Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 237 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f6f1b3790c88b..710517c9a990d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2843,15 +2843,243 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 }
 #endif
 
-/* sys_io_cancel:
- *	Attempts to cancel an iocb previously passed to io_submit.  If
- *	the operation is successfully cancelled, the resulting event is
- *	copied into the memory pointed to by result without being placed
- *	into the completion queue and 0 is returned.  May fail with
- *	-EFAULT if any of the data structures pointed to are invalid.
- *	May fail with -EINVAL if aio_context specified by ctx_id is
- *	invalid.  May fail with -EAGAIN if the iocb specified was not
- *	cancelled.  Will fail with -ENOSYS if not implemented.
+/**
+ * sys_io_cancel - Attempt to cancel an outstanding asynchronous I/O operation
+ * @ctx_id: AIO context handle returned by io_setup
+ * @iocb: Pointer to the iocb structure that was previously submitted
+ * @result: Unused parameter (historically for result storage, now ignored)
+ *
+ * long-desc: Attempts to cancel an asynchronous I/O operation that was
+ *   previously submitted via io_submit(). The syscall searches for the
+ *   specified iocb in the context's active request list and invokes the
+ *   operation-specific cancellation callback if found.
+ *
+ *   The cancellation behavior depends on the type of I/O operation:
+ *   - For poll operations (IOCB_CMD_POLL): The request is marked as cancelled
+ *     and a work item is scheduled to complete the cancellation.
+ *   - For USB gadget I/O: The USB endpoint dequeue function is called, which
+ *     triggers the completion callback with -ECONNRESET status.
+ *   - For most direct I/O operations: Cancellation is typically not supported
+ *     as these operations do not register a cancel callback.
+ *
+ *   If the iocb is found and has a registered cancellation callback, that
+ *   callback is invoked and the iocb is removed from the active request list.
+ *   The completion event is delivered via the ring buffer (not via the result
+ *   parameter, which is now unused for this purpose).
+ *
+ *   On successful cancellation initiation, the syscall returns -EINPROGRESS
+ *   (not 0) to indicate that cancellation is in progress. This is because
+ *   the actual completion may occur asynchronously via the cancel callback.
+ *
+ *   Important limitations:
+ *   - Most file I/O operations do not support cancellation
+ *   - The iocb must still be pending (not yet completed)
+ *   - The iocb must have been submitted via io_submit (aio_key == KIOCB_KEY)
+ *   - Only operations that register a ki_cancel callback can be cancelled
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC
+ *
+ * param: ctx_id
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid AIO context handle previously returned by
+ *     io_setup() for the current process. The context must not have been
+ *     destroyed via io_destroy(). A value of 0 is always invalid. The handle
+ *     is actually the virtual address of the ring buffer mapping, and must
+ *     belong to the calling process's address space.
+ *
+ * param: iocb
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   size: sizeof(struct iocb)
+ *   constraint-type: KAPI_CONSTRAINT_USER_PTR
+ *   constraint: Must be a valid userspace pointer to a struct iocb that was
+ *     previously submitted via io_submit(). The iocb's aio_key field must
+ *     contain KIOCB_KEY (0), which is written by the kernel during io_submit.
+ *     A NULL pointer will result in EFAULT. The iocb must still be pending
+ *     (present in the context's active_reqs list) for cancellation to succeed.
+ *
+ * param: result
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ *   constraint-type: KAPI_CONSTRAINT_NONE
+ *   constraint: This parameter is no longer used by the kernel. It was
+ *     historically intended to receive the io_event result on successful
+ *     cancellation, but completion events are now always delivered via the
+ *     ring buffer. May be NULL.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: -EINPROGRESS
+ *   desc: Returns -EINPROGRESS when the cancellation callback was successfully
+ *     invoked and the request is being cancelled. This is the expected return
+ *     value on successful cancellation initiation. The completion event will
+ *     be delivered via the ring buffer. Note that this is different from the
+ *     man page which claims 0 is returned on success.
+ *
+ * error: EFAULT, Cannot read iocb from userspace
+ *   desc: Returned if the iocb pointer is invalid or points to memory that
+ *     cannot be read. Specifically, the kernel attempts to read the aio_key
+ *     field from the iocb via get_user() and returns EFAULT if this fails.
+ *     A NULL iocb pointer will trigger this error.
+ *
+ * error: EINVAL, iocb not submitted via io_submit
+ *   desc: Returned if the aio_key field of the iocb does not contain KIOCB_KEY
+ *     (which is 0). The kernel sets aio_key to KIOCB_KEY when an iocb is
+ *     successfully submitted via io_submit(). If aio_key contains a different
+ *     value, it indicates the iocb was never successfully submitted, is
+ *     corrupted, or the memory has been reused.
+ *
+ * error: EINVAL, Invalid AIO context
+ *   desc: Returned if ctx_id does not refer to a valid AIO context. This can
+ *     occur if: (1) the context was never created, (2) the context was
+ *     destroyed via io_destroy(), (3) the ctx_id is 0, (4) the ring buffer
+ *     header cannot be read from userspace, (5) the context belongs to a
+ *     different process, or (6) the context's internal ID doesn't match.
+ *
+ * error: EINVAL, iocb not found or not cancellable
+ *   desc: Returned if the specified iocb is not present in the context's
+ *     active request list. This occurs when: (1) the operation has already
+ *     completed and the completion event is in the ring buffer, (2) the
+ *     operation was never submitted to this context, (3) the iocb pointer
+ *     does not match any pending operation (comparison is by pointer value
+ *     converted to u64), or (4) the operation did not register a cancellation
+ *     callback (though in this case EINVAL comes from the default ret value).
+ *     Note: The man page documents EAGAIN for this case, but the actual
+ *     implementation returns EINVAL.
+ *
+ * error: ENOSYS, AIO not implemented
+ *   desc: Returned if the kernel was compiled without CONFIG_AIO support.
+ *     This error is returned by the syscall dispatch mechanism before the
+ *     io_cancel implementation is even reached.
+ *
+ * error: (driver-specific), Cancellation callback failed
+ *   desc: If the iocb is found and its ki_cancel callback is invoked, the
+ *     callback's return value is propagated to userspace if non-zero. For
+ *     USB gadget operations, usb_ep_dequeue() may return various errors
+ *     including EINVAL if the request wasn't queued. The aio_poll_cancel
+ *     callback always returns 0. Driver-specific cancellation functions
+ *     may return other error codes.
+ *
+ * lock: RCU read lock
+ *   type: KAPI_LOCK_RCU
+ *   desc: Acquired in lookup_ioctx() during context lookup. Protects against
+ *     concurrent modification of the mm->ioctx_table while searching for the
+ *     context. Released before any spinlocks are acquired.
+ *
+ * lock: ctx->ctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock acquired with interrupts disabled via
+ *     spin_lock_irq(). Held while iterating through the active_reqs list
+ *     searching for the iocb, while invoking the ki_cancel callback, and
+ *     while removing the iocb from the list. The cancel callback is invoked
+ *     with this lock held, so callbacks must not sleep and must be IRQ-safe.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: ctx->active_reqs list
+ *   desc: If the iocb is found and its cancellation callback is invoked, the
+ *     kiocb is removed from the context's active_reqs list via list_del_init().
+ *     This prevents the iocb from being found by subsequent io_cancel calls.
+ *   condition: iocb found and ki_cancel callback invoked
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: Pending I/O operation
+ *   desc: The cancellation callback may modify the state of the underlying
+ *     I/O operation. For poll operations, the cancelled flag is set. For USB
+ *     operations, the USB request is dequeued which triggers the completion
+ *     callback. The completion event is delivered via the ring buffer.
+ *   condition: ki_cancel callback is invoked
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ *   target: aio_poll work queue
+ *   desc: For poll operations (IOCB_CMD_POLL), the aio_poll_cancel callback
+ *     schedules a work item via schedule_work() to complete the cancellation
+ *     asynchronously. This work item will eventually deliver the completion
+ *     event to the ring buffer.
+ *   condition: Cancelling a poll operation
+ *   reversible: no
+ *
+ * state-trans: kiocb state
+ *   from: in_flight (in active_reqs list)
+ *   to: cancelling (removed from list, cancel callback invoked)
+ *   condition: iocb found and ki_cancel invoked
+ *   desc: When the iocb is found in the active_reqs list and its cancellation
+ *     callback is invoked, the kiocb transitions from in-flight to cancelling
+ *     state. The kiocb is removed from the active_reqs list, preventing
+ *     duplicate cancellation attempts. Final completion occurs asynchronously.
+ *
+ * state-trans: poll_iocb cancelled flag
+ *   from: false
+ *   to: true
+ *   condition: aio_poll_cancel is invoked
+ *   desc: For poll operations, the aio_poll_cancel callback sets the cancelled
+ *     flag on the poll_iocb structure. This signals to the poll completion
+ *     handler that the operation was cancelled rather than completed normally.
+ *
+ * constraint: Operation must support cancellation
+ *   desc: Only operations that register a ki_cancel callback can be cancelled.
+ *     Operations that don't set this callback (most direct I/O operations)
+ *     will never appear in the active_reqs list and thus cannot be cancelled.
+ *     Currently, only IOCB_CMD_POLL operations in the kernel AIO subsystem
+ *     and USB gadget operations support cancellation.
+ *
+ * constraint: Timing window for cancellation
+ *   desc: The iocb must still be pending at the time io_cancel is called.
+ *     There is an inherent race condition: the operation may complete
+ *     naturally between the time the application decides to cancel and when
+ *     io_cancel is invoked. In this case, EINVAL is returned because the
+ *     iocb is no longer in the active_reqs list.
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=y for this syscall
+ *     to be available. If not configured, ENOSYS is returned.
+ *
+ * examples: io_cancel(ctx, &iocb, NULL);  // Cancel with unused result param
+ *   if (io_cancel(ctx, &iocb, NULL) == -EINPROGRESS) handle_cancellation();
+ *   ret = io_cancel(ctx, &iocb, NULL); if (ret && ret != -EINPROGRESS) error();
+ *
+ * notes: The return value semantics are unusual: -EINPROGRESS indicates
+ *   successful cancellation initiation, not an error. This is because the
+ *   actual cancellation may complete asynchronously, with the completion
+ *   event delivered via the ring buffer.
+ *
+ *   The result parameter is completely ignored by current kernels. It was
+ *   historically used to return the io_event directly, but since commit
+ *   28468cbed92e ("Revert 'fs/aio: Make io_cancel() generate completions
+ *   again'"), completion events are always delivered via the ring buffer.
+ *   Applications should use io_getevents() to retrieve the cancelled
+ *   operation's completion event.
+ *
+ *   The man page documents EAGAIN as a possible error when "the iocb specified
+ *   was not cancelled", but code analysis shows that EINVAL is actually
+ *   returned in this case. The man page is outdated in this regard.
+ *
+ *   The aio_key field must equal KIOCB_KEY (0) because the kernel writes this
+ *   value during io_submit. If an application attempts to cancel an iocb
+ *   before submitting it, or after the memory has been reused, this check
+ *   will fail with EINVAL.
+ *
+ *   For poll operations specifically, the cancellation is marked but the
+ *   actual completion may be delayed until a worker processes it. The
+ *   -EINPROGRESS return value reflects this asynchronous completion model.
+ *
+ *   USB gadget operations are an exception: when usb_ep_dequeue() is called,
+ *   it typically completes the request synchronously with -ECONNRESET status
+ *   in the completion callback.
+ *
+ *   There is no glibc wrapper for this syscall. Applications must use
+ *   syscall(SYS_io_cancel, ...) or the libaio library. The libaio wrapper
+ *   returns negative error numbers directly rather than returning -1 and
+ *   setting errno.
+ *
+ *   io_uring (since Linux 5.1) provides a more capable and widely-supported
+ *   async I/O interface with better cancellation support via IORING_OP_ASYNC_CANCEL.
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 		struct io_event __user *, result)
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 07/15] kernel/api: add API specification for io_submit
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 308 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index ff2a8527e1b85..f6f1b3790c88b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2450,17 +2450,314 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	return err;
 }
 
-/* sys_io_submit:
- *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
- *	the number of iocbs queued.  May return -EINVAL if the aio_context
- *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
- *	*iocbpp[0] is not properly initialized, if the operation specified
- *	is invalid for the file descriptor in the iocb.  May fail with
- *	-EFAULT if any of the data structures point to invalid data.  May
- *	fail with -EBADF if the file descriptor specified in the first
- *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
- *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
- *	fail with -ENOSYS if not implemented.
+/**
+ * sys_io_submit - Submit asynchronous I/O operations for processing
+ * @ctx_id: AIO context handle returned by io_setup
+ * @nr: Number of I/O control blocks to submit
+ * @iocbpp: Array of pointers to iocb structures describing the operations
+ *
+ * long-desc: Submits one or more asynchronous I/O operations for processing
+ *   against a previously created AIO context. Each iocb structure describes
+ *   a single I/O operation including the operation type, file descriptor,
+ *   buffer, size, and offset.
+ *
+ *   The syscall processes iocbs sequentially from the array. If an error
+ *   occurs while processing an iocb, submission stops at that point and
+ *   the number of successfully submitted operations is returned. This means
+ *   partial submission is possible: if submitting 10 iocbs and the 5th fails,
+ *   4 is returned and iocbs 0-3 are queued for processing.
+ *
+ *   Supported operations (specified via aio_lio_opcode):
+ *   - IOCB_CMD_PREAD (0): Positioned read from file
+ *   - IOCB_CMD_PWRITE (1): Positioned write to file
+ *   - IOCB_CMD_FSYNC (2): Sync file data and metadata
+ *   - IOCB_CMD_FDSYNC (3): Sync file data only
+ *   - IOCB_CMD_POLL (5): Poll for events on file descriptor
+ *   - IOCB_CMD_NOOP (6): No operation (useful for testing)
+ *   - IOCB_CMD_PREADV (7): Positioned scatter read
+ *   - IOCB_CMD_PWRITEV (8): Positioned gather write
+ *
+ *   The iocb structure fields include:
+ *   - aio_data: User data copied to io_event on completion
+ *   - aio_lio_opcode: Operation type (one of IOCB_CMD_*)
+ *   - aio_fildes: File descriptor for the operation
+ *   - aio_buf: Buffer address (or iovec array for vectored ops)
+ *   - aio_nbytes: Buffer size (or iovec count for vectored ops)
+ *   - aio_offset: File offset for positioned operations
+ *   - aio_flags: Optional flags (IOCB_FLAG_RESFD, IOCB_FLAG_IOPRIO)
+ *   - aio_resfd: eventfd to signal on completion (if IOCB_FLAG_RESFD set)
+ *   - aio_rw_flags: Per-operation RWF_* flags
+ *   - aio_reqprio: I/O priority (if IOCB_FLAG_IOPRIO set)
+ *
+ *   After successful submission, operations complete asynchronously. Results
+ *   are delivered to the completion ring buffer and can be retrieved via
+ *   io_getevents(). If aio_resfd specifies a valid eventfd, it is signaled
+ *   when each operation completes.
+ *
+ *   The actual I/O may complete synchronously if the data is cached or if
+ *   the underlying filesystem doesn't support truly asynchronous I/O. In
+ *   such cases, the operation is still reported via the completion ring.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: ctx_id
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid AIO context handle previously returned by
+ *     io_setup() for the current process. The context must not have been
+ *     destroyed. A value of 0 is always invalid. The handle is actually
+ *     the virtual address of the ring buffer mapping.
+ *
+ * param: nr
+ *   type: KAPI_TYPE_INT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, LONG_MAX
+ *   constraint: Must be >= 0. If 0, the syscall returns immediately with 0.
+ *     The actual number processed is capped to ctx->nr_events (the context's
+ *     capacity). Very large values are effectively limited by the context
+ *     capacity and available ring buffer slots.
+ *
+ * param: iocbpp
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid userspace pointer to an array of nr pointers
+ *     to struct iocb. Each iocb pointer must itself be valid and point to a
+ *     properly initialized iocb structure. The iocb structures must have
+ *     aio_reserved2 set to 0 for forward compatibility.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_RANGE
+ *   success: >= 0
+ *   desc: Returns the number of iocbs successfully submitted (0 to nr). If
+ *     partial submission occurs due to an error, returns the count of
+ *     successfully submitted operations. Returns 0 if nr is 0.
+ *
+ * error: EINVAL, Invalid context or parameter
+ *   desc: Returned if ctx_id is invalid, nr is negative, aio_reserved2 is
+ *     non-zero, aio_lio_opcode is invalid, aio_buf/aio_nbytes overflow,
+ *     aio_resfd is not an eventfd, conflicting aio_rw_flags, file lacks
+ *     required operation support, invalid POLL/FSYNC parameters, or
+ *     invalid aio_reqprio class.
+ *
+ * error: EFAULT, Invalid memory access
+ *   desc: Returned if: (1) iocbpp is not a valid userspace pointer, (2) any
+ *     pointer in the iocbpp array is invalid, (3) the iocb data cannot be
+ *     copied from userspace, (4) aio_buf points to invalid memory, or
+ *     (5) the kernel cannot write the aio_key field back to userspace.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: Returned if: (1) aio_fildes in an iocb does not refer to an open
+ *     file, (2) aio_resfd does not refer to a valid file descriptor when
+ *     IOCB_FLAG_RESFD is set, (3) the file is not opened with appropriate
+ *     mode for the operation (e.g., read on write-only file).
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ *   desc: Returned if insufficient slots are available in the completion
+ *     ring buffer. This typically means too many operations are already
+ *     in flight and the application should call io_getevents() to consume
+ *     completed events before submitting more.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: Returned if: (1) IOCB_FLAG_IOPRIO is set and aio_reqprio specifies
+ *     IOPRIO_CLASS_RT (real-time I/O priority) but the process lacks
+ *     CAP_SYS_ADMIN or CAP_SYS_NICE capability, or (2) RWF_NOAPPEND is
+ *     specified but the file has the append-only attribute (IS_APPEND).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ *   desc: Returned if: (1) unsupported aio_rw_flags are specified, (2)
+ *     RWF_NOWAIT is specified but the file doesn't support non-blocking I/O
+ *     (FMODE_NOWAIT not set), (3) RWF_ATOMIC is specified for a read or
+ *     the file doesn't support atomic writes, or (4) RWF_DONTCACHE is
+ *     specified but not supported by the filesystem or file is DAX-mapped.
+ *
+ * error: EOVERFLOW, Value too large
+ *   desc: Returned if aio_offset plus aio_nbytes would overflow and the
+ *     file does not support unsigned offsets. This check prevents reading
+ *     or writing past the maximum representable file position.
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: Returned if memory allocation fails when preparing credentials
+ *     for IOCB_CMD_FSYNC operations, or if vectored I/O (preadv/pwritev)
+ *     requires allocating iovec arrays larger than the stack buffer.
+ *
+ * lock: RCU read lock
+ *   type: KAPI_LOCK_RCU
+ *   desc: Acquired during context lookup in lookup_ioctx(). Protects against
+ *     concurrent modification of the ioctx_table while looking up the
+ *     context. Released before processing any iocbs.
+ *
+ * lock: ctx->completion_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock acquired briefly during request slot allocation
+ *     via user_refill_reqs_available() if the percpu request counter is empty.
+ *     Protects the ring buffer tail and completed_events counters.
+ *
+ * lock: ctx->ctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock acquired when adding cancellable requests to
+ *     the active_reqs list. This enables io_cancel() to find and cancel
+ *     in-flight operations.
+ *
+ * lock: blk_plug
+ *   type: KAPI_LOCK_CUSTOM
+ *   desc: Block layer plugging is enabled when nr > 2 (AIO_PLUG_THRESHOLD)
+ *     to batch block I/O requests for better performance. This is not a
+ *     traditional lock but affects I/O scheduling.
+ *
+ * signal: any
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_TRANSFORM
+ *   condition: Signal arrives during underlying read/write operation
+ *   desc: If a signal arrives during the underlying file read/write operation
+ *     and the operation returns ERESTARTSYS/ERESTARTNOINTR/etc., the error
+ *     is transformed to EINTR for the completion event. AIO operations cannot
+ *     be restarted in the traditional sense because other operations may have
+ *     already been submitted. The syscall itself (io_submit) is NOT interrupted
+ *     by signals - only the individual async operations can be.
+ *   error: -EINTR (in io_event.res, not syscall return)
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: no
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: aio_kiocb structures
+ *   desc: Allocates one aio_kiocb structure per submitted operation from the
+ *     kiocb_cachep slab cache. These structures track the in-flight operations
+ *     and are freed after completion is recorded in the ring buffer.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: AIO context request counters
+ *   desc: Decrements the available request slot counter in the context.
+ *     Slots are reclaimed when completion events are consumed from the ring
+ *     buffer via io_getevents().
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: ctx->active_reqs list
+ *   desc: Cancellable operations (reads, writes, polls) are added to the
+ *     context's active_reqs list, enabling cancellation via io_cancel().
+ *   condition: Operation supports cancellation
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: iocb->aio_key field
+ *   desc: The kernel writes KIOCB_KEY (0) to the aio_key field of each
+ *     submitted iocb in userspace memory. This marks the iocb as submitted
+ *     and is checked by io_cancel() to validate the iocb.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: file reference count
+ *   desc: Increments the reference count of the file descriptor's struct file
+ *     via fget() for each submitted operation. The reference is released
+ *     when the operation completes (via fput() in iocb_destroy()).
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: target file(s)
+ *   desc: For write operations, the file content may be modified. For fsync
+ *     operations, dirty data is flushed to storage. The actual I/O may
+ *     complete synchronously or asynchronously depending on the filesystem.
+ *   condition: IOCB_CMD_PWRITE, IOCB_CMD_PWRITEV, IOCB_CMD_FSYNC, IOCB_CMD_FDSYNC
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ *   target: fsync work queue
+ *   desc: FSYNC and FDSYNC operations are scheduled to run on a workqueue
+ *     because vfs_fsync() can block. The operation runs asynchronously and
+ *     completion is signaled via the ring buffer.
+ *   condition: IOCB_CMD_FSYNC or IOCB_CMD_FDSYNC
+ *   reversible: no
+ *
+ * state-trans: iocb state
+ *   from: user-prepared iocb
+ *   to: submitted (aio_key set to KIOCB_KEY)
+ *   condition: successful submission of each iocb
+ *   desc: Each successfully submitted iocb transitions from user-prepared
+ *     state to submitted state, marked by the kernel writing KIOCB_KEY to
+ *     aio_key. The iocb remains in submitted state until completion.
+ *
+ * state-trans: AIO context slot availability
+ *   from: slots_available = N
+ *   to: slots_available = N - submitted_count
+ *   condition: successful submission
+ *   desc: Available slots in the context decrease by the number of successfully
+ *     submitted operations. Slots are reclaimed when io_getevents() consumes
+ *     completion events.
+ *
+ * capability: CAP_SYS_ADMIN
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Use of IOPRIO_CLASS_RT (real-time I/O priority class)
+ *   without: Returns EPERM when attempting to use RT I/O priority
+ *   condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLASS_RT
+ *
+ * capability: CAP_SYS_NICE
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Use of IOPRIO_CLASS_RT (alternative to CAP_SYS_ADMIN)
+ *   without: Returns EPERM when attempting to use RT I/O priority
+ *   condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLASS_RT
+ *
+ * constraint: Ring buffer slot availability
+ *   desc: There must be available slots in the completion ring buffer for
+ *     each operation to be submitted. If all slots are occupied by pending
+ *     completion events, submission fails with EAGAIN. The number of slots
+ *     is determined by nr_events passed to io_setup(), though internal
+ *     doubling means more slots may be available.
+ *   expr: available_slots >= 1 for each submission
+ *
+ * constraint: Valid file descriptor per iocb
+ *   desc: Each iocb must reference a valid, open file descriptor via
+ *     aio_fildes. The file must be opened with appropriate access mode
+ *     for the requested operation (read access for PREAD, write access
+ *     for PWRITE, etc.).
+ *
+ * constraint: File must support operation
+ *   desc: For read/write operations, the underlying file must implement
+ *     read_iter/write_iter file operations. For fsync, the file must
+ *     implement fsync. For poll, the file must support vfs_poll().
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=y for this syscall
+ *     to be available. If not configured, returns -ENOSYS.
+ *
+ * examples: struct iocb iocb, *iocbp = &iocb; io_submit(ctx, 1, &iocbp);
+ *   struct iocb iocbs[10], *ptrs[10]; io_submit(ctx, 10, ptrs);  // Batch submit
+ *
+ * notes: Unlike traditional synchronous I/O, errors from io_submit() indicate
+ *   submission failures, not I/O errors. Actual I/O errors are reported via
+ *   the res field of struct io_event when retrieved via io_getevents().
+ *
+ *   The return value indicates how many iocbs were successfully submitted.
+ *   If this is less than nr, the application should check which operation
+ *   failed (it's the one at index = return_value) and handle the error.
+ *   Previously submitted operations in the batch are still queued.
+ *
+ *   For vectored operations (PREADV/PWRITEV), aio_buf points to an array
+ *   of struct iovec and aio_nbytes contains the iovec count. The maximum
+ *   iovec count is UIO_MAXIOV (1024).
+ *
+ *   Block layer plugging is automatically enabled for batches larger than
+ *   2 operations, improving I/O merging and reducing per-I/O overhead.
+ *
+ *   The COMPAT_SYSCALL variant handles 32-bit userspace on 64-bit kernels,
+ *   using compat_uptr_t for the iocbpp array elements.
+ *
+ *   Historical note: commit d6b2615f7d31d ("aio: simplify - and fix - fget/fput
+ *   for io_submit()") fixed file descriptor reference counting issues. Earlier
+ *   kernels could leak file references on certain error paths.
+ *
+ *   io_uring (since Linux 5.1) is a more modern and performant alternative.
+ *   Consider using io_uring_enter() for new applications requiring async I/O.
+ *
+ *   There is no glibc wrapper; use syscall(SYS_io_submit, ...) or the libaio
+ *   library. The libaio wrapper io_submit() returns negative error numbers
+ *   directly rather than returning -1 and setting errno.
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 		struct iocb __user * __user *, iocbpp)
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 06/15] kernel/api: add API specification for io_destroy
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 184 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 36556e7a8e2c0..ff2a8527e1b85 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1646,11 +1646,190 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p)
 }
 #endif
 
-/* sys_io_destroy:
- *	Destroy the aio_context specified.  May cancel any outstanding 
- *	AIOs and block on completion.  Will fail with -ENOSYS if not
- *	implemented.  May fail with -EINVAL if the context pointed to
- *	is invalid.
+/**
+ * sys_io_destroy - Destroy an asynchronous I/O context
+ * @ctx: AIO context handle returned by io_setup
+ *
+ * long-desc: Destroys the asynchronous I/O context identified by ctx. This
+ *   syscall will attempt to cancel all outstanding asynchronous I/O operations
+ *   against the context and block until all operations have completed. Once
+ *   this syscall returns successfully, the context handle becomes invalid and
+ *   must not be used with any other io_* syscalls.
+ *
+ *   The context's memory-mapped ring buffer is unmapped from the process address
+ *   space, and all associated kernel resources are freed. The system-wide AIO
+ *   event counter (aio_nr) is decremented by the original nr_events value that
+ *   was passed to io_setup when creating this context.
+ *
+ *   This syscall blocks until all in-flight I/O operations have completed. This
+ *   ensures that userspace buffers passed to io_submit are no longer accessed
+ *   by the kernel after io_destroy returns. The wait is NOT interruptible by
+ *   signals, so callers cannot cancel this blocking behavior.
+ *
+ *   If two threads call io_destroy on the same context simultaneously, only the
+ *   first call will succeed; subsequent calls return -EINVAL as the context is
+ *   already marked as dead.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: ctx
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid context handle previously returned by io_setup.
+ *     The handle is actually the virtual address of the ring buffer mapping in
+ *     the calling process's address space. A value of 0 is always invalid.
+ *     The context must not have been previously destroyed.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. After successful return, the context handle is
+ *     invalid and all resources have been released. All outstanding I/O
+ *     operations have completed.
+ *
+ * error: EINVAL, Invalid context
+ *   desc: The ctx argument does not refer to a valid AIO context in the calling
+ *     process. This can occur if: (1) ctx was never returned by io_setup,
+ *     (2) ctx was returned by io_setup in a different process, (3) ctx was
+ *     already destroyed by a previous io_destroy call, (4) ctx is 0 or an
+ *     arbitrary invalid value, or (5) the ring buffer at the ctx address has
+ *     been corrupted (e.g., the id field no longer matches).
+ *
+ * lock: mm->ioctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-mm spinlock protecting the ioctx_table. Held briefly while
+ *     marking the context as dead and removing it from the process's AIO
+ *     context table.
+ *
+ * lock: RCU read lock
+ *   type: KAPI_LOCK_RCU
+ *   desc: RCU read-side critical section held during context lookup in
+ *     lookup_ioctx(). Protects against concurrent modification of the
+ *     ioctx_table.
+ *
+ * lock: ctx->ctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock held while cancelling outstanding I/O requests
+ *     in free_ioctx_users(). Protects the active_reqs list.
+ *
+ * lock: mmap_lock
+ *   type: KAPI_LOCK_RWLOCK
+ *   desc: Process memory map write lock acquired during vm_munmap() when
+ *     unmapping the ring buffer. May contend with other memory operations
+ *     in the same process.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: ctx->dead flag
+ *   desc: Atomically sets the context's dead flag to 1, marking it as being
+ *     destroyed. This prevents new I/O submissions and ensures subsequent
+ *     io_destroy calls return -EINVAL.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: mm->ioctx_table
+ *   desc: Removes the context from the process's AIO context table by setting
+ *     the corresponding table entry to NULL. After this, lookup_ioctx will
+ *     no longer find this context.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: aio_nr (global counter)
+ *   desc: Decrements the system-wide AIO context counter by the context's
+ *     max_reqs value (the nr_events originally passed to io_setup). This
+ *     counter is visible via /proc/sys/fs/aio-nr.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: process virtual memory
+ *   desc: Unmaps the ring buffer from the process's address space via
+ *     vm_munmap(). The memory region at ctx becomes invalid.
+ *   condition: ctx->mmap_size > 0
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ *   target: kioctx structure and associated resources
+ *   desc: Frees the AIO context structure, percpu data, ring buffer pages, and
+ *     the anonymous file backing the ring buffer. Deferred via RCU work queue
+ *     to ensure safe cleanup after all references are dropped.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SIGNAL_SEND
+ *   target: outstanding AIO operations
+ *   desc: Cancels all outstanding asynchronous I/O operations by invoking their
+ *     ki_cancel callbacks. The specific effect depends on the operation type
+ *     (read, write, fsync, poll).
+ *   condition: active_reqs list is not empty
+ *   reversible: no
+ *
+ * state-trans: AIO context state
+ *   from: alive (ctx->dead == 0)
+ *   to: dead (ctx->dead == 1)
+ *   condition: successful atomic exchange in kill_ioctx
+ *   desc: The context transitions from usable to destroyed. Once dead, the
+ *     context cannot be used for any operations and will be freed after all
+ *     references are dropped.
+ *
+ * state-trans: process AIO state
+ *   from: has AIO context(s)
+ *   to: context removed (or no contexts)
+ *   condition: successful io_destroy
+ *   desc: The destroyed context is removed from the process's context table.
+ *     If this was the only context, the process no longer has any active
+ *     AIO contexts.
+ *
+ * state-trans: system AIO resources
+ *   from: aio_nr = N
+ *   to: aio_nr = N - max_reqs
+ *   condition: successful io_destroy
+ *   desc: System-wide AIO resource counter decreases, making room for other
+ *     processes to create new AIO contexts.
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=y for this syscall to be
+ *     available. If not configured, the syscall returns -ENOSYS. This is
+ *     typically enabled by default but may be disabled on embedded systems.
+ *
+ * constraint: Context must belong to calling process
+ *   desc: Each AIO context is bound to a specific process (mm_struct). A context
+ *     created by one process cannot be destroyed by another process, even if
+ *     the context handle value is somehow known.
+ *   expr: ctx belongs to current->mm
+ *
+ * examples: io_destroy(ctx);  // Destroy context and wait for completion
+ *   if (io_destroy(ctx) == -EINVAL) handle_error();  // Invalid context
+ *
+ * notes: The man page documents EFAULT as a possible error, but code analysis
+ *   shows that EFAULT conditions (e.g., invalid ring buffer pointer) actually
+ *   result in EINVAL being returned, as lookup_ioctx returns NULL on any
+ *   failure to access the ring buffer header.
+ *
+ *   This syscall blocks in TASK_UNINTERRUPTIBLE state while waiting for
+ *   outstanding I/O operations to complete. This means the process cannot be
+ *   interrupted by signals during this wait. In extreme cases with very slow
+ *   I/O devices, this could cause the process to appear hung.
+ *
+ *   Historical note: Before kernel 3.11, io_destroy blocked waiting for I/O
+ *   completion. A refactoring in 3.11 accidentally removed this behavior,
+ *   creating a race where userspace buffers could be freed while the kernel
+ *   was still using them. This was fixed by commit e02ba72aabfa that blocks
+ *   io_destroy until all context requests are completed.
+ *
+ *   Race condition handling: A race between io_destroy and io_submit was fixed
+ *   by commit 7137c6bd4552. A race between io_setup and io_destroy was fixed
+ *   by commit 86b62a2cb4fc. Both fixes ensure proper synchronization via
+ *   reference counting.
+ *
+ *   io_uring (since Linux 5.1) is a more modern alternative that provides better
+ *   performance and more features. Consider using io_uring for new applications.
+ *
+ *   There is no glibc wrapper for this syscall. Use syscall(SYS_io_destroy, ctx)
+ *   or the libaio library wrapper io_destroy(). Note: libaio has slightly
+ *   different error semantics, returning negative error numbers directly instead
+ *   of -1 with errno.
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 {
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 05/15] kernel/api: add API specification for io_setup
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 216 insertions(+), 12 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0a23a8c0717ff..36556e7a8e2c0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1366,18 +1366,222 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
 	return ret;
 }
 
-/* sys_io_setup:
- *	Create an aio_context capable of receiving at least nr_events.
- *	ctxp must not point to an aio_context that already exists, and
- *	must be initialized to 0 prior to the call.  On successful
- *	creation of the aio_context, *ctxp is filled in with the resulting 
- *	handle.  May fail with -EINVAL if *ctxp is not initialized,
- *	if the specified nr_events exceeds internal limits.  May fail 
- *	with -EAGAIN if the specified nr_events exceeds the user's limit 
- *	of available events.  May fail with -ENOMEM if insufficient kernel
- *	resources are available.  May fail with -EFAULT if an invalid
- *	pointer is passed for ctxp.  Will fail with -ENOSYS if not
- *	implemented.
+/**
+ * sys_io_setup - Create an asynchronous I/O context
+ * @nr_events: Minimum number of concurrent AIO operations the context should support
+ * @ctxp: Pointer to aio_context_t variable to receive the context handle
+ *
+ * long-desc: Creates an asynchronous I/O context capable of receiving at least
+ *   nr_events concurrent operations. The context handle is returned via ctxp,
+ *   which must be initialized to 0 prior to the call. The returned context
+ *   handle is used with subsequent AIO operations (io_submit, io_getevents,
+ *   io_cancel, io_destroy).
+ *
+ *   The AIO context consists of a memory-mapped ring buffer shared between
+ *   kernel and userspace for efficient completion notification. The kernel
+ *   internally allocates more capacity than requested to account for percpu
+ *   batching (approximately nr_events * 2, but at least num_cpus * 8).
+ *
+ *   The context is bound to the calling process and cannot be shared across
+ *   processes. Each process can have multiple AIO contexts, limited only by
+ *   the system-wide aio-max-nr sysctl.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: nr_events
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 1, 8388608
+ *   constraint: Must be greater than 0. Internal limit of approximately 8M events
+ *     prevents overflow when calculating ring buffer size (0x10000000 / 32 bytes
+ *     per io_event). The kernel may allocate more capacity than requested to
+ *     optimize for percpu batching.
+ *
+ * param: ctxp
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_INOUT | KAPI_PARAM_USER
+ *   size: sizeof(aio_context_t)
+ *   constraint-type: KAPI_CONSTRAINT_USER_PTR
+ *   constraint: Must be a valid userspace pointer to an aio_context_t variable.
+ *     The memory pointed to MUST be initialized to 0 before the call. On success,
+ *     receives the context handle (actually the mmap address of the ring buffer).
+ *     The context handle is opaque and should not be interpreted by userspace
+ *     except to pass to other io_* syscalls.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. On success, *ctxp contains the new context handle.
+ *
+ * error: EFAULT, Invalid pointer
+ *   desc: The ctxp pointer is invalid, not accessible, or points to memory that
+ *     cannot be read or written. Returned from get_user() when reading the
+ *     initial value or from put_user() when storing the context handle.
+ *
+ * error: EINVAL, Invalid parameter
+ *   desc: Either *ctxp is not initialized to 0 (indicating an existing context or
+ *     uninitialized memory), or nr_events is 0, or nr_events is too large causing
+ *     internal overflow when calculating ring buffer size. The internal limit is
+ *     approximately 0x10000000 / sizeof(struct io_event) events.
+ *
+ * error: EAGAIN, Resource limit exceeded
+ *   desc: The system-wide limit on AIO contexts would be exceeded. The limit is
+ *     controlled by /proc/sys/fs/aio-max-nr (default 65536). Each context counts
+ *     as nr_events toward this limit. Also returned if nr_events exceeds the
+ *     current aio-max-nr value. Unlike ENOMEM, this error indicates a policy
+ *     limit rather than physical resource exhaustion.
+ *
+ * error: ENOMEM, Insufficient memory
+ *   desc: Kernel could not allocate required memory for the AIO context. This
+ *     includes the kioctx structure, percpu data, ring buffer pages, or the
+ *     anonymous file backing the ring buffer. Also returned if the kernel could
+ *     not establish the memory mapping for the ring buffer, or if ioctx_table
+ *     expansion failed.
+ *
+ * error: EINTR, Interrupted by signal
+ *   desc: A fatal signal was received while attempting to acquire the mmap_lock
+ *     for the ring buffer memory mapping. The operation was aborted before any
+ *     state was modified. Only fatal signals (SIGKILL) can cause this error;
+ *     normal signals like SIGINT do not interrupt the operation.
+ *
+ * lock: aio_nr_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Global spinlock protecting the system-wide aio_nr counter. Held briefly
+ *     to check and update the system-wide AIO context count.
+ *
+ * lock: mm->ioctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-mm spinlock protecting the ioctx_table. Held while adding the new
+ *     context to the process's AIO context table.
+ *
+ * lock: ctx->ring_lock
+ *   type: KAPI_LOCK_MUTEX
+ *   desc: Per-context mutex protecting ring buffer setup. Held throughout context
+ *     initialization to prevent page migration during setup, then released once
+ *     the context is fully initialized.
+ *
+ * lock: mmap_lock
+ *   type: KAPI_LOCK_RWLOCK
+ *   desc: Process memory map write lock. Acquired via mmap_write_lock_killable()
+ *     during ring buffer mmap operation. This is where EINTR can occur.
+ *
+ * signal: SIGKILL
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: Fatal signal pending during mmap_write_lock_killable
+ *   desc: Fatal signals can interrupt the context creation during the mmap phase.
+ *     The mmap_write_lock_killable() function checks for fatal signals and returns
+ *     -EINTR if one is pending. Non-fatal signals do not interrupt this syscall.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   priority: 0
+ *   restartable: no
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: kioctx structure
+ *   desc: Allocates the main AIO context structure from kioctx_cachep slab cache.
+ *     Contains ring buffer metadata, locks, and request tracking.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: percpu kioctx_cpu structures
+ *   desc: Allocates per-CPU structures for request batching via alloc_percpu().
+ *     Used to reduce contention on the global request counter.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: ring buffer pages
+ *   desc: Allocates pages for the completion event ring buffer. The ring is backed
+ *     by an anonymous file on the internal "aio" filesystem and memory-mapped into
+ *     the process address space.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_CREATE
+ *   target: anonymous inode and file
+ *   desc: Creates an anonymous inode and file on the internal aio filesystem to
+ *     back the ring buffer mapping. This enables proper page migration support.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: process virtual memory
+ *   desc: Creates a new memory mapping (VMA) for the ring buffer in the process
+ *     address space. The mapping is marked VM_DONTEXPAND and uses aio_ring_vm_ops.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: mm->ioctx_table
+ *   desc: Adds the new context to the process's AIO context table. The table is
+ *     dynamically expanded if needed (grows by 4x each time).
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: aio_nr (global counter)
+ *   desc: Increments the system-wide AIO context counter by nr_events. This counter
+ *     is visible via /proc/sys/fs/aio-nr and counts toward the aio-max-nr limit.
+ *   reversible: yes
+ *
+ * state-trans: process AIO state
+ *   from: no AIO context (or fewer contexts)
+ *   to: has AIO context
+ *   condition: successful io_setup
+ *   desc: Process gains an AIO context that can be used for asynchronous I/O
+ *     operations. The context remains until explicitly destroyed via io_destroy
+ *     or process exit.
+ *
+ * state-trans: system AIO resources
+ *   from: aio_nr = N
+ *   to: aio_nr = N + nr_events
+ *   condition: successful io_setup
+ *   desc: System-wide AIO resource counter increases. The counter tracks total
+ *     requested AIO capacity across all processes.
+ *
+ * constraint: System-wide AIO limit (aio-max-nr)
+ *   desc: The /proc/sys/fs/aio-max-nr sysctl (default 65536) limits the total
+ *     number of AIO events system-wide. Each io_setup call adds nr_events to
+ *     the aio_nr counter. If aio_nr + nr_events would exceed aio_max_nr, the
+ *     call fails with EAGAIN. Administrators can increase aio-max-nr if needed.
+ *   expr: aio_nr + nr_events <= aio_max_nr
+ *
+ * constraint: Per-process context limit
+ *   desc: Each process can have multiple AIO contexts, limited only by the
+ *     system-wide aio-max-nr limit and available memory. The ioctx_table grows
+ *     dynamically to accommodate new contexts.
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=y for this syscall to be
+ *     available. If not configured, the syscall returns -ENOSYS. This is typically
+ *     enabled by default but may be disabled on embedded systems.
+ *
+ * constraint: Memory for ring buffer
+ *   desc: The kernel must be able to allocate sufficient contiguous pages for the
+ *     ring buffer and establish the memory mapping. Large nr_events values require
+ *     more memory and may fail with ENOMEM on memory-constrained systems.
+ *
+ * examples: aio_context_t ctx = 0; io_setup(128, &ctx);  // Create context for 128 events
+ *   aio_context_t ctx = 0; io_setup(1024, &ctx);  // Create context for 1024 events
+ *
+ * notes: The returned context handle is actually the virtual address of the ring
+ *   buffer mapping in the process address space. This allows userspace libraries
+ *   to directly access completion events without syscall overhead in some cases.
+ *
+ *   The kernel internally doubles nr_events and ensures a minimum of num_cpus * 8
+ *   events for percpu batching efficiency. This means the actual ring capacity may
+ *   be significantly larger than requested.
+ *
+ *   Historical note: A race condition between io_setup and io_destroy was fixed
+ *   in commit 86b62a2cb4fc ("aio: fix io_setup/io_destroy race"). Earlier kernels
+ *   could have the context freed while io_setup was still completing.
+ *
+ *   io_uring (since Linux 5.1) is a more modern alternative that provides better
+ *   performance and more features. Consider using io_uring for new applications.
+ *
+ *   There is no glibc wrapper for this syscall. Use syscall(SYS_io_setup, ...) or
+ *   the libaio library wrapper (note: libaio has slightly different error semantics,
+ *   returning negative error numbers directly instead of -1 with errno).
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
 {
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 04/15] tools/kapi: Add kernel API specification extraction tool
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

The kapi tool extracts and displays kernel API specifications.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 Documentation/dev-tools/kernel-api-spec.rst   | 198 +++-
 tools/kapi/.gitignore                         |   4 +
 tools/kapi/Cargo.toml                         |  19 +
 tools/kapi/src/extractor/debugfs.rs           | 442 +++++++++
 tools/kapi/src/extractor/kerneldoc_parser.rs  | 692 ++++++++++++++
 tools/kapi/src/extractor/mod.rs               | 464 ++++++++++
 tools/kapi/src/extractor/source_parser.rs     | 213 +++++
 .../src/extractor/vmlinux/binary_utils.rs     | 180 ++++
 .../src/extractor/vmlinux/magic_finder.rs     | 102 +++
 tools/kapi/src/extractor/vmlinux/mod.rs       | 864 ++++++++++++++++++
 tools/kapi/src/formatter/json.rs              | 468 ++++++++++
 tools/kapi/src/formatter/mod.rs               | 140 +++
 tools/kapi/src/formatter/plain.rs             | 549 +++++++++++
 tools/kapi/src/formatter/rst.rs               | 621 +++++++++++++
 tools/kapi/src/main.rs                        | 116 +++
 15 files changed, 5069 insertions(+), 3 deletions(-)
 create mode 100644 tools/kapi/.gitignore
 create mode 100644 tools/kapi/Cargo.toml
 create mode 100644 tools/kapi/src/extractor/debugfs.rs
 create mode 100644 tools/kapi/src/extractor/kerneldoc_parser.rs
 create mode 100644 tools/kapi/src/extractor/mod.rs
 create mode 100644 tools/kapi/src/extractor/source_parser.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/magic_finder.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
 create mode 100644 tools/kapi/src/formatter/json.rs
 create mode 100644 tools/kapi/src/formatter/mod.rs
 create mode 100644 tools/kapi/src/formatter/plain.rs
 create mode 100644 tools/kapi/src/formatter/rst.rs
 create mode 100644 tools/kapi/src/main.rs

diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/dev-tools/kernel-api-spec.rst
index 3a63f6711e27b..9b452753111ad 100644
--- a/Documentation/dev-tools/kernel-api-spec.rst
+++ b/Documentation/dev-tools/kernel-api-spec.rst
@@ -31,7 +31,9 @@ The framework aims to:
    common programming errors during development and testing.
 
 3. **Support Tooling**: Export API specifications in machine-readable formats for
-   use by static analyzers, documentation generators, and development tools.
+   use by static analyzers, documentation generators, and development tools. The
+   ``kapi`` tool (see `The kapi Tool`_) provides comprehensive extraction and
+   formatting capabilities.
 
 4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
    for debugging and introspection.
@@ -71,6 +73,13 @@ The framework consists of several key components:
    - Type-safe parameter specifications
    - Context and constraint definitions
 
+5. **kapi Tool** (``tools/kapi/``)
+
+   - Userspace utility for extracting specifications
+   - Multiple input sources (source, binary, debugfs)
+   - Multiple output formats (plain, JSON, RST)
+   - Testing and validation utilities
+
 Data Model
 ----------
 
@@ -344,8 +353,177 @@ Documentation Generation
 ------------------------
 
 The framework exports specifications via debugfs that can be used
-to generate documentation. Tools for automatic documentation generation
-from specifications are planned for future development.
+to generate documentation. The ``kapi`` tool provides comprehensive
+extraction and formatting capabilities for kernel API specifications.
+
+The kapi Tool
+=============
+
+Overview
+--------
+
+The ``kapi`` tool is a userspace utility that extracts and displays kernel API
+specifications from multiple sources. It provides a unified interface to access
+API documentation whether from compiled kernels, source code, or runtime systems.
+
+Installation
+------------
+
+Build the tool from the kernel source tree::
+
+    $ cd tools/kapi
+    $ cargo build --release
+
+    # Optional: Install system-wide
+    $ cargo install --path .
+
+The tool requires Rust and Cargo to build. The binary will be available at
+``tools/kapi/target/release/kapi``.
+
+Command-Line Usage
+------------------
+
+Basic syntax::
+
+    kapi [OPTIONS] [API_NAME]
+
+Options:
+
+- ``--vmlinux <PATH>``: Extract from compiled kernel binary
+- ``--source <PATH>``: Extract from kernel source code
+- ``--debugfs <PATH>``: Extract from debugfs (default: /sys/kernel/debug)
+- ``-f, --format <FORMAT>``: Output format (plain, json, rst)
+- ``-h, --help``: Display help information
+- ``-V, --version``: Display version information
+
+Input Modes
+-----------
+
+**1. Source Code Mode**
+
+Extract specifications directly from kernel source::
+
+    # Scan entire kernel source tree
+    $ kapi --source /path/to/linux
+
+    # Extract from specific file
+    $ kapi --source kernel/sched/core.c
+
+    # Get details for specific API
+    $ kapi --source /path/to/linux sys_sched_yield
+
+**2. Vmlinux Mode**
+
+Extract from compiled kernel with debug symbols::
+
+    # List all APIs in vmlinux
+    $ kapi --vmlinux /boot/vmlinux-5.15.0
+
+    # Get specific syscall details
+    $ kapi --vmlinux ./vmlinux sys_read
+
+**3. Debugfs Mode**
+
+Extract from running kernel via debugfs::
+
+    # Use default debugfs path
+    $ kapi
+
+    # Use custom debugfs mount
+    $ kapi --debugfs /mnt/debugfs
+
+    # Get specific API from running kernel
+    $ kapi sys_write
+
+Output Formats
+--------------
+
+**Plain Text Format** (default)::
+
+    $ kapi sys_read
+
+    Detailed information for sys_read:
+    ==================================
+    Description: Read from a file descriptor
+
+    Detailed Description:
+    Reads up to count bytes from file descriptor fd into the buffer starting at buf.
+
+    Execution Context:
+      - KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+
+    Parameters (3):
+
+    Available since: 1.0
+
+**JSON Format**::
+
+    $ kapi --format json sys_read
+    {
+      "api_details": {
+        "name": "sys_read",
+        "description": "Read from a file descriptor",
+        "long_description": "Reads up to count bytes...",
+        "context_flags": ["KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE"],
+        "since_version": "1.0"
+      }
+    }
+
+**ReStructuredText Format**::
+
+    $ kapi --format rst sys_read
+
+    sys_read
+    ========
+
+    **Read from a file descriptor**
+
+    Reads up to count bytes from file descriptor fd into the buffer...
+
+Usage Examples
+--------------
+
+**Generate complete API documentation**::
+
+    # Export all kernel APIs to JSON
+    $ kapi --source /path/to/linux --format json > kernel-apis.json
+
+    # Generate RST documentation for all syscalls
+    $ kapi --vmlinux ./vmlinux --format rst > syscalls.rst
+
+    # List APIs from specific subsystem
+    $ kapi --source drivers/gpu/drm/
+
+**Integration with other tools**::
+
+    # Find all APIs that can sleep
+    $ kapi --format json | jq '.apis[] | select(.context_flags[] | contains("SLEEPABLE"))'
+
+    # Generate markdown documentation
+    $ kapi --format rst sys_mmap | pandoc -f rst -t markdown
+
+**Debugging and analysis**::
+
+    # Compare API between kernel versions
+    $ diff <(kapi --vmlinux vmlinux-5.10) <(kapi --vmlinux vmlinux-5.15)
+
+    # Check if specific API exists
+    $ kapi --source . my_custom_api || echo "API not found"
+
+Implementation Details
+----------------------
+
+The tool extracts API specifications from three sources:
+
+1. **Source Code**: Parses KAPI specification macros using regular expressions
+2. **Vmlinux**: Reads the ``.kapi_specs`` ELF section from compiled kernels
+3. **Debugfs**: Reads from ``/sys/kernel/debug/kapi/`` filesystem interface
+
+The tool supports all KAPI specification types:
+
+- System calls (``DEFINE_KERNEL_API_SPEC``)
+- IOCTLs (``DEFINE_IOCTL_API_SPEC``)
+- Kernel functions (``KAPI_DEFINE_SPEC``)
 
 IDE Integration
 ---------------
@@ -357,6 +535,11 @@ Modern IDEs can use the JSON export for:
 - Context validation
 - Error code documentation
 
+Example IDE integration::
+
+    # Generate IDE completion data
+    $ kapi --format json > .vscode/kernel-apis.json
+
 Testing Framework
 -----------------
 
@@ -367,6 +550,15 @@ The framework includes test helpers::
     kapi_test_api("kmalloc", test_cases);
     #endif
 
+The kapi tool can verify specifications against implementations::
+
+    # Run consistency tests
+    $ cd tools/kapi
+    $ ./test_consistency.sh
+
+    # Compare source vs binary specifications
+    $ ./compare_all_syscalls.sh
+
 Best Practices
 ==============
 
diff --git a/tools/kapi/.gitignore b/tools/kapi/.gitignore
new file mode 100644
index 0000000000000..1390bfc12686c
--- /dev/null
+++ b/tools/kapi/.gitignore
@@ -0,0 +1,4 @@
+# Rust build artifacts
+/target/
+**/*.rs.bk
+
diff --git a/tools/kapi/Cargo.toml b/tools/kapi/Cargo.toml
new file mode 100644
index 0000000000000..4e6bcb10d132f
--- /dev/null
+++ b/tools/kapi/Cargo.toml
@@ -0,0 +1,19 @@
+[package]
+name = "kapi"
+version = "0.1.0"
+edition = "2024"
+authors = ["Sasha Levin <sashal@kernel.org>"]
+description = "Tool for extracting and displaying kernel API specifications"
+license = "GPL-2.0"
+
+[dependencies]
+goblin = "0.10"
+clap = { version = "4.4", features = ["derive"] }
+anyhow = "1.0"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+regex = "1.10"
+walkdir = "2.4"
+
+[dev-dependencies]
+tempfile = "3.8"
diff --git a/tools/kapi/src/extractor/debugfs.rs b/tools/kapi/src/extractor/debugfs.rs
new file mode 100644
index 0000000000000..698c51e50438f
--- /dev/null
+++ b/tools/kapi/src/extractor/debugfs.rs
@@ -0,0 +1,442 @@
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result, bail};
+use serde::Deserialize;
+use std::fs;
+use std::io::Write;
+use std::path::PathBuf;
+
+use super::{ApiExtractor, ApiSpec, CapabilitySpec, display_api_spec};
+
+#[derive(Deserialize)]
+struct KernelApiJson {
+    name: String,
+    api_type: Option<String>,
+    version: Option<u32>,
+    description: Option<String>,
+    long_description: Option<String>,
+    context_flags: Option<u32>,
+    since_version: Option<String>,
+    examples: Option<String>,
+    notes: Option<String>,
+    capabilities: Option<Vec<KernelCapabilityJson>>,
+}
+
+#[derive(Deserialize)]
+struct KernelCapabilityJson {
+    capability: i32,
+    name: String,
+    action: String,
+    allows: String,
+    without_cap: String,
+    check_condition: Option<String>,
+    priority: Option<u8>,
+    alternatives: Option<Vec<i32>>,
+}
+
+/// Extractor for kernel API specifications from debugfs
+pub struct DebugfsExtractor {
+    debugfs_path: PathBuf,
+}
+
+impl DebugfsExtractor {
+    /// Create a new debugfs extractor with the specified debugfs path
+    pub fn new(debugfs_path: Option<String>) -> Result<Self> {
+        let path = match debugfs_path {
+            Some(p) => PathBuf::from(p),
+            None => PathBuf::from("/sys/kernel/debug"),
+        };
+
+        // Check if the debugfs path exists
+        if !path.exists() {
+            bail!("Debugfs path does not exist: {}", path.display());
+        }
+
+        // Check if kapi directory exists
+        let kapi_path = path.join("kapi");
+        if !kapi_path.exists() {
+            bail!(
+                "Kernel API debugfs interface not found at: {}",
+                kapi_path.display()
+            );
+        }
+
+        Ok(Self { debugfs_path: path })
+    }
+
+    /// Parse the list file to get all available API names
+    fn parse_list_file(&self) -> Result<Vec<String>> {
+        let list_path = self.debugfs_path.join("kapi/list");
+        let content = fs::read_to_string(&list_path)
+            .with_context(|| format!("Failed to read {}", list_path.display()))?;
+
+        let mut apis = Vec::new();
+        let mut in_list = false;
+
+        for line in content.lines() {
+            if line.contains("===") {
+                in_list = true;
+                continue;
+            }
+
+            if in_list && line.starts_with("Total:") {
+                break;
+            }
+
+            if in_list && !line.trim().is_empty() {
+                // Extract API name from lines like "sys_read - Read from a file descriptor"
+                if let Some(name) = line.split(" - ").next() {
+                    apis.push(name.trim().to_string());
+                }
+            }
+        }
+
+        Ok(apis)
+    }
+
+    /// Try to parse JSON content, convert context flags from u32 to string representations
+    fn parse_context_flags(flags: u32) -> Vec<String> {
+        let mut result = Vec::new();
+
+        // These values should match KAPI_CTX_* flags from kernel
+        if flags & (1 << 0) != 0 {
+            result.push("PROCESS".to_string());
+        }
+        if flags & (1 << 1) != 0 {
+            result.push("SOFTIRQ".to_string());
+        }
+        if flags & (1 << 2) != 0 {
+            result.push("HARDIRQ".to_string());
+        }
+        if flags & (1 << 3) != 0 {
+            result.push("NMI".to_string());
+        }
+        if flags & (1 << 4) != 0 {
+            result.push("ATOMIC".to_string());
+        }
+        if flags & (1 << 5) != 0 {
+            result.push("SLEEPABLE".to_string());
+        }
+        if flags & (1 << 6) != 0 {
+            result.push("PREEMPT_DISABLED".to_string());
+        }
+        if flags & (1 << 7) != 0 {
+            result.push("IRQ_DISABLED".to_string());
+        }
+
+        result
+    }
+
+    /// Convert capability action from kernel representation
+    fn parse_capability_action(action: &str) -> String {
+        match action {
+            "bypass_check" => "Bypasses check".to_string(),
+            "increase_limit" => "Increases limit".to_string(),
+            "override_restriction" => "Overrides restriction".to_string(),
+            "grant_permission" => "Grants permission".to_string(),
+            "modify_behavior" => "Modifies behavior".to_string(),
+            "access_resource" => "Allows resource access".to_string(),
+            "perform_operation" => "Allows operation".to_string(),
+            _ => action.to_string(),
+        }
+    }
+
+    /// Try to parse as JSON first
+    fn try_parse_json(&self, content: &str) -> Option<ApiSpec> {
+        let json_data: KernelApiJson = serde_json::from_str(content).ok()?;
+
+        let mut spec = ApiSpec {
+            name: json_data.name,
+            api_type: json_data.api_type.unwrap_or_else(|| "unknown".to_string()),
+            description: json_data.description,
+            long_description: json_data.long_description,
+            version: json_data.version.map(|v| v.to_string()),
+            context_flags: json_data
+                .context_flags
+                .map_or_else(Vec::new, Self::parse_context_flags),
+            param_count: None,
+            error_count: None,
+            examples: json_data.examples,
+            notes: json_data.notes,
+            since_version: json_data.since_version,
+            subsystem: None,   // Not in current JSON format
+            sysfs_path: None,  // Not in current JSON format
+            permissions: None, // Not in current JSON format
+            socket_state: None,
+            protocol_behaviors: vec![],
+            addr_families: vec![],
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: vec![],
+            parameters: vec![],
+            return_spec: None,
+            errors: vec![],
+            signals: vec![],
+            signal_masks: vec![],
+            side_effects: vec![],
+            state_transitions: vec![],
+            constraints: vec![],
+            locks: vec![],
+            struct_specs: vec![],
+        };
+
+        // Convert capabilities
+        if let Some(caps) = json_data.capabilities {
+            for cap in caps {
+                spec.capabilities.push(CapabilitySpec {
+                    capability: cap.capability,
+                    name: cap.name,
+                    action: Self::parse_capability_action(&cap.action),
+                    allows: cap.allows,
+                    without_cap: cap.without_cap,
+                    check_condition: cap.check_condition,
+                    priority: cap.priority,
+                    alternatives: cap.alternatives.unwrap_or_default(),
+                });
+            }
+        }
+
+        Some(spec)
+    }
+
+    /// Parse a single API specification file
+    fn parse_spec_file(&self, api_name: &str) -> Result<ApiSpec> {
+        let spec_path = self.debugfs_path.join(format!("kapi/specs/{}", api_name));
+        let content = fs::read_to_string(&spec_path)
+            .with_context(|| format!("Failed to read {}", spec_path.display()))?;
+
+        // Try JSON parsing first
+        if let Some(spec) = self.try_parse_json(&content) {
+            return Ok(spec);
+        }
+
+        // Fall back to plain text parsing
+        let mut spec = ApiSpec {
+            name: api_name.to_string(),
+            api_type: "unknown".to_string(),
+            description: None,
+            long_description: None,
+            version: None,
+            context_flags: Vec::new(),
+            param_count: None,
+            error_count: None,
+            examples: None,
+            notes: None,
+            since_version: None,
+            subsystem: None,
+            sysfs_path: None,
+            permissions: None,
+            socket_state: None,
+            protocol_behaviors: vec![],
+            addr_families: vec![],
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: vec![],
+            parameters: vec![],
+            return_spec: None,
+            errors: vec![],
+            signals: vec![],
+            signal_masks: vec![],
+            side_effects: vec![],
+            state_transitions: vec![],
+            constraints: vec![],
+            locks: vec![],
+            struct_specs: vec![],
+        };
+
+        // Parse the content
+        let mut collecting_multiline = false;
+        let mut multiline_buffer = String::new();
+        let mut multiline_field = "";
+        let mut parsing_capability = false;
+        let mut current_capability: Option<CapabilitySpec> = None;
+
+        for line in content.lines() {
+            // Handle capability sections
+            if line.starts_with("Capabilities (") {
+                continue; // Skip the header
+            }
+            if line.starts_with("  ") && line.contains(" (") && line.ends_with("):") {
+                // Start of a capability entry like "  CAP_IPC_LOCK (14):"
+                if let Some(cap) = current_capability.take() {
+                    spec.capabilities.push(cap);
+                }
+
+                let parts: Vec<&str> = line.trim().split(" (").collect();
+                if parts.len() == 2 {
+                    let cap_name = parts[0].to_string();
+                    let cap_id = parts[1].trim_end_matches("):").parse().unwrap_or(0);
+                    current_capability = Some(CapabilitySpec {
+                        capability: cap_id,
+                        name: cap_name,
+                        action: String::new(),
+                        allows: String::new(),
+                        without_cap: String::new(),
+                        check_condition: None,
+                        priority: None,
+                        alternatives: Vec::new(),
+                    });
+                    parsing_capability = true;
+                }
+                continue;
+            }
+            if parsing_capability && line.starts_with("    ") {
+                // Parse capability fields
+                if let Some(ref mut cap) = current_capability {
+                    if let Some(action) = line.strip_prefix("    Action: ") {
+                        cap.action = action.to_string();
+                    } else if let Some(allows) = line.strip_prefix("    Allows: ") {
+                        cap.allows = allows.to_string();
+                    } else if let Some(without) = line.strip_prefix("    Without: ") {
+                        cap.without_cap = without.to_string();
+                    } else if let Some(cond) = line.strip_prefix("    Condition: ") {
+                        cap.check_condition = Some(cond.to_string());
+                    } else if let Some(prio) = line.strip_prefix("    Priority: ") {
+                        cap.priority = prio.parse().ok();
+                    } else if let Some(alts) = line.strip_prefix("    Alternatives: ") {
+                        cap.alternatives =
+                            alts.split(", ").filter_map(|s| s.parse().ok()).collect();
+                    }
+                }
+                continue;
+            }
+            if parsing_capability && !line.starts_with("  ") {
+                // End of capabilities section
+                if let Some(cap) = current_capability.take() {
+                    spec.capabilities.push(cap);
+                }
+                parsing_capability = false;
+            }
+
+            // Handle section headers
+            if line.starts_with("Parameters (") {
+                if let Some(count_str) = line
+                    .strip_prefix("Parameters (")
+                    .and_then(|s| s.strip_suffix("):"))
+                {
+                    spec.param_count = count_str.parse().ok();
+                }
+                continue;
+            } else if line.starts_with("Errors (") {
+                if let Some(count_str) = line
+                    .strip_prefix("Errors (")
+                    .and_then(|s| s.strip_suffix("):"))
+                {
+                    spec.error_count = count_str.parse().ok();
+                }
+                continue;
+            } else if line.starts_with("Examples:") {
+                collecting_multiline = true;
+                multiline_field = "examples";
+                multiline_buffer.clear();
+                continue;
+            } else if line.starts_with("Notes:") {
+                collecting_multiline = true;
+                multiline_field = "notes";
+                multiline_buffer.clear();
+                continue;
+            }
+
+            // Handle multiline sections
+            if collecting_multiline {
+                if line.trim().is_empty() && multiline_buffer.ends_with("\n\n") {
+                    collecting_multiline = false;
+                    match multiline_field {
+                        "examples" => spec.examples = Some(multiline_buffer.trim().to_string()),
+                        "notes" => spec.notes = Some(multiline_buffer.trim().to_string()),
+                        _ => {}
+                    }
+                    multiline_buffer.clear();
+                } else {
+                    if !multiline_buffer.is_empty() {
+                        multiline_buffer.push('\n');
+                    }
+                    multiline_buffer.push_str(line);
+                }
+                continue;
+            }
+
+            // Parse regular fields
+            if let Some(desc) = line.strip_prefix("Description: ") {
+                spec.description = Some(desc.to_string());
+            } else if let Some(long_desc) = line.strip_prefix("Long description: ") {
+                spec.long_description = Some(long_desc.to_string());
+            } else if let Some(version) = line.strip_prefix("Version: ") {
+                spec.version = Some(version.to_string());
+            } else if let Some(since) = line.strip_prefix("Since: ") {
+                spec.since_version = Some(since.to_string());
+            } else if let Some(flags) = line.strip_prefix("Context flags: ") {
+                spec.context_flags = flags.split_whitespace().map(str::to_string).collect();
+            } else if let Some(subsys) = line.strip_prefix("Subsystem: ") {
+                spec.subsystem = Some(subsys.to_string());
+            } else if let Some(path) = line.strip_prefix("Sysfs Path: ") {
+                spec.sysfs_path = Some(path.to_string());
+            } else if let Some(perms) = line.strip_prefix("Permissions: ") {
+                spec.permissions = Some(perms.to_string());
+            }
+        }
+
+        // Handle any remaining capability
+        if let Some(cap) = current_capability.take() {
+            spec.capabilities.push(cap);
+        }
+
+        // Determine API type based on name
+        if api_name.starts_with("sys_") {
+            spec.api_type = "syscall".to_string();
+        } else if api_name.contains("_ioctl") || api_name.starts_with("ioctl_") {
+            spec.api_type = "ioctl".to_string();
+        } else if api_name.contains("sysfs")
+            || api_name.ends_with("_show")
+            || api_name.ends_with("_store")
+        {
+            spec.api_type = "sysfs".to_string();
+        } else {
+            spec.api_type = "function".to_string();
+        }
+
+        Ok(spec)
+    }
+}
+
+impl ApiExtractor for DebugfsExtractor {
+    fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+        let api_names = self.parse_list_file()?;
+        let mut specs = Vec::new();
+
+        for name in api_names {
+            match self.parse_spec_file(&name) {
+                Ok(spec) => specs.push(spec),
+                Err(_e) => {} // Silently skip files that fail to parse
+            }
+        }
+
+        Ok(specs)
+    }
+
+    fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+        let api_names = self.parse_list_file()?;
+
+        if api_names.contains(&name.to_string()) {
+            Ok(Some(self.parse_spec_file(name)?))
+        } else {
+            Ok(None)
+        }
+    }
+
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        writer: &mut dyn Write,
+    ) -> Result<()> {
+        if let Some(spec) = self.extract_by_name(api_name)? {
+            display_api_spec(&spec, formatter, writer)?;
+        } else {
+            writeln!(writer, "API '{api_name}' not found in debugfs")?;
+        }
+
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/extractor/kerneldoc_parser.rs b/tools/kapi/src/extractor/kerneldoc_parser.rs
new file mode 100644
index 0000000000000..1c6924a0b5291
--- /dev/null
+++ b/tools/kapi/src/extractor/kerneldoc_parser.rs
@@ -0,0 +1,692 @@
+use super::{
+    ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec, ParamSpec,
+    ReturnSpec, SideEffectSpec, SignalSpec, StateTransitionSpec, StructSpec,
+    StructFieldSpec,
+};
+use anyhow::Result;
+use std::collections::HashMap;
+
+/// Real kerneldoc parser that extracts KAPI annotations
+pub struct KerneldocParserImpl;
+
+impl KerneldocParserImpl {
+    pub fn new() -> Self {
+        KerneldocParserImpl
+    }
+
+    pub fn parse_kerneldoc(
+        &self,
+        doc: &str,
+        name: &str,
+        api_type: &str,
+        _signature: Option<&str>,
+    ) -> Result<ApiSpec> {
+        let mut spec = ApiSpec {
+            name: name.to_string(),
+            api_type: api_type.to_string(),
+            description: None,
+            long_description: None,
+            version: None,
+            context_flags: vec![],
+            param_count: None,
+            error_count: None,
+            examples: None,
+            notes: None,
+            since_version: None,
+            subsystem: None,
+            sysfs_path: None,
+            permissions: None,
+            socket_state: None,
+            protocol_behaviors: vec![],
+            addr_families: vec![],
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: vec![],
+            parameters: vec![],
+            return_spec: None,
+            errors: vec![],
+            signals: vec![],
+            signal_masks: vec![],
+            side_effects: vec![],
+            state_transitions: vec![],
+            constraints: vec![],
+            locks: vec![],
+            struct_specs: vec![],
+        };
+
+        // Parse line by line
+        let lines: Vec<&str> = doc.lines().collect();
+        let mut i = 0;
+
+        // Extract main description from function name line
+        if let Some(first_line) = lines.first() {
+            if let Some((_, desc)) = first_line.split_once(" - ") {
+                spec.description = Some(desc.trim().to_string());
+            }
+        }
+
+        // Keep track of parameters we've seen
+        let mut param_map: HashMap<String, ParamSpec> = HashMap::new();
+        let mut struct_fields: Vec<StructFieldSpec> = Vec::new();
+        let mut current_lock: Option<LockSpec> = None;
+        let mut current_signal: Option<SignalSpec> = None;
+        let mut current_capability: Option<CapabilitySpec> = None;
+
+        while i < lines.len() {
+            let line = lines[i].trim();
+
+            // Skip empty lines
+            if line.is_empty() {
+                i += 1;
+                continue;
+            }
+
+            // Parse @param lines
+            if let Some(rest) = line.strip_prefix("@") {
+                if let Some((param_name, desc)) = rest.split_once(':') {
+                    let param_name = param_name.trim();
+                    let desc = desc.trim();
+                    if !param_name.contains('-') {
+                        // This is a basic parameter description - add to map
+                        param_map.insert(param_name.to_string(), ParamSpec {
+                            index: param_map.len() as u32,
+                            name: param_name.to_string(),
+                            type_name: String::new(),
+                            description: desc.to_string(),
+                            flags: 0,
+                            param_type: 0,
+                            constraint_type: 0,
+                            constraint: None,
+                            min_value: None,
+                            max_value: None,
+                            valid_mask: None,
+                            enum_values: vec![],
+                            size: None,
+                            alignment: None,
+                        });
+                    }
+                }
+            }
+            // Parse long-desc
+            else if let Some(rest) = line.strip_prefix("long-desc:") {
+                spec.long_description = Some(self.collect_multiline_value(&lines, i, rest));
+            }
+            // Parse context-flags
+            else if let Some(rest) = line.strip_prefix("context-flags:") {
+                spec.context_flags = self.parse_context_flags(rest.trim());
+            }
+            // Parse param-count
+            else if let Some(rest) = line.strip_prefix("param-count:") {
+                spec.param_count = rest.trim().parse().ok();
+            }
+            // Parse param-type
+            else if let Some(rest) = line.strip_prefix("param-type:") {
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    if let Some(param) = param_map.get_mut(parts[0]) {
+                        param.param_type = self.parse_param_type(parts[1]);
+                    }
+                }
+            }
+            // Parse param-flags
+            else if let Some(rest) = line.strip_prefix("param-flags:") {
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    if let Some(param) = param_map.get_mut(parts[0]) {
+                        param.flags = self.parse_param_flags(parts[1]);
+                    }
+                }
+            }
+            // Parse param-range
+            else if let Some(rest) = line.strip_prefix("param-range:") {
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 3 {
+                    if let Some(param) = param_map.get_mut(parts[0]) {
+                        param.min_value = parts[1].parse().ok();
+                        param.max_value = parts[2].parse().ok();
+                        param.constraint_type = 1; // KAPI_CONSTRAINT_RANGE
+                    }
+                }
+            }
+            // Parse param-constraint
+            else if let Some(rest) = line.strip_prefix("param-constraint:") {
+                let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    if let Some(param) = param_map.get_mut(parts[0]) {
+                        param.constraint = Some(parts[1].to_string());
+                    }
+                }
+            }
+            // Parse error
+            else if let Some(rest) = line.strip_prefix("error:") {
+                // Parse error in format: "ERROR_CODE, description"
+                let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    let error_name = parts[0].to_string();
+                    let description = parts[1].to_string();
+
+                    // Look for desc: line on the next line
+                    let mut full_description = description;
+                    if i + 1 < lines.len() {
+                        if let Some(desc_line) = lines[i + 1].strip_prefix("*   desc:") {
+                            full_description = desc_line.trim().to_string();
+                        } else if let Some(desc_line) = lines[i + 1].strip_prefix("* desc:") {
+                            full_description = desc_line.trim().to_string();
+                        }
+                    }
+
+                    // Map common error names to codes
+                    let error_code = match error_name.as_str() {
+                        "E2BIG" => -7,
+                        "EACCES" => -13,
+                        "EAGAIN" => -11,
+                        "EBADF" => -9,
+                        "EBUSY" => -16,
+                        "EFAULT" => -14,
+                        "EINTR" => -4,
+                        "EINVAL" => -22,
+                        "EIO" => -5,
+                        "EISDIR" => -21,
+                        "ELIBBAD" => -80,
+                        "ELOOP" => -40,
+                        "EMFILE" => -24,
+                        "ENAMETOOLONG" => -36,
+                        "ENFILE" => -23,
+                        "ENOENT" => -2,
+                        "ENOEXEC" => -8,
+                        "ENOMEM" => -12,
+                        "ENOTDIR" => -20,
+                        "EOPNOTSUPP" => -95,
+                        "EPERM" => -1,
+                        "ESRCH" => -3,
+                        "ETXTBSY" => -26,
+                        _ => 0,
+                    };
+
+                    spec.errors.push(ErrorSpec {
+                        error_code,
+                        name: error_name,
+                        condition: String::new(),
+                        description: full_description,
+                    });
+                }
+            }
+            // Parse lock
+            else if let Some(rest) = line.strip_prefix("lock:") {
+                // Save previous lock if any
+                if let Some(lock) = current_lock.take() {
+                    spec.locks.push(lock);
+                }
+
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    current_lock = Some(LockSpec {
+                        lock_name: parts[0].to_string(),
+                        lock_type: self.parse_lock_type(parts[1]),
+                        scope: super::KAPI_LOCK_INTERNAL, // default: acquired and released
+                        description: String::new(),
+                    });
+                }
+            }
+            // Parse lock scope
+            else if let Some(rest) = line.strip_prefix("lock-scope:") {
+                if let Some(lock) = current_lock.as_mut() {
+                    lock.scope = match rest.trim() {
+                        "internal" => super::KAPI_LOCK_INTERNAL,
+                        "acquires" => super::KAPI_LOCK_ACQUIRES,
+                        "releases" => super::KAPI_LOCK_RELEASES,
+                        "caller_held" => super::KAPI_LOCK_CALLER_HELD,
+                        _ => super::KAPI_LOCK_INTERNAL,
+                    };
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("lock-desc:") {
+                if let Some(lock) = current_lock.as_mut() {
+                    lock.description = self.collect_multiline_value(&lines, i, rest);
+                }
+            }
+            // Parse signal
+            else if let Some(rest) = line.strip_prefix("signal:") {
+                // Save previous signal if any
+                if let Some(signal) = current_signal.take() {
+                    spec.signals.push(signal);
+                }
+
+                let signal_name = rest.trim().to_string();
+                current_signal = Some(SignalSpec {
+                    signal_num: 0,
+                    signal_name,
+                    direction: 1,
+                    action: 0,
+                    target: None,
+                    condition: None,
+                    description: None,
+                    restartable: false,
+                    timing: 0,
+                    priority: 0,
+                    interruptible: false,
+                    queue: None,
+                    sa_flags: 0,
+                    sa_flags_required: 0,
+                    sa_flags_forbidden: 0,
+                    state_required: 0,
+                    state_forbidden: 0,
+                    error_on_signal: None,
+                });
+            }
+            // Parse signal attributes
+            else if let Some(rest) = line.strip_prefix("signal-direction:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.direction = self.parse_signal_direction(rest.trim());
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("signal-action:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.action = self.parse_signal_action(rest.trim());
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("signal-condition:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.condition = Some(self.collect_multiline_value(&lines, i, rest));
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("signal-desc:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.description = Some(self.collect_multiline_value(&lines, i, rest));
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("signal-timing:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.timing = self.parse_signal_timing(rest.trim());
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("signal-priority:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.priority = rest.trim().parse().unwrap_or(0);
+                }
+            }
+            else if line.strip_prefix("signal-interruptible:").is_some() {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.interruptible = true;
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("signal-state-req:") {
+                if let Some(signal) = current_signal.as_mut() {
+                    signal.state_required = self.parse_signal_state(rest.trim());
+                }
+            }
+            // Parse side-effect
+            else if let Some(rest) = line.strip_prefix("side-effect:") {
+                let full_effect = self.collect_multiline_value(&lines, i, rest);
+                let parts: Vec<&str> = full_effect.splitn(3, ',').map(|s| s.trim()).collect();
+                if parts.len() >= 3 {
+                    let mut effect = SideEffectSpec {
+                        effect_type: self.parse_effect_type(parts[0]),
+                        target: parts[1].to_string(),
+                        condition: None,
+                        description: parts[2].to_string(),
+                        reversible: false,
+                    };
+
+                    // Check for additional attributes
+                    if let Some(pos) = parts[2].find("condition=") {
+                        let cond_str = &parts[2][pos + 10..];
+                        if let Some(end) = cond_str.find(',') {
+                            effect.condition = Some(cond_str[..end].to_string());
+                        } else {
+                            effect.condition = Some(cond_str.to_string());
+                        }
+                    }
+
+                    if parts[2].contains("reversible=yes") {
+                        effect.reversible = true;
+                    }
+
+                    spec.side_effects.push(effect);
+                }
+            }
+            // Parse state-trans
+            else if let Some(rest) = line.strip_prefix("state-trans:") {
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 4 {
+                    spec.state_transitions.push(StateTransitionSpec {
+                        object: parts[0].to_string(),
+                        from_state: parts[1].to_string(),
+                        to_state: parts[2].to_string(),
+                        condition: None,
+                        description: parts[3].to_string(),
+                    });
+                }
+            }
+            // Parse capability
+            else if let Some(rest) = line.strip_prefix("capability:") {
+                // Save previous capability if any
+                if let Some(cap) = current_capability.take() {
+                    spec.capabilities.push(cap);
+                }
+
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 3 {
+                    current_capability = Some(CapabilitySpec {
+                        capability: self.parse_capability_value(parts[0]),
+                        action: parts[1].to_string(),
+                        name: parts[2].to_string(),
+                        allows: String::new(),
+                        without_cap: String::new(),
+                        check_condition: None,
+                        priority: Some(0),
+                        alternatives: vec![],
+                    });
+                }
+            }
+            // Parse capability attributes
+            else if let Some(rest) = line.strip_prefix("capability-allows:") {
+                if let Some(cap) = current_capability.as_mut() {
+                    cap.allows = self.collect_multiline_value(&lines, i, rest);
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("capability-without:") {
+                if let Some(cap) = current_capability.as_mut() {
+                    cap.without_cap = self.collect_multiline_value(&lines, i, rest);
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("capability-condition:") {
+                if let Some(cap) = current_capability.as_mut() {
+                    cap.check_condition = Some(self.collect_multiline_value(&lines, i, rest));
+                }
+            }
+            else if let Some(rest) = line.strip_prefix("capability-priority:") {
+                if let Some(cap) = current_capability.as_mut() {
+                    cap.priority = rest.trim().parse().ok();
+                }
+            }
+            // Parse constraint
+            else if let Some(rest) = line.strip_prefix("constraint:") {
+                let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    spec.constraints.push(ConstraintSpec {
+                        name: parts[0].to_string(),
+                        description: parts[1].to_string(),
+                        expression: None,
+                    });
+                }
+            }
+            // Parse constraint-expr
+            else if let Some(rest) = line.strip_prefix("constraint-expr:") {
+                let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+                if parts.len() >= 2 {
+                    // Find matching constraint and update it
+                    if let Some(constraint) = spec.constraints.iter_mut().find(|c| c.name == parts[0]) {
+                        constraint.expression = Some(parts[1].to_string());
+                    }
+                }
+            }
+            // Parse struct-field
+            else if let Some(rest) = line.strip_prefix("struct-field:") {
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 3 {
+                    struct_fields.push(StructFieldSpec {
+                        name: parts[0].to_string(),
+                        field_type: self.parse_field_type(parts[1]),
+                        type_name: parts[1].to_string(),
+                        offset: 0,
+                        size: 0,
+                        flags: 0,
+                        constraint_type: 0,
+                        min_value: 0,
+                        max_value: 0,
+                        valid_mask: 0,
+                        description: parts[2].to_string(),
+                    });
+                }
+            }
+            // Parse struct-field-range
+            else if let Some(rest) = line.strip_prefix("struct-field-range:") {
+                let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+                if parts.len() >= 3 {
+                    // Update the field with range
+                    if let Some(field) = struct_fields.iter_mut().find(|f| f.name == parts[0]) {
+                        field.min_value = parts[1].parse().unwrap_or(0);
+                        field.max_value = parts[2].parse().unwrap_or(0);
+                        field.constraint_type = 1; // KAPI_CONSTRAINT_RANGE
+                    }
+                }
+            }
+            // Parse examples
+            else if let Some(rest) = line.strip_prefix("examples:") {
+                spec.examples = Some(self.collect_multiline_value(&lines, i, rest));
+            }
+            // Parse notes
+            else if let Some(rest) = line.strip_prefix("notes:") {
+                spec.notes = Some(self.collect_multiline_value(&lines, i, rest));
+            }
+            // Parse since-version
+            else if let Some(rest) = line.strip_prefix("since-version:") {
+                spec.since_version = Some(rest.trim().to_string());
+            }
+            // Parse return-type
+            else if let Some(rest) = line.strip_prefix("return-type:") {
+                if spec.return_spec.is_none() {
+                    spec.return_spec = Some(ReturnSpec {
+                        type_name: rest.trim().to_string(),
+                        description: String::new(),
+                        return_type: self.parse_param_type(rest.trim()),
+                        check_type: 0,
+                        success_value: None,
+                        success_min: None,
+                        success_max: None,
+                        error_values: vec![],
+                    });
+                }
+            }
+            // Parse return-check-type
+            else if let Some(rest) = line.strip_prefix("return-check-type:") {
+                if let Some(ret) = spec.return_spec.as_mut() {
+                    ret.check_type = self.parse_return_check_type(rest.trim());
+                }
+            }
+            // Parse return-success
+            else if let Some(rest) = line.strip_prefix("return-success:") {
+                if let Some(ret) = spec.return_spec.as_mut() {
+                    ret.success_value = rest.trim().parse().ok();
+                }
+            }
+
+            i += 1;
+        }
+
+        // Save any remaining items
+        if let Some(lock) = current_lock {
+            spec.locks.push(lock);
+        }
+        if let Some(signal) = current_signal {
+            spec.signals.push(signal);
+        }
+        if let Some(cap) = current_capability {
+            spec.capabilities.push(cap);
+        }
+
+        // Convert param_map to vec preserving order
+        let mut params: Vec<ParamSpec> = param_map.into_values().collect();
+        params.sort_by_key(|p| p.index);
+        spec.parameters = params;
+
+        // Create struct spec if we have fields
+        if !struct_fields.is_empty() {
+            spec.struct_specs.push(StructSpec {
+                name: "struct sched_attr".to_string(),
+                size: 120, // Default for sched_attr
+                alignment: 8,
+                field_count: struct_fields.len() as u32,
+                fields: struct_fields,
+                description: "Structure specification".to_string(),
+            });
+        }
+
+        Ok(spec)
+    }
+
+    fn collect_multiline_value(&self, lines: &[&str], start_idx: usize, first_part: &str) -> String {
+        let mut result = String::from(first_part.trim());
+        let mut i = start_idx + 1;
+
+        // Continue collecting lines until we hit another annotation or end
+        while i < lines.len() {
+            let line = lines[i];
+
+            // Stop if we hit another annotation (contains ':' and starts with valid keyword)
+            if self.is_annotation_line(line) {
+                break;
+            }
+
+            // Add continuation lines
+            if !line.trim().is_empty() && line.starts_with("  ") {
+                if !result.is_empty() {
+                    result.push(' ');
+                }
+                result.push_str(line.trim());
+            } else if line.trim().is_empty() {
+                // Empty line might be part of multiline
+                i += 1;
+                continue;
+            } else {
+                // Non-continuation line, stop
+                break;
+            }
+
+            i += 1;
+        }
+
+        result
+    }
+
+    fn is_annotation_line(&self, line: &str) -> bool {
+        let annotations = [
+            "param-", "error-", "lock", "signal", "side-effect:",
+            "state-trans:", "capability", "constraint", "struct-",
+            "return-", "examples:", "notes:", "since-", "context-",
+            "long-desc:"
+        ];
+
+        for ann in &annotations {
+            if line.trim_start().starts_with(ann) {
+                return true;
+            }
+        }
+        false
+    }
+
+    fn parse_context_flags(&self, flags: &str) -> Vec<String> {
+        flags.split('|')
+            .map(|f| f.trim().to_string())
+            .collect()
+    }
+
+    fn parse_param_type(&self, type_str: &str) -> u32 {
+        match type_str {
+            "KAPI_TYPE_INT" => 1,
+            "KAPI_TYPE_UINT" => 2,
+            "KAPI_TYPE_LONG" => 3,
+            "KAPI_TYPE_ULONG" => 4,
+            "KAPI_TYPE_STRING" => 5,
+            "KAPI_TYPE_USER_PTR" => 6,
+            _ => 0,
+        }
+    }
+
+    fn parse_field_type(&self, type_str: &str) -> u32 {
+        match type_str {
+            "__s32" | "int" => 1,
+            "__u32" | "unsigned int" => 2,
+            "__s64" | "long" => 3,
+            "__u64" | "unsigned long" => 4,
+            _ => 0,
+        }
+    }
+
+    fn parse_param_flags(&self, flags: &str) -> u32 {
+        let mut result = 0;
+        for flag in flags.split('|') {
+            match flag.trim() {
+                "KAPI_PARAM_IN" => result |= 1,
+                "KAPI_PARAM_OUT" => result |= 2,
+                "KAPI_PARAM_INOUT" => result |= 3,
+                "KAPI_PARAM_USER" => result |= 4,
+                _ => {}
+            }
+        }
+        result
+    }
+
+    fn parse_lock_type(&self, type_str: &str) -> u32 {
+        match type_str {
+            "KAPI_LOCK_SPINLOCK" => 0,
+            "KAPI_LOCK_MUTEX" => 1,
+            "KAPI_LOCK_RWLOCK" => 2,
+            _ => 3,
+        }
+    }
+
+    fn parse_signal_direction(&self, dir: &str) -> u32 {
+        match dir {
+            "KAPI_SIGNAL_SEND" => 1,
+            "KAPI_SIGNAL_RECEIVE" => 2,
+            _ => 0,
+        }
+    }
+
+    fn parse_signal_action(&self, action: &str) -> u32 {
+        match action {
+            "KAPI_SIGNAL_ACTION_DEFAULT" => 0,
+            "KAPI_SIGNAL_ACTION_IGNORE" => 1,
+            "KAPI_SIGNAL_ACTION_CUSTOM" => 2,
+            _ => 0,
+        }
+    }
+
+    fn parse_signal_timing(&self, timing: &str) -> u32 {
+        match timing {
+            "KAPI_SIGNAL_TIME_BEFORE" => 0,
+            "KAPI_SIGNAL_TIME_DURING" => 1,
+            "KAPI_SIGNAL_TIME_AFTER" => 2,
+            _ => 0,
+        }
+    }
+
+    fn parse_signal_state(&self, state: &str) -> u32 {
+        match state {
+            "KAPI_SIGNAL_STATE_RUNNING" => 1,
+            "KAPI_SIGNAL_STATE_SLEEPING" => 2,
+            _ => 0,
+        }
+    }
+
+    fn parse_effect_type(&self, type_str: &str) -> u32 {
+        let mut result = 0;
+        for flag in type_str.split('|') {
+            match flag.trim() {
+                "KAPI_EFFECT_MODIFY_STATE" => result |= 1,
+                "KAPI_EFFECT_PROCESS_STATE" => result |= 2,
+                "KAPI_EFFECT_SCHEDULE" => result |= 4,
+                _ => {}
+            }
+        }
+        result
+    }
+
+    fn parse_capability_value(&self, cap: &str) -> i32 {
+        match cap {
+            "CAP_SYS_NICE" => 23,
+            _ => 0,
+        }
+    }
+
+    fn parse_return_check_type(&self, check: &str) -> u32 {
+        match check {
+            "KAPI_RETURN_ERROR_CHECK" => 1,
+            "KAPI_RETURN_SUCCESS_CHECK" => 2,
+            _ => 0,
+        }
+    }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/extractor/mod.rs b/tools/kapi/src/extractor/mod.rs
new file mode 100644
index 0000000000000..4eeb03b9a4ca3
--- /dev/null
+++ b/tools/kapi/src/extractor/mod.rs
@@ -0,0 +1,464 @@
+use crate::formatter::OutputFormatter;
+use anyhow::Result;
+use std::convert::TryInto;
+use std::io::Write;
+
+pub mod debugfs;
+pub mod kerneldoc_parser;
+pub mod source_parser;
+pub mod vmlinux;
+
+pub use debugfs::DebugfsExtractor;
+pub use source_parser::SourceExtractor;
+pub use vmlinux::VmlinuxExtractor;
+
+/// Socket state specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SocketStateSpec {
+    pub required_states: Vec<String>,
+    pub forbidden_states: Vec<String>,
+    pub resulting_state: Option<String>,
+    pub condition: Option<String>,
+    pub applicable_protocols: Option<String>,
+}
+
+/// Protocol behavior specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ProtocolBehaviorSpec {
+    pub applicable_protocols: String,
+    pub behavior: String,
+    pub protocol_flags: Option<String>,
+    pub flag_description: Option<String>,
+}
+
+/// Address family specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AddrFamilySpec {
+    pub family: i32,
+    pub family_name: String,
+    pub addr_struct_size: usize,
+    pub min_addr_len: usize,
+    pub max_addr_len: usize,
+    pub addr_format: Option<String>,
+    pub supports_wildcard: bool,
+    pub supports_multicast: bool,
+    pub supports_broadcast: bool,
+    pub special_addresses: Option<String>,
+    pub port_range_min: u32,
+    pub port_range_max: u32,
+}
+
+/// Buffer specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct BufferSpec {
+    pub buffer_behaviors: Option<String>,
+    pub min_buffer_size: Option<usize>,
+    pub max_buffer_size: Option<usize>,
+    pub optimal_buffer_size: Option<usize>,
+}
+
+/// Async specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AsyncSpec {
+    pub supported_modes: Option<String>,
+    pub nonblock_errno: Option<i32>,
+}
+
+/// Capability specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct CapabilitySpec {
+    pub capability: i32,
+    pub name: String,
+    pub action: String,
+    pub allows: String,
+    pub without_cap: String,
+    pub check_condition: Option<String>,
+    pub priority: Option<u8>,
+    pub alternatives: Vec<i32>,
+}
+
+/// Parameter specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ParamSpec {
+    pub index: u32,
+    pub name: String,
+    pub type_name: String,
+    pub description: String,
+    pub flags: u32,
+    pub param_type: u32,
+    pub constraint_type: u32,
+    pub constraint: Option<String>,
+    pub min_value: Option<i64>,
+    pub max_value: Option<i64>,
+    pub valid_mask: Option<u64>,
+    pub enum_values: Vec<String>,
+    pub size: Option<u32>,
+    pub alignment: Option<u32>,
+}
+
+/// Return value specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ReturnSpec {
+    pub type_name: String,
+    pub description: String,
+    pub return_type: u32,
+    pub check_type: u32,
+    pub success_value: Option<i64>,
+    pub success_min: Option<i64>,
+    pub success_max: Option<i64>,
+    pub error_values: Vec<i32>,
+}
+
+/// Error specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ErrorSpec {
+    pub error_code: i32,
+    pub name: String,
+    pub condition: String,
+    pub description: String,
+}
+
+/// Signal specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalSpec {
+    pub signal_num: i32,
+    pub signal_name: String,
+    pub direction: u32,
+    pub action: u32,
+    pub target: Option<String>,
+    pub condition: Option<String>,
+    pub description: Option<String>,
+    pub timing: u32,
+    pub priority: u32,
+    pub restartable: bool,
+    pub interruptible: bool,
+    pub queue: Option<String>,
+    pub sa_flags: u32,
+    pub sa_flags_required: u32,
+    pub sa_flags_forbidden: u32,
+    pub state_required: u32,
+    pub state_forbidden: u32,
+    pub error_on_signal: Option<i32>,
+}
+
+/// Signal mask specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalMaskSpec {
+    pub name: String,
+    pub description: String,
+}
+
+/// Side effect specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SideEffectSpec {
+    pub effect_type: u32,
+    pub target: String,
+    pub condition: Option<String>,
+    pub description: String,
+    pub reversible: bool,
+}
+
+/// State transition specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StateTransitionSpec {
+    pub object: String,
+    pub from_state: String,
+    pub to_state: String,
+    pub condition: Option<String>,
+    pub description: String,
+}
+
+/// Constraint specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ConstraintSpec {
+    pub name: String,
+    pub description: String,
+    pub expression: Option<String>,
+}
+
+/// Lock scope enum values matching kernel enum kapi_lock_scope
+pub const KAPI_LOCK_INTERNAL: u32 = 0;
+pub const KAPI_LOCK_ACQUIRES: u32 = 1;
+pub const KAPI_LOCK_RELEASES: u32 = 2;
+pub const KAPI_LOCK_CALLER_HELD: u32 = 3;
+
+/// Lock specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct LockSpec {
+    pub lock_name: String,
+    pub lock_type: u32,
+    pub scope: u32,
+    pub description: String,
+}
+
+/// Struct field specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StructFieldSpec {
+    pub name: String,
+    pub field_type: u32,
+    pub type_name: String,
+    pub offset: usize,
+    pub size: usize,
+    pub flags: u32,
+    pub constraint_type: u32,
+    pub min_value: i64,
+    pub max_value: i64,
+    pub valid_mask: u64,
+    pub description: String,
+}
+
+/// Struct specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StructSpec {
+    pub name: String,
+    pub size: usize,
+    pub alignment: usize,
+    pub field_count: u32,
+    pub fields: Vec<StructFieldSpec>,
+    pub description: String,
+}
+
+/// Common API specification information that all extractors should provide
+#[derive(Debug, Clone)]
+pub struct ApiSpec {
+    pub name: String,
+    pub api_type: String,
+    pub description: Option<String>,
+    pub long_description: Option<String>,
+    pub version: Option<String>,
+    pub context_flags: Vec<String>,
+    pub param_count: Option<u32>,
+    pub error_count: Option<u32>,
+    pub examples: Option<String>,
+    pub notes: Option<String>,
+    pub since_version: Option<String>,
+    // Sysfs-specific fields
+    pub subsystem: Option<String>,
+    pub sysfs_path: Option<String>,
+    pub permissions: Option<String>,
+    // Networking-specific fields
+    pub socket_state: Option<SocketStateSpec>,
+    pub protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+    pub addr_families: Vec<AddrFamilySpec>,
+    pub buffer_spec: Option<BufferSpec>,
+    pub async_spec: Option<AsyncSpec>,
+    pub net_data_transfer: Option<String>,
+    pub capabilities: Vec<CapabilitySpec>,
+    pub parameters: Vec<ParamSpec>,
+    pub return_spec: Option<ReturnSpec>,
+    pub errors: Vec<ErrorSpec>,
+    pub signals: Vec<SignalSpec>,
+    pub signal_masks: Vec<SignalMaskSpec>,
+    pub side_effects: Vec<SideEffectSpec>,
+    pub state_transitions: Vec<StateTransitionSpec>,
+    pub constraints: Vec<ConstraintSpec>,
+    pub locks: Vec<LockSpec>,
+    pub struct_specs: Vec<StructSpec>,
+}
+
+/// Trait for extracting API specifications from different sources
+pub trait ApiExtractor {
+    /// Extract all API specifications from the source
+    fn extract_all(&self) -> Result<Vec<ApiSpec>>;
+
+    /// Extract a specific API specification by name
+    fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>>;
+
+    /// Display detailed information about a specific API
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        writer: &mut dyn Write,
+    ) -> Result<()>;
+}
+
+/// Helper function to display an ApiSpec using a formatter
+pub fn display_api_spec(
+    spec: &ApiSpec,
+    formatter: &mut dyn OutputFormatter,
+    writer: &mut dyn Write,
+) -> Result<()> {
+    formatter.begin_api_details(writer, &spec.name)?;
+
+    if let Some(desc) = &spec.description {
+        formatter.description(writer, desc)?;
+    }
+
+    if let Some(long_desc) = &spec.long_description {
+        formatter.long_description(writer, long_desc)?;
+    }
+
+    if let Some(version) = &spec.since_version {
+        formatter.since_version(writer, version)?;
+    }
+
+    if !spec.context_flags.is_empty() {
+        formatter.begin_context_flags(writer)?;
+        for flag in &spec.context_flags {
+            formatter.context_flag(writer, flag)?;
+        }
+        formatter.end_context_flags(writer)?;
+    }
+
+    if !spec.parameters.is_empty() {
+        formatter.begin_parameters(writer, spec.parameters.len().try_into().unwrap_or(u32::MAX))?;
+        for param in &spec.parameters {
+            formatter.parameter(writer, param)?;
+        }
+        formatter.end_parameters(writer)?;
+    }
+
+    if let Some(ret) = &spec.return_spec {
+        formatter.return_spec(writer, ret)?;
+    }
+
+    if !spec.errors.is_empty() {
+        formatter.begin_errors(writer, spec.errors.len().try_into().unwrap_or(u32::MAX))?;
+        for error in &spec.errors {
+            formatter.error(writer, error)?;
+        }
+        formatter.end_errors(writer)?;
+    }
+
+    if let Some(notes) = &spec.notes {
+        formatter.notes(writer, notes)?;
+    }
+
+    if let Some(examples) = &spec.examples {
+        formatter.examples(writer, examples)?;
+    }
+
+    // Display sysfs-specific fields
+    if spec.api_type == "sysfs" {
+        if let Some(subsystem) = &spec.subsystem {
+            formatter.sysfs_subsystem(writer, subsystem)?;
+        }
+        if let Some(path) = &spec.sysfs_path {
+            formatter.sysfs_path(writer, path)?;
+        }
+        if let Some(perms) = &spec.permissions {
+            formatter.sysfs_permissions(writer, perms)?;
+        }
+    }
+
+    // Display networking-specific fields
+    if let Some(socket_state) = &spec.socket_state {
+        formatter.socket_state(writer, socket_state)?;
+    }
+
+    if !spec.protocol_behaviors.is_empty() {
+        formatter.begin_protocol_behaviors(writer)?;
+        for behavior in &spec.protocol_behaviors {
+            formatter.protocol_behavior(writer, behavior)?;
+        }
+        formatter.end_protocol_behaviors(writer)?;
+    }
+
+    if !spec.addr_families.is_empty() {
+        formatter.begin_addr_families(writer)?;
+        for family in &spec.addr_families {
+            formatter.addr_family(writer, family)?;
+        }
+        formatter.end_addr_families(writer)?;
+    }
+
+    if let Some(buffer_spec) = &spec.buffer_spec {
+        formatter.buffer_spec(writer, buffer_spec)?;
+    }
+
+    if let Some(async_spec) = &spec.async_spec {
+        formatter.async_spec(writer, async_spec)?;
+    }
+
+    if let Some(net_data_transfer) = &spec.net_data_transfer {
+        formatter.net_data_transfer(writer, net_data_transfer)?;
+    }
+
+    if !spec.capabilities.is_empty() {
+        formatter.begin_capabilities(writer)?;
+        for cap in &spec.capabilities {
+            formatter.capability(writer, cap)?;
+        }
+        formatter.end_capabilities(writer)?;
+    }
+
+    // Display signals
+    if !spec.signals.is_empty() {
+        formatter.begin_signals(writer, spec.signals.len().try_into().unwrap_or(u32::MAX))?;
+        for signal in &spec.signals {
+            formatter.signal(writer, signal)?;
+        }
+        formatter.end_signals(writer)?;
+    }
+
+    // Display signal masks
+    if !spec.signal_masks.is_empty() {
+        formatter.begin_signal_masks(
+            writer,
+            spec.signal_masks.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for mask in &spec.signal_masks {
+            formatter.signal_mask(writer, mask)?;
+        }
+        formatter.end_signal_masks(writer)?;
+    }
+
+    // Display side effects
+    if !spec.side_effects.is_empty() {
+        formatter.begin_side_effects(
+            writer,
+            spec.side_effects.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for effect in &spec.side_effects {
+            formatter.side_effect(writer, effect)?;
+        }
+        formatter.end_side_effects(writer)?;
+    }
+
+    // Display state transitions
+    if !spec.state_transitions.is_empty() {
+        formatter.begin_state_transitions(
+            writer,
+            spec.state_transitions.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for trans in &spec.state_transitions {
+            formatter.state_transition(writer, trans)?;
+        }
+        formatter.end_state_transitions(writer)?;
+    }
+
+    // Display constraints
+    if !spec.constraints.is_empty() {
+        formatter.begin_constraints(
+            writer,
+            spec.constraints.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for constraint in &spec.constraints {
+            formatter.constraint(writer, constraint)?;
+        }
+        formatter.end_constraints(writer)?;
+    }
+
+    // Display locks
+    if !spec.locks.is_empty() {
+        formatter.begin_locks(writer, spec.locks.len().try_into().unwrap_or(u32::MAX))?;
+        for lock in &spec.locks {
+            formatter.lock(writer, lock)?;
+        }
+        formatter.end_locks(writer)?;
+    }
+
+    // Display struct specs
+    if !spec.struct_specs.is_empty() {
+        formatter.begin_struct_specs(writer, spec.struct_specs.len().try_into().unwrap_or(u32::MAX))?;
+        for struct_spec in &spec.struct_specs {
+            formatter.struct_spec(writer, struct_spec)?;
+        }
+        formatter.end_struct_specs(writer)?;
+    }
+
+    formatter.end_api_details(writer)?;
+
+    Ok(())
+}
diff --git a/tools/kapi/src/extractor/source_parser.rs b/tools/kapi/src/extractor/source_parser.rs
new file mode 100644
index 0000000000000..7a72b85a83bea
--- /dev/null
+++ b/tools/kapi/src/extractor/source_parser.rs
@@ -0,0 +1,213 @@
+use super::{
+    ApiExtractor, ApiSpec, display_api_spec,
+};
+use super::kerneldoc_parser::KerneldocParserImpl;
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result};
+use regex::Regex;
+use std::fs;
+use std::io::Write;
+use std::path::Path;
+use walkdir::WalkDir;
+
+/// Extractor for kernel source files with KAPI-annotated kerneldoc
+pub struct SourceExtractor {
+    path: String,
+    parser: KerneldocParserImpl,
+    syscall_regex: Regex,
+    ioctl_regex: Regex,
+    function_regex: Regex,
+}
+
+impl SourceExtractor {
+    pub fn new(path: &str) -> Result<Self> {
+        Ok(SourceExtractor {
+            path: path.to_string(),
+            parser: KerneldocParserImpl::new(),
+            syscall_regex: Regex::new(r"SYSCALL_DEFINE\d+\((\w+)")?,
+            ioctl_regex: Regex::new(r"(?:static\s+)?long\s+(\w+_ioctl)\s*\(")?,
+            function_regex: Regex::new(
+                r"(?m)^(?:static\s+)?(?:inline\s+)?(?:(?:unsigned\s+)?(?:long|int|void|char|short|struct\s+\w+\s*\*?|[\w_]+_t)\s*\*?\s+)?(\w+)\s*\([^)]*\)",
+            )?,
+        })
+    }
+
+    fn extract_from_file(&self, path: &Path) -> Result<Vec<ApiSpec>> {
+        let content = fs::read_to_string(path)
+            .with_context(|| format!("Failed to read file: {}", path.display()))?;
+
+        self.extract_from_content(&content)
+    }
+
+    fn extract_from_content(&self, content: &str) -> Result<Vec<ApiSpec>> {
+        let mut specs = Vec::new();
+        let mut in_kerneldoc = false;
+        let mut current_doc = String::new();
+        let lines: Vec<&str> = content.lines().collect();
+        let mut i = 0;
+
+        while i < lines.len() {
+            let line = lines[i];
+
+            // Start of kerneldoc comment
+            if line.trim_start().starts_with("/**") {
+                in_kerneldoc = true;
+                current_doc.clear();
+                i += 1;
+                continue;
+            }
+
+            // Inside kerneldoc comment
+            if in_kerneldoc {
+                if line.contains("*/") {
+                    in_kerneldoc = false;
+
+                    // Check if this kerneldoc has KAPI annotations
+                    if current_doc.contains("context-flags:") ||
+                       current_doc.contains("param-count:") ||
+                       current_doc.contains("side-effect:") ||
+                       current_doc.contains("state-trans:") ||
+                       current_doc.contains("error-code:") {
+
+                        // Look ahead for the function declaration
+                        if let Some((name, api_type, signature)) = self.find_function_after(&lines, i + 1) {
+                            if let Ok(spec) = self.parser.parse_kerneldoc(&current_doc, &name, &api_type, Some(&signature)) {
+                                specs.push(spec);
+                            }
+                        }
+                    }
+                } else {
+                    // Remove leading asterisk and preserve content
+                    let cleaned = if let Some(stripped) = line.trim_start().strip_prefix("*") {
+                        if let Some(no_space) = stripped.strip_prefix(' ') {
+                            no_space
+                        } else {
+                            stripped
+                        }
+                    } else {
+                        line.trim_start()
+                    };
+                    current_doc.push_str(cleaned);
+                    current_doc.push('\n');
+                }
+            }
+
+            i += 1;
+        }
+
+        Ok(specs)
+    }
+
+    fn find_function_after(&self, lines: &[&str], start: usize) -> Option<(String, String, String)> {
+        for i in start..lines.len().min(start + 10) {
+            let line = lines[i];
+
+            // Skip empty lines
+            if line.trim().is_empty() {
+                continue;
+            }
+
+            // Check for SYSCALL_DEFINE
+            if let Some(caps) = self.syscall_regex.captures(line) {
+                let name = format!("sys_{}", caps.get(1).unwrap().as_str());
+                let signature = self.extract_syscall_signature(lines, i);
+                return Some((name, "syscall".to_string(), signature));
+            }
+
+            // Check for ioctl function
+            if let Some(caps) = self.ioctl_regex.captures(line) {
+                let name = caps.get(1).unwrap().as_str().to_string();
+                return Some((name, "ioctl".to_string(), line.to_string()));
+            }
+
+            // Check for regular function
+            if let Some(caps) = self.function_regex.captures(line) {
+                let name = caps.get(1).unwrap().as_str().to_string();
+                return Some((name, "function".to_string(), line.to_string()));
+            }
+
+            // Stop if we hit something that's clearly not part of the function declaration
+            if !line.starts_with(' ') && !line.starts_with('\t') && !line.trim().is_empty() {
+                break;
+            }
+        }
+
+        None
+    }
+
+    fn extract_syscall_signature(&self, lines: &[&str], start: usize) -> String {
+        // Extract the full SYSCALL_DEFINE signature
+        let mut sig = String::new();
+        let mut in_paren = false;
+        let mut paren_count = 0;
+
+        for line in lines.iter().skip(start).take(20) {
+            let line = *line;
+
+            // Start of SYSCALL_DEFINE
+            if line.contains("SYSCALL_DEFINE") {
+                if let Some(pos) = line.find('(') {
+                    sig.push_str(&line[pos..]);
+                    in_paren = true;
+                    paren_count = line[pos..].chars().filter(|&c| c == '(').count() -
+                                  line[pos..].chars().filter(|&c| c == ')').count();
+                }
+            } else if in_paren {
+                sig.push(' ');
+                sig.push_str(line.trim());
+                paren_count += line.chars().filter(|&c| c == '(').count();
+                paren_count -= line.chars().filter(|&c| c == ')').count();
+
+                if paren_count == 0 {
+                    break;
+                }
+            }
+        }
+
+        sig
+    }
+}
+
+impl ApiExtractor for SourceExtractor {
+    fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+        let path = Path::new(&self.path);
+        let mut all_specs = Vec::new();
+
+        if path.is_file() {
+            // Single file
+            all_specs.extend(self.extract_from_file(path)?);
+        } else if path.is_dir() {
+            // Directory - walk all .c files
+            for entry in WalkDir::new(path)
+                .into_iter()
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().extension().is_some_and(|ext| ext == "c"))
+            {
+                if let Ok(specs) = self.extract_from_file(entry.path()) {
+                    all_specs.extend(specs);
+                }
+            }
+        }
+
+        Ok(all_specs)
+    }
+
+    fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+        let all_specs = self.extract_all()?;
+        Ok(all_specs.into_iter().find(|s| s.name == name))
+    }
+
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        output: &mut dyn Write,
+    ) -> Result<()> {
+        if let Some(spec) = self.extract_by_name(api_name)? {
+            display_api_spec(&spec, formatter, output)?;
+        } else {
+            writeln!(output, "API '{}' not found", api_name)?;
+        }
+        Ok(())
+    }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/extractor/vmlinux/binary_utils.rs b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
new file mode 100644
index 0000000000000..0a51943e1c027
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
@@ -0,0 +1,180 @@
+// Constants for all structure field sizes
+pub mod sizes {
+    pub const NAME: usize = 128;
+    pub const DESC: usize = 512;
+    pub const MAX_PARAMS: usize = 16;
+    pub const MAX_ERRORS: usize = 32;
+    pub const MAX_CONSTRAINTS: usize = 16;
+    pub const MAX_CAPABILITIES: usize = 8;
+    pub const MAX_SIGNALS: usize = 16;
+    pub const MAX_STRUCT_SPECS: usize = 8;
+    pub const MAX_SIDE_EFFECTS: usize = 32;
+    pub const MAX_STATE_TRANS: usize = 16;
+    pub const MAX_PROTOCOL_BEHAVIORS: usize = 8;
+    pub const MAX_ADDR_FAMILIES: usize = 8;
+}
+
+// Helper for reading data at specific offsets
+pub struct DataReader<'a> {
+    pub data: &'a [u8],
+    pub pos: usize,
+}
+
+impl<'a> DataReader<'a> {
+    pub fn new(data: &'a [u8], offset: usize) -> Self {
+        Self { data, pos: offset }
+    }
+
+    pub fn read_bytes(&mut self, len: usize) -> Option<&'a [u8]> {
+        if self.pos + len <= self.data.len() {
+            let bytes = &self.data[self.pos..self.pos + len];
+            self.pos += len;
+            Some(bytes)
+        } else {
+            None
+        }
+    }
+
+    pub fn read_cstring(&mut self, max_len: usize) -> Option<String> {
+        let bytes = self.read_bytes(max_len)?;
+        if let Some(null_pos) = bytes.iter().position(|&b| b == 0) {
+            if null_pos > 0 {
+                if let Ok(s) = std::str::from_utf8(&bytes[..null_pos]) {
+                    return Some(s.to_string());
+                }
+            }
+        }
+        None
+    }
+
+    pub fn read_u32(&mut self) -> Option<u32> {
+        self.read_bytes(4).map(|b| u32::from_le_bytes(b.try_into().unwrap()))
+    }
+
+    pub fn read_u8(&mut self) -> Option<u8> {
+        self.read_bytes(1).map(|b| b[0])
+    }
+
+    pub fn read_i32(&mut self) -> Option<i32> {
+        self.read_bytes(4).map(|b| i32::from_le_bytes(b.try_into().unwrap()))
+    }
+
+    pub fn read_u64(&mut self) -> Option<u64> {
+        self.read_bytes(8).map(|b| u64::from_le_bytes(b.try_into().unwrap()))
+    }
+
+    pub fn read_i64(&mut self) -> Option<i64> {
+        self.read_bytes(8).map(|b| i64::from_le_bytes(b.try_into().unwrap()))
+    }
+
+    pub fn read_usize(&mut self) -> Option<usize> {
+        self.read_u64().map(|v| v as usize)
+    }
+
+    pub fn skip(&mut self, len: usize) {
+        self.pos = (self.pos + len).min(self.data.len());
+    }
+
+    // Helper methods for common patterns
+    pub fn read_bool(&mut self) -> Option<bool> {
+        self.read_u8().map(|v| v != 0)
+    }
+
+    pub fn read_optional_string(&mut self, max_len: usize) -> Option<String> {
+        self.read_cstring(max_len).filter(|s| !s.is_empty())
+    }
+
+    pub fn read_string_or_default(&mut self, max_len: usize) -> String {
+        self.read_cstring(max_len).unwrap_or_default()
+    }
+
+    // Skip and discard - advances position by reading and discarding
+    pub fn discard_cstring(&mut self, max_len: usize) {
+        let _ = self.read_cstring(max_len);
+    }
+
+    // Read multiple booleans at once
+    pub fn read_bools<const N: usize>(&mut self) -> Option<[bool; N]> {
+        let mut result = [false; N];
+        for item in &mut result {
+            *item = self.read_bool()?;
+        }
+        Some(result)
+    }
+
+
+}
+
+// Structure layout definitions for calculating sizes
+pub fn signal_mask_spec_layout_size() -> usize {
+    // Packed structure from struct kapi_signal_mask_spec
+    sizes::NAME + // mask_name
+    4 * sizes::MAX_SIGNALS + // signals array
+    4 + // signal_count
+    sizes::DESC // description
+}
+
+pub fn struct_field_layout_size() -> usize {
+    // Packed structure from struct kapi_struct_field
+    sizes::NAME + // name
+    4 + // type (enum)
+    sizes::NAME + // type_name
+    8 + // offset (size_t)
+    8 + // size (size_t)
+    4 + // flags
+    4 + // constraint_type (enum)
+    8 + // min_value (s64)
+    8 + // max_value (s64)
+    8 + // valid_mask (u64)
+    sizes::DESC + // enum_values
+    sizes::DESC // description
+}
+
+pub fn socket_state_spec_layout_size() -> usize {
+    // struct kapi_socket_state_spec
+    sizes::NAME * sizes::MAX_CONSTRAINTS + // required_states array
+    sizes::NAME * sizes::MAX_CONSTRAINTS + // forbidden_states array
+    sizes::NAME + // resulting_state
+    sizes::DESC + // condition
+    sizes::NAME + // applicable_protocols
+    4 + // required_count
+    4 // forbidden_count
+}
+
+pub fn protocol_behavior_spec_layout_size() -> usize {
+    // struct kapi_protocol_behavior
+    sizes::NAME + // applicable_protocols
+    sizes::DESC + // behavior
+    sizes::NAME + // protocol_flags
+    sizes::DESC // flag_description
+}
+
+pub fn buffer_spec_layout_size() -> usize {
+    // struct kapi_buffer_spec
+    sizes::DESC + // buffer_behaviors
+    8 + // min_buffer_size (size_t)
+    8 + // max_buffer_size (size_t)
+    8 // optimal_buffer_size (size_t)
+}
+
+pub fn async_spec_layout_size() -> usize {
+    // struct kapi_async_spec
+    sizes::NAME + // supported_modes
+    4 // nonblock_errno (int)
+}
+
+pub fn addr_family_spec_layout_size() -> usize {
+    // struct kapi_addr_family_spec
+    4 + // family (int)
+    sizes::NAME + // family_name
+    8 + // addr_struct_size (size_t)
+    8 + // min_addr_len (size_t)
+    8 + // max_addr_len (size_t)
+    sizes::DESC + // addr_format
+    1 + // supports_wildcard (bool)
+    1 + // supports_multicast (bool)
+    1 + // supports_broadcast (bool)
+    sizes::DESC + // special_addresses
+    4 + // port_range_min (u32)
+    4 // port_range_max (u32)
+}
diff --git a/tools/kapi/src/extractor/vmlinux/magic_finder.rs b/tools/kapi/src/extractor/vmlinux/magic_finder.rs
new file mode 100644
index 0000000000000..cb7dc535801a0
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/magic_finder.rs
@@ -0,0 +1,102 @@
+// Magic markers for each section
+pub const MAGIC_PARAM: u32 = 0x4B415031;    // 'KAP1'
+pub const MAGIC_RETURN: u32 = 0x4B415232;   // 'KAR2'
+pub const MAGIC_ERROR: u32 = 0x4B414533;    // 'KAE3'
+pub const MAGIC_LOCK: u32 = 0x4B414C34;     // 'KAL4'
+pub const MAGIC_CONSTRAINT: u32 = 0x4B414335; // 'KAC5'
+pub const MAGIC_INFO: u32 = 0x4B414936;     // 'KAI6'
+pub const MAGIC_SIGNAL: u32 = 0x4B415337;   // 'KAS7'
+pub const MAGIC_SIGMASK: u32 = 0x4B414D38;  // 'KAM8'
+pub const MAGIC_STRUCT: u32 = 0x4B415439;   // 'KAT9'
+pub const MAGIC_EFFECT: u32 = 0x4B414641;   // 'KAFA'
+pub const MAGIC_TRANS: u32 = 0x4B415442;    // 'KATB'
+pub const MAGIC_CAP: u32 = 0x4B414343;      // 'KACC'
+
+pub struct MagicOffsets {
+    pub param_offset: Option<usize>,
+    pub return_offset: Option<usize>,
+    pub error_offset: Option<usize>,
+    pub lock_offset: Option<usize>,
+    pub constraint_offset: Option<usize>,
+    pub info_offset: Option<usize>,
+    pub signal_offset: Option<usize>,
+    pub sigmask_offset: Option<usize>,
+    pub struct_offset: Option<usize>,
+    pub effect_offset: Option<usize>,
+    pub trans_offset: Option<usize>,
+    pub cap_offset: Option<usize>,
+}
+
+impl MagicOffsets {
+    /// Find magic markers in the provided data slice
+    /// data: slice of data to search (typically one spec's worth)
+    /// base_offset: absolute offset where this slice starts in the full buffer
+    pub fn find_in_data(data: &[u8], base_offset: usize) -> Self {
+        let mut offsets = MagicOffsets {
+            param_offset: None,
+            return_offset: None,
+            error_offset: None,
+            lock_offset: None,
+            constraint_offset: None,
+            info_offset: None,
+            signal_offset: None,
+            sigmask_offset: None,
+            struct_offset: None,
+            effect_offset: None,
+            trans_offset: None,
+            cap_offset: None,
+        };
+
+        // Scan through data looking for magic markers
+        // Only find the first occurrence of each magic to avoid cross-spec contamination
+        let mut i = 0;
+        while i + 4 <= data.len() {
+            let bytes = &data[i..i + 4];
+            let value = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
+
+            match value {
+                MAGIC_PARAM if offsets.param_offset.is_none() => {
+                    offsets.param_offset = Some(base_offset + i);
+                },
+                MAGIC_RETURN if offsets.return_offset.is_none() => {
+                    offsets.return_offset = Some(base_offset + i);
+                },
+                MAGIC_ERROR if offsets.error_offset.is_none() => {
+                    offsets.error_offset = Some(base_offset + i);
+                },
+                MAGIC_LOCK if offsets.lock_offset.is_none() => {
+                    offsets.lock_offset = Some(base_offset + i);
+                },
+                MAGIC_CONSTRAINT if offsets.constraint_offset.is_none() => {
+                    offsets.constraint_offset = Some(base_offset + i);
+                },
+                MAGIC_INFO if offsets.info_offset.is_none() => {
+                    offsets.info_offset = Some(base_offset + i);
+                },
+                MAGIC_SIGNAL if offsets.signal_offset.is_none() => {
+                    offsets.signal_offset = Some(base_offset + i);
+                },
+                MAGIC_SIGMASK if offsets.sigmask_offset.is_none() => {
+                    offsets.sigmask_offset = Some(base_offset + i);
+                },
+                MAGIC_STRUCT if offsets.struct_offset.is_none() => {
+                    offsets.struct_offset = Some(base_offset + i);
+                },
+                MAGIC_EFFECT if offsets.effect_offset.is_none() => {
+                    offsets.effect_offset = Some(base_offset + i);
+                },
+                MAGIC_TRANS if offsets.trans_offset.is_none() => {
+                    offsets.trans_offset = Some(base_offset + i);
+                },
+                MAGIC_CAP if offsets.cap_offset.is_none() => {
+                    offsets.cap_offset = Some(base_offset + i);
+                },
+                _ => {}
+            }
+
+            i += 1;
+        }
+
+        offsets
+    }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/extractor/vmlinux/mod.rs b/tools/kapi/src/extractor/vmlinux/mod.rs
new file mode 100644
index 0000000000000..bf3da4df6e66a
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/mod.rs
@@ -0,0 +1,864 @@
+use super::{
+    ApiExtractor, ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec, ParamSpec,
+    ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec, StateTransitionSpec, StructSpec,
+    StructFieldSpec,
+};
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result};
+use goblin::elf::Elf;
+use std::convert::TryInto;
+use std::fs;
+use std::io::Write;
+
+mod binary_utils;
+mod magic_finder;
+use binary_utils::{
+    DataReader, addr_family_spec_layout_size, async_spec_layout_size, buffer_spec_layout_size,
+    protocol_behavior_spec_layout_size, signal_mask_spec_layout_size,
+    sizes, socket_state_spec_layout_size, struct_field_layout_size,
+};
+
+// Helper to convert empty strings to None
+fn opt_string(s: String) -> Option<String> {
+    if s.is_empty() { None } else { Some(s) }
+}
+
+pub struct VmlinuxExtractor {
+    kapi_data: Vec<u8>,
+    specs: Vec<KapiSpec>,
+}
+
+#[derive(Debug)]
+struct KapiSpec {
+    name: String,
+    api_type: String,
+    offset: usize,
+}
+
+impl VmlinuxExtractor {
+    pub fn new(vmlinux_path: &str) -> Result<Self> {
+        let vmlinux_data = fs::read(vmlinux_path)
+            .with_context(|| format!("Failed to read vmlinux file: {vmlinux_path}"))?;
+
+        let elf = Elf::parse(&vmlinux_data).context("Failed to parse ELF file")?;
+
+        // Find __start_kapi_specs and __stop_kapi_specs symbols first
+        let mut start_addr = None;
+        let mut stop_addr = None;
+
+        for sym in &elf.syms {
+            if let Some(name) = elf.strtab.get_at(sym.st_name) {
+                match name {
+                    "__start_kapi_specs" => start_addr = Some(sym.st_value),
+                    "__stop_kapi_specs" => stop_addr = Some(sym.st_value),
+                    _ => {}
+                }
+            }
+        }
+
+        let start = start_addr.context("Could not find __start_kapi_specs symbol")?;
+        let stop = stop_addr.context("Could not find __stop_kapi_specs symbol")?;
+
+        if stop <= start {
+            anyhow::bail!("No kernel API specifications found in vmlinux");
+        }
+
+        // Find the section containing the kapi specs data
+        // The specs may be in .kapi_specs (standalone) or .rodata (embedded in RO_DATA)
+        let containing_section = elf
+            .section_headers
+            .iter()
+            .find(|sh| {
+                // Check if this section contains the start address
+                start >= sh.sh_addr && start < sh.sh_addr + sh.sh_size
+            })
+            .context("Could not find section containing kapi_specs data")?;
+
+        // Calculate the offset within the file
+        let section_vaddr = containing_section.sh_addr;
+        let file_offset = containing_section.sh_offset + (start - section_vaddr);
+        let data_size: usize = (stop - start)
+            .try_into()
+            .context("Data size too large for platform")?;
+
+        let file_offset_usize: usize = file_offset
+            .try_into()
+            .context("File offset too large for platform")?;
+
+        if file_offset_usize + data_size > vmlinux_data.len() {
+            anyhow::bail!("Invalid offset/size for kapi_specs data");
+        }
+
+        // Extract the raw data
+        let kapi_data = vmlinux_data[file_offset_usize..(file_offset_usize + data_size)].to_vec();
+
+        // Parse the specifications
+        let specs = parse_kapi_specs(&kapi_data)?;
+
+        Ok(VmlinuxExtractor { kapi_data, specs })
+    }
+}
+
+fn parse_kapi_specs(data: &[u8]) -> Result<Vec<KapiSpec>> {
+    let mut specs = Vec::new();
+    let mut offset = 0;
+    let mut last_found_offset = None;
+
+    // Expected offset from struct start to param_magic based on struct layout
+    let param_magic_offset = sizes::NAME + 4 + sizes::DESC + (sizes::DESC * 4) + 4;
+
+    // Find specs by validating API name and magic marker pairs
+    while offset + param_magic_offset + 4 <= data.len() {
+        // Read potential API name
+        let name_bytes = &data[offset..offset + sizes::NAME.min(data.len() - offset)];
+
+        // Find null terminator
+        let name_len = name_bytes.iter().position(|&b| b == 0).unwrap_or(0);
+
+        if name_len > 0 && name_len < 100 {
+            let name = String::from_utf8_lossy(&name_bytes[..name_len]).to_string();
+
+            // Validate API name format
+            if is_valid_api_name(&name) {
+                // Verify magic marker at expected position
+                let magic_offset = offset + param_magic_offset;
+                if magic_offset + 4 <= data.len() {
+                    let magic_bytes = &data[magic_offset..magic_offset + 4];
+                    let magic_value = u32::from_le_bytes([magic_bytes[0], magic_bytes[1], magic_bytes[2], magic_bytes[3]]);
+
+                    if magic_value == magic_finder::MAGIC_PARAM {
+                        // Avoid duplicate detection of the same spec
+                        if last_found_offset.is_none() || offset >= last_found_offset.unwrap() + param_magic_offset {
+                            let api_type = if name.starts_with("sys_") {
+                                "syscall"
+                            } else if name.ends_with("_ioctl") {
+                                "ioctl"
+                            } else if name.contains("sysfs") {
+                                "sysfs"
+                            } else {
+                                "function"
+                            }
+                            .to_string();
+
+                            specs.push(KapiSpec {
+                                name: name.clone(),
+                                api_type,
+                                offset,
+                            });
+
+                            last_found_offset = Some(offset);
+                        }
+                    }
+                }
+            }
+        }
+
+        // Scan byte by byte to find all specs
+        offset += 1;
+    }
+
+    Ok(specs)
+}
+
+
+
+
+fn is_valid_api_name(name: &str) -> bool {
+    // Validate API name format and length
+    if name.is_empty() || name.len() < 3 || name.len() > 100 {
+        return false;
+    }
+
+    // Alphanumeric and underscore characters only
+    if !name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_') {
+        return false;
+    }
+
+    // Must start with letter or underscore
+    let first_char = name.chars().next().unwrap();
+    if !first_char.is_ascii_alphabetic() && first_char != '_' {
+        return false;
+    }
+
+    // Match common kernel API patterns
+    name.starts_with("sys_") ||
+    name.starts_with("__") ||
+    name.ends_with("_ioctl") ||
+    name.contains("_") ||
+    name.len() > 6
+}
+
+impl ApiExtractor for VmlinuxExtractor {
+    fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+        Ok(self
+            .specs
+            .iter()
+            .map(|spec| {
+                // Parse the full spec for listing
+                parse_binary_to_api_spec(&self.kapi_data, spec.offset)
+                    .unwrap_or_else(|_| ApiSpec {
+                        name: spec.name.clone(),
+                        api_type: spec.api_type.clone(),
+                        description: None,
+                        long_description: None,
+                        version: None,
+                        context_flags: vec![],
+                        param_count: None,
+                        error_count: None,
+                        examples: None,
+                        notes: None,
+                        since_version: None,
+                        subsystem: None,
+                        sysfs_path: None,
+                        permissions: None,
+                        socket_state: None,
+                        protocol_behaviors: vec![],
+                        addr_families: vec![],
+                        buffer_spec: None,
+                        async_spec: None,
+                        net_data_transfer: None,
+                        capabilities: vec![],
+                        parameters: vec![],
+                        return_spec: None,
+                        errors: vec![],
+                        signals: vec![],
+                        signal_masks: vec![],
+                        side_effects: vec![],
+                        state_transitions: vec![],
+                        constraints: vec![],
+                        locks: vec![],
+                        struct_specs: vec![],
+                    })
+            })
+            .collect())
+    }
+
+    fn extract_by_name(&self, api_name: &str) -> Result<Option<ApiSpec>> {
+        if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+            Ok(Some(parse_binary_to_api_spec(&self.kapi_data, spec.offset)?))
+        } else {
+            Ok(None)
+        }
+    }
+
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        writer: &mut dyn Write,
+    ) -> Result<()> {
+        if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+            let api_spec = parse_binary_to_api_spec(&self.kapi_data, spec.offset)?;
+            super::display_api_spec(&api_spec, formatter, writer)?;
+        }
+        Ok(())
+    }
+}
+
+/// Helper to read count and parse array items with optional magic offset
+fn parse_array_with_magic<T, F>(
+    reader: &mut DataReader,
+    magic_offset: Option<usize>,
+    max_items: u32,
+    parse_fn: F,
+) -> Vec<T>
+where
+    F: Fn(&mut DataReader) -> Option<T>,
+{
+    // Read count - position at magic+4 if magic offset exists
+    let count = if let Some(offset) = magic_offset {
+        reader.pos = offset + 4;
+        reader.read_u32()
+    } else {
+        reader.read_u32()
+    };
+
+    let mut items = Vec::new();
+    if let Some(count) = count {
+        // Position at start of array data if magic offset exists
+        if let Some(offset) = magic_offset {
+            reader.pos = offset + 8; // +4 for magic, +4 for count
+        }
+        // Parse items up to max_items
+        for _ in 0..count.min(max_items) as usize {
+            if let Some(item) = parse_fn(reader) {
+                items.push(item);
+            }
+        }
+    }
+    items
+}
+
+fn parse_binary_to_api_spec(data: &[u8], offset: usize) -> Result<ApiSpec> {
+    let mut reader = DataReader::new(data, offset);
+
+    // Search for magic markers in the entire spec data
+    let search_end = (offset + 0x70000).min(data.len()); // Search full spec size
+    let spec_data = &data[offset..search_end];
+
+    // Find magic markers relative to the spec start
+    let magic_offsets = magic_finder::MagicOffsets::find_in_data(spec_data, offset);
+
+    // Read fields in exact order of struct kernel_api_spec
+
+    // Read name (128 bytes)
+    let name = reader
+        .read_cstring(sizes::NAME)
+        .ok_or_else(|| anyhow::anyhow!("Failed to read API name"))?;
+
+    // Determine API type
+    let api_type = if name.starts_with("sys_") {
+        "syscall"
+    } else if name.ends_with("_ioctl") {
+        "ioctl"
+    } else if name.contains("sysfs") {
+        "sysfs"
+    } else {
+        "function"
+    }
+    .to_string();
+
+    // Read version (u32)
+    let version = reader.read_u32().map(|v| v.to_string());
+
+    // Read description (512 bytes)
+    let description = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+
+    // Read long_description (2048 bytes)
+    let long_description = reader
+        .read_cstring(sizes::DESC * 4)
+        .filter(|s| !s.is_empty());
+
+    // Read context_flags (u32)
+    let context_flags = parse_context_flags(&mut reader);
+
+    // Parse params array
+    let parameters = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.param_offset,
+        sizes::MAX_PARAMS as u32,
+        |r| parse_param(r, 0),  // Index doesn't seem to be used in parse_param
+    );
+
+    // Read return_spec
+    let return_spec = parse_return_spec(&mut reader);
+
+    // Parse errors array
+    let errors = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.error_offset,
+        sizes::MAX_ERRORS as u32,
+        parse_error,
+    );
+
+    // Parse locks array
+    let locks = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.lock_offset,
+        sizes::MAX_CONSTRAINTS as u32,
+        parse_lock,
+    );
+
+    // Parse constraints array
+    let constraints = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.constraint_offset,
+        sizes::MAX_CONSTRAINTS as u32,
+        parse_constraint,
+    );
+
+    // Read examples and notes - position reader at info section if magic found
+    let (examples, notes) = if let Some(info_offset) = magic_offsets.info_offset {
+        reader.pos = info_offset + 4; // +4 to skip magic
+        let examples = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+        let notes = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+        (examples, notes)
+    } else {
+        let examples = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+        let notes = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+        (examples, notes)
+    };
+
+    // Read since_version (32 bytes)
+    let since_version = reader.read_cstring(32).filter(|s| !s.is_empty());
+
+    // Skip deprecated (bool = 1 byte + 3 bytes padding) and replacement (128 bytes)
+    // These fields were removed from kernel but we need to skip them for binary compatibility
+    reader.skip(4); // deprecated + padding
+    reader.discard_cstring(sizes::NAME); // replacement
+
+    // Parse signals array
+    let signals = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.signal_offset,
+        sizes::MAX_SIGNALS as u32,
+        parse_signal,
+    );
+
+    // Read signal_mask_count (u32)
+    let signal_mask_count = reader.read_u32();
+
+    // Parse signal_masks array
+    let mut signal_masks = Vec::new();
+    if let Some(count) = signal_mask_count {
+        for i in 0..sizes::MAX_SIGNALS {
+            if i < count as usize {
+                if let Some(mask) = parse_signal_mask(&mut reader) {
+                    signal_masks.push(mask);
+                }
+            } else {
+                reader.skip(signal_mask_spec_layout_size());
+            }
+        }
+    } else {
+        reader.skip(signal_mask_spec_layout_size() * sizes::MAX_SIGNALS);
+    }
+
+    // Parse struct_specs array
+    let struct_specs = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.struct_offset,
+        sizes::MAX_STRUCT_SPECS as u32,
+        parse_struct_spec,
+    );
+
+    // According to the C struct, the order is:
+    // side_effect_count, side_effects array, state_trans_count, state_transitions array,
+    // capability_count, capabilities array
+
+    // Parse side_effects array
+    let side_effects = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.effect_offset,
+        sizes::MAX_SIDE_EFFECTS as u32,
+        parse_side_effect,
+    );
+
+    // Parse state_transitions array
+    let state_transitions = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.trans_offset,
+        sizes::MAX_STATE_TRANS as u32,
+        parse_state_transition,
+    );
+
+    // Parse capabilities array
+    let capabilities = parse_array_with_magic(
+        &mut reader,
+        magic_offsets.cap_offset,
+        sizes::MAX_CAPABILITIES as u32,
+        parse_capability,
+    );
+
+    // Skip remaining network/socket fields
+    reader.skip(
+        socket_state_spec_layout_size() +
+        protocol_behavior_spec_layout_size() * sizes::MAX_PROTOCOL_BEHAVIORS +
+        4 + // protocol_behavior_count
+        buffer_spec_layout_size() +
+        async_spec_layout_size() +
+        addr_family_spec_layout_size() * sizes::MAX_ADDR_FAMILIES +
+        4 + // addr_family_count
+        6 + 2 + // 6 bool flags + padding
+        sizes::DESC * 3 // 3 semantic descriptions
+    );
+
+    Ok(ApiSpec {
+        name,
+        api_type,
+        description,
+        long_description,
+        version,
+        context_flags,
+        param_count: if parameters.is_empty() { None } else { Some(parameters.len() as u32) },
+        error_count: if errors.is_empty() { None } else { Some(errors.len() as u32) },
+        examples,
+        notes,
+        since_version,
+        subsystem: None,
+        sysfs_path: None,
+        permissions: None,
+        socket_state: None,
+        protocol_behaviors: vec![],
+        addr_families: vec![],
+        buffer_spec: None,
+        async_spec: None,
+        net_data_transfer: None,
+        capabilities,
+        parameters,
+        return_spec,
+        errors,
+        signals,
+        signal_masks,
+        side_effects,
+        state_transitions,
+        constraints,
+        locks,
+        struct_specs,
+    })
+}
+
+// Helper parsing functions
+
+fn parse_context_flags(reader: &mut DataReader) -> Vec<String> {
+    const KAPI_CTX_PROCESS: u32 = 1 << 0;
+    const KAPI_CTX_SOFTIRQ: u32 = 1 << 1;
+    const KAPI_CTX_HARDIRQ: u32 = 1 << 2;
+    const KAPI_CTX_NMI: u32 = 1 << 3;
+    const KAPI_CTX_ATOMIC: u32 = 1 << 4;
+    const KAPI_CTX_SLEEPABLE: u32 = 1 << 5;
+    const KAPI_CTX_PREEMPT_DISABLED: u32 = 1 << 6;
+    const KAPI_CTX_IRQ_DISABLED: u32 = 1 << 7;
+
+    if let Some(flags) = reader.read_u32() {
+        let mut parts = Vec::new();
+
+        if flags & KAPI_CTX_PROCESS != 0 {
+            parts.push("KAPI_CTX_PROCESS");
+        }
+        if flags & KAPI_CTX_SOFTIRQ != 0 {
+            parts.push("KAPI_CTX_SOFTIRQ");
+        }
+        if flags & KAPI_CTX_HARDIRQ != 0 {
+            parts.push("KAPI_CTX_HARDIRQ");
+        }
+        if flags & KAPI_CTX_NMI != 0 {
+            parts.push("KAPI_CTX_NMI");
+        }
+        if flags & KAPI_CTX_ATOMIC != 0 {
+            parts.push("KAPI_CTX_ATOMIC");
+        }
+        if flags & KAPI_CTX_SLEEPABLE != 0 {
+            parts.push("KAPI_CTX_SLEEPABLE");
+        }
+        if flags & KAPI_CTX_PREEMPT_DISABLED != 0 {
+            parts.push("KAPI_CTX_PREEMPT_DISABLED");
+        }
+        if flags & KAPI_CTX_IRQ_DISABLED != 0 {
+            parts.push("KAPI_CTX_IRQ_DISABLED");
+        }
+
+        if !parts.is_empty() {
+            vec![parts.join(" | ")]
+        } else {
+            vec![]
+        }
+    } else {
+        vec![]
+    }
+}
+
+fn parse_param(reader: &mut DataReader, index: usize) -> Option<ParamSpec> {
+    let name = reader.read_cstring(sizes::NAME)?;
+    let type_name = reader.read_cstring(sizes::NAME)?;
+    let param_type = reader.read_u32()?;
+    let flags = reader.read_u32()?;
+    let size = reader.read_usize()?;
+    let alignment = reader.read_usize()?;
+    let min_value = reader.read_i64()?;
+    let max_value = reader.read_i64()?;
+    let valid_mask = reader.read_u64()?;
+
+    // Skip enum_values pointer (8 bytes)
+    reader.skip(8);
+    let _enum_count = reader.read_u32()?; // Must use ? to propagate errors
+    let constraint_type = reader.read_u32()?;
+    // Skip validate function pointer (8 bytes)
+    reader.skip(8);
+
+    let description = reader.read_string_or_default(sizes::DESC);
+    let constraint = reader.read_optional_string(sizes::DESC);
+    let _size_param_idx = reader.read_i32()?; // Must use ? to propagate errors
+    let _size_multiplier = reader.read_usize()?; // Must use ? to propagate errors
+
+    Some(ParamSpec {
+        index: index as u32,
+        name,
+        type_name,
+        description,
+        flags,
+        param_type,
+        constraint_type,
+        constraint,
+        min_value: Some(min_value),
+        max_value: Some(max_value),
+        valid_mask: Some(valid_mask),
+        enum_values: vec![],
+        size: Some(size as u32),
+        alignment: Some(alignment as u32),
+    })
+}
+
+fn parse_return_spec(reader: &mut DataReader) -> Option<ReturnSpec> {
+    // Read type_name, but treat empty as valid (will be empty string)
+    let type_name = reader.read_string_or_default(sizes::NAME);
+
+    // Read return_type and check_type
+    let return_type = reader.read_u32().unwrap_or(0);
+    let check_type = reader.read_u32().unwrap_or(0);
+    let success_value = reader.read_i64().unwrap_or(0);
+    let success_min = reader.read_i64().unwrap_or(0);
+    let success_max = reader.read_i64().unwrap_or(0);
+
+    // Skip error_values pointer (8 bytes)
+    reader.skip(8);
+    let _error_count = reader.read_u32().unwrap_or(0); // Don't fail on return spec
+    // Skip is_success function pointer (8 bytes)
+    reader.skip(8);
+
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    // Return a spec even if type_name is empty, as long as we have some data
+    // The type_name might be a string like "KAPI_TYPE_INT" that gets stored literally
+    if type_name.is_empty() && return_type == 0 && check_type == 0 && success_value == 0 {
+        // No return spec at all
+        return None;
+    }
+
+    Some(ReturnSpec {
+        type_name,
+        description,
+        return_type,
+        check_type,
+        success_value: Some(success_value),
+        success_min: Some(success_min),
+        success_max: Some(success_max),
+        error_values: vec![],
+    })
+}
+
+fn parse_error(reader: &mut DataReader) -> Option<ErrorSpec> {
+    let error_code = reader.read_i32()?;
+    let name = reader.read_cstring(sizes::NAME)?;
+    let condition = reader.read_string_or_default(sizes::DESC);
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    Some(ErrorSpec {
+        error_code,
+        name,
+        condition,
+        description,
+    })
+}
+
+fn parse_lock(reader: &mut DataReader) -> Option<LockSpec> {
+    let lock_name = reader.read_cstring(sizes::NAME)?;
+    let lock_type = reader.read_u32()?;
+    let scope = reader.read_u32()?;
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    Some(LockSpec {
+        lock_name,
+        lock_type,
+        scope,
+        description,
+    })
+}
+
+fn parse_constraint(reader: &mut DataReader) -> Option<ConstraintSpec> {
+    let name = reader.read_cstring(sizes::NAME)?;
+    let description = reader.read_string_or_default(sizes::DESC);
+    let expression = reader.read_string_or_default(sizes::DESC);
+
+    // No function pointer in packed struct
+
+    Some(ConstraintSpec {
+        name,
+        description,
+        expression: opt_string(expression),
+    })
+}
+
+fn parse_signal(reader: &mut DataReader) -> Option<SignalSpec> {
+    let signal_num = reader.read_i32()?;
+    let signal_name = reader.read_cstring(32)?; // signal_name[32]
+    let direction = reader.read_u32()?;
+    let action = reader.read_u32()?;
+    let target = reader.read_optional_string(sizes::DESC); // target[512]
+    let condition = reader.read_optional_string(sizes::DESC); // condition[512]
+    let description = reader.read_optional_string(sizes::DESC); // description[512]
+    let restartable = reader.read_bool()?;
+    let sa_flags_required = reader.read_u32()?;
+    let sa_flags_forbidden = reader.read_u32()?;
+    let error_on_signal = reader.read_i32()?;
+    let _transform_to = reader.read_i32()?; // transform_to
+    let timing_bytes = reader.read_bytes(32)?; // timing[32]
+    let timing = if let Some(end) = timing_bytes.iter().position(|&b| b == 0) {
+        String::from_utf8_lossy(&timing_bytes[..end]).parse().unwrap_or(0)
+    } else {
+        0
+    };
+    let priority = reader.read_u8()?;
+    let interruptible = reader.read_bool()?;
+    let _queue_behavior = reader.read_bytes(128)?; // queue_behavior[128]
+    let state_required = reader.read_u32()?;
+    let state_forbidden = reader.read_u32()?;
+
+    Some(SignalSpec {
+        signal_num,
+        signal_name,
+        direction,
+        action,
+        target,
+        condition,
+        description,
+        timing,
+        priority: priority as u32,
+        restartable,
+        interruptible,
+        queue: None, // queue_behavior not exposed in SignalSpec
+        sa_flags: 0, // Not directly available
+        sa_flags_required,
+        sa_flags_forbidden,
+        state_required,
+        state_forbidden,
+        error_on_signal: Some(error_on_signal),
+    })
+}
+
+fn parse_signal_mask(reader: &mut DataReader) -> Option<SignalMaskSpec> {
+    let name = reader.read_cstring(sizes::NAME)?;
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    // Skip signals array
+    for _ in 0..sizes::MAX_SIGNALS {
+        reader.read_i32();
+    }
+
+    let _signal_count = reader.read_u32()?;
+
+    Some(SignalMaskSpec {
+        name,
+        description,
+    })
+}
+
+fn parse_struct_field(reader: &mut DataReader) -> Option<StructFieldSpec> {
+    let name = reader.read_cstring(sizes::NAME)?;
+    let field_type = reader.read_u32()?;
+    let type_name = reader.read_cstring(sizes::NAME)?;
+    let offset = reader.read_usize()?;
+    let size = reader.read_usize()?;
+    let flags = reader.read_u32()?;
+    let constraint_type = reader.read_u32()?;
+    let min_value = reader.read_i64()?;
+    let max_value = reader.read_i64()?;
+    let valid_mask = reader.read_u64()?;
+    // Skip enum_values field (512 bytes)
+    let _enum_values = reader.read_cstring(sizes::DESC); // Don't fail on optional field
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    Some(StructFieldSpec {
+        name,
+        field_type,
+        type_name,
+        offset,
+        size,
+        flags,
+        constraint_type,
+        min_value,
+        max_value,
+        valid_mask,
+        description,
+    })
+}
+
+fn parse_struct_spec(reader: &mut DataReader) -> Option<StructSpec> {
+    let name = reader.read_cstring(sizes::NAME)?;
+    let size = reader.read_usize()?;
+    let alignment = reader.read_usize()?;
+    let field_count = reader.read_u32()?;
+
+    // Parse fields array
+    let mut fields = Vec::new();
+    for _ in 0..field_count.min(sizes::MAX_PARAMS as u32) {
+        if let Some(field) = parse_struct_field(reader) {
+            fields.push(field);
+        } else {
+            // Skip this field if we can't parse it
+            reader.skip(struct_field_layout_size());
+        }
+    }
+
+    // Skip remaining fields if any
+    let remaining = sizes::MAX_PARAMS as u32 - field_count.min(sizes::MAX_PARAMS as u32);
+    for _ in 0..remaining {
+        reader.skip(struct_field_layout_size());
+    }
+
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    Some(StructSpec {
+        name,
+        size,
+        alignment,
+        field_count,
+        fields,
+        description,
+    })
+}
+
+fn parse_side_effect(reader: &mut DataReader) -> Option<SideEffectSpec> {
+    let effect_type = reader.read_u32()?;
+    let target = reader.read_cstring(sizes::NAME)?;
+    let condition = reader.read_string_or_default(sizes::DESC);
+    let description = reader.read_string_or_default(sizes::DESC);
+    let reversible = reader.read_bool()?;
+    // No padding needed for packed struct
+
+    Some(SideEffectSpec {
+        effect_type,
+        target,
+        condition: opt_string(condition),
+        description,
+        reversible,
+    })
+}
+
+fn parse_state_transition(reader: &mut DataReader) -> Option<StateTransitionSpec> {
+    let from_state = reader.read_cstring(sizes::NAME)?;
+    let to_state = reader.read_cstring(sizes::NAME)?;
+    let condition = reader.read_string_or_default(sizes::DESC);
+    let object = reader.read_cstring(sizes::NAME)?;
+    let description = reader.read_string_or_default(sizes::DESC);
+
+    Some(StateTransitionSpec {
+        object,
+        from_state,
+        to_state,
+        condition: opt_string(condition),
+        description,
+    })
+}
+
+fn parse_capability(reader: &mut DataReader) -> Option<CapabilitySpec> {
+    let capability = reader.read_i32()?;
+    let cap_name = reader.read_cstring(sizes::NAME)?;
+    let action = reader.read_u32()?;
+    let allows = reader.read_string_or_default(sizes::DESC);
+    let without_cap = reader.read_string_or_default(sizes::DESC);
+    let check_condition = reader.read_optional_string(sizes::DESC);
+    let priority = reader.read_u32()?;
+
+    let mut alternatives = Vec::new();
+    for _ in 0..sizes::MAX_CAPABILITIES {
+        if let Some(alt) = reader.read_i32() {
+            if alt != 0 {
+                alternatives.push(alt);
+            }
+        }
+    }
+
+    let _alternative_count = reader.read_u32()?; // alternative_count
+
+    Some(CapabilitySpec {
+        capability,
+        name: cap_name,
+        action: action.to_string(),
+        allows,
+        without_cap,
+        check_condition,
+        priority: Some(priority as u8),
+        alternatives,
+    })
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/json.rs b/tools/kapi/src/formatter/json.rs
new file mode 100644
index 0000000000000..8025467409d64
--- /dev/null
+++ b/tools/kapi/src/formatter/json.rs
@@ -0,0 +1,468 @@
+use super::OutputFormatter;
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec, StructSpec,
+};
+use serde::Serialize;
+use std::io::Write;
+
+pub struct JsonFormatter {
+    data: JsonData,
+}
+
+#[derive(Serialize)]
+struct JsonData {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    apis: Option<Vec<JsonApi>>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    api_details: Option<JsonApiDetails>,
+}
+
+#[derive(Serialize)]
+struct JsonApi {
+    name: String,
+    api_type: String,
+}
+
+#[derive(Serialize)]
+struct JsonApiDetails {
+    name: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    description: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    long_description: Option<String>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    context_flags: Vec<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    examples: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    notes: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    since_version: Option<String>,
+    // Sysfs-specific fields
+    #[serde(skip_serializing_if = "Option::is_none")]
+    subsystem: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    sysfs_path: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    permissions: Option<String>,
+    // Networking-specific fields
+    #[serde(skip_serializing_if = "Option::is_none")]
+    socket_state: Option<SocketStateSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    addr_families: Vec<AddrFamilySpec>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    buffer_spec: Option<BufferSpec>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    async_spec: Option<AsyncSpec>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    net_data_transfer: Option<String>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    capabilities: Vec<CapabilitySpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    state_transitions: Vec<StateTransitionSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    side_effects: Vec<SideEffectSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    parameters: Vec<ParamSpec>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    return_spec: Option<ReturnSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    errors: Vec<ErrorSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    locks: Vec<LockSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    struct_specs: Vec<StructSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    signals: Vec<SignalSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    signal_masks: Vec<SignalMaskSpec>,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    constraints: Vec<ConstraintSpec>,
+}
+
+impl JsonFormatter {
+    pub fn new() -> Self {
+        JsonFormatter {
+            data: JsonData {
+                apis: None,
+                api_details: None,
+            },
+        }
+    }
+}
+
+impl OutputFormatter for JsonFormatter {
+    fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        let json = serde_json::to_string_pretty(&self.data)?;
+        writeln!(w, "{json}")?;
+        Ok(())
+    }
+
+    fn begin_api_list(&mut self, _w: &mut dyn Write, _title: &str) -> std::io::Result<()> {
+        self.data.apis = Some(Vec::new());
+        Ok(())
+    }
+
+    fn api_item(&mut self, _w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+        if let Some(apis) = &mut self.data.apis {
+            apis.push(JsonApi {
+                name: name.to_string(),
+                api_type: api_type.to_string(),
+            });
+        }
+        Ok(())
+    }
+
+    fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn total_specs(&mut self, _w: &mut dyn Write, _count: usize) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_api_details(&mut self, _w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+        self.data.api_details = Some(JsonApiDetails {
+            name: name.to_string(),
+            description: None,
+            long_description: None,
+            context_flags: Vec::new(),
+            examples: None,
+            notes: None,
+            since_version: None,
+            subsystem: None,
+            sysfs_path: None,
+            permissions: None,
+            socket_state: None,
+            protocol_behaviors: Vec::new(),
+            addr_families: Vec::new(),
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: Vec::new(),
+            state_transitions: Vec::new(),
+            side_effects: Vec::new(),
+            parameters: Vec::new(),
+            return_spec: None,
+            errors: Vec::new(),
+            locks: Vec::new(),
+            struct_specs: Vec::new(),
+            signals: Vec::new(),
+            signal_masks: Vec::new(),
+            constraints: Vec::new(),
+        });
+        Ok(())
+    }
+
+    fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.description = Some(desc.to_string());
+        }
+        Ok(())
+    }
+
+    fn long_description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.long_description = Some(desc.to_string());
+        }
+        Ok(())
+    }
+
+    fn begin_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn context_flag(&mut self, _w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.context_flags.push(flag.to_string());
+        }
+        Ok(())
+    }
+
+    fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_parameters(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_errors(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn examples(&mut self, _w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.examples = Some(examples.to_string());
+        }
+        Ok(())
+    }
+
+    fn notes(&mut self, _w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.notes = Some(notes.to_string());
+        }
+        Ok(())
+    }
+
+    fn since_version(&mut self, _w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.since_version = Some(version.to_string());
+        }
+        Ok(())
+    }
+
+    fn sysfs_subsystem(&mut self, _w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.subsystem = Some(subsystem.to_string());
+        }
+        Ok(())
+    }
+
+    fn sysfs_path(&mut self, _w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.sysfs_path = Some(path.to_string());
+        }
+        Ok(())
+    }
+
+    fn sysfs_permissions(&mut self, _w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.permissions = Some(perms.to_string());
+        }
+        Ok(())
+    }
+
+    // Networking-specific methods
+    fn socket_state(&mut self, _w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.socket_state = Some(state.clone());
+        }
+        Ok(())
+    }
+
+    fn begin_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn protocol_behavior(
+        &mut self,
+        _w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.protocol_behaviors.push(behavior.clone());
+        }
+        Ok(())
+    }
+
+    fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn addr_family(&mut self, _w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.addr_families.push(family.clone());
+        }
+        Ok(())
+    }
+
+    fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn buffer_spec(&mut self, _w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.buffer_spec = Some(spec.clone());
+        }
+        Ok(())
+    }
+
+    fn async_spec(&mut self, _w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.async_spec = Some(spec.clone());
+        }
+        Ok(())
+    }
+
+    fn net_data_transfer(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.net_data_transfer = Some(desc.to_string());
+        }
+        Ok(())
+    }
+
+    fn begin_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn capability(&mut self, _w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.capabilities.push(cap.clone());
+        }
+        Ok(())
+    }
+
+    fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    // Stub implementations for new methods
+    fn parameter(&mut self, _w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.parameters.push(param.clone());
+        }
+        Ok(())
+    }
+
+    fn return_spec(&mut self, _w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.return_spec = Some(ret.clone());
+        }
+        Ok(())
+    }
+
+    fn error(&mut self, _w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.errors.push(error.clone());
+        }
+        Ok(())
+    }
+
+    fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn signal(&mut self, _w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()> {
+        if let Some(api_details) = &mut self.data.api_details {
+            api_details.signals.push(signal.clone());
+        }
+        Ok(())
+    }
+
+    fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn signal_mask(&mut self, _w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()> {
+        if let Some(api_details) = &mut self.data.api_details {
+            api_details.signal_masks.push(mask.clone());
+        }
+        Ok(())
+    }
+
+    fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_side_effects(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn side_effect(&mut self, _w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.side_effects.push(effect.clone());
+        }
+        Ok(())
+    }
+
+    fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_state_transitions(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn state_transition(
+        &mut self,
+        _w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.state_transitions.push(trans.clone());
+        }
+        Ok(())
+    }
+
+    fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn constraint(
+        &mut self,
+        _w: &mut dyn Write,
+        constraint: &ConstraintSpec,
+    ) -> std::io::Result<()> {
+        if let Some(api_details) = &mut self.data.api_details {
+            api_details.constraints.push(constraint.clone());
+        }
+        Ok(())
+    }
+
+    fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_locks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn lock(&mut self, _w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+        if let Some(details) = &mut self.data.api_details {
+            details.locks.push(lock.clone());
+        }
+        Ok(())
+    }
+
+    fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_struct_specs(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn struct_spec(&mut self, _w: &mut dyn Write, spec: &StructSpec) -> std::io::Result<()> {
+        if let Some(ref mut details) = self.data.api_details {
+            details.struct_specs.push(spec.clone());
+        }
+        Ok(())
+    }
+
+    fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/formatter/mod.rs b/tools/kapi/src/formatter/mod.rs
new file mode 100644
index 0000000000000..3de8bf23bc29a
--- /dev/null
+++ b/tools/kapi/src/formatter/mod.rs
@@ -0,0 +1,140 @@
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec, StructSpec,
+};
+use std::io::Write;
+
+mod json;
+mod plain;
+mod rst;
+
+pub use json::JsonFormatter;
+pub use plain::PlainFormatter;
+pub use rst::RstFormatter;
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum OutputFormat {
+    Plain,
+    Json,
+    Rst,
+}
+
+impl std::str::FromStr for OutputFormat {
+    type Err = String;
+
+    fn from_str(s: &str) -> Result<Self, Self::Err> {
+        match s.to_lowercase().as_str() {
+            "plain" => Ok(OutputFormat::Plain),
+            "json" => Ok(OutputFormat::Json),
+            "rst" => Ok(OutputFormat::Rst),
+            _ => Err(format!("Unknown output format: {}", s)),
+        }
+    }
+}
+
+pub trait OutputFormatter {
+    fn begin_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()>;
+    fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()>;
+    fn end_api_list(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()>;
+
+    fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()>;
+    fn end_api_details(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+    fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+    fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()>;
+    fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()>;
+    fn end_parameters(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()>;
+
+    fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()>;
+    fn end_errors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()>;
+    fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()>;
+    fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()>;
+
+    // Sysfs-specific methods
+    fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()>;
+    fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()>;
+    fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()>;
+
+    // Networking-specific methods
+    fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()>;
+
+    fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn protocol_behavior(
+        &mut self,
+        w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()>;
+    fn end_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()>;
+    fn end_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()>;
+    fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()>;
+    fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+    fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()>;
+    fn end_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    // Signal-related methods
+    fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()>;
+    fn end_signals(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()>;
+    fn end_signal_masks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    // Side effects and state transitions
+    fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()>;
+    fn end_side_effects(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn state_transition(
+        &mut self,
+        w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()>;
+    fn end_state_transitions(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    // Constraints and locks
+    fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpec)
+    -> std::io::Result<()>;
+    fn end_constraints(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()>;
+    fn end_locks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+    fn struct_spec(&mut self, w: &mut dyn Write, spec: &StructSpec) -> std::io::Result<()>;
+    fn end_struct_specs(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+}
+
+pub fn create_formatter(format: OutputFormat) -> Box<dyn OutputFormatter> {
+    match format {
+        OutputFormat::Plain => Box::new(PlainFormatter::new()),
+        OutputFormat::Json => Box::new(JsonFormatter::new()),
+        OutputFormat::Rst => Box::new(RstFormatter::new()),
+    }
+}
diff --git a/tools/kapi/src/formatter/plain.rs b/tools/kapi/src/formatter/plain.rs
new file mode 100644
index 0000000000000..569af9fd7b09b
--- /dev/null
+++ b/tools/kapi/src/formatter/plain.rs
@@ -0,0 +1,549 @@
+use super::OutputFormatter;
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec,
+};
+use std::io::Write;
+
+pub struct PlainFormatter;
+
+impl PlainFormatter {
+    pub fn new() -> Self {
+        PlainFormatter
+    }
+}
+
+impl OutputFormatter for PlainFormatter {
+    fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+        writeln!(w, "\n{title}:")?;
+        writeln!(w, "{}", "-".repeat(title.len() + 1))
+    }
+
+    fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str) -> std::io::Result<()> {
+        writeln!(w, "  {name}")
+    }
+
+    fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+        writeln!(w, "\nTotal specifications found: {count}")
+    }
+
+    fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+        writeln!(w, "\nDetailed information for {name}:")?;
+        writeln!(w, "{}=", "=".repeat(25 + name.len()))
+    }
+
+    fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        writeln!(w, "Description: {desc}")
+    }
+
+    fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        writeln!(w, "\nDetailed Description:")?;
+        writeln!(w, "{desc}")
+    }
+
+    fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        writeln!(w, "\nExecution Context:")
+    }
+
+    fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+        writeln!(w, "  - {flag}")
+    }
+
+    fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nParameters ({count}):")
+    }
+
+    fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "  [{}] {} ({})",
+            param.index, param.name, param.type_name
+        )?;
+        if !param.description.is_empty() {
+            writeln!(w, "      {}", param.description)?;
+        }
+
+        // Display flags
+        let mut flags = Vec::new();
+        if param.flags & 0x01 != 0 {
+            flags.push("IN");
+        }
+        if param.flags & 0x02 != 0 {
+            flags.push("OUT");
+        }
+        if param.flags & 0x04 != 0 {
+            flags.push("INOUT");
+        }
+        if param.flags & 0x08 != 0 {
+            flags.push("USER");
+        }
+        if param.flags & 0x10 != 0 {
+            flags.push("OPTIONAL");
+        }
+        if !flags.is_empty() {
+            writeln!(w, "      Flags: {}", flags.join(" | "))?;
+        }
+
+        // Display constraints
+        if let Some(constraint) = &param.constraint {
+            writeln!(w, "      Constraint: {constraint}")?;
+        }
+        if let (Some(min), Some(max)) = (param.min_value, param.max_value) {
+            writeln!(w, "      Range: {min} to {max}")?;
+        }
+        if let Some(mask) = param.valid_mask {
+            writeln!(w, "      Valid mask: 0x{mask:x}")?;
+        }
+        Ok(())
+    }
+
+    fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+        writeln!(w, "\nReturn Value:")?;
+        writeln!(w, "  Type: {}", ret.type_name)?;
+        writeln!(w, "  {}", ret.description)?;
+        if let Some(val) = ret.success_value {
+            writeln!(w, "  Success value: {val}")?;
+        }
+        if let (Some(min), Some(max)) = (ret.success_min, ret.success_max) {
+            writeln!(w, "  Success range: {min} to {max}")?;
+        }
+        Ok(())
+    }
+
+    fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nPossible Errors ({count}):")
+    }
+
+    fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+        writeln!(w, "  {} ({})", error.name, error.error_code)?;
+        if !error.condition.is_empty() {
+            writeln!(w, "      Condition: {}", error.condition)?;
+        }
+        if !error.description.is_empty() {
+            writeln!(w, "      {}", error.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+        writeln!(w, "\nExamples:")?;
+        writeln!(w, "{examples}")
+    }
+
+    fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+        writeln!(w, "\nNotes:")?;
+        writeln!(w, "{notes}")
+    }
+
+    fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+        writeln!(w, "\nAvailable since: {version}")
+    }
+
+    fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+        writeln!(w, "Subsystem: {subsystem}")
+    }
+
+    fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+        writeln!(w, "Sysfs Path: {path}")
+    }
+
+    fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+        writeln!(w, "Permissions: {perms}")
+    }
+
+    // Networking-specific methods
+    fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+        writeln!(w, "\nSocket State Requirements:")?;
+        if !state.required_states.is_empty() {
+            writeln!(w, "  Required states: {:?}", state.required_states)?;
+        }
+        if !state.forbidden_states.is_empty() {
+            writeln!(w, "  Forbidden states: {:?}", state.forbidden_states)?;
+        }
+        if let Some(result) = &state.resulting_state {
+            writeln!(w, "  Resulting state: {result}")?;
+        }
+        if let Some(cond) = &state.condition {
+            writeln!(w, "  Condition: {cond}")?;
+        }
+        if let Some(protos) = &state.applicable_protocols {
+            writeln!(w, "  Applicable protocols: {protos}")?;
+        }
+        Ok(())
+    }
+
+    fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        writeln!(w, "\nProtocol-Specific Behaviors:")
+    }
+
+    fn protocol_behavior(
+        &mut self,
+        w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "  {} - {}",
+            behavior.applicable_protocols, behavior.behavior
+        )?;
+        if let Some(flags) = &behavior.protocol_flags {
+            writeln!(w, "    Flags: {flags}")?;
+        }
+        Ok(())
+    }
+
+    fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        writeln!(w, "\nSupported Address Families:")
+    }
+
+    fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+        writeln!(w, "  {} ({}):", family.family_name, family.family)?;
+        writeln!(w, "    Struct size: {} bytes", family.addr_struct_size)?;
+        writeln!(
+            w,
+            "    Address length: {}-{} bytes",
+            family.min_addr_len, family.max_addr_len
+        )?;
+        if let Some(format) = &family.addr_format {
+            writeln!(w, "    Format: {format}")?;
+        }
+        writeln!(
+            w,
+            "    Features: wildcard={}, multicast={}, broadcast={}",
+            family.supports_wildcard, family.supports_multicast, family.supports_broadcast
+        )?;
+        if let Some(special) = &family.special_addresses {
+            writeln!(w, "    Special addresses: {special}")?;
+        }
+        if family.port_range_max > 0 {
+            writeln!(
+                w,
+                "    Port range: {}-{}",
+                family.port_range_min, family.port_range_max
+            )?;
+        }
+        Ok(())
+    }
+
+    fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+        writeln!(w, "\nBuffer Specification:")?;
+        if let Some(behaviors) = &spec.buffer_behaviors {
+            writeln!(w, "  Behaviors: {behaviors}")?;
+        }
+        if let Some(min) = spec.min_buffer_size {
+            writeln!(w, "  Min size: {min} bytes")?;
+        }
+        if let Some(max) = spec.max_buffer_size {
+            writeln!(w, "  Max size: {max} bytes")?;
+        }
+        if let Some(optimal) = spec.optimal_buffer_size {
+            writeln!(w, "  Optimal size: {optimal} bytes")?;
+        }
+        Ok(())
+    }
+
+    fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+        writeln!(w, "\nAsynchronous Operation:")?;
+        if let Some(modes) = &spec.supported_modes {
+            writeln!(w, "  Supported modes: {modes}")?;
+        }
+        if let Some(errno) = spec.nonblock_errno {
+            writeln!(w, "  Non-blocking errno: {errno}")?;
+        }
+        Ok(())
+    }
+
+    fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        writeln!(w, "\nNetwork Data Transfer: {desc}")
+    }
+
+    fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        writeln!(w, "\nRequired Capabilities:")
+    }
+
+    fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+        writeln!(w, "  {} ({}) - {}", cap.name, cap.capability, cap.action)?;
+        if !cap.allows.is_empty() {
+            writeln!(w, "    Allows: {}", cap.allows)?;
+        }
+        if !cap.without_cap.is_empty() {
+            writeln!(w, "    Without capability: {}", cap.without_cap)?;
+        }
+        if let Some(cond) = &cap.check_condition {
+            writeln!(w, "    Condition: {cond}")?;
+        }
+        Ok(())
+    }
+
+    fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    // Signal-related methods
+    fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nSignal Specifications ({count}):")
+    }
+
+    fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()> {
+        write!(w, "  {} ({})", signal.signal_name, signal.signal_num)?;
+
+        // Display direction
+        let direction = match signal.direction {
+            0 => "SEND",
+            1 => "RECEIVE",
+            2 => "HANDLE",
+            3 => "IGNORE",
+            _ => "UNKNOWN",
+        };
+        write!(w, " - {direction}")?;
+
+        // Display action
+        let action = match signal.action {
+            0 => "DEFAULT",
+            1 => "TERMINATE",
+            2 => "COREDUMP",
+            3 => "STOP",
+            4 => "CONTINUE",
+            5 => "IGNORE",
+            6 => "CUSTOM",
+            7 => "DISCARD",
+            _ => "UNKNOWN",
+        };
+        writeln!(w, " - {action}")?;
+
+        if let Some(target) = &signal.target {
+            writeln!(w, "      Target: {target}")?;
+        }
+        if let Some(condition) = &signal.condition {
+            writeln!(w, "      Condition: {condition}")?;
+        }
+        if let Some(desc) = &signal.description {
+            writeln!(w, "      {desc}")?;
+        }
+
+        // Display timing
+        let timing = match signal.timing {
+            0 => "BEFORE",
+            1 => "DURING",
+            2 => "AFTER",
+            3 => "EXIT",
+            _ => "UNKNOWN",
+        };
+        writeln!(w, "      Timing: {timing}")?;
+        writeln!(w, "      Priority: {}", signal.priority)?;
+
+        if signal.restartable {
+            writeln!(w, "      Restartable: yes")?;
+        }
+        if signal.interruptible {
+            writeln!(w, "      Interruptible: yes")?;
+        }
+        if let Some(error) = signal.error_on_signal {
+            writeln!(w, "      Error on signal: {error}")?;
+        }
+        Ok(())
+    }
+
+    fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nSignal Masks ({count}):")
+    }
+
+    fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()> {
+        writeln!(w, "  {}", mask.name)?;
+        if !mask.description.is_empty() {
+            writeln!(w, "      {}", mask.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    // Side effects and state transitions
+    fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nSide Effects ({count}):")
+    }
+
+    fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+        writeln!(w, "  {} - {}", effect.target, effect.description)?;
+        if let Some(condition) = &effect.condition {
+            writeln!(w, "      Condition: {condition}")?;
+        }
+        if effect.reversible {
+            writeln!(w, "      Reversible: yes")?;
+        }
+        Ok(())
+    }
+
+    fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nState Transitions ({count}):")
+    }
+
+    fn state_transition(
+        &mut self,
+        w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "  {} : {} -> {}",
+            trans.object, trans.from_state, trans.to_state
+        )?;
+        if let Some(condition) = &trans.condition {
+            writeln!(w, "      Condition: {condition}")?;
+        }
+        if !trans.description.is_empty() {
+            writeln!(w, "      {}", trans.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    // Constraints and locks
+    fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nAdditional Constraints ({count}):")
+    }
+
+    fn constraint(
+        &mut self,
+        w: &mut dyn Write,
+        constraint: &ConstraintSpec,
+    ) -> std::io::Result<()> {
+        writeln!(w, "  {}", constraint.name)?;
+        if !constraint.description.is_empty() {
+            writeln!(w, "      {}", constraint.description)?;
+        }
+        if let Some(expr) = &constraint.expression {
+            writeln!(w, "      Expression: {expr}")?;
+        }
+        Ok(())
+    }
+
+    fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nLocking Requirements ({count}):")
+    }
+
+    fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+        write!(w, "  {}", lock.lock_name)?;
+
+        // Display lock type
+        let lock_type = match lock.lock_type {
+            0 => "NONE",
+            1 => "MUTEX",
+            2 => "SPINLOCK",
+            3 => "RWLOCK",
+            4 => "SEQLOCK",
+            5 => "RCU",
+            6 => "SEMAPHORE",
+            7 => "CUSTOM",
+            _ => "UNKNOWN",
+        };
+        writeln!(w, " ({lock_type})")?;
+
+        let scope_str = match lock.scope {
+            0 => "acquired and released",
+            1 => "acquired (not released)",
+            2 => "released (held on entry)",
+            3 => "held by caller",
+            _ => "unknown",
+        };
+        writeln!(w, "      Scope: {scope_str}")?;
+
+        if !lock.description.is_empty() {
+            writeln!(w, "      {}", lock.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        writeln!(w, "\nStructure Specifications ({count}):")
+    }
+
+    fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::StructSpec) -> std::io::Result<()> {
+        writeln!(w, "  {} (size={}, align={}):", spec.name, spec.size, spec.alignment)?;
+        if !spec.description.is_empty() {
+            writeln!(w, "      {}", spec.description)?;
+        }
+
+        if !spec.fields.is_empty() {
+            writeln!(w, "      Fields ({}):", spec.field_count)?;
+            for field in &spec.fields {
+                write!(w, "        - {} ({}):", field.name, field.type_name)?;
+                if !field.description.is_empty() {
+                    write!(w, " {}", field.description)?;
+                }
+                writeln!(w)?;
+
+                // Show constraints if present
+                if field.min_value != 0 || field.max_value != 0 {
+                    writeln!(w, "          Range: [{}, {}]", field.min_value, field.max_value)?;
+                }
+                if field.valid_mask != 0 {
+                    writeln!(w, "          Mask: {:#x}", field.valid_mask)?;
+                }
+            }
+        }
+        Ok(())
+    }
+
+    fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/formatter/rst.rs b/tools/kapi/src/formatter/rst.rs
new file mode 100644
index 0000000000000..51d0be911480b
--- /dev/null
+++ b/tools/kapi/src/formatter/rst.rs
@@ -0,0 +1,621 @@
+use super::OutputFormatter;
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec,
+};
+use std::io::Write;
+
+pub struct RstFormatter {
+    current_section_level: usize,
+}
+
+impl RstFormatter {
+    pub fn new() -> Self {
+        RstFormatter {
+            current_section_level: 0,
+        }
+    }
+
+    fn section_char(level: usize) -> char {
+        match level {
+            0 => '=',
+            1 => '-',
+            2 => '~',
+            3 => '^',
+            _ => '"',
+        }
+    }
+}
+
+impl OutputFormatter for RstFormatter {
+    fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+        writeln!(w, "\n{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(0).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+        writeln!(w, "* **{name}** (*{api_type}*)")
+    }
+
+    fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+        writeln!(w, "\n**Total specifications found:** {count}")
+    }
+
+    fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+        self.current_section_level = 0;
+        writeln!(w, "\n{name}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(0).to_string().repeat(name.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        writeln!(w, "**{desc}**")?;
+        writeln!(w)
+    }
+
+    fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        writeln!(w, "{desc}")?;
+        writeln!(w)
+    }
+
+    fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Execution Context";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+        writeln!(w, "* {flag}")
+    }
+
+    fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        writeln!(w)
+    }
+
+    fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = format!("Parameters ({count})");
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = format!("Possible Errors ({count})");
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Examples";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+        writeln!(w, ".. code-block:: c")?;
+        writeln!(w)?;
+        for line in examples.lines() {
+            writeln!(w, "   {line}")?;
+        }
+        writeln!(w)
+    }
+
+    fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Notes";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+        writeln!(w, "{notes}")?;
+        writeln!(w)
+    }
+
+    fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+        writeln!(w, ":Available since: {version}")?;
+        writeln!(w)
+    }
+
+    fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+        writeln!(w, ":Subsystem: {subsystem}")?;
+        writeln!(w)
+    }
+
+    fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+        writeln!(w, ":Sysfs Path: {path}")?;
+        writeln!(w)
+    }
+
+    fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+        writeln!(w, ":Permissions: {perms}")?;
+        writeln!(w)
+    }
+
+    // Networking-specific methods
+    fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Socket State Requirements";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+
+        if !state.required_states.is_empty() {
+            writeln!(
+                w,
+                "**Required states:** {}",
+                state.required_states.join(", ")
+            )?;
+        }
+        if !state.forbidden_states.is_empty() {
+            writeln!(
+                w,
+                "**Forbidden states:** {}",
+                state.forbidden_states.join(", ")
+            )?;
+        }
+        if let Some(result) = &state.resulting_state {
+            writeln!(w, "**Resulting state:** {result}")?;
+        }
+        if let Some(cond) = &state.condition {
+            writeln!(w, "**Condition:** {cond}")?;
+        }
+        if let Some(protos) = &state.applicable_protocols {
+            writeln!(w, "**Applicable protocols:** {protos}")?;
+        }
+        writeln!(w)
+    }
+
+    fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Protocol-Specific Behaviors";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn protocol_behavior(
+        &mut self,
+        w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()> {
+        writeln!(w, "**{}**", behavior.applicable_protocols)?;
+        writeln!(w)?;
+        writeln!(w, "{}", behavior.behavior)?;
+        if let Some(flags) = &behavior.protocol_flags {
+            writeln!(w)?;
+            writeln!(w, "*Flags:* {flags}")?;
+        }
+        writeln!(w)
+    }
+
+    fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Supported Address Families";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+        writeln!(w, "**{} ({})**", family.family_name, family.family)?;
+        writeln!(w)?;
+        writeln!(w, "* **Struct size:** {} bytes", family.addr_struct_size)?;
+        writeln!(
+            w,
+            "* **Address length:** {}-{} bytes",
+            family.min_addr_len, family.max_addr_len
+        )?;
+        if let Some(format) = &family.addr_format {
+            writeln!(w, "* **Format:** ``{format}``")?;
+        }
+        writeln!(
+            w,
+            "* **Features:** wildcard={}, multicast={}, broadcast={}",
+            family.supports_wildcard, family.supports_multicast, family.supports_broadcast
+        )?;
+        if let Some(special) = &family.special_addresses {
+            writeln!(w, "* **Special addresses:** {special}")?;
+        }
+        if family.port_range_max > 0 {
+            writeln!(
+                w,
+                "* **Port range:** {}-{}",
+                family.port_range_min, family.port_range_max
+            )?;
+        }
+        writeln!(w)
+    }
+
+    fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Buffer Specification";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+
+        if let Some(behaviors) = &spec.buffer_behaviors {
+            writeln!(w, "**Behaviors:** {behaviors}")?;
+        }
+        if let Some(min) = spec.min_buffer_size {
+            writeln!(w, "**Min size:** {min} bytes")?;
+        }
+        if let Some(max) = spec.max_buffer_size {
+            writeln!(w, "**Max size:** {max} bytes")?;
+        }
+        if let Some(optimal) = spec.optimal_buffer_size {
+            writeln!(w, "**Optimal size:** {optimal} bytes")?;
+        }
+        writeln!(w)
+    }
+
+    fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Asynchronous Operation";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+
+        if let Some(modes) = &spec.supported_modes {
+            writeln!(w, "**Supported modes:** {modes}")?;
+        }
+        if let Some(errno) = spec.nonblock_errno {
+            writeln!(w, "**Non-blocking errno:** {errno}")?;
+        }
+        writeln!(w)
+    }
+
+    fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+        writeln!(w, "**Network Data Transfer:** {desc}")?;
+        writeln!(w)
+    }
+
+    fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = "Required Capabilities";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+        writeln!(w, "**{} ({})** - {}", cap.name, cap.capability, cap.action)?;
+        writeln!(w)?;
+        if !cap.allows.is_empty() {
+            writeln!(w, "* **Allows:** {}", cap.allows)?;
+        }
+        if !cap.without_cap.is_empty() {
+            writeln!(w, "* **Without capability:** {}", cap.without_cap)?;
+        }
+        if let Some(cond) = &cap.check_condition {
+            writeln!(w, "* **Condition:** {}", cond)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    // Stub implementations for new methods
+    fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "**[{}] {}** (*{}*)",
+            param.index, param.name, param.type_name
+        )?;
+        writeln!(w)?;
+        writeln!(w, "  {}", param.description)?;
+
+        // Display flags
+        let mut flags = Vec::new();
+        if param.flags & 0x01 != 0 {
+            flags.push("IN");
+        }
+        if param.flags & 0x02 != 0 {
+            flags.push("OUT");
+        }
+        if param.flags & 0x04 != 0 {
+            flags.push("USER");
+        }
+        if param.flags & 0x08 != 0 {
+            flags.push("OPTIONAL");
+        }
+        if !flags.is_empty() {
+            writeln!(w, "  :Flags: {}", flags.join(", "))?;
+        }
+
+        if let Some(constraint) = &param.constraint {
+            writeln!(w, "  :Constraint: {}", constraint)?;
+        }
+
+        if let (Some(min), Some(max)) = (param.min_value, param.max_value) {
+            writeln!(w, "  :Range: {} to {}", min, max)?;
+        }
+
+        writeln!(w)
+    }
+
+    fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+        writeln!(w, "\nReturn Value")?;
+        writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(12))?;
+        writeln!(w)?;
+        writeln!(w, ":Type: {}", ret.type_name)?;
+        writeln!(w, ":Description: {}", ret.description)?;
+        if let Some(success) = ret.success_value {
+            writeln!(w, ":Success value: {}", success)?;
+        }
+        writeln!(w)
+    }
+
+    fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+        writeln!(w, "**{}** ({})", error.name, error.error_code)?;
+        writeln!(w)?;
+        writeln!(w, "  :Condition: {}", error.condition)?;
+        if !error.description.is_empty() {
+            writeln!(w, "  :Description: {}", error.description)?;
+        }
+        writeln!(w)
+    }
+
+    fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn signal(&mut self, _w: &mut dyn Write, _signal: &SignalSpec) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn signal_mask(&mut self, _w: &mut dyn Write, _mask: &SignalMaskSpec) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = format!("Side Effects ({count})");
+        writeln!(w, "{}\n", title)?;
+        writeln!(
+            w,
+            "{}\n",
+            Self::section_char(1).to_string().repeat(title.len())
+        )
+    }
+
+    fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+        write!(w, "* **{}**", effect.target)?;
+        if effect.reversible {
+            write!(w, " *(reversible)*")?;
+        }
+        writeln!(w)?;
+        writeln!(w, "  {}", effect.description)?;
+        if let Some(cond) = &effect.condition {
+            writeln!(w, "  :Condition: {}", cond)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = format!("State Transitions ({count})");
+        writeln!(w, "{}\n", title)?;
+        writeln!(
+            w,
+            "{}\n",
+            Self::section_char(1).to_string().repeat(title.len())
+        )
+    }
+
+    fn state_transition(
+        &mut self,
+        w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "* **{}**: {} → {}",
+            trans.object, trans.from_state, trans.to_state
+        )?;
+        writeln!(w, "  {}", trans.description)?;
+        if let Some(cond) = &trans.condition {
+            writeln!(w, "  :Condition: {}", cond)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn constraint(
+        &mut self,
+        _w: &mut dyn Write,
+        _constraint: &ConstraintSpec,
+    ) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+        self.current_section_level = 1;
+        let title = format!("Locks ({count})");
+        writeln!(w, "{}\n", title)?;
+        writeln!(
+            w,
+            "{}\n",
+            Self::section_char(1).to_string().repeat(title.len())
+        )
+    }
+
+    fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+        write!(w, "* **{}**", lock.lock_name)?;
+        let lock_type_str = match lock.lock_type {
+            1 => " *(mutex)*",
+            2 => " *(spinlock)*",
+            3 => " *(rwlock)*",
+            4 => " *(semaphore)*",
+            5 => " *(RCU)*",
+            _ => "",
+        };
+        writeln!(w, "{}", lock_type_str)?;
+        if !lock.description.is_empty() {
+            writeln!(w, "  {}", lock.description)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_struct_specs(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+        writeln!(w)?;
+        writeln!(w, "Structure Specifications")?;
+        writeln!(w, "~~~~~~~~~~~~~~~~~~~~~~~")?;
+        writeln!(w)
+    }
+
+    fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::StructSpec) -> std::io::Result<()> {
+        writeln!(w, "**{}**", spec.name)?;
+        writeln!(w)?;
+
+        if !spec.description.is_empty() {
+            writeln!(w, "  {}", spec.description)?;
+            writeln!(w)?;
+        }
+
+        writeln!(w, "  :Size: {} bytes", spec.size)?;
+        writeln!(w, "  :Alignment: {} bytes", spec.alignment)?;
+        writeln!(w, "  :Fields: {}", spec.field_count)?;
+        writeln!(w)?;
+
+        if !spec.fields.is_empty() {
+            for field in &spec.fields {
+                writeln!(w, "  * **{}** ({})", field.name, field.type_name)?;
+                if !field.description.is_empty() {
+                    writeln!(w, "    {}", field.description)?;
+                }
+                if field.min_value != 0 || field.max_value != 0 {
+                    writeln!(w, "    Range: [{}, {}]", field.min_value, field.max_value)?;
+                }
+            }
+            writeln!(w)?;
+        }
+
+        Ok(())
+    }
+
+    fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/main.rs b/tools/kapi/src/main.rs
new file mode 100644
index 0000000000000..2d219046f3287
--- /dev/null
+++ b/tools/kapi/src/main.rs
@@ -0,0 +1,116 @@
+//! kapi - Kernel API Specification Tool
+//!
+//! This tool extracts and displays kernel API specifications from multiple sources:
+//! - Kernel source code (KAPI macros)
+//! - Compiled vmlinux binaries (`.kapi_specs` ELF section)
+//! - Running kernel via debugfs
+
+use anyhow::Result;
+use clap::Parser;
+use std::io::{self, Write};
+
+mod extractor;
+mod formatter;
+
+use extractor::{ApiExtractor, DebugfsExtractor, SourceExtractor, VmlinuxExtractor};
+use formatter::{OutputFormat, create_formatter};
+
+#[derive(Parser, Debug)]
+#[command(author, version, about, long_about = None)]
+struct Args {
+    /// Path to the vmlinux file
+    #[arg(long, value_name = "PATH", group = "input")]
+    vmlinux: Option<String>,
+
+    /// Path to kernel source directory or file
+    #[arg(long, value_name = "PATH", group = "input")]
+    source: Option<String>,
+
+    /// Path to debugfs (defaults to /sys/kernel/debug if not specified)
+    #[arg(long, value_name = "PATH", group = "input")]
+    debugfs: Option<String>,
+
+    /// Optional: Name of specific API to show details for
+    api_name: Option<String>,
+
+    /// Output format
+    #[arg(long, short = 'f', default_value = "plain")]
+    format: String,
+}
+
+fn main() -> Result<()> {
+    let args = Args::parse();
+
+    let output_format: OutputFormat = args
+        .format
+        .parse()
+        .map_err(|e: String| anyhow::anyhow!(e))?;
+
+    let extractor: Box<dyn ApiExtractor> = match (args.vmlinux, args.source, args.debugfs.clone()) {
+        (Some(vmlinux_path), None, None) => Box::new(VmlinuxExtractor::new(&vmlinux_path)?),
+        (None, Some(source_path), None) => Box::new(SourceExtractor::new(&source_path)?),
+        (None, None, Some(_) | None) => {
+            // If debugfs is specified or no input is provided, use debugfs
+            Box::new(DebugfsExtractor::new(args.debugfs)?)
+        }
+        _ => {
+            anyhow::bail!("Please specify only one of --vmlinux, --source, or --debugfs")
+        }
+    };
+
+    display_apis(extractor.as_ref(), args.api_name, output_format)
+}
+
+fn display_apis(
+    extractor: &dyn ApiExtractor,
+    api_name: Option<String>,
+    output_format: OutputFormat,
+) -> Result<()> {
+    let mut formatter = create_formatter(output_format);
+    let mut stdout = io::stdout();
+
+    formatter.begin_document(&mut stdout)?;
+
+    if let Some(api_name_req) = api_name {
+        // Use the extractor to display API details
+        if let Some(_spec) = extractor.extract_by_name(&api_name_req)? {
+            extractor.display_api_details(&api_name_req, &mut *formatter, &mut stdout)?;
+        } else if output_format == OutputFormat::Plain {
+            writeln!(stdout, "\nAPI '{}' not found.", api_name_req)?;
+            writeln!(stdout, "\nAvailable APIs:")?;
+            for spec in extractor.extract_all()? {
+                writeln!(stdout, "  {} ({})", spec.name, spec.api_type)?;
+            }
+        }
+    } else {
+        // Display list of APIs using the extractor
+        let all_specs = extractor.extract_all()?;
+
+        // Helper to display API list for a specific type
+        let mut display_api_type = |api_type: &str, title: &str| -> Result<()> {
+            let filtered: Vec<_> = all_specs.iter()
+                .filter(|s| s.api_type == api_type)
+                .collect();
+
+            if !filtered.is_empty() {
+                formatter.begin_api_list(&mut stdout, title)?;
+                for spec in filtered {
+                    formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+                }
+                formatter.end_api_list(&mut stdout)?;
+            }
+            Ok(())
+        };
+
+        display_api_type("syscall", "System Calls")?;
+        display_api_type("ioctl", "IOCTLs")?;
+        display_api_type("function", "Functions")?;
+        display_api_type("sysfs", "Sysfs Attributes")?;
+
+        formatter.total_specs(&mut stdout, all_specs.len())?;
+    }
+
+    formatter.end_document(&mut stdout)?;
+
+    Ok(())
+}
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 03/15] kernel/api: add debugfs interface for kernel API specifications
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

Add a debugfs interface to expose kernel API specifications at runtime.
This allows tools and users to query the complete API specifications
through the debugfs filesystem.

The interface provides:
- /sys/kernel/debug/kapi/list - lists all available API specifications
- /sys/kernel/debug/kapi/specs/<name> - detailed info for each API

Each specification file includes:
- Function name, version, and descriptions
- Execution context requirements and flags
- Parameter details with types, flags, and constraints
- Return value specifications and success conditions
- Error codes with descriptions and conditions
- Locking requirements and constraints
- Signal handling specifications
- Examples, notes, and deprecation status

This enables runtime introspection of kernel APIs for documentation
tools, static analyzers, and debugging purposes.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/api/Kconfig        |  20 +++
 kernel/api/Makefile       |   3 +
 kernel/api/kapi_debugfs.c | 358 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 381 insertions(+)
 create mode 100644 kernel/api/kapi_debugfs.c

diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
index fde25ec70e134..d2754b21acc43 100644
--- a/kernel/api/Kconfig
+++ b/kernel/api/Kconfig
@@ -33,3 +33,23 @@ config KAPI_RUNTIME_CHECKS
 	  development. The checks use WARN_ONCE to report violations.
 
 	  If unsure, say N.
+
+config KAPI_SPEC_DEBUGFS
+	bool "Export kernel API specifications via debugfs"
+	depends on KAPI_SPEC
+	depends on DEBUG_FS
+	help
+	  This option enables exporting kernel API specifications through
+	  the debugfs filesystem. When enabled, specifications can be
+	  accessed at /sys/kernel/debug/kapi/.
+
+	  The debugfs interface provides:
+	  - A list of all available API specifications
+	  - Detailed information for each API including parameters,
+	    return values, errors, locking requirements, and constraints
+	  - Complete machine-readable representation of the specs
+
+	  This is useful for documentation tools, static analyzers, and
+	  runtime introspection of kernel APIs.
+
+	  If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index acab17c78afa3..716f128eea71d 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -10,6 +10,9 @@ obj-$(CONFIG_KAPI_SPEC)		+= kernel_api_spec.o
 ifeq ($(CONFIG_KAPI_SPEC),y)
 obj-$(CONFIG_KAPI_SPEC)		+= generated_api_specs.o
 
+# Debugfs interface for kernel API specs
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
+
 # Find all potential apispec files (this is evaluated at make time)
 apispec-files := $(shell find $(objtree) -name "*.apispec.h" -type f 2>/dev/null)
 
diff --git a/kernel/api/kapi_debugfs.c b/kernel/api/kapi_debugfs.c
new file mode 100644
index 0000000000000..84d5446d93916
--- /dev/null
+++ b/kernel/api/kapi_debugfs.c
@@ -0,0 +1,358 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kernel API specification debugfs interface
+ *
+ * This provides a debugfs interface to expose kernel API specifications
+ * at runtime, allowing tools and users to query the complete API specs.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+/* External symbols for kernel API spec section */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+static struct dentry *kapi_debugfs_root;
+
+/* Helper function to print parameter type as string */
+static const char *param_type_str(enum kapi_param_type type)
+{
+	switch (type) {
+	case KAPI_TYPE_INT: return "int";
+	case KAPI_TYPE_UINT: return "uint";
+	case KAPI_TYPE_PTR: return "ptr";
+	case KAPI_TYPE_STRUCT: return "struct";
+	case KAPI_TYPE_UNION: return "union";
+	case KAPI_TYPE_ARRAY: return "array";
+	case KAPI_TYPE_FD: return "fd";
+	case KAPI_TYPE_ENUM: return "enum";
+	case KAPI_TYPE_USER_PTR: return "user_ptr";
+	case KAPI_TYPE_PATH: return "path";
+	case KAPI_TYPE_FUNC_PTR: return "func_ptr";
+	case KAPI_TYPE_CUSTOM: return "custom";
+	default: return "unknown";
+	}
+}
+
+/* Helper to print parameter flags */
+static void print_param_flags(struct seq_file *m, u32 flags)
+{
+	seq_printf(m, "    flags: ");
+	if (flags & KAPI_PARAM_IN) seq_printf(m, "IN ");
+	if (flags & KAPI_PARAM_OUT) seq_printf(m, "OUT ");
+	if (flags & KAPI_PARAM_INOUT) seq_printf(m, "INOUT ");
+	if (flags & KAPI_PARAM_OPTIONAL) seq_printf(m, "OPTIONAL ");
+	if (flags & KAPI_PARAM_CONST) seq_printf(m, "CONST ");
+	if (flags & KAPI_PARAM_USER) seq_printf(m, "USER ");
+	if (flags & KAPI_PARAM_VOLATILE) seq_printf(m, "VOLATILE ");
+	if (flags & KAPI_PARAM_DMA) seq_printf(m, "DMA ");
+	if (flags & KAPI_PARAM_ALIGNED) seq_printf(m, "ALIGNED ");
+	seq_printf(m, "\n");
+}
+
+/* Helper to print context flags */
+static void print_context_flags(struct seq_file *m, u32 flags)
+{
+	seq_printf(m, "Context flags: ");
+	if (flags & KAPI_CTX_PROCESS) seq_printf(m, "PROCESS ");
+	if (flags & KAPI_CTX_HARDIRQ) seq_printf(m, "HARDIRQ ");
+	if (flags & KAPI_CTX_SOFTIRQ) seq_printf(m, "SOFTIRQ ");
+	if (flags & KAPI_CTX_NMI) seq_printf(m, "NMI ");
+	if (flags & KAPI_CTX_SLEEPABLE) seq_printf(m, "SLEEPABLE ");
+	if (flags & KAPI_CTX_ATOMIC) seq_printf(m, "ATOMIC ");
+	if (flags & KAPI_CTX_PREEMPT_DISABLED) seq_printf(m, "PREEMPT_DISABLED ");
+	if (flags & KAPI_CTX_IRQ_DISABLED) seq_printf(m, "IRQ_DISABLED ");
+	seq_printf(m, "\n");
+}
+
+/* Show function for individual API spec */
+static int kapi_spec_show(struct seq_file *m, void *v)
+{
+	struct kernel_api_spec *spec = m->private;
+	int i;
+
+	seq_printf(m, "Kernel API Specification\n");
+	seq_printf(m, "========================\n\n");
+
+	/* Basic info */
+	seq_printf(m, "Name: %s\n", spec->name);
+	seq_printf(m, "Version: %u\n", spec->version);
+	seq_printf(m, "Description: %s\n", spec->description);
+	if (strlen(spec->long_description) > 0)
+		seq_printf(m, "Long description: %s\n", spec->long_description);
+
+	/* Context */
+	print_context_flags(m, spec->context_flags);
+	seq_printf(m, "\n");
+
+	/* Parameters */
+	if (spec->param_count > 0) {
+		seq_printf(m, "Parameters (%u):\n", spec->param_count);
+		for (i = 0; i < spec->param_count; i++) {
+			struct kapi_param_spec *param = &spec->params[i];
+			seq_printf(m, "  [%d] %s:\n", i, param->name);
+			seq_printf(m, "    type: %s (%s)\n",
+				   param_type_str(param->type), param->type_name);
+			print_param_flags(m, param->flags);
+			if (strlen(param->description) > 0)
+				seq_printf(m, "    description: %s\n", param->description);
+			if (param->size > 0)
+				seq_printf(m, "    size: %zu\n", param->size);
+			if (param->alignment > 0)
+				seq_printf(m, "    alignment: %zu\n", param->alignment);
+
+			/* Print constraints if any */
+			if (param->constraint_type != KAPI_CONSTRAINT_NONE) {
+				seq_printf(m, "    constraints:\n");
+				switch (param->constraint_type) {
+				case KAPI_CONSTRAINT_RANGE:
+					seq_printf(m, "      type: range\n");
+					seq_printf(m, "      min: %lld\n", param->min_value);
+					seq_printf(m, "      max: %lld\n", param->max_value);
+					break;
+				case KAPI_CONSTRAINT_MASK:
+					seq_printf(m, "      type: mask\n");
+					seq_printf(m, "      valid_bits: 0x%llx\n", param->valid_mask);
+					break;
+				case KAPI_CONSTRAINT_ENUM:
+					seq_printf(m, "      type: enum\n");
+					seq_printf(m, "      count: %u\n", param->enum_count);
+					break;
+				case KAPI_CONSTRAINT_USER_STRING:
+					seq_printf(m, "      type: user_string\n");
+					seq_printf(m, "      min_len: %lld\n", param->min_value);
+					seq_printf(m, "      max_len: %lld\n", param->max_value);
+					break;
+				case KAPI_CONSTRAINT_USER_PATH:
+					seq_printf(m, "      type: user_path\n");
+					seq_printf(m, "      max_len: PATH_MAX (4096)\n");
+					break;
+				case KAPI_CONSTRAINT_USER_PTR:
+					seq_printf(m, "      type: user_ptr\n");
+					seq_printf(m, "      size: %zu bytes\n", param->size);
+					break;
+				case KAPI_CONSTRAINT_CUSTOM:
+					seq_printf(m, "      type: custom\n");
+					if (strlen(param->constraints) > 0)
+						seq_printf(m, "      description: %s\n",
+							   param->constraints);
+					break;
+				default:
+					break;
+				}
+			}
+			seq_printf(m, "\n");
+		}
+	}
+
+	/* Return value */
+	seq_printf(m, "Return value:\n");
+	seq_printf(m, "  type: %s\n", spec->return_spec.type_name);
+	if (strlen(spec->return_spec.description) > 0)
+		seq_printf(m, "  description: %s\n", spec->return_spec.description);
+
+	switch (spec->return_spec.check_type) {
+	case KAPI_RETURN_EXACT:
+		seq_printf(m, "  success: == %lld\n", spec->return_spec.success_value);
+		break;
+	case KAPI_RETURN_RANGE:
+		seq_printf(m, "  success: [%lld, %lld]\n",
+			   spec->return_spec.success_min,
+			   spec->return_spec.success_max);
+		break;
+	case KAPI_RETURN_FD:
+		seq_printf(m, "  success: valid file descriptor (>= 0)\n");
+		break;
+	case KAPI_RETURN_ERROR_CHECK:
+		seq_printf(m, "  success: error check\n");
+		break;
+	case KAPI_RETURN_CUSTOM:
+		seq_printf(m, "  success: custom check\n");
+		break;
+	default:
+		break;
+	}
+	seq_printf(m, "\n");
+
+	/* Errors */
+	if (spec->error_count > 0) {
+		seq_printf(m, "Errors (%u):\n", spec->error_count);
+		for (i = 0; i < spec->error_count; i++) {
+			struct kapi_error_spec *err = &spec->errors[i];
+			seq_printf(m, "  %s (%d): %s\n",
+				   err->name, err->error_code, err->description);
+			if (strlen(err->condition) > 0)
+				seq_printf(m, "    condition: %s\n", err->condition);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Locks */
+	if (spec->lock_count > 0) {
+		seq_printf(m, "Locks (%u):\n", spec->lock_count);
+		for (i = 0; i < spec->lock_count; i++) {
+			struct kapi_lock_spec *lock = &spec->locks[i];
+			const char *type_str, *scope_str;
+			switch (lock->lock_type) {
+			case KAPI_LOCK_MUTEX: type_str = "mutex"; break;
+			case KAPI_LOCK_SPINLOCK: type_str = "spinlock"; break;
+			case KAPI_LOCK_RWLOCK: type_str = "rwlock"; break;
+			case KAPI_LOCK_SEMAPHORE: type_str = "semaphore"; break;
+			case KAPI_LOCK_RCU: type_str = "rcu"; break;
+			case KAPI_LOCK_SEQLOCK: type_str = "seqlock"; break;
+			default: type_str = "unknown"; break;
+			}
+			switch (lock->scope) {
+			case KAPI_LOCK_INTERNAL: scope_str = "acquired and released"; break;
+			case KAPI_LOCK_ACQUIRES: scope_str = "acquired (not released)"; break;
+			case KAPI_LOCK_RELEASES: scope_str = "released (held on entry)"; break;
+			case KAPI_LOCK_CALLER_HELD: scope_str = "held by caller"; break;
+			default: scope_str = "unknown"; break;
+			}
+			seq_printf(m, "  %s (%s): %s\n",
+				   lock->lock_name, type_str, lock->description);
+			seq_printf(m, "    scope: %s\n", scope_str);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Constraints */
+	if (spec->constraint_count > 0) {
+		seq_printf(m, "Additional constraints (%u):\n", spec->constraint_count);
+		for (i = 0; i < spec->constraint_count; i++) {
+			struct kapi_constraint_spec *cons = &spec->constraints[i];
+
+			seq_printf(m, "  - %s", cons->name);
+			if (cons->description[0])
+				seq_printf(m, ": %s", cons->description);
+			seq_printf(m, "\n");
+			if (cons->expression[0])
+				seq_printf(m, "    expression: %s\n", cons->expression);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Signals */
+	if (spec->signal_count > 0) {
+		seq_printf(m, "Signal handling (%u):\n", spec->signal_count);
+		for (i = 0; i < spec->signal_count; i++) {
+			struct kapi_signal_spec *sig = &spec->signals[i];
+			seq_printf(m, "  %s (%d):\n", sig->signal_name, sig->signal_num);
+			seq_printf(m, "    direction: ");
+			if (sig->direction & KAPI_SIGNAL_SEND) seq_printf(m, "send ");
+			if (sig->direction & KAPI_SIGNAL_RECEIVE) seq_printf(m, "receive ");
+			if (sig->direction & KAPI_SIGNAL_HANDLE) seq_printf(m, "handle ");
+			if (sig->direction & KAPI_SIGNAL_BLOCK) seq_printf(m, "block ");
+			if (sig->direction & KAPI_SIGNAL_IGNORE) seq_printf(m, "ignore ");
+			seq_printf(m, "\n");
+			seq_printf(m, "    action: ");
+			switch (sig->action) {
+			case KAPI_SIGNAL_ACTION_DEFAULT: seq_printf(m, "default"); break;
+			case KAPI_SIGNAL_ACTION_TERMINATE: seq_printf(m, "terminate"); break;
+			case KAPI_SIGNAL_ACTION_COREDUMP: seq_printf(m, "coredump"); break;
+			case KAPI_SIGNAL_ACTION_STOP: seq_printf(m, "stop"); break;
+			case KAPI_SIGNAL_ACTION_CONTINUE: seq_printf(m, "continue"); break;
+			case KAPI_SIGNAL_ACTION_CUSTOM: seq_printf(m, "custom"); break;
+			case KAPI_SIGNAL_ACTION_RETURN: seq_printf(m, "return"); break;
+			case KAPI_SIGNAL_ACTION_RESTART: seq_printf(m, "restart"); break;
+			default: seq_printf(m, "unknown"); break;
+			}
+			seq_printf(m, "\n");
+			if (strlen(sig->description) > 0)
+				seq_printf(m, "    description: %s\n", sig->description);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Additional info */
+	if (strlen(spec->examples) > 0) {
+		seq_printf(m, "Examples:\n%s\n\n", spec->examples);
+	}
+	if (strlen(spec->notes) > 0) {
+		seq_printf(m, "Notes:\n%s\n\n", spec->notes);
+	}
+	if (strlen(spec->since_version) > 0) {
+		seq_printf(m, "Since: %s\n", spec->since_version);
+	}
+
+	return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops = {
+	.open = kapi_spec_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+/* Show all available API specs */
+static int kapi_list_show(struct seq_file *m, void *v)
+{
+	struct kernel_api_spec *spec;
+	int count = 0;
+
+	seq_printf(m, "Available Kernel API Specifications\n");
+	seq_printf(m, "===================================\n\n");
+
+	for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+		seq_printf(m, "%s - %s\n", spec->name, spec->description);
+		count++;
+	}
+
+	seq_printf(m, "\nTotal: %d specifications\n", count);
+	return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops = {
+	.open = kapi_list_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+	struct kernel_api_spec *spec;
+	struct dentry *spec_dir;
+
+	/* Create main directory */
+	kapi_debugfs_root = debugfs_create_dir("kapi", NULL);
+
+	/* Create list file */
+	debugfs_create_file("list", 0444, kapi_debugfs_root, NULL, &kapi_list_fops);
+
+	/* Create specs subdirectory */
+	spec_dir = debugfs_create_dir("specs", kapi_debugfs_root);
+
+	/* Create a file for each API spec */
+	for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+		debugfs_create_file(spec->name, 0444, spec_dir, spec, &kapi_spec_fops);
+	}
+
+	pr_info("Kernel API debugfs interface initialized\n");
+	return 0;
+}
+
+static void __exit kapi_debugfs_exit(void)
+{
+	debugfs_remove_recursive(kapi_debugfs_root);
+}
+
+/* Initialize as part of kernel, not as a module */
+fs_initcall(kapi_debugfs_init);
\ No newline at end of file
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 02/15] kernel/api: enable kerneldoc-based API specifications
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

This patch adds support for extracting API specifications from
kernel-doc comments and generating C macro invocations for the
kernel API specification framework.

Changes include:
- New kdoc_apispec.py module for generating API spec macros
- Updates to kernel-doc.py to support -apispec output format
- Build system integration in Makefile.build
- Generator script for collecting all API specifications
- Support for API-specific sections in kernel-doc comments

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 scripts/Makefile.build                |  28 +
 scripts/Makefile.clean                |   3 +
 scripts/generate_api_specs.sh         |  83 ++-
 scripts/kernel-doc.py                 |   5 +
 tools/lib/python/kdoc/kdoc_apispec.py | 755 ++++++++++++++++++++++++++
 tools/lib/python/kdoc/kdoc_output.py  |   9 +-
 tools/lib/python/kdoc/kdoc_parser.py  |  86 ++-
 7 files changed, 957 insertions(+), 12 deletions(-)
 create mode 100644 tools/lib/python/kdoc/kdoc_apispec.py

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 52c08c4eb0b9a..7a192d29a01f6 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -172,6 +172,34 @@ ifneq ($(KBUILD_EXTRA_WARN),)
         $<
 endif
 
+# Generate API spec headers from kernel-doc comments
+ifeq ($(CONFIG_KAPI_SPEC),y)
+# Function to check if a file has API specifications
+has-apispec = $(shell grep -qE '^\s*\*\s*(long-desc|context-flags|state-trans):' $(src)/$(1) 2>/dev/null && echo $(1))
+
+# Get base names without directory prefix
+c-objs-base := $(notdir $(real-obj-y) $(real-obj-m))
+# Filter to only .o files with corresponding .c source files
+c-files := $(foreach o,$(c-objs-base),$(if $(wildcard $(src)/$(o:.o=.c)),$(o:.o=.c)))
+# Also check for any additional .c files that contain API specs but are included
+extra-c-files := $(shell find $(src) -maxdepth 1 -name "*.c" -exec grep -l '^\s*\*\s*\(long-desc\|context-flags\|state-trans\):' {} \; 2>/dev/null | xargs -r basename -a)
+# Combine both lists and remove duplicates
+all-c-files := $(sort $(c-files) $(extra-c-files))
+# Only include files that actually have API specifications
+apispec-files := $(foreach f,$(all-c-files),$(call has-apispec,$(f)))
+# Generate apispec targets with proper directory prefix
+apispec-y := $(addprefix $(obj)/,$(apispec-files:.c=.apispec.h))
+always-y += $(apispec-y)
+targets += $(apispec-y)
+
+quiet_cmd_apispec = APISPEC $@
+      cmd_apispec = PYTHONDONTWRITEBYTECODE=1 $(KERNELDOC) -apispec \
+                    $(KDOCFLAGS) $< > $@ || rm -f $@
+
+$(obj)/%.apispec.h: $(src)/%.c FORCE
+	$(call if_changed,apispec)
+endif
+
 # Compile C sources (.c)
 # ---------------------------------------------------------------------------
 
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 6ead00ec7313b..f78dbbe637f27 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -35,6 +35,9 @@ __clean-files   := $(filter-out $(no-clean-files), $(__clean-files))
 
 __clean-files   := $(wildcard $(addprefix $(obj)/, $(__clean-files)))
 
+# Also clean generated apispec headers (computed dynamically in Makefile.build)
+__clean-files   += $(wildcard $(obj)/*.apispec.h)
+
 # ==========================================================================
 
 # To make this rule robust against "Argument list too long" error,
diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh
index 2c3078a508fef..3ac6be9b4fe98 100755
--- a/scripts/generate_api_specs.sh
+++ b/scripts/generate_api_specs.sh
@@ -1,18 +1,87 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 #
-# Stub script for generating API specifications collector
-# This is a placeholder until the full implementation is available
+# generate_api_specs.sh - Generate C file that includes all API specification headers
 #
+# Usage: generate_api_specs.sh <srctree> <objtree>
 
-cat << 'EOF'
-// SPDX-License-Identifier: GPL-2.0
+SRCTREE="$1"
+OBJTREE="$2"
+
+if [ -z "$SRCTREE" ] || [ -z "$OBJTREE" ]; then
+    echo "Usage: $0 <srctree> <objtree>" >&2
+    exit 1
+fi
+
+# Generate header
+cat <<EOF
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Auto-generated API specifications collector (stub)
- * Generated by scripts/generate_api_specs.sh
+ * Auto-generated file - DO NOT EDIT
+ * Generated by: scripts/generate_api_specs.sh
+ *
+ * This file includes all kernel API specification headers
  */
 
+#include <linux/kernel.h>
 #include <linux/kernel_api_spec.h>
+#include <linux/errno.h>
+#include <linux/capability.h>
+#include <linux/fcntl.h>
+#include <uapi/linux/aio_abi.h>
+#include <uapi/linux/sched/types.h>
+#include <uapi/linux/xattr.h>
+#include <linux/eventfd.h>
+#include <linux/eventpoll.h>
+#include <linux/wait.h>
+#include <linux/inotify.h>
+#include <linux/splice.h>
+#include <linux/timerfd.h>
+#include <linux/kexec.h>
+#include <uapi/linux/mount.h>
+#include <uapi/linux/signalfd.h>
+#include <linux/fs.h>
+#include <linux/signal.h>
+#include <linux/ptrace.h>
+#include <linux/quota.h>
+#include <linux/uio.h>
+#include <linux/time.h>
+
+#ifdef CONFIG_KAPI_SPEC
+
+EOF
 
-/* No API specifications collected yet */
+# Find all .apispec.h files and generate includes
+# Look in both source tree and object tree
+(find "$SRCTREE" -name "*.apispec.h" -type f 2>/dev/null; \
+ find "$OBJTREE" -name "*.apispec.h" -type f 2>/dev/null) | \
+    grep -v "/generated_api_specs.c" | \
+    sort -u | \
+    while read -r apispec_file; do
+        # Get relative path from srctree or objtree
+        case "$apispec_file" in
+            "$SRCTREE"*)
+                rel_path="${apispec_file#$SRCTREE/}"
+                ;;
+            *)
+                rel_path="${apispec_file#$OBJTREE/}"
+                ;;
+        esac
+
+        # Skip if file is empty
+        if [ ! -s "$apispec_file" ]; then
+            continue
+        fi
+
+        # Generate include statement with relative path from kernel/api/
+        # The generated file is always at kernel/api/generated_api_specs.c,
+        # so we need to go up two directories to reach the root
+        echo "#include \"../../${rel_path}\""
+    done
+
+# Close the ifdef
+cat <<EOF
+
+#endif /* CONFIG_KAPI_SPEC */
 EOF
+
diff --git a/scripts/kernel-doc.py b/scripts/kernel-doc.py
index 7a1eaf986bcd4..8404a9e9e3fed 100755
--- a/scripts/kernel-doc.py
+++ b/scripts/kernel-doc.py
@@ -231,6 +231,8 @@ def main():
                          help="Output reStructuredText format (default).")
     out_fmt.add_argument("-N", "-none", "--none", action="store_true",
                          help="Do not output documentation, only warnings.")
+    out_fmt.add_argument("-apispec", "--apispec", action="store_true",
+                         help="Output C macro invocations for kernel API specifications.")
 
     # Output selection mutually-exclusive group
 
@@ -294,11 +296,14 @@ def main():
     # Import kernel-doc libraries only after checking Python version
     from kdoc.kdoc_files import KernelFiles             # pylint: disable=C0415
     from kdoc.kdoc_output import RestFormat, ManFormat  # pylint: disable=C0415
+    from kdoc.kdoc_apispec import ApiSpecFormat         # pylint: disable=C0415
 
     if args.man:
         out_style = ManFormat(modulename=args.modulename)
     elif args.none:
         out_style = None
+    elif args.apispec:
+        out_style = ApiSpecFormat()
     else:
         out_style = RestFormat()
 
diff --git a/tools/lib/python/kdoc/kdoc_apispec.py b/tools/lib/python/kdoc/kdoc_apispec.py
new file mode 100644
index 0000000000000..ae03af6d10685
--- /dev/null
+++ b/tools/lib/python/kdoc/kdoc_apispec.py
@@ -0,0 +1,755 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Generate C macro invocations for kernel API specifications from kernel-doc comments.
+
+This module creates C header files with API specification macros that match
+the kernel API specification framework introduced in commit 9688de5c25bed.
+"""
+
+from kdoc.kdoc_output import OutputFormat
+import re
+
+
+# Maximum string lengths (from kernel_api_spec.h)
+KAPI_MAX_DESC_LEN = 512
+KAPI_MAX_NAME_LEN = 128
+KAPI_MAX_SIGNAL_NAME_LEN = 32
+
+# Valid KAPI effect types
+VALID_EFFECT_TYPES = {
+    'KAPI_EFFECT_NONE', 'KAPI_EFFECT_MODIFY_STATE', 'KAPI_EFFECT_PROCESS_STATE',
+    'KAPI_EFFECT_IRREVERSIBLE', 'KAPI_EFFECT_SCHEDULE', 'KAPI_EFFECT_FILESYSTEM',
+    'KAPI_EFFECT_HARDWARE', 'KAPI_EFFECT_ALLOC_MEMORY', 'KAPI_EFFECT_FREE_MEMORY',
+    'KAPI_EFFECT_SIGNAL_SEND', 'KAPI_EFFECT_FILE_POSITION', 'KAPI_EFFECT_LOCK_ACQUIRE',
+    'KAPI_EFFECT_LOCK_RELEASE', 'KAPI_EFFECT_RESOURCE_CREATE', 'KAPI_EFFECT_RESOURCE_DESTROY',
+    'KAPI_EFFECT_NETWORK'
+}
+
+
+class ApiSpecFormat(OutputFormat):
+    """Generate C macro invocations for kernel API specifications"""
+
+    def __init__(self):
+        super().__init__()
+        self.header_written = False
+
+    def msg(self, fname, name, args):
+        """Handles a single entry from kernel-doc parser"""
+        if not self.header_written:
+            self.data = self._generate_header()
+            self.header_written = True
+        else:
+            self.data = ""
+
+        result = super().msg(fname, name, args)
+        return result if result else self.data
+
+    def _generate_header(self):
+        """Generate the file header"""
+        return (
+            "/* SPDX-License-Identifier: GPL-2.0 */\n"
+            "/* Auto-generated from kerneldoc annotations - DO NOT EDIT */\n\n"
+            "#include <linux/kernel_api_spec.h>\n"
+            "#include <linux/errno.h>\n\n"
+        )
+
+    def _format_macro_param(self, value, max_len=KAPI_MAX_DESC_LEN):
+        """Format a value for use in C macro parameter, truncating if needed"""
+        if value is None:
+            return '""'
+        value = str(value).replace('\\', '\\\\').replace('"', '\\"')
+        value = value.replace('\n', ' ')
+        # Truncate to fit within max_len, accounting for escaping overhead
+        if len(value) > max_len - 10:
+            value = value[:max_len - 13] + '...'
+        return f'"{value}"'
+
+    def _get_section(self, sections, key):
+        """Get first line from sections, checking with and without @ prefix"""
+        for prefix in ['', '@']:
+            full_key = prefix + key
+            if full_key in sections:
+                content = sections[full_key].strip()
+                # Return only first line to avoid mixing sections
+                return content.split('\n')[0].strip() if content else ''
+        return None
+
+    def _get_raw_section(self, sections, key):
+        """Get full section content, checking with and without @ prefix"""
+        for prefix in ['', '@']:
+            full_key = prefix + key
+            if full_key in sections:
+                return sections[full_key]
+        return ''
+
+    def _parse_indented_items(self, section_content, item_parser):
+        """Generic parser for indented items.
+
+        Args:
+            section_content: Raw section content
+            item_parser: Function that takes (lines, start_index) and returns (item, next_index)
+
+        Returns:
+            List of parsed items
+        """
+        if not section_content:
+            return []
+
+        items = []
+        lines = section_content.strip().split('\n')
+        i = 0
+
+        while i < len(lines):
+            if not lines[i].strip():
+                i += 1
+                continue
+
+            # Check if this is a main item (not indented)
+            if not lines[i].startswith((' ', '\t')):
+                item, i = item_parser(lines, i)
+                if item:
+                    items.append(item)
+            else:
+                i += 1
+
+        return items
+
+    def _parse_subfields(self, lines, start_idx):
+        """Parse indented subfields starting from start_idx+1.
+
+        Returns: (dict of subfields, next index)
+        """
+        subfields = {}
+        i = start_idx + 1
+
+        while i < len(lines) and (lines[i].startswith((' ', '\t'))):
+            line = lines[i].strip()
+            if ':' in line:
+                key, value = line.split(':', 1)
+                subfields[key.strip()] = value.strip()
+            i += 1
+
+        return subfields, i
+
+    def _parse_signal_item(self, lines, i):
+        """Parse a single signal specification"""
+        signal = {'name': lines[i].strip()}
+        subfields, next_i = self._parse_subfields(lines, i)
+
+        # Map subfields to signal attributes
+        signal.update({
+            'direction': subfields.get('direction', 'KAPI_SIGNAL_RECEIVE'),
+            'action': subfields.get('action', 'KAPI_SIGNAL_ACTION_RETURN'),
+            'condition': subfields.get('condition'),
+            'desc': subfields.get('desc'),
+            'error': subfields.get('error'),
+            'timing': subfields.get('timing'),
+            'priority': subfields.get('priority'),
+            'interruptible': subfields.get('interruptible', '').lower() == 'yes',
+            'number': subfields.get('number', '0'),
+        })
+
+        return signal, next_i
+
+    def _parse_error_item(self, lines, i):
+        """Parse a single error specification"""
+        line = lines[i].strip()
+
+        # Skip desc: lines
+        if line.startswith('desc:'):
+            return None, i + 1
+
+        # Check for error pattern
+        if not re.match(r'^[A-Z][A-Z0-9_]+,', line):
+            return None, i + 1
+
+        error = {'line': line, 'desc': ''}
+
+        # Look for desc: continuation
+        i += 1
+        desc_lines = []
+        while i < len(lines):
+            next_line = lines[i].strip()
+            if next_line.startswith('desc:'):
+                desc_lines.append(next_line[5:].strip())
+                i += 1
+            elif not next_line or re.match(r'^[A-Z][A-Z0-9_]+,', next_line):
+                break
+            else:
+                desc_lines.append(next_line)
+                i += 1
+
+        if desc_lines:
+            error['desc'] = ' '.join(desc_lines)
+
+        return error, i
+
+    def _parse_lock_item(self, lines, i):
+        """Parse a single lock specification"""
+        line = lines[i].strip()
+        if ':' not in line:
+            return None, i + 1
+
+        parts = line.split(':', 1)[1].strip().split(',', 1)
+        if len(parts) < 2:
+            return None, i + 1
+
+        lock = {
+            'name': parts[0].strip(),
+            'type': parts[1].strip()
+        }
+
+        subfields, next_i = self._parse_subfields(lines, i)
+
+        # Map boolean fields
+        for field in ['acquired', 'released', 'held-on-entry', 'held-on-exit']:
+            if subfields.get(field, '').lower() == 'true':
+                lock[field] = True
+
+        lock['desc'] = subfields.get('desc', '')
+
+        return lock, next_i
+
+    def _parse_constraint_item(self, lines, i):
+        """Parse a single constraint specification"""
+        line = lines[i].strip()
+
+        # Check for old format with comma
+        if ',' in line:
+            parts = line.split(',', 1)
+            constraint = {
+                'name': parts[0].strip(),
+                'desc': parts[1].strip() if len(parts) > 1 else '',
+                'expr': None
+            }
+        else:
+            constraint = {'name': line, 'desc': '', 'expr': None}
+
+        subfields, next_i = self._parse_subfields(lines, i)
+
+        if 'desc' in subfields:
+            constraint['desc'] = (constraint['desc'] + ' ' + subfields['desc']).strip()
+        constraint['expr'] = subfields.get('expr')
+
+        return constraint, next_i
+
+    def _parse_side_effect_item(self, lines, i):
+        """Parse a single side effect specification"""
+        line = lines[i].strip()
+
+        # Default to new format
+        effect = {
+            'type': line,
+            'target': '',
+            'desc': '',
+            'condition': None,
+            'reversible': False
+        }
+
+        # Check for old format with commas
+        if ',' in line:
+            # Handle condition and reversible flags
+            cond_match = re.search(r',\s*condition=([^,]+?)(?:\s*,\s*reversible=(yes|no)\s*)?$', line)
+            if cond_match:
+                effect['condition'] = cond_match.group(1).strip()
+                effect['reversible'] = cond_match.group(2) == 'yes'
+                line = line[:cond_match.start()]
+            elif ', reversible=yes' in line:
+                effect['reversible'] = True
+                line = line.replace(', reversible=yes', '')
+            elif ', reversible=no' in line:
+                line = line.replace(', reversible=no', '')
+
+            parts = line.split(',', 2)
+            if len(parts) >= 1:
+                effect['type'] = parts[0].strip()
+            if len(parts) >= 2:
+                effect['target'] = parts[1].strip()
+            if len(parts) >= 3:
+                effect['desc'] = parts[2].strip()
+        else:
+            # Multi-line format with subfields
+            subfields, next_i = self._parse_subfields(lines, i)
+            effect.update({
+                'target': subfields.get('target', ''),
+                'desc': subfields.get('desc', ''),
+                'condition': subfields.get('condition'),
+                'reversible': subfields.get('reversible', '').lower() == 'yes'
+            })
+            return effect, next_i
+
+        return effect, i + 1
+
+    def _parse_state_trans_item(self, lines, i):
+        """Parse a single state transition specification"""
+        line = lines[i].strip()
+
+        trans = {
+            'target': line,
+            'from': '',
+            'to': '',
+            'condition': '',
+            'desc': ''
+        }
+
+        # Check for old format with commas
+        if ',' in line:
+            parts = line.split(',', 3)
+            if len(parts) >= 1:
+                trans['target'] = parts[0].strip()
+            if len(parts) >= 2:
+                trans['from'] = parts[1].strip()
+            if len(parts) >= 3:
+                trans['to'] = parts[2].strip()
+            if len(parts) >= 4:
+                desc_part = parts[3].strip()
+                desc_parts = desc_part.split(',', 1)
+                if len(desc_parts) > 1:
+                    trans['condition'] = desc_parts[0].strip()
+                    trans['desc'] = desc_parts[1].strip()
+                else:
+                    trans['desc'] = desc_part
+            return trans, i + 1
+        else:
+            # Multi-line format with subfields
+            subfields, next_i = self._parse_subfields(lines, i)
+            trans.update({
+                'from': subfields.get('from', ''),
+                'to': subfields.get('to', ''),
+                'condition': subfields.get('condition', ''),
+                'desc': subfields.get('desc', '')
+            })
+            return trans, next_i
+
+    def _process_parameters(self, sections, parameterlist, parameterdescs, parametertypes):
+        """Process and output parameter specifications"""
+        param_count = len(parameterlist)
+        if param_count > 0:
+            self.data += f"\n\tKAPI_PARAM_COUNT({param_count})\n"
+
+        for param_idx, param in enumerate(parameterlist):
+            param_name = param.strip()
+            param_desc = parameterdescs.get(param_name, '')
+            param_ctype = parametertypes.get(param_name, '')
+
+            # Parse parameter specifications
+            param_section = self._get_raw_section(sections, 'param')
+            param_specs = {}
+            if param_section:
+                param_specs = self._parse_param_spec(param_section, param_name)
+
+            self.data += f"\n\tKAPI_PARAM({param_idx}, {self._format_macro_param(param_name)}, "
+            self.data += f"{self._format_macro_param(param_ctype)}, {self._format_macro_param(param_desc)})\n"
+
+            # Add parameter attributes
+            for key, macro in [
+                ('param-type', 'KAPI_PARAM_TYPE'),
+                ('param-flags', 'KAPI_PARAM_FLAGS'),
+                ('param-size', 'KAPI_PARAM_SIZE'),
+                ('param-alignment', 'KAPI_PARAM_ALIGNMENT'),
+            ]:
+                if key in param_specs:
+                    self.data += f"\t\t{macro}({param_specs[key]})\n"
+
+            # Handle constraint type
+            if 'param-constraint-type' in param_specs:
+                ctype = param_specs['param-constraint-type']
+                if ctype == 'KAPI_CONSTRAINT_BITMASK':
+                    ctype = 'KAPI_CONSTRAINT_MASK'
+                self.data += f"\t\tKAPI_PARAM_CONSTRAINT_TYPE({ctype})\n"
+
+            # Handle range
+            if 'param-range' in param_specs and ',' in param_specs['param-range']:
+                min_val, max_val = param_specs['param-range'].split(',', 1)
+                self.data += f"\t\tKAPI_PARAM_RANGE({min_val.strip()}, {max_val.strip()})\n"
+
+            # Handle mask
+            if 'param-mask' in param_specs:
+                self.data += f"\t\tKAPI_PARAM_VALID_MASK({param_specs['param-mask']})\n"
+
+            # Handle enum values
+            if 'param-enum-values' in param_specs:
+                self.data += f"\t\tKAPI_PARAM_ENUM_VALUES({param_specs['param-enum-values']})\n"
+
+            # Handle constraint description
+            if 'param-constraint' in param_specs:
+                self.data += f"\t\tKAPI_PARAM_CONSTRAINT({self._format_macro_param(param_specs['param-constraint'])})\n"
+
+            self.data += "\tKAPI_PARAM_END\n"
+
+    def _parse_param_spec(self, section_content, param_name):
+        """Parse parameter specifications from indented format"""
+        specs = {}
+        lines = section_content.strip().split('\n')
+        current_item = None
+
+        # Map to expected keys
+        field_map = {
+            'flags': 'param-flags',
+            'size': 'param-size',
+            'constraint-type': 'param-constraint-type',
+            'constraint': 'param-constraint',
+            'range': 'param-range',
+            'mask': 'param-mask',
+            'valid-mask': 'param-mask',
+            'valid-values': 'param-enum-values',
+            'alignment': 'param-alignment',
+            'struct-type': 'param-struct-type',
+        }
+
+        i = 0
+        while i < len(lines):
+            line = lines[i]
+            if not line.strip():
+                i += 1
+                continue
+
+            # Check if this is our parameter (non-indented line)
+            if not line.startswith((' ', '\t')):
+                parts = line.strip().split(',', 1)
+                current_item = param_name if parts[0].strip() == param_name else None
+                if current_item and len(parts) > 1:
+                    specs['param-type'] = parts[1].strip()
+                i += 1
+            elif current_item == param_name:
+                # Parse subfield
+                stripped = line.strip()
+                if ':' in stripped:
+                    key, value = stripped.split(':', 1)
+                    key = key.strip()
+                    value = value.strip()
+
+                    # Collect continuation lines (indented lines without a colon that
+                    # defines a new key, i.e., lines that are pure continuations)
+                    i += 1
+                    while i < len(lines):
+                        next_line = lines[i]
+                        # Stop if we hit a non-indented line (new param)
+                        if next_line.strip() and not next_line.startswith((' ', '\t')):
+                            break
+                        next_stripped = next_line.strip()
+                        # Stop if we hit a new key (contains colon with known key prefix)
+                        if next_stripped and ':' in next_stripped:
+                            potential_key = next_stripped.split(':', 1)[0].strip()
+                            if potential_key in field_map or potential_key in ['type', 'desc']:
+                                break
+                        # This is a continuation line
+                        if next_stripped:
+                            value = value + ' ' + next_stripped
+                        i += 1
+
+                    if key in field_map:
+                        # Clean up the value - remove excessive whitespace
+                        value = ' '.join(value.split())
+                        specs[field_map[key]] = value
+            else:
+                i += 1
+
+        return specs
+
+    def _validate_effect_type(self, effect_type):
+        """Validate and normalize effect type"""
+        if 'KAPI_EFFECT_SCHEDULER' in effect_type:
+            return effect_type.replace('KAPI_EFFECT_SCHEDULER', 'KAPI_EFFECT_SCHEDULE')
+
+        if 'KAPI_EFFECT_' in effect_type and effect_type not in VALID_EFFECT_TYPES:
+            if '|' in effect_type:
+                parts = [p.strip() for p in effect_type.split('|')]
+                valid_parts = [p if p in VALID_EFFECT_TYPES else 'KAPI_EFFECT_MODIFY_STATE' for p in parts]
+                return ' | '.join(valid_parts)
+            return 'KAPI_EFFECT_MODIFY_STATE'
+
+        return effect_type
+
+    def _has_api_spec(self, sections):
+        """Check if this function has an API specification.
+
+        Returns True if at least 2 KAPI-specific section indicators are present.
+        We require 2+ indicators (not just 1) to avoid false positives from
+        regular kernel-doc comments that happen to use a common section name
+        like 'return' or 'error'. Having multiple KAPI sections strongly
+        suggests intentional API specification rather than coincidence.
+        """
+        indicators = [
+            'api-type', 'context-flags', 'param-type', 'error-code',
+            'capability', 'signal', 'lock', 'state-trans', 'constraint',
+            'return', 'error', 'side-effects', 'struct'
+        ]
+
+        count = sum(1 for ind in indicators
+                   if any(key.lower().startswith(ind.lower()) or
+                         key.lower().startswith('@' + ind.lower())
+                         for key in sections.keys()))
+
+        # Require 2+ indicators to distinguish from regular kernel-doc
+        return count >= 2
+
+    def out_function(self, fname, name, args):
+        """Generate API spec for a function"""
+        function_name = args.get('function', name)
+        sections = args.sections if hasattr(args, 'sections') else args.get('sections', {})
+
+        if not self._has_api_spec(sections):
+            return
+
+        parameterlist = args.parameterlist if hasattr(args, 'parameterlist') else args.get('parameterlist', [])
+        parameterdescs = args.parameterdescs if hasattr(args, 'parameterdescs') else args.get('parameterdescs', {})
+        parametertypes = args.parametertypes if hasattr(args, 'parametertypes') else args.get('parametertypes', {})
+        purpose = args.get('purpose', '')
+
+        # Start macro invocation
+        self.data += f"DEFINE_KERNEL_API_SPEC({function_name})\n"
+
+        # Basic info
+        if purpose:
+            self.data += f"\tKAPI_DESCRIPTION({self._format_macro_param(purpose)})\n"
+
+        long_desc = self._get_section(sections, 'long-desc')
+        if long_desc:
+            self.data += f"\tKAPI_LONG_DESC({self._format_macro_param(long_desc)})\n"
+
+        # Context flags
+        context = self._get_section(sections, 'context-flags') or self._get_section(sections, 'context')
+        if context:
+            self.data += f"\tKAPI_CONTEXT({context})\n"
+
+        # Process parameters
+        self._process_parameters(sections, parameterlist, parameterdescs, parametertypes)
+
+        # Process errors
+        errors = self._parse_indented_items(
+            self._get_raw_section(sections, 'error'),
+            self._parse_error_item
+        )
+
+        if errors:
+            self.data += f"\n\tKAPI_RETURN_ERROR_COUNT({len(errors)})\n"
+            self.data += f"\n\tKAPI_ERROR_COUNT({len(errors)})\n"
+
+            for idx, error in enumerate(errors):
+                self._output_error(idx, error)
+
+        # Process signals
+        signals = self._parse_indented_items(
+            self._get_raw_section(sections, 'signal'),
+            self._parse_signal_item
+        )
+
+        if signals:
+            self.data += f"\n\tKAPI_SIGNAL_COUNT({len(signals)})\n"
+
+            for idx, signal in enumerate(signals):
+                self._output_signal(idx, signal)
+
+        # Process other specifications
+        self._process_locks(sections)
+        self._process_constraints(sections)
+        self._process_side_effects(sections)
+        self._process_state_transitions(sections)
+        self._process_capabilities(sections)
+
+        # Add examples and notes
+        for key, macro in [('examples', 'KAPI_EXAMPLES'), ('notes', 'KAPI_NOTES')]:
+            value = self._get_section(sections, key)
+            if value:
+                self.data += f"\n\t{macro}({self._format_macro_param(value)})\n"
+
+        self.data += "\nKAPI_END_SPEC;\n\n"
+
+    def _output_error(self, idx, error):
+        """Output a single error specification"""
+        line = error['line']
+        if line.startswith('-'):
+            line = line[1:].strip()
+
+        parts = line.split(',', 2)
+        if len(parts) == 2:
+            # Format: NAME, description
+            name = parts[0].strip()
+            short_desc = parts[1].strip()
+            code = f"-{name}"
+        elif len(parts) >= 3:
+            # Format: code, name, description
+            code = parts[0].strip()
+            name = parts[1].strip()
+            short_desc = parts[2].strip()
+            if not code.startswith('-'):
+                code = f"-{code}"
+        else:
+            return
+
+        long_desc = error.get('desc', '') or short_desc
+
+        self.data += f"\n\tKAPI_ERROR({idx}, {code}, {self._format_macro_param(name)}, "
+        self.data += f"{self._format_macro_param(short_desc)},\n\t\t   {self._format_macro_param(long_desc)})\n"
+
+    def _output_signal(self, idx, signal):
+        """Output a single signal specification"""
+        self.data += f"\n\tKAPI_SIGNAL({idx}, {signal['number']}, "
+        self.data += f"{self._format_macro_param(signal['name'], KAPI_MAX_SIGNAL_NAME_LEN)}, "
+        self.data += f"{signal['direction']}, {signal['action']})\n"
+
+        for key, macro in [
+            ('condition', 'KAPI_SIGNAL_CONDITION'),
+            ('desc', 'KAPI_SIGNAL_DESC'),
+            ('error', 'KAPI_SIGNAL_ERROR'),
+            ('timing', 'KAPI_SIGNAL_TIMING'),
+            ('priority', 'KAPI_SIGNAL_PRIORITY'),
+        ]:
+            if signal.get(key):
+                # Priority field is numeric
+                if key == 'priority':
+                    self.data += f"\t\t{macro}({signal[key]})\n"
+                else:
+                    self.data += f"\t\t{macro}({self._format_macro_param(signal[key])})\n"
+
+        if signal.get('interruptible'):
+            self.data += "\t\tKAPI_SIGNAL_INTERRUPTIBLE\n"
+
+        self.data += "\tKAPI_SIGNAL_END\n"
+
+    def _process_locks(self, sections):
+        """Process lock specifications"""
+        locks = self._parse_indented_items(
+            self._get_raw_section(sections, 'lock'),
+            self._parse_lock_item
+        )
+
+        if locks:
+            self.data += f"\n\tKAPI_LOCK_COUNT({len(locks)})\n"
+
+            for idx, lock in enumerate(locks):
+                self.data += f"\n\tKAPI_LOCK({idx}, {self._format_macro_param(lock['name'])}, {lock['type']})\n"
+
+                for flag in ['acquired', 'released']:
+                    if lock.get(flag):
+                        self.data += f"\t\tKAPI_LOCK_{flag.upper()}\n"
+
+                if lock.get('desc'):
+                    self.data += f"\t\tKAPI_LOCK_DESC({self._format_macro_param(lock['desc'])})\n"
+
+                self.data += "\tKAPI_LOCK_END\n"
+
+    def _process_constraints(self, sections):
+        """Process constraint specifications"""
+        constraints = self._parse_indented_items(
+            self._get_raw_section(sections, 'constraint'),
+            self._parse_constraint_item
+        )
+
+        if constraints:
+            self.data += f"\n\tKAPI_CONSTRAINT_COUNT({len(constraints)})\n"
+
+            for idx, constraint in enumerate(constraints):
+                self.data += f"\n\tKAPI_CONSTRAINT({idx}, {self._format_macro_param(constraint['name'])},\n"
+                self.data += f"\t\t\t{self._format_macro_param(constraint['desc'])})\n"
+
+                if constraint.get('expr'):
+                    self.data += f"\t\tKAPI_CONSTRAINT_EXPR({self._format_macro_param(constraint['expr'])})\n"
+
+                self.data += "\tKAPI_CONSTRAINT_END\n"
+
+    def _process_side_effects(self, sections):
+        """Process side effect specifications"""
+        effects = self._parse_indented_items(
+            self._get_raw_section(sections, 'side-effect'),
+            self._parse_side_effect_item
+        )
+
+        if effects:
+            self.data += f"\n\tKAPI_SIDE_EFFECT_COUNT({len(effects)})\n"
+
+            for idx, effect in enumerate(effects):
+                effect_type = self._validate_effect_type(effect['type'])
+
+                self.data += f"\n\tKAPI_SIDE_EFFECT({idx}, {effect_type},\n"
+                self.data += f"\t\t\t {self._format_macro_param(effect['target'])},\n"
+                self.data += f"\t\t\t {self._format_macro_param(effect['desc'])})\n"
+
+                if effect.get('condition'):
+                    self.data += f"\t\tKAPI_EFFECT_CONDITION({self._format_macro_param(effect['condition'])})\n"
+
+                if effect.get('reversible'):
+                    self.data += "\t\tKAPI_EFFECT_REVERSIBLE\n"
+
+                self.data += "\tKAPI_SIDE_EFFECT_END\n"
+
+    def _process_state_transitions(self, sections):
+        """Process state transition specifications"""
+        transitions = self._parse_indented_items(
+            self._get_raw_section(sections, 'state-trans'),
+            self._parse_state_trans_item
+        )
+
+        if transitions:
+            self.data += f"\n\tKAPI_STATE_TRANS_COUNT({len(transitions)})\n"
+
+            for idx, trans in enumerate(transitions):
+                desc = trans['desc']
+                if trans.get('condition'):
+                    desc = trans['condition'] + (', ' + desc if desc else '')
+
+                self.data += f"\n\tKAPI_STATE_TRANS({idx}, {self._format_macro_param(trans['target'])}, "
+                self.data += f"{self._format_macro_param(trans['from'])}, {self._format_macro_param(trans['to'])},\n"
+                self.data += f"\t\t\t {self._format_macro_param(desc)})\n"
+                self.data += "\tKAPI_STATE_TRANS_END\n"
+
+    def _process_capabilities(self, sections):
+        """Process capability specifications"""
+        cap_section = self._get_raw_section(sections, 'capability')
+        if not cap_section:
+            return
+
+        lines = cap_section.strip().split('\n')
+        capabilities = []
+        i = 0
+
+        while i < len(lines):
+            line = lines[i].strip()
+            if not line or line.startswith(('allows:', 'without:', 'condition:', 'priority:')):
+                i += 1
+                continue
+
+            cap_info = {'line': line}
+
+            # Parse subfields
+            subfields, next_i = self._parse_subfields(lines, i)
+            cap_info.update(subfields)
+            capabilities.append(cap_info)
+            i = next_i
+
+        if capabilities:
+            self.data += f"\n\tKAPI_CAPABILITY_COUNT({len(capabilities)})\n"
+
+            for idx, cap in enumerate(capabilities):
+                parts = cap['line'].split(',', 2)
+                if len(parts) >= 2:
+                    cap_name = parts[0].strip()
+                    cap_type = parts[1].strip()
+                    cap_desc = parts[2].strip() if len(parts) > 2 else cap_name
+
+                    # Fix common type issues
+                    if 'BYPASS' in cap_type and cap_type != 'KAPI_CAP_BYPASS_CHECK':
+                        cap_type = 'KAPI_CAP_BYPASS_CHECK'
+
+                    self.data += f"\n\tKAPI_CAPABILITY({idx}, {cap_name}, {self._format_macro_param(cap_desc)}, {cap_type})\n"
+
+                    for key, macro in [
+                        ('allows', 'KAPI_CAP_ALLOWS'),
+                        ('without', 'KAPI_CAP_WITHOUT'),
+                        ('condition', 'KAPI_CAP_CONDITION'),
+                        ('priority', 'KAPI_CAP_PRIORITY'),
+                    ]:
+                        if cap.get(key):
+                            value = self._format_macro_param(cap[key]) if key != 'priority' else cap[key]
+                            self.data += f"\t\t{macro}({value})\n"
+
+                    self.data += "\tKAPI_CAPABILITY_END\n"
+
+    # Skip output methods for non-function types
+    def out_enum(self, fname, name, args): pass
+    def out_typedef(self, fname, name, args): pass
+    def out_struct(self, fname, name, args): pass
+    def out_doc(self, fname, name, args): pass
diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/kdoc_output.py
index b1aaa7fc36041..cc5752cd76a8d 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -124,8 +124,13 @@ class OutputFormat:
         Output warnings for identifiers that will be displayed.
         """
 
-        for log_msg in args.warnings:
-            self.config.warning(log_msg)
+        warnings = getattr(args, 'warnings', [])
+
+        for log_msg in warnings:
+            # Skip numeric warnings (line numbers) which are false positives
+            # from parameter-specific sections like "param-constraint: name, value"
+            if not isinstance(log_msg, int):
+                self.config.warning(log_msg)
 
     def check_doc(self, name, args):
         """Check if DOC should be output"""
diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 500aafc500322..ecd218e762a34 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -31,6 +31,23 @@ from kdoc.kdoc_item import KdocItem
 # Allow whitespace at end of comment start.
 doc_start = KernRe(r'^/\*\*\s*$', cache=False)
 
+# Sections that are allowed to be duplicated for API specifications
+# These represent lists of items (multiple errors, signals, etc.)
+ALLOWED_DUPLICATE_SECTIONS = {
+    'param', '@param',
+    'error', '@error',
+    'signal', '@signal',
+    'lock', '@lock',
+    'side-effect', '@side-effect',
+    'state-trans', '@state-trans',
+    'capability', '@capability',
+    'constraint', '@constraint',
+    'validation-group', '@validation-group',
+    'validation-rule', '@validation-rule',
+    'validation-flag', '@validation-flag',
+    'struct-field', '@struct-field',
+}
+
 doc_end = KernRe(r'\*/', cache=False)
 doc_com = KernRe(r'\s*\*\s*', cache=False)
 doc_com_body = KernRe(r'\s*\* ?', cache=False)
@@ -43,10 +60,71 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
 #         @{section-name}:
 # while trying to not match literal block starts like "example::"
 #
+# Base kernel-doc section names
 known_section_names = 'description|context|returns?|notes?|examples?'
-known_sections = KernRe(known_section_names, flags = re.I)
+
+# API specification section names (for KAPI spec framework)
+# Format: (base_name, has_count_variant, has_other_variants)
+# Sections with has_count_variant=True need negative lookahead in doc_sect
+# to avoid matching 'error' when 'error-count' is intended
+_kapi_base_sections = [
+    # (name, needs_lookahead, additional_variants)
+    ('api-type', False, []),
+    ('api-version', False, []),
+    ('param', True, []),  # has param-count
+    ('struct', True, ['struct-type', 'struct-field', 'struct-field-[a-z\\-]+']),
+    ('validation-group', False, []),
+    ('validation-policy', False, []),
+    ('validation-flag', False, []),
+    ('validation-rule', False, []),
+    ('error', True, ['error-code', 'error-condition']),
+    ('capability', True, []),
+    ('signal', True, []),
+    ('lock', True, []),
+    ('since', False, ['since-version']),
+    ('context-flags', False, []),
+    ('return', True, ['return-type', 'return-check', 'return-check-type',
+                      'return-success', 'return-desc']),
+    ('long-desc', False, []),
+    ('constraint', True, []),
+    ('side-effect', True, []),
+    ('state-trans', True, []),
+]
+
+def _build_kapi_patterns():
+    """Build KAPI section patterns from the base definitions."""
+    validation_parts = []  # For known_sections (simple validation)
+    parsing_parts = []     # For doc_sect (with negative lookaheads)
+
+    for name, has_count, variants in _kapi_base_sections:
+        # Add base name (with optional @ prefix)
+        validation_parts.append(f'@?{name}')
+        if has_count:
+            # Need negative lookahead to not match 'name-count' or 'name-*'
+            parsing_parts.append(f'@?{name}(?!-)')
+            validation_parts.append(f'@?{name}-count')
+            parsing_parts.append(f'@?{name}-count')
+        else:
+            parsing_parts.append(f'@?{name}')
+
+        # Add variants
+        for variant in variants:
+            validation_parts.append(f'@?{variant}')
+            parsing_parts.append(f'@?{variant}')
+
+    # Add catch-all for kapi-* extensions
+    validation_parts.append(r'@?kapi-.*')
+    parsing_parts.append(r'@?kapi-.*')
+
+    return '|'.join(validation_parts), '|'.join(parsing_parts)
+
+_kapi_validation_pattern, _kapi_parsing_pattern = _build_kapi_patterns()
+
+known_sections = KernRe(known_section_names + '|' + _kapi_validation_pattern,
+                        flags=re.I)
 doc_sect = doc_com + \
-    KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
+    KernRe(r'\s*(@[.\w\-]+|@\.\.\.|' + known_section_names + '|' +
+           _kapi_parsing_pattern + r')\s*:([^:].*)?$',
            flags=re.I, cache=False)
 
 doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
@@ -342,7 +420,9 @@ class KernelEntry:
         else:
             if name in self.sections and self.sections[name] != "":
                 # Only warn on user-specified duplicate section names
-                if name != SECTION_DEFAULT:
+                # Skip warning for sections that are expected to have duplicates
+                # (like error, param, signal, etc. for API specifications)
+                if name != SECTION_DEFAULT and name not in ALLOWED_DUPLICATE_SECTIONS:
                     self.emit_msg(self.new_start_line,
                                   f"duplicate section name '{name}'")
                 # Treat as a new paragraph - add a blank line
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 01/15] kernel/api: introduce kernel API specification framework
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>

Add a framework for formally documenting kernel APIs with inline
specifications. This framework provides:

- Structured API documentation with parameter specifications, return
  values, error conditions, and execution context requirements
- Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
- Export of specifications via debugfs for tooling integration
- Support for both internal kernel APIs and system calls

The framework stores specifications in a dedicated ELF section and
provides infrastructure for:
- Compile-time validation of specifications
- Runtime querying of API documentation
- Machine-readable export formats
- Integration with existing SYSCALL_DEFINE macros

This commit introduces the core infrastructure without modifying any
existing APIs. Subsequent patches will add specifications to individual
subsystems.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .gitignore                                  |    1 +
 Documentation/dev-tools/kernel-api-spec.rst |  507 ++++++
 MAINTAINERS                                 |    9 +
 include/asm-generic/vmlinux.lds.h           |   28 +
 include/linux/kernel_api_spec.h             | 1597 +++++++++++++++++++
 include/linux/syscall_api_spec.h            |  198 +++
 include/linux/syscalls.h                    |   38 +
 init/Kconfig                                |    2 +
 kernel/Makefile                             |    3 +
 kernel/api/Kconfig                          |   35 +
 kernel/api/Makefile                         |   29 +
 kernel/api/kernel_api_spec.c                | 1185 ++++++++++++++
 scripts/generate_api_specs.sh               |   18 +
 13 files changed, 3650 insertions(+)
 create mode 100644 Documentation/dev-tools/kernel-api-spec.rst
 create mode 100644 include/linux/kernel_api_spec.h
 create mode 100644 include/linux/syscall_api_spec.h
 create mode 100644 kernel/api/Kconfig
 create mode 100644 kernel/api/Makefile
 create mode 100644 kernel/api/kernel_api_spec.c
 create mode 100755 scripts/generate_api_specs.sh

diff --git a/.gitignore b/.gitignore
index 3a7241c941f5e..7130001e444f1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,6 +12,7 @@
 #
 .*
 *.a
+*.apispec.h
 *.asn1.[ch]
 *.bin
 *.bz2
diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/dev-tools/kernel-api-spec.rst
new file mode 100644
index 0000000000000..3a63f6711e27b
--- /dev/null
+++ b/Documentation/dev-tools/kernel-api-spec.rst
@@ -0,0 +1,507 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Kernel API Specification Framework
+======================================
+
+:Author: Sasha Levin <sashal@kernel.org>
+:Date: June 2025
+
+.. contents:: Table of Contents
+   :depth: 3
+   :local:
+
+Introduction
+============
+
+The Kernel API Specification Framework (KAPI) provides a comprehensive system for
+formally documenting, validating, and introspecting kernel APIs. This framework
+addresses the long-standing challenge of maintaining accurate, machine-readable
+documentation for the thousands of internal kernel APIs and system calls.
+
+Purpose and Goals
+-----------------
+
+The framework aims to:
+
+1. **Improve API Documentation**: Provide structured, inline documentation that
+   lives alongside the code and is maintained as part of the development process.
+
+2. **Enable Runtime Validation**: Optionally validate API usage at runtime to catch
+   common programming errors during development and testing.
+
+3. **Support Tooling**: Export API specifications in machine-readable formats for
+   use by static analyzers, documentation generators, and development tools.
+
+4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
+   for debugging and introspection.
+
+5. **Formalize Contracts**: Explicitly document API contracts including parameter
+   constraints, execution contexts, locking requirements, and side effects.
+
+Architecture Overview
+=====================
+
+Components
+----------
+
+The framework consists of several key components:
+
+1. **Core Framework** (``kernel/api/kernel_api_spec.c``)
+
+   - API specification registration and storage
+   - Runtime validation engine
+   - Specification lookup and querying
+
+2. **DebugFS Interface** (``kernel/api/kapi_debugfs.c``)
+
+   - Runtime introspection via ``/sys/kernel/debug/kapi/``
+   - JSON and XML export formats
+   - Per-API detailed information
+
+3. **IOCTL Support** (``kernel/api/ioctl_validation.c``)
+
+   - Extended framework for IOCTL specifications
+   - Automatic validation wrappers
+   - Structure field validation
+
+4. **Specification Macros** (``include/linux/kernel_api_spec.h``)
+
+   - Declarative macros for API documentation
+   - Type-safe parameter specifications
+   - Context and constraint definitions
+
+Data Model
+----------
+
+The framework uses a hierarchical data model::
+
+    kernel_api_spec
+    ├── Basic Information
+    │   ├── name (API function name)
+    │   ├── version (specification version)
+    │   ├── description (human-readable description)
+    │   └── kernel_version (when API was introduced)
+    │
+    ├── Parameters (up to 16)
+    │   └── kapi_param_spec
+    │       ├── name
+    │       ├── type (int, pointer, string, etc.)
+    │       ├── direction (in, out, inout)
+    │       ├── constraints (range, mask, enum values)
+    │       └── validation rules
+    │
+    ├── Return Value
+    │   └── kapi_return_spec
+    │       ├── type
+    │       ├── success conditions
+    │       └── validation rules
+    │
+    ├── Error Conditions (up to 32)
+    │   └── kapi_error_spec
+    │       ├── error code
+    │       ├── condition description
+    │       └── recovery advice
+    │
+    ├── Execution Context
+    │   ├── allowed contexts (process, interrupt, etc.)
+    │   ├── locking requirements
+    │   └── preemption/interrupt state
+    │
+    └── Side Effects
+        ├── memory allocation
+        ├── state changes
+        └── signal handling
+
+Usage Guide
+===========
+
+Basic API Specification
+-----------------------
+
+To document a kernel API, use the specification macros in the implementation file:
+
+.. code-block:: c
+
+    #include <linux/kernel_api_spec.h>
+
+    KAPI_DEFINE_SPEC(kmalloc_spec, kmalloc, "3.0")
+    KAPI_DESCRIPTION("Allocate kernel memory")
+    KAPI_PARAM(0, size, KAPI_TYPE_SIZE_T, KAPI_DIR_IN,
+               "Number of bytes to allocate")
+    KAPI_PARAM_RANGE(0, 0, KMALLOC_MAX_SIZE)
+    KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+               "Allocation flags (GFP_*)")
+    KAPI_PARAM_MASK(1, __GFP_BITS_MASK)
+    KAPI_RETURN(KAPI_TYPE_POINTER, "Pointer to allocated memory or NULL")
+    KAPI_ERROR(ENOMEM, "Out of memory")
+    KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SOFTIRQ | KAPI_CTX_HARDIRQ)
+    KAPI_SIDE_EFFECT("Allocates memory from kernel heap")
+    KAPI_LOCK_NOT_REQUIRED("Any lock")
+    KAPI_END_SPEC
+
+    void *kmalloc(size_t size, gfp_t flags)
+    {
+        /* Implementation */
+    }
+
+System Call Specification
+-------------------------
+
+System calls use specialized macros:
+
+.. code-block:: c
+
+    KAPI_DEFINE_SYSCALL_SPEC(open_spec, open, "1.0")
+    KAPI_DESCRIPTION("Open a file")
+    KAPI_PARAM(0, pathname, KAPI_TYPE_USER_STRING, KAPI_DIR_IN,
+               "Path to file")
+    KAPI_PARAM_PATH(0, PATH_MAX)
+    KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+               "Open flags (O_*)")
+    KAPI_PARAM(2, mode, KAPI_TYPE_MODE_T, KAPI_DIR_IN,
+               "File permissions (if creating)")
+    KAPI_RETURN(KAPI_TYPE_INT, "File descriptor or -1")
+    KAPI_ERROR(EACCES, "Permission denied")
+    KAPI_ERROR(ENOENT, "File does not exist")
+    KAPI_ERROR(EMFILE, "Too many open files")
+    KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+    KAPI_SIGNAL(EINTR, "Open can be interrupted by signal")
+    KAPI_END_SYSCALL_SPEC
+
+IOCTL Specification
+-------------------
+
+IOCTLs have extended support for structure validation:
+
+.. code-block:: c
+
+    KAPI_DEFINE_IOCTL_SPEC(vidioc_querycap_spec, VIDIOC_QUERYCAP,
+                           "VIDIOC_QUERYCAP",
+                           sizeof(struct v4l2_capability),
+                           sizeof(struct v4l2_capability),
+                           "video_fops")
+    KAPI_DESCRIPTION("Query device capabilities")
+    KAPI_IOCTL_FIELD(driver, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+                     "Driver name", 16)
+    KAPI_IOCTL_FIELD(card, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+                     "Device name", 32)
+    KAPI_IOCTL_FIELD(version, KAPI_TYPE_U32, KAPI_DIR_OUT,
+                     "Driver version")
+    KAPI_IOCTL_FIELD(capabilities, KAPI_TYPE_FLAGS, KAPI_DIR_OUT,
+                     "Device capabilities")
+    KAPI_END_IOCTL_SPEC
+
+Runtime Validation
+==================
+
+Enabling Validation
+-------------------
+
+Runtime validation is controlled by kernel configuration:
+
+1. Enable ``CONFIG_KAPI_SPEC`` to build the framework
+2. Enable ``CONFIG_KAPI_RUNTIME_CHECKS`` for runtime validation
+3. Optionally enable ``CONFIG_KAPI_SPEC_DEBUGFS`` for debugfs interface
+
+Validation Modes
+----------------
+
+The framework supports several validation modes:
+
+.. code-block:: c
+
+    /* Enable validation for specific API */
+    kapi_enable_validation("kmalloc");
+
+    /* Enable validation for all APIs */
+    kapi_enable_all_validation();
+
+    /* Set validation level */
+    kapi_set_validation_level(KAPI_VALIDATE_FULL);
+
+Validation Levels:
+
+- ``KAPI_VALIDATE_NONE``: No validation
+- ``KAPI_VALIDATE_BASIC``: Type and NULL checks only
+- ``KAPI_VALIDATE_NORMAL``: Basic + range and constraint checks
+- ``KAPI_VALIDATE_FULL``: All checks including custom validators
+
+Custom Validators
+-----------------
+
+APIs can register custom validation functions:
+
+.. code-block:: c
+
+    static bool validate_buffer_size(const struct kapi_param_spec *spec,
+                                     const void *value, void *context)
+    {
+        size_t size = *(size_t *)value;
+        struct my_context *ctx = context;
+
+        return size > 0 && size <= ctx->max_buffer_size;
+    }
+
+    KAPI_PARAM_CUSTOM_VALIDATOR(0, validate_buffer_size)
+
+DebugFS Interface
+=================
+
+The debugfs interface provides runtime access to API specifications:
+
+Directory Structure
+-------------------
+
+::
+
+    /sys/kernel/debug/kapi/
+    ├── apis/                    # All registered APIs
+    │   ├── kmalloc/
+    │   │   ├── specification   # Human-readable spec
+    │   │   ├── json           # JSON format
+    │   │   └── xml            # XML format
+    │   └── open/
+    │       └── ...
+    ├── summary                  # Overview of all APIs
+    ├── validation/              # Validation controls
+    │   ├── enabled             # Global enable/disable
+    │   ├── level               # Validation level
+    │   └── stats               # Validation statistics
+    └── export/                  # Bulk export options
+        ├── all.json            # All specs in JSON
+        └── all.xml             # All specs in XML
+
+Usage Examples
+--------------
+
+Query specific API::
+
+    $ cat /sys/kernel/debug/kapi/apis/kmalloc/specification
+    API: kmalloc
+    Version: 3.0
+    Description: Allocate kernel memory
+
+    Parameters:
+      [0] size (size_t, in): Number of bytes to allocate
+          Range: 0 - 4194304
+      [1] flags (flags, in): Allocation flags (GFP_*)
+          Mask: 0x1ffffff
+
+    Returns: pointer - Pointer to allocated memory or NULL
+
+    Errors:
+      ENOMEM: Out of memory
+
+    Context: process, softirq, hardirq
+
+    Side Effects:
+      - Allocates memory from kernel heap
+
+Export all specifications::
+
+    $ cat /sys/kernel/debug/kapi/export/all.json > kernel-apis.json
+
+Enable validation for specific API::
+
+    $ echo 1 > /sys/kernel/debug/kapi/apis/kmalloc/validate
+
+Performance Considerations
+==========================
+
+Memory Overhead
+---------------
+
+Each API specification consumes approximately 2-4KB of memory. With thousands
+of kernel APIs, this can add up to several megabytes. Consider:
+
+1. Building with ``CONFIG_KAPI_SPEC=n`` for production kernels
+2. Using ``__init`` annotations for APIs only used during boot
+3. Implementing lazy loading for rarely used specifications
+
+Runtime Overhead
+----------------
+
+When ``CONFIG_KAPI_RUNTIME_CHECKS`` is enabled:
+
+- Each validated API call adds 50-200ns overhead
+- Complex validations (custom validators) may add more
+- Use validation only in development/testing kernels
+
+Optimization Strategies
+-----------------------
+
+1. **Compile-time optimization**: When validation is disabled, all
+   validation code is optimized away by the compiler.
+
+2. **Selective validation**: Enable validation only for specific APIs
+   or subsystems under test.
+
+3. **Caching**: The framework caches validation results for repeated
+   calls with identical parameters.
+
+Documentation Generation
+------------------------
+
+The framework exports specifications via debugfs that can be used
+to generate documentation. Tools for automatic documentation generation
+from specifications are planned for future development.
+
+IDE Integration
+---------------
+
+Modern IDEs can use the JSON export for:
+
+- Parameter hints
+- Type checking
+- Context validation
+- Error code documentation
+
+Testing Framework
+-----------------
+
+The framework includes test helpers::
+
+    #ifdef CONFIG_KAPI_TESTING
+    /* Verify API behaves according to specification */
+    kapi_test_api("kmalloc", test_cases);
+    #endif
+
+Best Practices
+==============
+
+Writing Specifications
+----------------------
+
+1. **Be Comprehensive**: Document all parameters, errors, and side effects
+2. **Keep Updated**: Update specs when API behavior changes
+3. **Use Examples**: Include usage examples in descriptions
+4. **Validate Constraints**: Define realistic constraints for parameters
+5. **Document Context**: Clearly specify allowed execution contexts
+
+Maintenance
+-----------
+
+1. **Version Specifications**: Increment version when API changes
+2. **Deprecation**: Mark deprecated APIs and suggest replacements
+3. **Cross-reference**: Link related APIs in descriptions
+4. **Test Specifications**: Verify specs match implementation
+
+Common Patterns
+---------------
+
+**Optional Parameters**::
+
+    KAPI_PARAM(2, optional_arg, KAPI_TYPE_POINTER, KAPI_DIR_IN,
+               "Optional argument (may be NULL)")
+    KAPI_PARAM_OPTIONAL(2)
+
+**Variable Arguments**::
+
+    KAPI_PARAM(1, fmt, KAPI_TYPE_FORMAT_STRING, KAPI_DIR_IN,
+               "Printf-style format string")
+    KAPI_PARAM_VARIADIC(2, "Format arguments")
+
+**Callback Functions**::
+
+    KAPI_PARAM(1, callback, KAPI_TYPE_FUNCTION_PTR, KAPI_DIR_IN,
+               "Callback function")
+    KAPI_PARAM_CALLBACK(1, "int (*)(void *data)", "data")
+
+Troubleshooting
+===============
+
+Common Issues
+-------------
+
+**Specification Not Found**::
+
+    kernel: KAPI: Specification for 'my_api' not found
+
+    Solution: Ensure KAPI_DEFINE_SPEC is in the same translation unit
+    as the function implementation.
+
+**Validation Failures**::
+
+    kernel: KAPI: Validation failed for kmalloc parameter 'size':
+            value 5242880 exceeds maximum 4194304
+
+    Solution: Check parameter constraints or adjust specification if
+    the constraint is incorrect.
+
+**Build Errors**::
+
+    error: 'KAPI_TYPE_UNKNOWN' undeclared
+
+    Solution: Include <linux/kernel_api_spec.h> and ensure
+    CONFIG_KAPI_SPEC is enabled.
+
+Debug Options
+-------------
+
+Enable verbose debugging::
+
+    echo 8 > /proc/sys/kernel/printk
+    echo 1 > /sys/kernel/debug/kapi/debug/verbose
+
+Future Directions
+=================
+
+Planned Features
+----------------
+
+1. **Automatic Extraction**: Tool to extract specifications from existing
+   kernel-doc comments
+
+2. **Contract Verification**: Static analysis to verify implementation
+   matches specification
+
+3. **Performance Profiling**: Measure actual API performance against
+   documented expectations
+
+4. **Fuzzing Integration**: Use specifications to guide intelligent
+   fuzzing of kernel APIs
+
+5. **Version Compatibility**: Track API changes across kernel versions
+
+Research Areas
+--------------
+
+1. **Formal Verification**: Use specifications for mathematical proofs
+   of correctness
+
+2. **Runtime Monitoring**: Detect specification violations in production
+   with minimal overhead
+
+3. **API Evolution**: Analyze how kernel APIs change over time
+
+4. **Security Applications**: Use specifications for security policy
+   enforcement
+
+Contributing
+============
+
+Submitting Specifications
+-------------------------
+
+1. Add specifications to the same file as the API implementation
+2. Follow existing patterns and naming conventions
+3. Test with CONFIG_KAPI_RUNTIME_CHECKS enabled
+4. Verify debugfs output is correct
+5. Run scripts/checkpatch.pl on your changes
+
+Review Criteria
+---------------
+
+Specifications will be reviewed for:
+
+1. **Completeness**: All parameters and errors documented
+2. **Accuracy**: Specification matches implementation
+3. **Clarity**: Descriptions are clear and helpful
+4. **Consistency**: Follows framework conventions
+5. **Performance**: No unnecessary runtime overhead
+
+Contact
+-------
+
+- Maintainer: Sasha Levin <sashal@kernel.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index 5b11839cba9de..14cd8b3c95e40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13647,6 +13647,15 @@ W:	https://linuxtv.org
 T:	git git://linuxtv.org/media.git
 F:	drivers/media/radio/radio-keene*
 
+KERNEL API SPECIFICATION FRAMEWORK (KAPI)
+M:	Sasha Levin <sashal@kernel.org>
+L:	linux-api@vger.kernel.org
+S:	Maintained
+F:	Documentation/admin-guide/kernel-api-spec.rst
+F:	include/linux/kernel_api_spec.h
+F:	kernel/api/
+F:	scripts/extract-kapi-spec.sh
+
 KERNEL AUTOMOUNTER
 M:	Ian Kent <raven@themaw.net>
 L:	autofs@vger.kernel.org
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8ca130af301fc..658a14f8bf309 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -296,6 +296,33 @@
 #define TRACE_SYSCALLS()
 #endif
 
+#ifdef CONFIG_KAPI_SPEC
+/*
+ * KAPI_SPECS - Include kernel API specifications in current section
+ *
+ * The .kapi_specs input section has 32-byte alignment requirement from
+ * the compiler, so we must align to 32 bytes before setting the start
+ * symbol to avoid padding between the symbol and actual data.
+ */
+#define KAPI_SPECS()				\
+	. = ALIGN(32);				\
+	__start_kapi_specs = .;			\
+	KEEP(*(.kapi_specs))			\
+	__stop_kapi_specs = .;
+
+/* For placing KAPI specs in a dedicated section */
+#define KAPI_SPECS_SECTION()			\
+	.kapi_specs : AT(ADDR(.kapi_specs) - LOAD_OFFSET) {	\
+		. = ALIGN(32);			\
+		__start_kapi_specs = .;		\
+		KEEP(*(.kapi_specs))		\
+		__stop_kapi_specs = .;		\
+	}
+#else
+#define KAPI_SPECS()
+#define KAPI_SPECS_SECTION()
+#endif
+
 #ifdef CONFIG_BPF_EVENTS
 #define BPF_RAW_TP() STRUCT_ALIGN();				\
 	BOUNDED_SECTION_BY(__bpf_raw_tp_map, __bpf_raw_tp)
@@ -485,6 +512,7 @@
 		. = ALIGN(8);						\
 		BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \
 		*(__tracepoints_strings)/* Tracepoints: strings */	\
+		KAPI_SPECS()						\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
new file mode 100644
index 0000000000000..b3460d156602e
--- /dev/null
+++ b/include/linux/kernel_api_spec.h
@@ -0,0 +1,1597 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * kernel_api_spec.h - Kernel API Formal Specification Framework
+ *
+ * This framework provides structures and macros to formally specify kernel APIs
+ * in both human and machine-readable formats. It supports comprehensive documentation
+ * of function signatures, parameters, return values, error conditions, and constraints.
+ */
+
+#ifndef _LINUX_KERNEL_API_SPEC_H
+#define _LINUX_KERNEL_API_SPEC_H
+
+#include <linux/types.h>
+#include <linux/stringify.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+
+struct sigaction;
+
+#define KAPI_MAX_PARAMS		16
+#define KAPI_MAX_ERRORS		32
+#define KAPI_MAX_CONSTRAINTS	32
+#define KAPI_MAX_SIGNALS	32
+#define KAPI_MAX_NAME_LEN	128
+#define KAPI_MAX_DESC_LEN	512
+#define KAPI_MAX_CAPABILITIES	8
+#define KAPI_MAX_SOCKET_STATES	16
+#define KAPI_MAX_PROTOCOL_BEHAVIORS	8
+#define KAPI_MAX_NET_ERRORS	16
+#define KAPI_MAX_SOCKOPTS	16
+#define KAPI_MAX_ADDR_FAMILIES	8
+
+/* Magic numbers for section validation (ASCII mnemonics) */
+#define KAPI_MAGIC_PARAMS	0x4B415031	/* 'KAP1' */
+#define KAPI_MAGIC_RETURN	0x4B415232	/* 'KAR2' */
+#define KAPI_MAGIC_ERRORS	0x4B414533	/* 'KAE3' */
+#define KAPI_MAGIC_LOCKS	0x4B414C34	/* 'KAL4' */
+#define KAPI_MAGIC_CONSTRAINTS	0x4B414335	/* 'KAC5' */
+#define KAPI_MAGIC_INFO		0x4B414936	/* 'KAI6' */
+#define KAPI_MAGIC_SIGNALS	0x4B415337	/* 'KAS7' */
+#define KAPI_MAGIC_SIGMASK	0x4B414D38	/* 'KAM8' */
+#define KAPI_MAGIC_STRUCTS	0x4B415439	/* 'KAT9' */
+#define KAPI_MAGIC_EFFECTS	0x4B414641	/* 'KAFA' */
+#define KAPI_MAGIC_TRANS	0x4B415442	/* 'KATB' */
+#define KAPI_MAGIC_CAPS		0x4B414343	/* 'KACC' */
+
+/**
+ * enum kapi_param_type - Parameter type classification
+ * @KAPI_TYPE_VOID: void type
+ * @KAPI_TYPE_INT: Integer types (int, long, etc.)
+ * @KAPI_TYPE_UINT: Unsigned integer types
+ * @KAPI_TYPE_PTR: Pointer types
+ * @KAPI_TYPE_STRUCT: Structure types
+ * @KAPI_TYPE_UNION: Union types
+ * @KAPI_TYPE_ENUM: Enumeration types
+ * @KAPI_TYPE_FUNC_PTR: Function pointer types
+ * @KAPI_TYPE_ARRAY: Array types
+ * @KAPI_TYPE_FD: File descriptor - validated in process context
+ * @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size
+ * @KAPI_TYPE_PATH: Pathname - validated for access and path limits
+ * @KAPI_TYPE_CUSTOM: Custom/complex types
+ */
+enum kapi_param_type {
+	KAPI_TYPE_VOID = 0,
+	KAPI_TYPE_INT,
+	KAPI_TYPE_UINT,
+	KAPI_TYPE_PTR,
+	KAPI_TYPE_STRUCT,
+	KAPI_TYPE_UNION,
+	KAPI_TYPE_ENUM,
+	KAPI_TYPE_FUNC_PTR,
+	KAPI_TYPE_ARRAY,
+	KAPI_TYPE_FD,		/* File descriptor - validated in process context */
+	KAPI_TYPE_USER_PTR,	/* User space pointer - validated for access and size */
+	KAPI_TYPE_PATH,		/* Pathname - validated for access and path limits */
+	KAPI_TYPE_CUSTOM,
+};
+
+/**
+ * enum kapi_param_flags - Parameter attribute flags
+ * @KAPI_PARAM_IN: Input parameter
+ * @KAPI_PARAM_OUT: Output parameter
+ * @KAPI_PARAM_INOUT: Input/output parameter
+ * @KAPI_PARAM_OPTIONAL: Optional parameter (can be NULL)
+ * @KAPI_PARAM_CONST: Const qualified parameter
+ * @KAPI_PARAM_VOLATILE: Volatile qualified parameter
+ * @KAPI_PARAM_USER: User space pointer
+ * @KAPI_PARAM_DMA: DMA-capable memory required
+ * @KAPI_PARAM_ALIGNED: Alignment requirements
+ */
+enum kapi_param_flags {
+	KAPI_PARAM_IN		= (1 << 0),
+	KAPI_PARAM_OUT		= (1 << 1),
+	KAPI_PARAM_INOUT	= (1 << 2),
+	KAPI_PARAM_OPTIONAL	= (1 << 3),
+	KAPI_PARAM_CONST	= (1 << 4),
+	KAPI_PARAM_VOLATILE	= (1 << 5),
+	KAPI_PARAM_USER		= (1 << 6),
+	KAPI_PARAM_DMA		= (1 << 7),
+	KAPI_PARAM_ALIGNED	= (1 << 8),
+};
+
+/**
+ * enum kapi_context_flags - Function execution context flags
+ * @KAPI_CTX_PROCESS: Can be called from process context
+ * @KAPI_CTX_SOFTIRQ: Can be called from softirq context
+ * @KAPI_CTX_HARDIRQ: Can be called from hardirq context
+ * @KAPI_CTX_NMI: Can be called from NMI context
+ * @KAPI_CTX_ATOMIC: Must be called in atomic context
+ * @KAPI_CTX_SLEEPABLE: May sleep
+ * @KAPI_CTX_PREEMPT_DISABLED: Requires preemption disabled
+ * @KAPI_CTX_IRQ_DISABLED: Requires interrupts disabled
+ */
+enum kapi_context_flags {
+	KAPI_CTX_PROCESS	= (1 << 0),
+	KAPI_CTX_SOFTIRQ	= (1 << 1),
+	KAPI_CTX_HARDIRQ	= (1 << 2),
+	KAPI_CTX_NMI		= (1 << 3),
+	KAPI_CTX_ATOMIC		= (1 << 4),
+	KAPI_CTX_SLEEPABLE	= (1 << 5),
+	KAPI_CTX_PREEMPT_DISABLED = (1 << 6),
+	KAPI_CTX_IRQ_DISABLED	= (1 << 7),
+};
+
+/**
+ * enum kapi_lock_type - Lock types used/required by the function
+ * @KAPI_LOCK_NONE: No locking requirements
+ * @KAPI_LOCK_MUTEX: Mutex lock
+ * @KAPI_LOCK_SPINLOCK: Spinlock
+ * @KAPI_LOCK_RWLOCK: Read-write lock
+ * @KAPI_LOCK_SEQLOCK: Sequence lock
+ * @KAPI_LOCK_RCU: RCU lock
+ * @KAPI_LOCK_SEMAPHORE: Semaphore
+ * @KAPI_LOCK_CUSTOM: Custom locking mechanism
+ */
+enum kapi_lock_type {
+	KAPI_LOCK_NONE = 0,
+	KAPI_LOCK_MUTEX,
+	KAPI_LOCK_SPINLOCK,
+	KAPI_LOCK_RWLOCK,
+	KAPI_LOCK_SEQLOCK,
+	KAPI_LOCK_RCU,
+	KAPI_LOCK_SEMAPHORE,
+	KAPI_LOCK_CUSTOM,
+};
+
+/**
+ * enum kapi_constraint_type - Types of parameter constraints
+ * @KAPI_CONSTRAINT_NONE: No constraint
+ * @KAPI_CONSTRAINT_RANGE: Numeric range constraint
+ * @KAPI_CONSTRAINT_MASK: Bitmask constraint
+ * @KAPI_CONSTRAINT_ENUM: Enumerated values constraint
+ * @KAPI_CONSTRAINT_ALIGNMENT: Alignment constraint (must be aligned to specified boundary)
+ * @KAPI_CONSTRAINT_POWER_OF_TWO: Value must be a power of two
+ * @KAPI_CONSTRAINT_PAGE_ALIGNED: Value must be page-aligned
+ * @KAPI_CONSTRAINT_NONZERO: Value must be non-zero
+ * @KAPI_CONSTRAINT_USER_STRING: Userspace null-terminated string with length range
+ * @KAPI_CONSTRAINT_USER_PATH: Userspace pathname string (validated for accessibility and PATH_MAX)
+ * @KAPI_CONSTRAINT_USER_PTR: Userspace pointer (validated for accessibility and size)
+ * @KAPI_CONSTRAINT_CUSTOM: Custom validation function
+ */
+enum kapi_constraint_type {
+	KAPI_CONSTRAINT_NONE = 0,
+	KAPI_CONSTRAINT_RANGE,
+	KAPI_CONSTRAINT_MASK,
+	KAPI_CONSTRAINT_ENUM,
+	KAPI_CONSTRAINT_ALIGNMENT,
+	KAPI_CONSTRAINT_POWER_OF_TWO,
+	KAPI_CONSTRAINT_PAGE_ALIGNED,
+	KAPI_CONSTRAINT_NONZERO,
+	KAPI_CONSTRAINT_USER_STRING,
+	KAPI_CONSTRAINT_USER_PATH,
+	KAPI_CONSTRAINT_USER_PTR,
+	KAPI_CONSTRAINT_CUSTOM,
+};
+
+/**
+ * struct kapi_param_spec - Parameter specification
+ * @name: Parameter name
+ * @type_name: Type name as string
+ * @type: Parameter type classification
+ * @flags: Parameter attribute flags
+ * @size: Size in bytes (for arrays/buffers)
+ * @alignment: Required alignment
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag parameters)
+ * @enum_values: Array of valid enumerated values
+ * @enum_count: Number of valid enumerated values
+ * @constraint_type: Type of constraint applied
+ * @validate: Custom validation function
+ * @description: Human-readable description
+ * @constraints: Additional constraints description
+ * @size_param_idx: Index of parameter that determines size (-1 if fixed size)
+ * @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct))
+ */
+struct kapi_param_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	char type_name[KAPI_MAX_NAME_LEN];
+	enum kapi_param_type type;
+	u32 flags;
+	size_t size;
+	size_t alignment;
+	s64 min_value;
+	s64 max_value;
+	u64 valid_mask;
+	const s64 *enum_values;
+	u32 enum_count;
+	enum kapi_constraint_type constraint_type;
+	bool (*validate)(s64 value);
+	char description[KAPI_MAX_DESC_LEN];
+	char constraints[KAPI_MAX_DESC_LEN];
+	int size_param_idx;	/* Index of param that determines size, -1 if N/A */
+	size_t size_multiplier;	/* Size per unit (e.g., sizeof(struct epoll_event)) */
+} __attribute__((packed));
+
+/**
+ * struct kapi_error_spec - Error condition specification
+ * @error_code: Error code value
+ * @name: Error code name (e.g., "EINVAL")
+ * @condition: Condition that triggers this error
+ * @description: Detailed error description
+ */
+struct kapi_error_spec {
+	int error_code;
+	char name[KAPI_MAX_NAME_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_return_check_type - Return value check types
+ * @KAPI_RETURN_EXACT: Success is an exact value
+ * @KAPI_RETURN_RANGE: Success is within a range
+ * @KAPI_RETURN_ERROR_CHECK: Success is when NOT in error list
+ * @KAPI_RETURN_FD: Return value is a file descriptor (>= 0 is success)
+ * @KAPI_RETURN_CUSTOM: Custom validation function
+ * @KAPI_RETURN_NO_RETURN: Function does not return (e.g., exec on success)
+ */
+enum kapi_return_check_type {
+	KAPI_RETURN_EXACT,
+	KAPI_RETURN_RANGE,
+	KAPI_RETURN_ERROR_CHECK,
+	KAPI_RETURN_FD,
+	KAPI_RETURN_CUSTOM,
+	KAPI_RETURN_NO_RETURN,
+};
+
+/**
+ * struct kapi_return_spec - Return value specification
+ * @type_name: Return type name
+ * @type: Return type classification
+ * @check_type: Type of success check to perform
+ * @success_value: Exact value indicating success (for EXACT)
+ * @success_min: Minimum success value (for RANGE)
+ * @success_max: Maximum success value (for RANGE)
+ * @error_values: Array of error values (for ERROR_CHECK)
+ * @error_count: Number of error values
+ * @is_success: Custom function to check success
+ * @description: Return value description
+ */
+struct kapi_return_spec {
+	char type_name[KAPI_MAX_NAME_LEN];
+	enum kapi_param_type type;
+	enum kapi_return_check_type check_type;
+	s64 success_value;
+	s64 success_min;
+	s64 success_max;
+	const s64 *error_values;
+	u32 error_count;
+	bool (*is_success)(s64 retval);
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_lock_scope - Lock acquisition/release scope
+ * @KAPI_LOCK_INTERNAL: Lock is acquired and released within the function (common case)
+ * @KAPI_LOCK_ACQUIRES: Function acquires lock but does not release it
+ * @KAPI_LOCK_RELEASES: Function releases lock (must be held on entry)
+ * @KAPI_LOCK_CALLER_HELD: Lock must be held by caller throughout the call
+ */
+enum kapi_lock_scope {
+	KAPI_LOCK_INTERNAL = 0,
+	KAPI_LOCK_ACQUIRES,
+	KAPI_LOCK_RELEASES,
+	KAPI_LOCK_CALLER_HELD,
+};
+
+/**
+ * struct kapi_lock_spec - Lock requirement specification
+ * @lock_name: Name of the lock
+ * @lock_type: Type of lock
+ * @scope: Lock scope (internal, acquires, releases, or caller-held)
+ * @description: Additional lock requirements
+ */
+struct kapi_lock_spec {
+	char lock_name[KAPI_MAX_NAME_LEN];
+	enum kapi_lock_type lock_type;
+	enum kapi_lock_scope scope;
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_constraint_spec - Additional constraint specification
+ * @name: Constraint name
+ * @description: Constraint description
+ * @expression: Formal expression (if applicable)
+ */
+struct kapi_constraint_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+	char expression[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_signal_direction - Signal flow direction
+ * @KAPI_SIGNAL_RECEIVE: Function may receive this signal
+ * @KAPI_SIGNAL_SEND: Function may send this signal
+ * @KAPI_SIGNAL_HANDLE: Function handles this signal specially
+ * @KAPI_SIGNAL_BLOCK: Function blocks this signal
+ * @KAPI_SIGNAL_IGNORE: Function ignores this signal
+ */
+enum kapi_signal_direction {
+	KAPI_SIGNAL_RECEIVE	= (1 << 0),
+	KAPI_SIGNAL_SEND	= (1 << 1),
+	KAPI_SIGNAL_HANDLE	= (1 << 2),
+	KAPI_SIGNAL_BLOCK	= (1 << 3),
+	KAPI_SIGNAL_IGNORE	= (1 << 4),
+};
+
+/**
+ * enum kapi_signal_action - What the function does with the signal
+ * @KAPI_SIGNAL_ACTION_DEFAULT: Default signal action applies
+ * @KAPI_SIGNAL_ACTION_TERMINATE: Causes termination
+ * @KAPI_SIGNAL_ACTION_COREDUMP: Causes termination with core dump
+ * @KAPI_SIGNAL_ACTION_STOP: Stops the process
+ * @KAPI_SIGNAL_ACTION_CONTINUE: Continues a stopped process
+ * @KAPI_SIGNAL_ACTION_CUSTOM: Custom handling described in notes
+ * @KAPI_SIGNAL_ACTION_RETURN: Returns from syscall with EINTR
+ * @KAPI_SIGNAL_ACTION_RESTART: Restarts the syscall
+ * @KAPI_SIGNAL_ACTION_QUEUE: Queues the signal for later delivery
+ * @KAPI_SIGNAL_ACTION_DISCARD: Discards the signal
+ * @KAPI_SIGNAL_ACTION_TRANSFORM: Transforms to another signal
+ */
+enum kapi_signal_action {
+	KAPI_SIGNAL_ACTION_DEFAULT = 0,
+	KAPI_SIGNAL_ACTION_TERMINATE,
+	KAPI_SIGNAL_ACTION_COREDUMP,
+	KAPI_SIGNAL_ACTION_STOP,
+	KAPI_SIGNAL_ACTION_CONTINUE,
+	KAPI_SIGNAL_ACTION_CUSTOM,
+	KAPI_SIGNAL_ACTION_RETURN,
+	KAPI_SIGNAL_ACTION_RESTART,
+	KAPI_SIGNAL_ACTION_QUEUE,
+	KAPI_SIGNAL_ACTION_DISCARD,
+	KAPI_SIGNAL_ACTION_TRANSFORM,
+};
+
+/**
+ * struct kapi_signal_spec - Signal specification
+ * @signal_num: Signal number (e.g., SIGKILL, SIGTERM)
+ * @signal_name: Signal name as string
+ * @direction: Direction flags (OR of kapi_signal_direction)
+ * @action: What happens when signal is received
+ * @target: Description of target process/thread for sent signals
+ * @condition: Condition under which signal is sent/received/handled
+ * @description: Detailed description of signal handling
+ * @restartable: Whether syscall is restartable after this signal
+ * @sa_flags_required: Required signal action flags (SA_*)
+ * @sa_flags_forbidden: Forbidden signal action flags
+ * @error_on_signal: Error code returned when signal occurs (-EINTR, etc)
+ * @transform_to: Signal number to transform to (if action is TRANSFORM)
+ * @timing: When signal can occur ("entry", "during", "exit", "anytime")
+ * @priority: Signal handling priority (lower processed first)
+ * @interruptible: Whether this operation is interruptible by this signal
+ * @queue_behavior: How signal is queued ("realtime", "standard", "coalesce")
+ * @state_required: Required process state for signal to be delivered
+ * @state_forbidden: Forbidden process state for signal delivery
+ */
+struct kapi_signal_spec {
+	int signal_num;
+	char signal_name[32];
+	u32 direction;
+	enum kapi_signal_action action;
+	char target[KAPI_MAX_DESC_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+	bool restartable;
+	u32 sa_flags_required;
+	u32 sa_flags_forbidden;
+	int error_on_signal;
+	int transform_to;
+	char timing[32];
+	u8 priority;
+	bool interruptible;
+	char queue_behavior[128];
+	u32 state_required;
+	u32 state_forbidden;
+} __attribute__((packed));
+
+/**
+ * struct kapi_signal_mask_spec - Signal mask specification
+ * @mask_name: Name of the signal mask
+ * @signals: Array of signal numbers in the mask
+ * @signal_count: Number of signals in the mask
+ * @description: Description of what this mask represents
+ */
+struct kapi_signal_mask_spec {
+	char mask_name[KAPI_MAX_NAME_LEN];
+	int signals[KAPI_MAX_SIGNALS];
+	u32 signal_count;
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_field - Structure field specification
+ * @name: Field name
+ * @type: Field type classification
+ * @type_name: Type name as string
+ * @offset: Offset within structure
+ * @size: Size of field in bytes
+ * @flags: Field attribute flags
+ * @constraint_type: Type of constraint applied
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag fields)
+ * @enum_values: Comma-separated list of valid enum values (for enum types)
+ * @description: Field description
+ */
+struct kapi_struct_field {
+	char name[KAPI_MAX_NAME_LEN];
+	enum kapi_param_type type;
+	char type_name[KAPI_MAX_NAME_LEN];
+	size_t offset;
+	size_t size;
+	u32 flags;
+	enum kapi_constraint_type constraint_type;
+	s64 min_value;
+	s64 max_value;
+	u64 valid_mask;
+	char enum_values[KAPI_MAX_DESC_LEN];	/* Comma-separated list of valid enum values */
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_spec - Structure type specification
+ * @name: Structure name
+ * @size: Total size of structure
+ * @alignment: Required alignment
+ * @field_count: Number of fields
+ * @fields: Field specifications
+ * @description: Structure description
+ */
+struct kapi_struct_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	size_t size;
+	size_t alignment;
+	u32 field_count;
+	struct kapi_struct_field fields[KAPI_MAX_PARAMS];
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_capability_action - What the capability allows
+ * @KAPI_CAP_BYPASS_CHECK: Bypasses a check entirely
+ * @KAPI_CAP_INCREASE_LIMIT: Increases or removes a limit
+ * @KAPI_CAP_OVERRIDE_RESTRICTION: Overrides a restriction
+ * @KAPI_CAP_GRANT_PERMISSION: Grants permission that would otherwise be denied
+ * @KAPI_CAP_MODIFY_BEHAVIOR: Changes the behavior of the operation
+ * @KAPI_CAP_ACCESS_RESOURCE: Allows access to restricted resources
+ * @KAPI_CAP_PERFORM_OPERATION: Allows performing a privileged operation
+ */
+enum kapi_capability_action {
+	KAPI_CAP_BYPASS_CHECK = 0,
+	KAPI_CAP_INCREASE_LIMIT,
+	KAPI_CAP_OVERRIDE_RESTRICTION,
+	KAPI_CAP_GRANT_PERMISSION,
+	KAPI_CAP_MODIFY_BEHAVIOR,
+	KAPI_CAP_ACCESS_RESOURCE,
+	KAPI_CAP_PERFORM_OPERATION,
+};
+
+/**
+ * struct kapi_capability_spec - Capability requirement specification
+ * @capability: The capability constant (e.g., CAP_IPC_LOCK)
+ * @cap_name: Capability name as string
+ * @action: What the capability allows (kapi_capability_action)
+ * @allows: Description of what the capability allows
+ * @without_cap: What happens without the capability
+ * @check_condition: Condition when capability is checked
+ * @priority: Check priority (lower checked first)
+ * @alternative: Alternative capabilities that can be used
+ * @alternative_count: Number of alternative capabilities
+ */
+struct kapi_capability_spec {
+	int capability;
+	char cap_name[KAPI_MAX_NAME_LEN];
+	enum kapi_capability_action action;
+	char allows[KAPI_MAX_DESC_LEN];
+	char without_cap[KAPI_MAX_DESC_LEN];
+	char check_condition[KAPI_MAX_DESC_LEN];
+	u8 priority;
+	int alternative[KAPI_MAX_CAPABILITIES];
+	u32 alternative_count;
+} __attribute__((packed));
+
+/**
+ * enum kapi_side_effect_type - Types of side effects
+ * @KAPI_EFFECT_NONE: No side effects
+ * @KAPI_EFFECT_ALLOC_MEMORY: Allocates memory
+ * @KAPI_EFFECT_FREE_MEMORY: Frees memory
+ * @KAPI_EFFECT_MODIFY_STATE: Modifies global/shared state
+ * @KAPI_EFFECT_SIGNAL_SEND: Sends signals
+ * @KAPI_EFFECT_FILE_POSITION: Modifies file position
+ * @KAPI_EFFECT_LOCK_ACQUIRE: Acquires locks
+ * @KAPI_EFFECT_LOCK_RELEASE: Releases locks
+ * @KAPI_EFFECT_RESOURCE_CREATE: Creates system resources (FDs, PIDs, etc)
+ * @KAPI_EFFECT_RESOURCE_DESTROY: Destroys system resources
+ * @KAPI_EFFECT_SCHEDULE: May cause scheduling/context switch
+ * @KAPI_EFFECT_HARDWARE: Interacts with hardware
+ * @KAPI_EFFECT_NETWORK: Network I/O operation
+ * @KAPI_EFFECT_FILESYSTEM: Filesystem modification
+ * @KAPI_EFFECT_PROCESS_STATE: Modifies process state
+ * @KAPI_EFFECT_IRREVERSIBLE: Effect cannot be undone
+ */
+enum kapi_side_effect_type {
+	KAPI_EFFECT_NONE = 0,
+	KAPI_EFFECT_ALLOC_MEMORY = (1 << 0),
+	KAPI_EFFECT_FREE_MEMORY = (1 << 1),
+	KAPI_EFFECT_MODIFY_STATE = (1 << 2),
+	KAPI_EFFECT_SIGNAL_SEND = (1 << 3),
+	KAPI_EFFECT_FILE_POSITION = (1 << 4),
+	KAPI_EFFECT_LOCK_ACQUIRE = (1 << 5),
+	KAPI_EFFECT_LOCK_RELEASE = (1 << 6),
+	KAPI_EFFECT_RESOURCE_CREATE = (1 << 7),
+	KAPI_EFFECT_RESOURCE_DESTROY = (1 << 8),
+	KAPI_EFFECT_SCHEDULE = (1 << 9),
+	KAPI_EFFECT_HARDWARE = (1 << 10),
+	KAPI_EFFECT_NETWORK = (1 << 11),
+	KAPI_EFFECT_FILESYSTEM = (1 << 12),
+	KAPI_EFFECT_PROCESS_STATE = (1 << 13),
+	KAPI_EFFECT_IRREVERSIBLE = (1 << 14),
+};
+
+/**
+ * struct kapi_side_effect - Side effect specification
+ * @type: Bitmask of effect types
+ * @target: What is affected (e.g., "process memory", "file descriptor table")
+ * @condition: Condition under which effect occurs
+ * @description: Detailed description of the effect
+ * @reversible: Whether the effect can be undone
+ */
+struct kapi_side_effect {
+	u32 type;
+	char target[KAPI_MAX_NAME_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+	bool reversible;
+} __attribute__((packed));
+
+/**
+ * struct kapi_state_transition - State transition specification
+ * @from_state: Starting state description
+ * @to_state: Ending state description
+ * @condition: Condition for transition
+ * @object: Object whose state changes
+ * @description: Detailed description
+ */
+struct kapi_state_transition {
+	char from_state[KAPI_MAX_NAME_LEN];
+	char to_state[KAPI_MAX_NAME_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char object[KAPI_MAX_NAME_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+#define KAPI_MAX_STRUCT_SPECS	8
+#define KAPI_MAX_SIDE_EFFECTS	32
+#define KAPI_MAX_STATE_TRANS	8
+
+/**
+ * enum kapi_socket_state - Socket states for state machine
+ */
+enum kapi_socket_state {
+	KAPI_SOCK_STATE_UNSPEC = 0,
+	KAPI_SOCK_STATE_CLOSED,
+	KAPI_SOCK_STATE_OPEN,
+	KAPI_SOCK_STATE_BOUND,
+	KAPI_SOCK_STATE_LISTEN,
+	KAPI_SOCK_STATE_SYN_SENT,
+	KAPI_SOCK_STATE_SYN_RECV,
+	KAPI_SOCK_STATE_ESTABLISHED,
+	KAPI_SOCK_STATE_FIN_WAIT1,
+	KAPI_SOCK_STATE_FIN_WAIT2,
+	KAPI_SOCK_STATE_CLOSE_WAIT,
+	KAPI_SOCK_STATE_CLOSING,
+	KAPI_SOCK_STATE_LAST_ACK,
+	KAPI_SOCK_STATE_TIME_WAIT,
+	KAPI_SOCK_STATE_CONNECTED,
+	KAPI_SOCK_STATE_DISCONNECTED,
+};
+
+/**
+ * enum kapi_socket_protocol - Socket protocol types
+ */
+enum kapi_socket_protocol {
+	KAPI_PROTO_TCP		= (1 << 0),
+	KAPI_PROTO_UDP		= (1 << 1),
+	KAPI_PROTO_UNIX		= (1 << 2),
+	KAPI_PROTO_RAW		= (1 << 3),
+	KAPI_PROTO_PACKET	= (1 << 4),
+	KAPI_PROTO_NETLINK	= (1 << 5),
+	KAPI_PROTO_SCTP		= (1 << 6),
+	KAPI_PROTO_DCCP		= (1 << 7),
+	KAPI_PROTO_ALL		= 0xFFFFFFFF,
+};
+
+/**
+ * enum kapi_buffer_behavior - Network buffer handling behaviors
+ */
+enum kapi_buffer_behavior {
+	KAPI_BUF_PEEK		= (1 << 0),
+	KAPI_BUF_TRUNCATE	= (1 << 1),
+	KAPI_BUF_SCATTER	= (1 << 2),
+	KAPI_BUF_ZERO_COPY	= (1 << 3),
+	KAPI_BUF_KERNEL_ALLOC	= (1 << 4),
+	KAPI_BUF_DMA_CAPABLE	= (1 << 5),
+	KAPI_BUF_FRAGMENT	= (1 << 6),
+};
+
+/**
+ * enum kapi_async_behavior - Asynchronous operation behaviors
+ */
+enum kapi_async_behavior {
+	KAPI_ASYNC_BLOCK	= 0,
+	KAPI_ASYNC_NONBLOCK	= (1 << 0),
+	KAPI_ASYNC_POLL_READY	= (1 << 1),
+	KAPI_ASYNC_SIGNAL_DRIVEN = (1 << 2),
+	KAPI_ASYNC_AIO		= (1 << 3),
+	KAPI_ASYNC_IO_URING	= (1 << 4),
+	KAPI_ASYNC_EPOLL	= (1 << 5),
+};
+
+/**
+ * struct kapi_socket_state_spec - Socket state requirement/transition
+ */
+struct kapi_socket_state_spec {
+	enum kapi_socket_state required_states[KAPI_MAX_SOCKET_STATES];
+	u32 required_state_count;
+	enum kapi_socket_state forbidden_states[KAPI_MAX_SOCKET_STATES];
+	u32 forbidden_state_count;
+	enum kapi_socket_state resulting_state;
+	char state_condition[KAPI_MAX_DESC_LEN];
+	u32 applicable_protocols;
+} __attribute__((packed));
+
+/**
+ * struct kapi_protocol_behavior - Protocol-specific behavior
+ */
+struct kapi_protocol_behavior {
+	u32 applicable_protocols;
+	char behavior[KAPI_MAX_DESC_LEN];
+	s64 protocol_flags;
+	char flag_description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_buffer_spec - Network buffer specification
+ */
+struct kapi_buffer_spec {
+	u32 buffer_behaviors;
+	size_t min_buffer_size;
+	size_t max_buffer_size;
+	size_t optimal_buffer_size;
+	char fragmentation_rules[KAPI_MAX_DESC_LEN];
+	bool can_partial_transfer;
+	char partial_transfer_rules[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_async_spec - Asynchronous behavior specification
+ */
+struct kapi_async_spec {
+	enum kapi_async_behavior supported_modes;
+	int nonblock_errno;
+	u32 poll_events_in;
+	u32 poll_events_out;
+	char completion_condition[KAPI_MAX_DESC_LEN];
+	bool supports_timeout;
+	char timeout_behavior[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_addr_family_spec - Address family specification
+ */
+struct kapi_addr_family_spec {
+	int family;
+	char family_name[32];
+	size_t addr_struct_size;
+	size_t min_addr_len;
+	size_t max_addr_len;
+	char addr_format[KAPI_MAX_DESC_LEN];
+	bool supports_wildcard;
+	bool supports_multicast;
+	bool supports_broadcast;
+	char special_addresses[KAPI_MAX_DESC_LEN];
+	u32 port_range_min;
+	u32 port_range_max;
+} __attribute__((packed));
+
+/**
+ * struct kernel_api_spec - Complete kernel API specification
+ * @name: Function name
+ * @version: API version
+ * @description: Brief description
+ * @long_description: Detailed description
+ * @context_flags: Execution context flags
+ * @param_count: Number of parameters
+ * @params: Parameter specifications
+ * @return_spec: Return value specification
+ * @error_count: Number of possible errors
+ * @errors: Error specifications
+ * @lock_count: Number of lock specifications
+ * @locks: Lock requirement specifications
+ * @constraint_count: Number of additional constraints
+ * @constraints: Additional constraint specifications
+ * @examples: Usage examples
+ * @notes: Additional notes
+ * @since_version: Kernel version when introduced
+ * @signal_count: Number of signal specifications
+ * @signals: Signal handling specifications
+ * @signal_mask_count: Number of signal mask specifications
+ * @signal_masks: Signal mask specifications
+ * @struct_spec_count: Number of structure specifications
+ * @struct_specs: Structure type specifications
+ * @side_effect_count: Number of side effect specifications
+ * @side_effects: Side effect specifications
+ * @state_trans_count: Number of state transition specifications
+ * @state_transitions: State transition specifications
+ */
+struct kernel_api_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	u32 version;
+	char description[KAPI_MAX_DESC_LEN];
+	char long_description[KAPI_MAX_DESC_LEN * 4];
+	u32 context_flags;
+
+	/* Parameters */
+	u32 param_magic;  /* 0x4B415031 = 'KAP1' */
+	u32 param_count;
+	struct kapi_param_spec params[KAPI_MAX_PARAMS];
+
+	/* Return value */
+	u32 return_magic; /* 0x4B415232 = 'KAR2' */
+	struct kapi_return_spec return_spec;
+
+	/* Errors */
+	u32 error_magic;  /* 0x4B414533 = 'KAE3' */
+	u32 error_count;
+	struct kapi_error_spec errors[KAPI_MAX_ERRORS];
+
+	/* Locking */
+	u32 lock_magic;   /* 0x4B414C34 = 'KAL4' */
+	u32 lock_count;
+	struct kapi_lock_spec locks[KAPI_MAX_CONSTRAINTS];
+
+	/* Constraints */
+	u32 constraint_magic; /* 0x4B414335 = 'KAC5' */
+	u32 constraint_count;
+	struct kapi_constraint_spec constraints[KAPI_MAX_CONSTRAINTS];
+
+	/* Additional information */
+	u32 info_magic;   /* 0x4B414936 = 'KAI6' */
+	char examples[KAPI_MAX_DESC_LEN * 2];
+	char notes[KAPI_MAX_DESC_LEN * 2];
+	char since_version[32];
+
+	/* Signal specifications */
+	u32 signal_magic; /* 0x4B415337 = 'KAS7' */
+	u32 signal_count;
+	struct kapi_signal_spec signals[KAPI_MAX_SIGNALS];
+
+	/* Signal mask specifications */
+	u32 sigmask_magic; /* 0x4B414D38 = 'KAM8' */
+	u32 signal_mask_count;
+	struct kapi_signal_mask_spec signal_masks[KAPI_MAX_SIGNALS];
+
+	/* Structure specifications */
+	u32 struct_magic; /* 0x4B415439 = 'KAT9' */
+	u32 struct_spec_count;
+	struct kapi_struct_spec struct_specs[KAPI_MAX_STRUCT_SPECS];
+
+	/* Side effects */
+	u32 effect_magic; /* 0x4B414641 = 'KAFA' */
+	u32 side_effect_count;
+	struct kapi_side_effect side_effects[KAPI_MAX_SIDE_EFFECTS];
+
+	/* State transitions */
+	u32 trans_magic;  /* 0x4B415442 = 'KATB' */
+	u32 state_trans_count;
+	struct kapi_state_transition state_transitions[KAPI_MAX_STATE_TRANS];
+
+	/* Capability specifications */
+	u32 cap_magic;    /* 0x4B414343 = 'KACC' */
+	u32 capability_count;
+	struct kapi_capability_spec capabilities[KAPI_MAX_CAPABILITIES];
+
+	/* Extended fields for socket and network operations */
+	struct kapi_socket_state_spec socket_state;
+	struct kapi_protocol_behavior protocol_behaviors[KAPI_MAX_PROTOCOL_BEHAVIORS];
+	u32 protocol_behavior_count;
+	struct kapi_buffer_spec buffer_spec;
+	struct kapi_async_spec async_spec;
+	struct kapi_addr_family_spec addr_families[KAPI_MAX_ADDR_FAMILIES];
+	u32 addr_family_count;
+
+	/* Operation characteristics */
+	bool is_connection_oriented;
+	bool is_message_oriented;
+	bool supports_oob_data;
+	bool supports_peek;
+	bool supports_select_poll;
+	bool is_reentrant;
+
+	/* Semantic descriptions */
+	char connection_establishment[KAPI_MAX_DESC_LEN];
+	char connection_termination[KAPI_MAX_DESC_LEN];
+	char data_transfer_semantics[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/* Macros for defining API specifications */
+
+/**
+ * DEFINE_KERNEL_API_SPEC - Define a kernel API specification
+ * @func_name: Function name to specify
+ */
+#define DEFINE_KERNEL_API_SPEC(func_name) \
+	static struct kernel_api_spec __kapi_spec_##func_name \
+	__used __section(".kapi_specs") = {	\
+		.name = __stringify(func_name),	\
+		.version = 1,
+
+#define KAPI_END_SPEC };
+
+/**
+ * KAPI_DESCRIPTION - Set API description
+ * @desc: Description string
+ */
+#define KAPI_DESCRIPTION(desc) \
+	.description = desc,
+
+/**
+ * KAPI_LONG_DESC - Set detailed API description
+ * @desc: Detailed description string
+ */
+#define KAPI_LONG_DESC(desc) \
+	.long_description = desc,
+
+/**
+ * KAPI_CONTEXT - Set execution context flags
+ * @flags: Context flags (OR'ed KAPI_CTX_* values)
+ */
+#define KAPI_CONTEXT(flags) \
+	.context_flags = flags,
+
+/**
+ * KAPI_PARAM - Define a parameter specification
+ * @idx: Parameter index (0-based)
+ * @pname: Parameter name
+ * @ptype: Type name string
+ * @pdesc: Parameter description
+ */
+#define KAPI_PARAM(idx, pname, ptype, pdesc) \
+	.params[idx] = {			\
+		.name = pname,			\
+		.type_name = ptype,		\
+		.description = pdesc,		\
+		.size_param_idx = -1,		/* Default: no dynamic sizing */
+
+#define KAPI_PARAM_TYPE(ptype) \
+		.type = ptype,
+
+#define KAPI_PARAM_FLAGS(pflags) \
+		.flags = pflags,
+
+#define KAPI_PARAM_SIZE(psize) \
+		.size = psize,
+
+#define KAPI_PARAM_RANGE(pmin, pmax) \
+		.min_value = pmin,	\
+		.max_value = pmax,
+
+#define KAPI_PARAM_CONSTRAINT_TYPE(ctype) \
+		.constraint_type = ctype,
+
+#define KAPI_PARAM_CONSTRAINT(desc) \
+		.constraints = desc,
+
+#define KAPI_PARAM_VALID_MASK(mask) \
+		.valid_mask = mask,
+
+#define KAPI_PARAM_ENUM_VALUES(values) \
+		.enum_values = values, \
+		.enum_count = ARRAY_SIZE(values),
+
+#define KAPI_PARAM_ALIGNMENT(align) \
+		.alignment = align,
+
+#define KAPI_PARAM_SIZE_PARAM(idx) \
+		.size_param_idx = idx,
+
+#define KAPI_PARAM_END },
+
+/**
+ * KAPI_PARAM_COUNT - Set the number of parameters
+ * @n: Number of parameters
+ */
+#define KAPI_PARAM_COUNT(n) \
+	.param_magic = KAPI_MAGIC_PARAMS, \
+	.param_count = n,
+
+/**
+ * KAPI_RETURN - Define return value specification
+ * @rtype: Return type name
+ * @rdesc: Return value description
+ */
+#define KAPI_RETURN(rtype, rdesc) \
+	.return_spec = {		\
+		.type_name = rtype,	\
+		.description = rdesc,
+
+#define KAPI_RETURN_SUCCESS(val) \
+		.success_value = val,
+
+#define KAPI_RETURN_TYPE(rtype) \
+		.type = rtype,
+
+#define KAPI_RETURN_CHECK_TYPE(ctype) \
+		.check_type = ctype,
+
+#define KAPI_RETURN_ERROR_VALUES(values) \
+		.error_values = values,
+
+#define KAPI_RETURN_ERROR_COUNT(count) \
+		.error_count = count,
+
+#define KAPI_RETURN_SUCCESS_RANGE(min, max) \
+		.success_min = min, \
+		.success_max = max,
+
+#define KAPI_RETURN_END },
+
+/**
+ * KAPI_ERROR - Define an error condition
+ * @idx: Error index
+ * @ecode: Error code value
+ * @ename: Error name
+ * @econd: Error condition
+ * @edesc: Error description
+ */
+#define KAPI_ERROR(idx, ecode, ename, econd, edesc) \
+	.errors[idx] = {			\
+		.error_code = ecode,		\
+		.name = ename,			\
+		.condition = econd,		\
+		.description = edesc,		\
+	},
+
+/**
+ * KAPI_ERROR_COUNT - Set the number of errors
+ * @n: Number of errors
+ */
+#define KAPI_ERROR_COUNT(n) \
+	.error_magic = KAPI_MAGIC_ERRORS, \
+	.error_count = n,
+
+/**
+ * KAPI_LOCK - Define a lock requirement
+ * @idx: Lock index
+ * @lname: Lock name
+ * @ltype: Lock type
+ */
+#define KAPI_LOCK(idx, lname, ltype) \
+	.locks[idx] = {			\
+		.lock_name = lname,	\
+		.lock_type = ltype,
+
+#define KAPI_LOCK_ACQUIRED \
+		.acquired = true,
+
+#define KAPI_LOCK_RELEASED \
+		.released = true,
+
+#define KAPI_LOCK_HELD_ENTRY \
+		.held_on_entry = true,
+
+#define KAPI_LOCK_HELD_EXIT \
+		.held_on_exit = true,
+
+#define KAPI_LOCK_DESC(ldesc) \
+		.description = ldesc,
+
+#define KAPI_LOCK_END },
+
+/**
+ * KAPI_CONSTRAINT - Define an additional constraint
+ * @idx: Constraint index
+ * @cname: Constraint name
+ * @cdesc: Constraint description
+ */
+#define KAPI_CONSTRAINT(idx, cname, cdesc) \
+	.constraints[idx] = {		\
+		.name = cname,		\
+		.description = cdesc,
+
+#define KAPI_CONSTRAINT_EXPR(expr) \
+		.expression = expr,
+
+#define KAPI_CONSTRAINT_END },
+
+/**
+ * KAPI_EXAMPLES - Set API usage examples
+ * @examples: Examples string
+ */
+#define KAPI_EXAMPLES(ex) \
+	.info_magic = KAPI_MAGIC_INFO, \
+	.examples = ex,
+
+/**
+ * KAPI_NOTES - Set API notes
+ * @notes: Notes string
+ */
+#define KAPI_NOTES(n) \
+	.notes = n,
+
+
+/**
+ * KAPI_SIGNAL - Define a signal specification
+ * @idx: Signal index
+ * @signum: Signal number (e.g., SIGKILL)
+ * @signame: Signal name string
+ * @dir: Direction flags
+ * @act: Action taken
+ */
+#define KAPI_SIGNAL(idx, signum, signame, dir, act) \
+	.signals[idx] = {			\
+		.signal_num = signum,		\
+		.signal_name = signame,		\
+		.direction = dir,		\
+		.action = act,
+
+#define KAPI_SIGNAL_TARGET(tgt) \
+		.target = tgt,
+
+#define KAPI_SIGNAL_CONDITION(cond) \
+		.condition = cond,
+
+#define KAPI_SIGNAL_DESC(desc) \
+		.description = desc,
+
+#define KAPI_SIGNAL_RESTARTABLE \
+		.restartable = true,
+
+#define KAPI_SIGNAL_SA_FLAGS_REQ(flags) \
+		.sa_flags_required = flags,
+
+#define KAPI_SIGNAL_SA_FLAGS_FORBID(flags) \
+		.sa_flags_forbidden = flags,
+
+#define KAPI_SIGNAL_ERROR(err) \
+		.error_on_signal = err,
+
+#define KAPI_SIGNAL_TRANSFORM(sig) \
+		.transform_to = sig,
+
+#define KAPI_SIGNAL_TIMING(when) \
+		.timing = when,
+
+#define KAPI_SIGNAL_PRIORITY(prio) \
+		.priority = prio,
+
+#define KAPI_SIGNAL_INTERRUPTIBLE \
+		.interruptible = true,
+
+#define KAPI_SIGNAL_QUEUE(behavior) \
+		.queue_behavior = behavior,
+
+#define KAPI_SIGNAL_STATE_REQ(state) \
+		.state_required = state,
+
+#define KAPI_SIGNAL_STATE_FORBID(state) \
+		.state_forbidden = state,
+
+#define KAPI_SIGNAL_END },
+
+#define KAPI_SIGNAL_COUNT(n) \
+	.signal_magic = KAPI_MAGIC_SIGNALS, \
+	.signal_count = n,
+
+/**
+ * KAPI_SIGNAL_MASK - Define a signal mask specification
+ * @idx: Mask index
+ * @name: Mask name
+ * @desc: Mask description
+ */
+#define KAPI_SIGNAL_MASK(idx, name, desc) \
+	.signal_masks[idx] = {		\
+		.mask_name = name,	\
+		.description = desc,
+
+/*
+ * KAPI_SIGNAL_MASK_SIGNALS - Specify signals in a signal mask
+ * @...: Variadic list of signal numbers
+ *
+ * Usage:
+ *   KAPI_SIGNAL_MASK(0, "blocked", "Signals blocked during operation")
+ *   KAPI_SIGNAL_MASK_SIGNALS(SIGINT, SIGTERM, SIGQUIT)
+ *   KAPI_SIGNAL_MASK_END
+ */
+#define KAPI_SIGNAL_MASK_SIGNALS(...) \
+		.signals = { __VA_ARGS__ }, \
+		.signal_count = sizeof((int[]){ __VA_ARGS__ }) / sizeof(int),
+
+#define KAPI_SIGNAL_MASK_END },
+
+/**
+ * KAPI_STRUCT_SPEC - Define a structure specification
+ * @idx: Structure spec index
+ * @sname: Structure name
+ * @sdesc: Structure description
+ */
+#define KAPI_STRUCT_SPEC(idx, sname, sdesc) \
+	.struct_specs[idx] = {		\
+		.name = #sname,		\
+		.description = sdesc,
+
+#define KAPI_STRUCT_SIZE(ssize, salign) \
+		.size = ssize,		\
+		.alignment = salign,
+
+#define KAPI_STRUCT_FIELD_COUNT(n) \
+		.field_count = n,
+
+/**
+ * KAPI_STRUCT_FIELD - Define a structure field
+ * @fidx: Field index
+ * @fname: Field name
+ * @ftype: Field type (KAPI_TYPE_*)
+ * @ftype_name: Type name as string
+ * @fdesc: Field description
+ */
+#define KAPI_STRUCT_FIELD(fidx, fname, ftype, ftype_name, fdesc) \
+		.fields[fidx] = {	\
+			.name = fname,	\
+			.type = ftype,	\
+			.type_name = ftype_name, \
+			.description = fdesc,
+
+#define KAPI_FIELD_OFFSET(foffset) \
+			.offset = foffset,
+
+#define KAPI_FIELD_SIZE(fsize) \
+			.size = fsize,
+
+#define KAPI_FIELD_FLAGS(fflags) \
+			.flags = fflags,
+
+#define KAPI_FIELD_CONSTRAINT_RANGE(min, max) \
+			.constraint_type = KAPI_CONSTRAINT_RANGE, \
+			.min_value = min, \
+			.max_value = max,
+
+#define KAPI_FIELD_CONSTRAINT_MASK(mask) \
+			.constraint_type = KAPI_CONSTRAINT_MASK, \
+			.valid_mask = mask,
+
+#define KAPI_FIELD_CONSTRAINT_ENUM(values) \
+			.constraint_type = KAPI_CONSTRAINT_ENUM, \
+			.enum_values = values,
+
+#define KAPI_STRUCT_FIELD_END },
+
+#define KAPI_STRUCT_SPEC_END },
+
+/* Counter for structure specifications */
+#define KAPI_STRUCT_SPEC_COUNT(n) \
+	.struct_magic = KAPI_MAGIC_STRUCTS, \
+	.struct_spec_count = n,
+
+/* Additional lock-related macros */
+#define KAPI_LOCK_COUNT(n) \
+	.lock_magic = KAPI_MAGIC_LOCKS, \
+	.lock_count = n,
+
+/**
+ * KAPI_SIDE_EFFECT - Define a side effect
+ * @idx: Side effect index
+ * @etype: Effect type bitmask (OR'ed KAPI_EFFECT_* values)
+ * @etarget: What is affected
+ * @edesc: Effect description
+ */
+#define KAPI_SIDE_EFFECT(idx, etype, etarget, edesc) \
+	.side_effects[idx] = {		\
+		.type = etype,		\
+		.target = etarget,	\
+		.description = edesc,	\
+		.reversible = false,	/* Default to non-reversible */
+
+#define KAPI_EFFECT_CONDITION(cond) \
+		.condition = cond,
+
+#define KAPI_EFFECT_REVERSIBLE \
+		.reversible = true,
+
+#define KAPI_SIDE_EFFECT_END },
+
+/**
+ * KAPI_STATE_TRANS - Define a state transition
+ * @idx: State transition index
+ * @obj: Object whose state changes
+ * @from: From state
+ * @to: To state
+ * @desc: Transition description
+ */
+#define KAPI_STATE_TRANS(idx, obj, from, to, desc) \
+	.state_transitions[idx] = {	\
+		.object = obj,		\
+		.from_state = from,	\
+		.to_state = to,		\
+		.description = desc,
+
+#define KAPI_STATE_TRANS_COND(cond) \
+		.condition = cond,
+
+#define KAPI_STATE_TRANS_END },
+
+/* Counters for side effects and state transitions */
+#define KAPI_SIDE_EFFECT_COUNT(n) \
+	.effect_magic = KAPI_MAGIC_EFFECTS, \
+	.side_effect_count = n,
+
+#define KAPI_STATE_TRANS_COUNT(n) \
+	.trans_magic = KAPI_MAGIC_TRANS, \
+	.state_trans_count = n,
+
+/* Helper macros for common side effect patterns */
+#define KAPI_EFFECTS_MEMORY	(KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_MEMORY)
+#define KAPI_EFFECTS_LOCKING	(KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_RELEASE)
+#define KAPI_EFFECTS_RESOURCES	(KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_RESOURCE_DESTROY)
+#define KAPI_EFFECTS_IO		(KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM)
+
+/*
+ * Helper macros for combining common parameter flag patterns.
+ * Note: KAPI_PARAM_IN, KAPI_PARAM_OUT, KAPI_PARAM_INOUT, and KAPI_PARAM_OPTIONAL
+ * are already defined in enum kapi_param_flags - use those directly.
+ */
+#define KAPI_PARAM_FLAGS_INOUT	(KAPI_PARAM_IN | KAPI_PARAM_OUT)
+#define KAPI_PARAM_FLAGS_USER	(KAPI_PARAM_USER | KAPI_PARAM_IN)
+
+/* Common signal timing constants */
+#define KAPI_SIGNAL_TIME_ENTRY		"entry"
+#define KAPI_SIGNAL_TIME_DURING		"during"
+#define KAPI_SIGNAL_TIME_EXIT		"exit"
+#define KAPI_SIGNAL_TIME_ANYTIME	"anytime"
+#define KAPI_SIGNAL_TIME_BLOCKING	"while_blocked"
+#define KAPI_SIGNAL_TIME_SLEEPING	"while_sleeping"
+#define KAPI_SIGNAL_TIME_BEFORE		"before"
+#define KAPI_SIGNAL_TIME_AFTER		"after"
+
+/* Common signal queue behaviors */
+#define KAPI_SIGNAL_QUEUE_STANDARD	"standard"
+#define KAPI_SIGNAL_QUEUE_REALTIME	"realtime"
+#define KAPI_SIGNAL_QUEUE_COALESCE	"coalesce"
+#define KAPI_SIGNAL_QUEUE_REPLACE	"replace"
+#define KAPI_SIGNAL_QUEUE_DISCARD	"discard"
+
+/* Process state flags for signal delivery */
+#define KAPI_SIGNAL_STATE_RUNNING	(1 << 0)
+#define KAPI_SIGNAL_STATE_SLEEPING	(1 << 1)
+#define KAPI_SIGNAL_STATE_STOPPED	(1 << 2)
+#define KAPI_SIGNAL_STATE_TRACED	(1 << 3)
+#define KAPI_SIGNAL_STATE_ZOMBIE	(1 << 4)
+#define KAPI_SIGNAL_STATE_DEAD		(1 << 5)
+
+/* Capability specification macros */
+
+/**
+ * KAPI_CAPABILITY - Define a capability requirement
+ * @idx: Capability index
+ * @cap: Capability constant (e.g., CAP_IPC_LOCK)
+ * @name: Capability name string
+ * @act: Action type (kapi_capability_action)
+ */
+#define KAPI_CAPABILITY(idx, cap, name, act) \
+	.capabilities[idx] = {		\
+		.capability = cap,	\
+		.cap_name = name,	\
+		.action = act,
+
+#define KAPI_CAP_ALLOWS(desc) \
+		.allows = desc,
+
+#define KAPI_CAP_WITHOUT(desc) \
+		.without_cap = desc,
+
+#define KAPI_CAP_CONDITION(cond) \
+		.check_condition = cond,
+
+#define KAPI_CAP_PRIORITY(prio) \
+		.priority = prio,
+
+#define KAPI_CAP_ALTERNATIVE(caps, count) \
+		.alternative = caps,	\
+		.alternative_count = count,
+
+#define KAPI_CAPABILITY_END },
+
+/* Counter for capability specifications */
+#define KAPI_CAPABILITY_COUNT(n) \
+	.cap_magic = KAPI_MAGIC_CAPS, \
+	.capability_count = n,
+
+/* Common signal patterns for syscalls */
+#define KAPI_SIGNAL_INTERRUPTIBLE_SLEEP \
+	KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+		KAPI_SIGNAL_ERROR(-EINTR) \
+		KAPI_SIGNAL_RESTARTABLE \
+		KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+	KAPI_SIGNAL_END, \
+	KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+		KAPI_SIGNAL_ERROR(-EINTR) \
+		KAPI_SIGNAL_RESTARTABLE \
+		KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+	KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_FATAL_DEFAULT \
+	KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+		KAPI_SIGNAL_PRIORITY(0) \
+		KAPI_SIGNAL_DESC("Process terminated immediately") \
+	KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_STOP_CONT \
+	KAPI_SIGNAL(3, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+		KAPI_SIGNAL_DESC("Process stopped") \
+	KAPI_SIGNAL_END, \
+	KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+		KAPI_SIGNAL_DESC("Process continued") \
+	KAPI_SIGNAL_END
+
+/* Validation and runtime checking */
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+bool kapi_validate_params(const struct kernel_api_spec *spec, ...);
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value);
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+				       s64 value, const s64 *all_params, int param_count);
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+				int param_idx, s64 value);
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+				 const s64 *params, int param_count);
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval);
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval);
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval);
+void kapi_check_context(const struct kernel_api_spec *spec);
+void kapi_check_locks(const struct kernel_api_spec *spec);
+bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int signum);
+bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
+				 struct sigaction *act);
+int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum);
+bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum);
+#else
+static inline bool kapi_validate_params(const struct kernel_api_spec *spec, ...)
+{
+	return true;
+}
+static inline bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+	return true;
+}
+static inline bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+						     s64 value, const s64 *all_params, int param_count)
+{
+	return true;
+}
+static inline int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+					       int param_idx, s64 value)
+{
+	return 0;
+}
+static inline int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+					       const s64 *params, int param_count)
+{
+	return 0;
+}
+static inline bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+	return true;
+}
+static inline bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+	return true;
+}
+static inline int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+	return 0;
+}
+static inline void kapi_check_context(const struct kernel_api_spec *spec) {}
+static inline void kapi_check_locks(const struct kernel_api_spec *spec) {}
+static inline bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int signum)
+{
+	return true;
+}
+static inline bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
+					       struct sigaction *act)
+{
+	return true;
+}
+static inline int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum)
+{
+	return -EINTR;
+}
+static inline bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum)
+{
+	return false;
+}
+#endif
+
+/* Export/query functions */
+const struct kernel_api_spec *kapi_get_spec(const char *name);
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size);
+void kapi_print_spec(const struct kernel_api_spec *spec);
+
+/* Registration for dynamic APIs */
+int kapi_register_spec(struct kernel_api_spec *spec);
+void kapi_unregister_spec(const char *name);
+
+/* Helper to get parameter constraint info */
+static inline bool kapi_get_param_constraint(const char *api_name, int param_idx,
+					      enum kapi_constraint_type *type,
+					      u64 *valid_mask, s64 *min_val, s64 *max_val)
+{
+	const struct kernel_api_spec *spec = kapi_get_spec(api_name);
+
+	if (!spec || param_idx >= spec->param_count)
+		return false;
+
+	if (type)
+		*type = spec->params[param_idx].constraint_type;
+	if (valid_mask)
+		*valid_mask = spec->params[param_idx].valid_mask;
+	if (min_val)
+		*min_val = spec->params[param_idx].min_value;
+	if (max_val)
+		*max_val = spec->params[param_idx].max_value;
+
+	return true;
+}
+
+/* Socket state requirement macros */
+#define KAPI_SOCKET_STATE_REQ(...) \
+	.socket_state = { \
+		.required_states = { __VA_ARGS__ }, \
+		.required_state_count = sizeof((enum kapi_socket_state[]){__VA_ARGS__})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_FORBID(...) \
+		.forbidden_states = { __VA_ARGS__ }, \
+		.forbidden_state_count = sizeof((enum kapi_socket_state[]){__VA_ARGS__})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_RESULT(state) \
+		.resulting_state = state,
+
+#define KAPI_SOCKET_STATE_COND(cond) \
+		.state_condition = cond,
+
+#define KAPI_SOCKET_STATE_PROTOS(protos) \
+		.applicable_protocols = protos,
+
+#define KAPI_SOCKET_STATE_END },
+
+/* Protocol behavior macros */
+#define KAPI_PROTOCOL_BEHAVIOR(idx, protos, desc) \
+	.protocol_behaviors[idx] = { \
+		.applicable_protocols = protos, \
+		.behavior = desc,
+
+#define KAPI_PROTOCOL_FLAGS(flags, desc) \
+		.protocol_flags = flags, \
+		.flag_description = desc,
+
+#define KAPI_PROTOCOL_BEHAVIOR_END },
+
+/* Async behavior macros */
+#define KAPI_ASYNC_SPEC(modes, errno) \
+	.async_spec = { \
+		.supported_modes = modes, \
+		.nonblock_errno = errno,
+
+#define KAPI_ASYNC_POLL(in, out) \
+		.poll_events_in = in, \
+		.poll_events_out = out,
+
+#define KAPI_ASYNC_COMPLETION(cond) \
+		.completion_condition = cond,
+
+#define KAPI_ASYNC_TIMEOUT(supported, desc) \
+		.supports_timeout = supported, \
+		.timeout_behavior = desc,
+
+#define KAPI_ASYNC_END },
+
+/* Buffer behavior macros */
+#define KAPI_BUFFER_SPEC(behaviors) \
+	.buffer_spec = { \
+		.buffer_behaviors = behaviors,
+
+#define KAPI_BUFFER_SIZE(min, max, optimal) \
+		.min_buffer_size = min, \
+		.max_buffer_size = max, \
+		.optimal_buffer_size = optimal,
+
+#define KAPI_BUFFER_PARTIAL(allowed, rules) \
+		.can_partial_transfer = allowed, \
+		.partial_transfer_rules = rules,
+
+#define KAPI_BUFFER_FRAGMENT(rules) \
+		.fragmentation_rules = rules,
+
+#define KAPI_BUFFER_END },
+
+/* Address family macros */
+#define KAPI_ADDR_FAMILY(idx, fam, name, struct_sz, min_len, max_len) \
+	.addr_families[idx] = { \
+		.family = fam, \
+		.family_name = name, \
+		.addr_struct_size = struct_sz, \
+		.min_addr_len = min_len, \
+		.max_addr_len = max_len,
+
+#define KAPI_ADDR_FORMAT(fmt) \
+		.addr_format = fmt,
+
+#define KAPI_ADDR_FEATURES(wildcard, multicast, broadcast) \
+		.supports_wildcard = wildcard, \
+		.supports_multicast = multicast, \
+		.supports_broadcast = broadcast,
+
+#define KAPI_ADDR_SPECIAL(addrs) \
+		.special_addresses = addrs,
+
+#define KAPI_ADDR_PORTS(min, max) \
+		.port_range_min = min, \
+		.port_range_max = max,
+
+#define KAPI_ADDR_FAMILY_END },
+
+#define KAPI_ADDR_FAMILY_COUNT(n) \
+	.addr_family_count = n,
+
+#define KAPI_PROTOCOL_BEHAVIOR_COUNT(n) \
+	.protocol_behavior_count = n,
+
+#define KAPI_CONSTRAINT_COUNT(n) \
+	.constraint_magic = KAPI_MAGIC_CONSTRAINTS, \
+	.constraint_count = n,
+
+/* Network operation characteristics macros */
+#define KAPI_NET_CONNECTION_ORIENTED \
+	.is_connection_oriented = true,
+
+#define KAPI_NET_MESSAGE_ORIENTED \
+	.is_message_oriented = true,
+
+#define KAPI_NET_SUPPORTS_OOB \
+	.supports_oob_data = true,
+
+#define KAPI_NET_SUPPORTS_PEEK \
+	.supports_peek = true,
+
+#define KAPI_NET_REENTRANT \
+	.is_reentrant = true,
+
+/* Semantic description macros */
+#define KAPI_NET_CONN_ESTABLISH(desc) \
+	.connection_establishment = desc,
+
+#define KAPI_NET_CONN_TERMINATE(desc) \
+	.connection_termination = desc,
+
+#define KAPI_NET_DATA_TRANSFER(desc) \
+	.data_transfer_semantics = desc,
+
+#endif /* _LINUX_KERNEL_API_SPEC_H */
diff --git a/include/linux/syscall_api_spec.h b/include/linux/syscall_api_spec.h
new file mode 100644
index 0000000000000..b7f9ba0f978ab
--- /dev/null
+++ b/include/linux/syscall_api_spec.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_api_spec.h - System Call API Specification Integration
+ *
+ * This header extends the SYSCALL_DEFINEX macros to support inline API specifications,
+ * allowing syscall documentation to be written alongside the implementation in a
+ * human-readable and machine-parseable format.
+ */
+
+#ifndef _LINUX_SYSCALL_API_SPEC_H
+#define _LINUX_SYSCALL_API_SPEC_H
+
+#include <linux/kernel_api_spec.h>
+
+
+
+/* Automatic syscall validation infrastructure */
+/*
+ * The validation is now integrated directly into the SYSCALL_DEFINEx macros
+ * in syscalls.h when CONFIG_KAPI_RUNTIME_CHECKS is enabled.
+ *
+ * The validation happens in the __do_kapi_sys##name wrapper function which:
+ * 1. Validates all parameters before calling the actual syscall
+ * 2. Calls the real syscall implementation
+ * 3. Validates the return value
+ * 4. Returns the result
+ */
+
+
+/*
+ * Helper macros for common syscall patterns
+ */
+
+/* For syscalls that can sleep */
+#define KAPI_SYSCALL_SLEEPABLE \
+	KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+/* For syscalls that must be atomic */
+#define KAPI_SYSCALL_ATOMIC \
+	KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC)
+
+/* Common parameter specifications */
+#define KAPI_PARAM_FD(idx, desc) \
+	KAPI_PARAM(idx, "fd", "int", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+		.type = KAPI_TYPE_FD, \
+		.constraint_type = KAPI_CONSTRAINT_NONE, \
+	KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_BUF(idx, name, desc) \
+	KAPI_PARAM(idx, name, "void __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_STRUCT - Define a userspace struct pointer parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @struct_type: The struct type (e.g., struct iocb)
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a struct.
+ * The pointer will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The memory region of sizeof(struct_type) bytes is accessible
+ */
+#define KAPI_PARAM_USER_STRUCT(idx, name, struct_type, desc) \
+	KAPI_PARAM(idx, name, #struct_type " __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type = KAPI_TYPE_USER_PTR, \
+		.size = sizeof(struct_type), \
+		.constraint_type = KAPI_CONSTRAINT_USER_PTR, \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_PTR_SIZED - Define a userspace pointer with explicit size
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @ptr_size: Size in bytes of the memory region
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a memory
+ * region of a specific size. The pointer will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The memory region of ptr_size bytes is accessible
+ */
+#define KAPI_PARAM_USER_PTR_SIZED(idx, name, ptr_size, desc) \
+	KAPI_PARAM(idx, name, "void __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type = KAPI_TYPE_USER_PTR, \
+		.size = ptr_size, \
+		.constraint_type = KAPI_CONSTRAINT_USER_PTR, \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_STRING - Define a userspace null-terminated string parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @min_len: Minimum string length (excluding null terminator)
+ * @max_len: Maximum string length (excluding null terminator)
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a
+ * null-terminated string. The string will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The string length (excluding null terminator) is within [min_len, max_len]
+ */
+#define KAPI_PARAM_USER_STRING(idx, name, min_len, max_len, desc) \
+	KAPI_PARAM(idx, name, "const char __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type = KAPI_TYPE_USER_PTR, \
+		.constraint_type = KAPI_CONSTRAINT_USER_STRING, \
+		.min_value = min_len, \
+		.max_value = max_len, \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_PATH - Define a userspace pathname parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a
+ * null-terminated pathname string. The path will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The path is a valid null-terminated string
+ * - The path length does not exceed PATH_MAX (4096 bytes)
+ */
+#define KAPI_PARAM_USER_PATH(idx, name, desc) \
+	KAPI_PARAM(idx, name, "const char __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type = KAPI_TYPE_PATH, \
+		.constraint_type = KAPI_CONSTRAINT_USER_PATH, \
+	KAPI_PARAM_END
+
+#define KAPI_PARAM_SIZE_T(idx, name, desc) \
+	KAPI_PARAM(idx, name, "size_t", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+		KAPI_PARAM_RANGE(0, SIZE_MAX) \
+	KAPI_PARAM_END
+
+/* Common error specifications */
+#define KAPI_ERROR_EBADF(idx) \
+	KAPI_ERROR(idx, -EBADF, "EBADF", "Invalid file descriptor", \
+		   "The file descriptor is not valid or has been closed")
+
+#define KAPI_ERROR_EINVAL(idx, condition) \
+	KAPI_ERROR(idx, -EINVAL, "EINVAL", condition, \
+		   "Invalid argument provided")
+
+#define KAPI_ERROR_ENOMEM(idx) \
+	KAPI_ERROR(idx, -ENOMEM, "ENOMEM", "Insufficient memory", \
+		   "Cannot allocate memory for the operation")
+
+#define KAPI_ERROR_EPERM(idx) \
+	KAPI_ERROR(idx, -EPERM, "EPERM", "Operation not permitted", \
+		   "The calling process does not have the required permissions")
+
+#define KAPI_ERROR_EFAULT(idx) \
+	KAPI_ERROR(idx, -EFAULT, "EFAULT", "Bad address", \
+		   "Invalid user space address provided")
+
+/* Standard return value specifications */
+#define KAPI_RETURN_SUCCESS_ZERO \
+	KAPI_RETURN("long", "0 on success, negative error code on failure") \
+		KAPI_RETURN_SUCCESS(0, "== 0") \
+	KAPI_RETURN_END
+
+#define KAPI_RETURN_FD_SPEC \
+	KAPI_RETURN("long", "File descriptor on success, negative error code on failure") \
+		.check_type = KAPI_RETURN_FD, \
+	KAPI_RETURN_END
+
+#define KAPI_RETURN_COUNT \
+	KAPI_RETURN("long", "Number of bytes processed on success, negative error code on failure") \
+		KAPI_RETURN_SUCCESS(0, ">= 0") \
+	KAPI_RETURN_END
+
+/* KAPI_ERROR_COUNT and KAPI_PARAM_COUNT are now defined in kernel_api_spec.h */
+
+/**
+ * KAPI_SINCE_VERSION - Set the since version
+ * @version: Version string when the API was introduced
+ */
+#define KAPI_SINCE_VERSION(version) \
+	.since_version = version,
+
+
+/**
+ * KAPI_SIGNAL_MASK_COUNT - Set the signal mask count
+ * @count: Number of signal masks defined
+ */
+#define KAPI_SIGNAL_MASK_COUNT(count) \
+	.signal_mask_count = count,
+
+
+
+#endif /* _LINUX_SYSCALL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cf84d98964b2f..eafda2f509999 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -89,6 +89,7 @@ struct file_attr;
 #include <linux/bug.h>
 #include <linux/sem.h>
 #include <asm/siginfo.h>
+#include <linux/syscall_api_spec.h>
 #include <linux/unistd.h>
 #include <linux/quota.h>
 #include <linux/key.h>
@@ -134,6 +135,7 @@ struct file_attr;
 #define __SC_TYPE(t, a)	t
 #define __SC_ARGS(t, a)	a
 #define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof(t) > sizeof(long))
+#define __SC_CAST_TO_S64(t, a)	(s64)(a)
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 #define __SC_STR_ADECL(t, a)	#a
@@ -244,6 +246,41 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
  * done within __do_sys_*().
  */
 #ifndef __SYSCALL_DEFINEx
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+#define __SYSCALL_DEFINEx(x, name, ...)					\
+	__diag_push();							\
+	__diag_ignore(GCC, 8, "-Wattribute-alias",			\
+		      "Type aliasing is used to sanitize syscall arguments");\
+	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
+		__attribute__((alias(__stringify(__se_sys##name))));	\
+	ALLOW_ERROR_INJECTION(sys##name, ERRNO);			\
+	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
+	static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
+	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
+	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
+	{								\
+		long ret = __do_kapi_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
+		__MAP(x,__SC_TEST,__VA_ARGS__);				\
+		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
+		return ret;						\
+	}								\
+	__diag_pop();							\
+	static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
+	{								\
+		const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+		if (__spec) {						\
+			s64 __params[x] = { __MAP(x,__SC_CAST_TO_S64,__VA_ARGS__) }; \
+			int __ret = kapi_validate_syscall_params(__spec, __params, x); \
+			if (__ret) return __ret;			\
+		}							\
+		long ret = __do_sys##name(__MAP(x,__SC_ARGS,__VA_ARGS__));	\
+		if (__spec) {						\
+			kapi_validate_syscall_return(__spec, (s64)ret); \
+		}							\
+		return ret;						\
+	}								\
+	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
 #define __SYSCALL_DEFINEx(x, name, ...)					\
 	__diag_push();							\
 	__diag_ignore(GCC, 8, "-Wattribute-alias",			\
@@ -262,6 +299,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
 	}								\
 	__diag_pop();							\
 	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
 #endif /* __SYSCALL_DEFINEx */
 
 /* For split 64-bit arguments on 32-bit architectures */
diff --git a/init/Kconfig b/init/Kconfig
index fa79feb8fe57b..6a2a88de3aa84 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2191,6 +2191,8 @@ source "kernel/Kconfig.kexec"
 
 source "kernel/liveupdate/Kconfig"
 
+source "kernel/api/Kconfig"
+
 endmenu		# General setup
 
 source "arch/Kconfig"
diff --git a/kernel/Makefile b/kernel/Makefile
index e83669841b8cc..0531ed6973619 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -57,6 +57,9 @@ obj-y += dma/
 obj-y += entry/
 obj-y += unwind/
 obj-$(CONFIG_MODULES) += module/
+obj-$(CONFIG_KAPI_SPEC) += api/
+# Ensure api/ is always cleaned even when CONFIG_KAPI_SPEC is not set
+obj- += api/
 
 obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
new file mode 100644
index 0000000000000..fde25ec70e134
--- /dev/null
+++ b/kernel/api/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel API Specification Framework Configuration
+#
+
+config KAPI_SPEC
+	bool "Kernel API Specification Framework"
+	help
+	  This option enables the kernel API specification framework,
+	  which provides formal documentation of kernel APIs in both
+	  human and machine-readable formats.
+
+	  The framework allows developers to document APIs inline with
+	  their implementation, including parameter specifications,
+	  return values, error conditions, locking requirements, and
+	  execution context constraints.
+
+	  When enabled, API specifications can be queried at runtime
+	  and exported in various formats (JSON, XML) through debugfs.
+
+	  If unsure, say N.
+
+config KAPI_RUNTIME_CHECKS
+	bool "Runtime API specification checks"
+	depends on KAPI_SPEC
+	depends on DEBUG_KERNEL
+	help
+	  Enable runtime validation of API usage against specifications.
+	  This includes checking execution context requirements, parameter
+	  validation, and lock state verification.
+
+	  This adds overhead and should only be used for debugging and
+	  development. The checks use WARN_ONCE to report violations.
+
+	  If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
new file mode 100644
index 0000000000000..acab17c78afa3
--- /dev/null
+++ b/kernel/api/Makefile
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Kernel API Specification Framework
+#
+
+# Core API specification framework
+obj-$(CONFIG_KAPI_SPEC)		+= kernel_api_spec.o
+
+# Auto-generated API specifications collector
+ifeq ($(CONFIG_KAPI_SPEC),y)
+obj-$(CONFIG_KAPI_SPEC)		+= generated_api_specs.o
+
+# Find all potential apispec files (this is evaluated at make time)
+apispec-files := $(shell find $(objtree) -name "*.apispec.h" -type f 2>/dev/null)
+
+# Generate the collector file
+# Note: FORCE ensures this is always regenerated to pick up new apispec files
+$(obj)/generated_api_specs.c: $(srctree)/scripts/generate_api_specs.sh FORCE
+	$(Q)$(CONFIG_SHELL) $< $(srctree) $(objtree) > $@
+
+targets += generated_api_specs.c
+
+# Add explicit dependency on the generator script
+$(obj)/generated_api_specs.o: $(obj)/generated_api_specs.c
+endif
+
+# Always clean generated files, regardless of config
+clean-files += generated_api_specs.c
+
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
new file mode 100644
index 0000000000000..252000db38d2c
--- /dev/null
+++ b/kernel/api/kernel_api_spec.c
@@ -0,0 +1,1185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel_api_spec.c - Kernel API Specification Framework Implementation
+ *
+ * Provides runtime support for kernel API specifications including validation,
+ * export to various formats, and querying capabilities.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/export.h>
+#include <linux/preempt.h>
+#include <linux/hardirq.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <linux/limits.h>
+#include <linux/fcntl.h>
+#include <linux/mm.h>
+
+/* Section where API specifications are stored */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+/* Dynamic API registration */
+static LIST_HEAD(dynamic_api_specs);
+static DEFINE_MUTEX(api_spec_mutex);
+
+struct dynamic_api_spec {
+	struct list_head list;
+	struct kernel_api_spec *spec;
+};
+
+/*
+ * __kapi_find_spec_locked - Internal lookup, caller must hold api_spec_mutex
+ */
+static const struct kernel_api_spec *__kapi_find_spec_locked(const char *name)
+{
+	struct kernel_api_spec *spec;
+	struct dynamic_api_spec *dyn_spec;
+
+	/* Search static specifications */
+	for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+		if (strcmp(spec->name, name) == 0)
+			return spec;
+	}
+
+	/* Search dynamic specifications (mutex already held) */
+	list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+		if (strcmp(dyn_spec->spec->name, name) == 0)
+			return dyn_spec->spec;
+	}
+
+	return NULL;
+}
+
+/**
+ * kapi_get_spec - Get API specification by name
+ * @name: Function name to look up
+ *
+ * Return: Pointer to API specification or NULL if not found
+ */
+const struct kernel_api_spec *kapi_get_spec(const char *name)
+{
+	const struct kernel_api_spec *spec;
+
+	mutex_lock(&api_spec_mutex);
+	spec = __kapi_find_spec_locked(name);
+	mutex_unlock(&api_spec_mutex);
+
+	return spec;
+}
+EXPORT_SYMBOL_GPL(kapi_get_spec);
+
+/**
+ * kapi_register_spec - Register a dynamic API specification
+ * @spec: API specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_spec(struct kernel_api_spec *spec)
+{
+	struct dynamic_api_spec *dyn_spec;
+	int ret = 0;
+
+	if (!spec || !spec->name[0])
+		return -EINVAL;
+
+	dyn_spec = kzalloc(sizeof(*dyn_spec), GFP_KERNEL);
+	if (!dyn_spec)
+		return -ENOMEM;
+
+	dyn_spec->spec = spec;
+
+	mutex_lock(&api_spec_mutex);
+
+	/* Check if already exists while holding lock to prevent races */
+	if (__kapi_find_spec_locked(spec->name)) {
+		ret = -EEXIST;
+		kfree(dyn_spec);
+	} else {
+		list_add_tail(&dyn_spec->list, &dynamic_api_specs);
+	}
+
+	mutex_unlock(&api_spec_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_register_spec);
+
+/**
+ * kapi_unregister_spec - Unregister a dynamic API specification
+ * @name: Name of API to unregister
+ */
+void kapi_unregister_spec(const char *name)
+{
+	struct dynamic_api_spec *dyn_spec, *tmp;
+
+	mutex_lock(&api_spec_mutex);
+	list_for_each_entry_safe(dyn_spec, tmp, &dynamic_api_specs, list) {
+		if (strcmp(dyn_spec->spec->name, name) == 0) {
+			list_del(&dyn_spec->list);
+			kfree(dyn_spec);
+			break;
+		}
+	}
+	mutex_unlock(&api_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_spec);
+
+/**
+ * param_type_to_string - Convert parameter type to string
+ * @type: Parameter type
+ *
+ * Return: String representation of type
+ */
+static const char *param_type_to_string(enum kapi_param_type type)
+{
+	static const char * const type_names[] = {
+		[KAPI_TYPE_VOID] = "void",
+		[KAPI_TYPE_INT] = "int",
+		[KAPI_TYPE_UINT] = "uint",
+		[KAPI_TYPE_PTR] = "pointer",
+		[KAPI_TYPE_STRUCT] = "struct",
+		[KAPI_TYPE_UNION] = "union",
+		[KAPI_TYPE_ENUM] = "enum",
+		[KAPI_TYPE_FUNC_PTR] = "function_pointer",
+		[KAPI_TYPE_ARRAY] = "array",
+		[KAPI_TYPE_FD] = "file_descriptor",
+		[KAPI_TYPE_USER_PTR] = "user_pointer",
+		[KAPI_TYPE_PATH] = "pathname",
+		[KAPI_TYPE_CUSTOM] = "custom",
+	};
+
+	if (type >= ARRAY_SIZE(type_names))
+		return "unknown";
+
+	return type_names[type];
+}
+
+/**
+ * lock_type_to_string - Convert lock type to string
+ * @type: Lock type
+ *
+ * Return: String representation of lock type
+ */
+static const char *lock_type_to_string(enum kapi_lock_type type)
+{
+	static const char * const lock_names[] = {
+		[KAPI_LOCK_NONE] = "none",
+		[KAPI_LOCK_MUTEX] = "mutex",
+		[KAPI_LOCK_SPINLOCK] = "spinlock",
+		[KAPI_LOCK_RWLOCK] = "rwlock",
+		[KAPI_LOCK_SEQLOCK] = "seqlock",
+		[KAPI_LOCK_RCU] = "rcu",
+		[KAPI_LOCK_SEMAPHORE] = "semaphore",
+		[KAPI_LOCK_CUSTOM] = "custom",
+	};
+
+	if (type >= ARRAY_SIZE(lock_names))
+		return "unknown";
+
+	return lock_names[type];
+}
+
+/**
+ * lock_scope_to_string - Convert lock scope to string
+ * @scope: Lock scope
+ *
+ * Return: String representation of lock scope
+ */
+static const char *lock_scope_to_string(enum kapi_lock_scope scope)
+{
+	static const char * const scope_names[] = {
+		[KAPI_LOCK_INTERNAL] = "internal",
+		[KAPI_LOCK_ACQUIRES] = "acquires",
+		[KAPI_LOCK_RELEASES] = "releases",
+		[KAPI_LOCK_CALLER_HELD] = "caller_held",
+	};
+
+	if (scope >= ARRAY_SIZE(scope_names))
+		return "unknown";
+
+	return scope_names[scope];
+}
+
+/**
+ * return_check_type_to_string - Convert return check type to string
+ * @type: Return check type
+ *
+ * Return: String representation of return check type
+ */
+static const char *return_check_type_to_string(enum kapi_return_check_type type)
+{
+	static const char * const check_names[] = {
+		[KAPI_RETURN_EXACT] = "exact",
+		[KAPI_RETURN_RANGE] = "range",
+		[KAPI_RETURN_ERROR_CHECK] = "error_check",
+		[KAPI_RETURN_FD] = "file_descriptor",
+		[KAPI_RETURN_CUSTOM] = "custom",
+		[KAPI_RETURN_NO_RETURN] = "no_return",
+	};
+
+	if (type >= ARRAY_SIZE(check_names))
+		return "unknown";
+
+	return check_names[type];
+}
+
+/**
+ * capability_action_to_string - Convert capability action to string
+ * @action: Capability action
+ *
+ * Return: String representation of capability action
+ */
+static const char *capability_action_to_string(enum kapi_capability_action action)
+{
+	static const char * const action_names[] = {
+		[KAPI_CAP_BYPASS_CHECK] = "bypass_check",
+		[KAPI_CAP_INCREASE_LIMIT] = "increase_limit",
+		[KAPI_CAP_OVERRIDE_RESTRICTION] = "override_restriction",
+		[KAPI_CAP_GRANT_PERMISSION] = "grant_permission",
+		[KAPI_CAP_MODIFY_BEHAVIOR] = "modify_behavior",
+		[KAPI_CAP_ACCESS_RESOURCE] = "access_resource",
+		[KAPI_CAP_PERFORM_OPERATION] = "perform_operation",
+	};
+
+	if (action >= ARRAY_SIZE(action_names))
+		return "unknown";
+
+	return action_names[action];
+}
+
+/**
+ * kapi_export_json - Export API specification to JSON format
+ * @spec: API specification to export
+ * @buf: Buffer to write JSON to
+ * @size: Size of buffer
+ *
+ * Return: Number of bytes written or negative error
+ */
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
+{
+	int ret = 0;
+	int i;
+
+	if (!spec || !buf || size == 0)
+		return -EINVAL;
+
+	ret = scnprintf(buf, size,
+		"{\n"
+		"  \"name\": \"%s\",\n"
+		"  \"version\": %u,\n"
+		"  \"description\": \"%s\",\n"
+		"  \"long_description\": \"%s\",\n"
+		"  \"context_flags\": \"0x%x\",\n",
+		spec->name,
+		spec->version,
+		spec->description,
+		spec->long_description,
+		spec->context_flags);
+
+	/* Parameters */
+	ret += scnprintf(buf + ret, size - ret,
+		"  \"parameters\": [\n");
+
+	for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+		const struct kapi_param_spec *param = &spec->params[i];
+
+		ret += scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"name\": \"%s\",\n"
+			"      \"type\": \"%s\",\n"
+			"      \"type_class\": \"%s\",\n"
+			"      \"flags\": \"0x%x\",\n"
+			"      \"description\": \"%s\"\n"
+			"    }%s\n",
+			param->name,
+			param->type_name,
+			param_type_to_string(param->type),
+			param->flags,
+			param->description,
+			(i < spec->param_count - 1) ? "," : "");
+	}
+
+	ret += scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Return value */
+	ret += scnprintf(buf + ret, size - ret,
+		"  \"return\": {\n"
+		"    \"type\": \"%s\",\n"
+		"    \"type_class\": \"%s\",\n"
+		"    \"check_type\": \"%s\",\n",
+		spec->return_spec.type_name,
+		param_type_to_string(spec->return_spec.type),
+		return_check_type_to_string(spec->return_spec.check_type));
+
+	switch (spec->return_spec.check_type) {
+	case KAPI_RETURN_EXACT:
+		ret += scnprintf(buf + ret, size - ret,
+			"    \"success_value\": %lld,\n",
+			spec->return_spec.success_value);
+		break;
+	case KAPI_RETURN_RANGE:
+		ret += scnprintf(buf + ret, size - ret,
+			"    \"success_min\": %lld,\n"
+			"    \"success_max\": %lld,\n",
+			spec->return_spec.success_min,
+			spec->return_spec.success_max);
+		break;
+	case KAPI_RETURN_ERROR_CHECK:
+		ret += scnprintf(buf + ret, size - ret,
+			"    \"error_count\": %u,\n",
+			spec->return_spec.error_count);
+		break;
+	default:
+		break;
+	}
+
+	ret += scnprintf(buf + ret, size - ret,
+		"    \"description\": \"%s\"\n"
+		"  },\n",
+		spec->return_spec.description);
+
+	/* Errors */
+	ret += scnprintf(buf + ret, size - ret,
+		"  \"errors\": [\n");
+
+	for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+		const struct kapi_error_spec *error = &spec->errors[i];
+
+		ret += scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"code\": %d,\n"
+			"      \"name\": \"%s\",\n"
+			"      \"condition\": \"%s\",\n"
+			"      \"description\": \"%s\"\n"
+			"    }%s\n",
+			error->error_code,
+			error->name,
+			error->condition,
+			error->description,
+			(i < spec->error_count - 1) ? "," : "");
+	}
+
+	ret += scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Locks */
+	ret += scnprintf(buf + ret, size - ret,
+		"  \"locks\": [\n");
+
+	for (i = 0; i < spec->lock_count && i < KAPI_MAX_CONSTRAINTS; i++) {
+		const struct kapi_lock_spec *lock = &spec->locks[i];
+
+		ret += scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"name\": \"%s\",\n"
+			"      \"type\": \"%s\",\n"
+			"      \"scope\": \"%s\",\n"
+			"      \"description\": \"%s\"\n"
+			"    }%s\n",
+			lock->lock_name,
+			lock_type_to_string(lock->lock_type),
+			lock_scope_to_string(lock->scope),
+			lock->description,
+			(i < spec->lock_count - 1) ? "," : "");
+	}
+
+	ret += scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Capabilities */
+	ret += scnprintf(buf + ret, size - ret,
+		"  \"capabilities\": [\n");
+
+	for (i = 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i++) {
+		const struct kapi_capability_spec *cap = &spec->capabilities[i];
+
+		ret += scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"capability\": %d,\n"
+			"      \"name\": \"%s\",\n"
+			"      \"action\": \"%s\",\n"
+			"      \"allows\": \"%s\",\n"
+			"      \"without_cap\": \"%s\",\n"
+			"      \"check_condition\": \"%s\",\n"
+			"      \"priority\": %u",
+			cap->capability,
+			cap->cap_name,
+			capability_action_to_string(cap->action),
+			cap->allows,
+			cap->without_cap,
+			cap->check_condition,
+			cap->priority);
+
+		if (cap->alternative_count > 0) {
+			int j;
+			ret += scnprintf(buf + ret, size - ret,
+				",\n      \"alternatives\": [");
+			for (j = 0; j < cap->alternative_count; j++) {
+				ret += scnprintf(buf + ret, size - ret,
+					"%d%s", cap->alternative[j],
+					(j < cap->alternative_count - 1) ? ", " : "");
+			}
+			ret += scnprintf(buf + ret, size - ret, "]");
+		}
+
+		ret += scnprintf(buf + ret, size - ret,
+			"\n    }%s\n",
+			(i < spec->capability_count - 1) ? "," : "");
+	}
+
+	ret += scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Additional info */
+	ret += scnprintf(buf + ret, size - ret,
+		"  \"since_version\": \"%s\",\n"
+		"  \"examples\": \"%s\",\n"
+		"  \"notes\": \"%s\"\n"
+		"}\n",
+		spec->since_version,
+		spec->examples,
+		spec->notes);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_export_json);
+
+
+/**
+ * kapi_print_spec - Print API specification to kernel log
+ * @spec: API specification to print
+ */
+void kapi_print_spec(const struct kernel_api_spec *spec)
+{
+	int i;
+
+	if (!spec)
+		return;
+
+	pr_info("=== Kernel API Specification ===\n");
+	pr_info("Name: %s\n", spec->name);
+	pr_info("Version: %u\n", spec->version);
+	pr_info("Description: %s\n", spec->description);
+
+	if (spec->long_description[0])
+		pr_info("Long Description: %s\n", spec->long_description);
+
+	pr_info("Context Flags: 0x%x\n", spec->context_flags);
+
+	/* Parameters */
+	if (spec->param_count > 0) {
+		pr_info("Parameters:\n");
+		for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+			const struct kapi_param_spec *param = &spec->params[i];
+			pr_info("  [%d] %s: %s (flags: 0x%x)\n",
+				i, param->name, param->type_name, param->flags);
+			if (param->description[0])
+				pr_info("      Description: %s\n", param->description);
+		}
+	}
+
+	/* Return value */
+	pr_info("Return: %s\n", spec->return_spec.type_name);
+	if (spec->return_spec.description[0])
+		pr_info("  Description: %s\n", spec->return_spec.description);
+
+	/* Errors */
+	if (spec->error_count > 0) {
+		pr_info("Possible Errors:\n");
+		for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+			const struct kapi_error_spec *error = &spec->errors[i];
+			pr_info("  %s (%d): %s\n",
+				error->name, error->error_code, error->condition);
+		}
+	}
+
+	/* Capabilities */
+	if (spec->capability_count > 0) {
+		pr_info("Capabilities:\n");
+		for (i = 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i++) {
+			const struct kapi_capability_spec *cap = &spec->capabilities[i];
+			pr_info("  %s (%d):\n", cap->cap_name, cap->capability);
+			pr_info("    Action: %s\n", capability_action_to_string(cap->action));
+			pr_info("    Allows: %s\n", cap->allows);
+			pr_info("    Without: %s\n", cap->without_cap);
+			if (cap->check_condition[0])
+				pr_info("    Condition: %s\n", cap->check_condition);
+		}
+	}
+
+	pr_info("================================\n");
+}
+EXPORT_SYMBOL_GPL(kapi_print_spec);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_fd - Validate that a file descriptor is valid in current context
+ * @fd: File descriptor to validate
+ *
+ * Return: true if fd is valid in current process context, false otherwise
+ */
+static bool kapi_validate_fd(int fd)
+{
+	struct fd f;
+
+	/* Special case: AT_FDCWD is always valid */
+	if (fd == AT_FDCWD)
+		return true;
+
+	/* Check basic range */
+	if (fd < 0)
+		return false;
+
+	/* Check if fd is valid in current process context */
+	f = fdget(fd);
+	if (fd_empty(f)) {
+		return false;
+	}
+
+	/* fd is valid, release reference */
+	fdput(f);
+	return true;
+}
+
+/**
+ * kapi_validate_user_ptr - Validate that a user pointer is accessible
+ * @ptr: User pointer to validate
+ * @size: Size in bytes to validate
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr(const void __user *ptr, size_t size)
+{
+	/* NULL pointers are not valid; caller handles optional case */
+	if (!ptr)
+		return false;
+
+	return access_ok(ptr, size);
+}
+
+/**
+ * kapi_validate_user_ptr_with_params - Validate user pointer with dynamic size
+ * @param_spec: Parameter specification
+ * @ptr: User pointer to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr_with_params(const struct kapi_param_spec *param_spec,
+						const void __user *ptr,
+						const s64 *all_params,
+						int param_count)
+{
+	size_t actual_size;
+
+	/* NULL is allowed for optional parameters */
+	if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	/* Calculate actual size based on related parameter */
+	if (param_spec->size_param_idx >= 0 &&
+	    param_spec->size_param_idx < param_count) {
+		s64 count = all_params[param_spec->size_param_idx];
+
+		/* Validate count is positive */
+		if (count <= 0) {
+			pr_warn("Parameter %s: size determinant is non-positive (%lld)\n",
+				param_spec->name, count);
+			return false;
+		}
+
+		/* Check for multiplication overflow */
+		if (param_spec->size_multiplier > 0 &&
+		    count > SIZE_MAX / param_spec->size_multiplier) {
+			pr_warn("Parameter %s: size calculation overflow\n",
+				param_spec->name);
+			return false;
+		}
+
+		actual_size = count * param_spec->size_multiplier;
+	} else {
+		/* Use fixed size */
+		actual_size = param_spec->size;
+	}
+
+	return kapi_validate_user_ptr(ptr, actual_size);
+}
+
+/**
+ * kapi_validate_path - Validate that a pathname is accessible and within limits
+ * @path: User pointer to pathname
+ * @param_spec: Parameter specification
+ *
+ * Return: true if path is valid, false otherwise
+ */
+static bool kapi_validate_path(const char __user *path,
+				const struct kapi_param_spec *param_spec)
+{
+	size_t len;
+
+	/* NULL is allowed for optional parameters */
+	if (!path && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	if (!path) {
+		pr_warn("Parameter %s: NULL path not allowed\n", param_spec->name);
+		return false;
+	}
+
+	/* Check if the path is accessible */
+	if (!access_ok(path, 1)) {
+		pr_warn("Parameter %s: path pointer %p not accessible\n",
+			param_spec->name, path);
+		return false;
+	}
+
+	/* Use strnlen_user to get the length and validate accessibility */
+	len = strnlen_user(path, PATH_MAX + 1);
+	if (len == 0) {
+		pr_warn("Parameter %s: invalid path pointer %p\n",
+			param_spec->name, path);
+		return false;
+	}
+
+	/* Check path length limit */
+	if (len > PATH_MAX) {
+		pr_warn("Parameter %s: path too long (exceeds PATH_MAX)\n",
+			param_spec->name);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * kapi_validate_user_string - Validate a userspace null-terminated string
+ * @str: User pointer to string
+ * @param_spec: Parameter specification containing length constraints
+ *
+ * Validates that the userspace string pointer is accessible and that the
+ * string length (excluding null terminator) is within the range specified
+ * by min_value and max_value in the parameter specification.
+ *
+ * Return: true if string is valid, false otherwise
+ */
+static bool kapi_validate_user_string(const char __user *str,
+				       const struct kapi_param_spec *param_spec)
+{
+	size_t len;
+	size_t max_check_len;
+
+	/* NULL is allowed for optional parameters */
+	if (!str && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	if (!str) {
+		pr_warn("Parameter %s: NULL string not allowed\n", param_spec->name);
+		return false;
+	}
+
+	/* Check if the string pointer is accessible */
+	if (!access_ok(str, 1)) {
+		pr_warn("Parameter %s: string pointer %p not accessible\n",
+			param_spec->name, str);
+		return false;
+	}
+
+	/*
+	 * Use strnlen_user to get the string length and validate accessibility.
+	 * Check up to max_value + 1 to detect strings that are too long.
+	 * If max_value is 0 or unset, use PATH_MAX as a reasonable default.
+	 */
+	max_check_len = param_spec->max_value > 0 ?
+			(size_t)param_spec->max_value + 1 : PATH_MAX + 1;
+	len = strnlen_user(str, max_check_len);
+
+	if (len == 0) {
+		pr_warn("Parameter %s: invalid string pointer %p\n",
+			param_spec->name, str);
+		return false;
+	}
+
+	/*
+	 * strnlen_user returns the length including the null terminator.
+	 * Convert to string length (excluding terminator) for range check.
+	 */
+	len--;
+
+	/* Check minimum length constraint */
+	if (param_spec->min_value > 0 && len < (size_t)param_spec->min_value) {
+		pr_warn("Parameter %s: string too short (%zu < %lld)\n",
+			param_spec->name, len, param_spec->min_value);
+		return false;
+	}
+
+	/* Check maximum length constraint */
+	if (param_spec->max_value > 0 && len > (size_t)param_spec->max_value) {
+		pr_warn("Parameter %s: string too long (%zu > %lld)\n",
+			param_spec->name, len, param_spec->max_value);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * kapi_validate_user_ptr_constraint - Validate a userspace pointer with size
+ * @ptr: User pointer to validate
+ * @param_spec: Parameter specification containing size
+ *
+ * Validates that the userspace pointer is accessible and that the memory
+ * region of the specified size can be accessed. The size is taken from
+ * the param_spec->size field.
+ *
+ * Return: true if pointer is valid, false otherwise
+ */
+static bool kapi_validate_user_ptr_constraint(const void __user *ptr,
+					       const struct kapi_param_spec *param_spec)
+{
+	/* NULL is allowed for optional parameters */
+	if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	if (!ptr) {
+		pr_warn("Parameter %s: NULL pointer not allowed\n", param_spec->name);
+		return false;
+	}
+
+	/* Validate size is specified */
+	if (param_spec->size == 0) {
+		pr_warn("Parameter %s: size not specified for user pointer validation\n",
+			param_spec->name);
+		return false;
+	}
+
+	/* Check if the memory region is accessible */
+	if (!access_ok(ptr, param_spec->size)) {
+		pr_warn("Parameter %s: user pointer %p not accessible for %zu bytes\n",
+			param_spec->name, ptr, param_spec->size);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * kapi_validate_param - Validate a parameter against its specification
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+	int i;
+
+	/* Special handling for file descriptor type */
+	if (param_spec->type == KAPI_TYPE_FD) {
+		if (!kapi_validate_fd((int)value)) {
+			pr_warn("Parameter %s: invalid file descriptor %lld\n",
+				param_spec->name, value);
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	/* Special handling for user pointer type */
+	if (param_spec->type == KAPI_TYPE_USER_PTR) {
+		const void __user *ptr = (const void __user *)value;
+
+		/* NULL is allowed for optional parameters */
+		if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+			return true;
+
+		if (!kapi_validate_user_ptr(ptr, param_spec->size)) {
+			pr_warn("Parameter %s: invalid user pointer %p (size: %zu)\n",
+				param_spec->name, ptr, param_spec->size);
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	/* Special handling for path type */
+	if (param_spec->type == KAPI_TYPE_PATH) {
+		const char __user *path = (const char __user *)value;
+
+		if (!kapi_validate_path(path, param_spec)) {
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	switch (param_spec->constraint_type) {
+	case KAPI_CONSTRAINT_NONE:
+		return true;
+
+	case KAPI_CONSTRAINT_RANGE:
+		if (value < param_spec->min_value || value > param_spec->max_value) {
+			pr_warn("Parameter %s value %lld out of range [%lld, %lld]\n",
+				param_spec->name, value,
+				param_spec->min_value, param_spec->max_value);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_MASK:
+		if (value & ~param_spec->valid_mask) {
+			pr_warn("Parameter %s value 0x%llx contains invalid bits (valid mask: 0x%llx)\n",
+				param_spec->name, value, param_spec->valid_mask);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_ENUM:
+		if (!param_spec->enum_values || param_spec->enum_count == 0)
+			return true;
+
+		for (i = 0; i < param_spec->enum_count; i++) {
+			if (value == param_spec->enum_values[i])
+				return true;
+		}
+		pr_warn("Parameter %s value %lld not in valid enumeration\n",
+			param_spec->name, value);
+		return false;
+
+	case KAPI_CONSTRAINT_ALIGNMENT:
+		if (param_spec->alignment == 0) {
+			pr_warn("Parameter %s: alignment constraint specified but alignment is 0\n",
+				param_spec->name);
+			return false;
+		}
+		if (value & (param_spec->alignment - 1)) {
+			pr_warn("Parameter %s value 0x%llx not aligned to %zu boundary\n",
+				param_spec->name, value, param_spec->alignment);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_POWER_OF_TWO:
+		if (value == 0 || (value & (value - 1))) {
+			pr_warn("Parameter %s value %lld is not a power of two\n",
+				param_spec->name, value);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_PAGE_ALIGNED:
+		if (value & (PAGE_SIZE - 1)) {
+			pr_warn("Parameter %s value 0x%llx not page-aligned (PAGE_SIZE=%ld)\n",
+				param_spec->name, value, PAGE_SIZE);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_NONZERO:
+		if (value == 0) {
+			pr_warn("Parameter %s must be non-zero\n", param_spec->name);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_USER_STRING:
+		return kapi_validate_user_string((const char __user *)value, param_spec);
+
+	case KAPI_CONSTRAINT_USER_PATH:
+		return kapi_validate_path((const char __user *)value, param_spec);
+
+	case KAPI_CONSTRAINT_USER_PTR:
+		return kapi_validate_user_ptr_constraint((const void __user *)value, param_spec);
+
+	case KAPI_CONSTRAINT_CUSTOM:
+		if (param_spec->validate)
+			return param_spec->validate(value);
+		return true;
+
+	default:
+		return true;
+	}
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param);
+
+/**
+ * kapi_validate_param_with_context - Validate parameter with access to all params
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+				       s64 value, const s64 *all_params, int param_count)
+{
+	/* Special handling for user pointer type with dynamic sizing */
+	if (param_spec->type == KAPI_TYPE_USER_PTR) {
+		const void __user *ptr = (const void __user *)value;
+
+		/* NULL is allowed for optional parameters */
+		if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+			return true;
+
+		if (!kapi_validate_user_ptr_with_params(param_spec, ptr, all_params, param_count)) {
+			pr_warn("Parameter %s: invalid user pointer %p\n",
+				param_spec->name, ptr);
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	/* For other types, fall back to regular validation */
+	return kapi_validate_param(param_spec, value);
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param_with_context);
+
+/**
+ * kapi_validate_syscall_param - Validate syscall parameter with enforcement
+ * @spec: API specification
+ * @param_idx: Parameter index
+ * @value: Parameter value
+ *
+ * Return: -EINVAL if invalid, 0 if valid
+ */
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+				 int param_idx, s64 value)
+{
+	const struct kapi_param_spec *param_spec;
+
+	if (!spec || param_idx >= spec->param_count)
+		return 0;
+
+	param_spec = &spec->params[param_idx];
+
+	if (!kapi_validate_param(param_spec, value)) {
+		if (strncmp(spec->name, "sys_", 4) == 0) {
+			/* For syscalls, we can return EINVAL to userspace */
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_param);
+
+/**
+ * kapi_validate_syscall_params - Validate all syscall parameters together
+ * @spec: API specification
+ * @params: Array of parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: -EINVAL if any parameter is invalid, 0 if all valid
+ */
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+				 const s64 *params, int param_count)
+{
+	int i;
+
+	if (!spec || !params)
+		return 0;
+
+	/* Validate that we have the expected number of parameters */
+	if (param_count != spec->param_count) {
+		pr_warn("API %s: parameter count mismatch (expected %u, got %d)\n",
+			spec->name, spec->param_count, param_count);
+		return -EINVAL;
+	}
+
+	/* Validate each parameter with context */
+	for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+		const struct kapi_param_spec *param_spec = &spec->params[i];
+
+		if (!kapi_validate_param_with_context(param_spec, params[i], params, param_count)) {
+			if (strncmp(spec->name, "sys_", 4) == 0) {
+				/* For syscalls, we can return EINVAL to userspace */
+				return -EINVAL;
+			}
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_params);
+
+/**
+ * kapi_check_return_success - Check if return value indicates success
+ * @return_spec: Return specification
+ * @retval: Return value to check
+ *
+ * Returns true if the return value indicates success according to the spec.
+ */
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+	u32 i;
+
+	if (!return_spec)
+		return true; /* No spec means we can't validate */
+
+	switch (return_spec->check_type) {
+	case KAPI_RETURN_EXACT:
+		return retval == return_spec->success_value;
+
+	case KAPI_RETURN_RANGE:
+		return retval >= return_spec->success_min &&
+		       retval <= return_spec->success_max;
+
+	case KAPI_RETURN_ERROR_CHECK:
+		/* Success if NOT in error list */
+		if (return_spec->error_values) {
+			for (i = 0; i < return_spec->error_count; i++) {
+				if (retval == return_spec->error_values[i])
+					return false; /* Found in error list */
+			}
+		}
+		return true; /* Not in error list = success */
+
+	case KAPI_RETURN_FD:
+		/* File descriptors: >= 0 is success, < 0 is error */
+		return retval >= 0;
+
+	case KAPI_RETURN_CUSTOM:
+		if (return_spec->is_success)
+			return return_spec->is_success(retval);
+		fallthrough;
+
+	default:
+		return true; /* Unknown check type, assume success */
+	}
+}
+EXPORT_SYMBOL_GPL(kapi_check_return_success);
+
+/**
+ * kapi_validate_return_value - Validate that return value matches spec
+ * @spec: API specification
+ * @retval: Return value to validate
+ *
+ * Return: true if return value is valid according to spec, false otherwise.
+ *
+ * This function checks:
+ * 1. If the value indicates success, it must match the success criteria
+ * 2. If the value indicates error, it must be one of the specified error codes
+ */
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+	int i;
+	bool is_success;
+
+	if (!spec)
+		return true; /* No spec means we can't validate */
+
+	/* First check if this is a success return */
+	is_success = kapi_check_return_success(&spec->return_spec, retval);
+
+	if (is_success) {
+		/* Special validation for file descriptor returns */
+		if (spec->return_spec.check_type == KAPI_RETURN_FD) {
+			/* For successful FD returns, validate it's a valid FD */
+			if (!kapi_validate_fd((int)retval)) {
+				pr_warn("API %s returned invalid file descriptor %lld\n",
+					spec->name, retval);
+				return false;
+			}
+		}
+		return true;
+	}
+
+	/* Error case - check if it's one of the specified errors */
+	if (spec->error_count == 0) {
+		/* No errors specified, so any error is potentially valid */
+		pr_debug("API %s returned unspecified error %lld\n",
+			 spec->name, retval);
+		return true;
+	}
+
+	/* Check if the error is in our list of specified errors */
+	for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+		if (retval == spec->errors[i].error_code)
+			return true;
+	}
+
+	/* Error not in spec */
+	pr_warn("API %s returned unspecified error code %lld. Valid errors are:\n",
+		spec->name, retval);
+	for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+		pr_warn("  %s (%d): %s\n",
+			spec->errors[i].name,
+			spec->errors[i].error_code,
+			spec->errors[i].condition);
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_return_value);
+
+/**
+ * kapi_validate_syscall_return - Validate syscall return value with enforcement
+ * @spec: API specification
+ * @retval: Return value
+ *
+ * Return: 0 if valid, -EINVAL if the return value doesn't match spec
+ *
+ * For syscalls, this can help detect kernel bugs where unspecified error
+ * codes are returned to userspace.
+ */
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+	if (!spec)
+		return 0;
+
+	if (!kapi_validate_return_value(spec, retval)) {
+		/* Log the violation but don't change the return value */
+		WARN_ONCE(1, "Syscall %s returned unspecified value %lld\n",
+			  spec->name, retval);
+		/* Could return -EINVAL here to enforce, but that might break userspace */
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_return);
+
+/**
+ * kapi_check_context - Check if current context matches API requirements
+ * @spec: API specification to check against
+ */
+void kapi_check_context(const struct kernel_api_spec *spec)
+{
+	u32 ctx = spec->context_flags;
+	bool valid = false;
+
+	if (!ctx)
+		return;
+
+	/* Check if we're in an allowed context */
+	if ((ctx & KAPI_CTX_PROCESS) && !in_interrupt())
+		valid = true;
+
+	if ((ctx & KAPI_CTX_SOFTIRQ) && in_softirq())
+		valid = true;
+
+	if ((ctx & KAPI_CTX_HARDIRQ) && in_hardirq())
+		valid = true;
+
+	if ((ctx & KAPI_CTX_NMI) && in_nmi())
+		valid = true;
+
+	if (!valid) {
+		WARN_ONCE(1, "API %s called from invalid context\n", spec->name);
+	}
+
+	/* Check specific requirements */
+	if ((ctx & KAPI_CTX_ATOMIC) && preemptible()) {
+		WARN_ONCE(1, "API %s requires atomic context\n", spec->name);
+	}
+
+	if ((ctx & KAPI_CTX_SLEEPABLE) && !preemptible()) {
+		WARN_ONCE(1, "API %s requires sleepable context\n", spec->name);
+	}
+}
+EXPORT_SYMBOL_GPL(kapi_check_context);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh
new file mode 100755
index 0000000000000..2c3078a508fef
--- /dev/null
+++ b/scripts/generate_api_specs.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Stub script for generating API specifications collector
+# This is a placeholder until the full implementation is available
+#
+
+cat << 'EOF'
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Auto-generated API specifications collector (stub)
+ * Generated by scripts/generate_api_specs.sh
+ */
+
+#include <linux/kernel_api_spec.h>
+
+/* No API specifications collected yet */
+EOF
-- 
2.51.0


^ permalink raw reply related

* [RFC PATCH v5 00/15] Kernel API Specification Framework
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
  To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin

This proposal introduces machinery for documenting kernel APIs, addressing the
long-standing challenge of maintaining stable interfaces between the kernel and
user-space programs. Despite the kernel's commitment to never breaking user
space, the lack of machine-readable API specifications has led to breakages and
across system calls and IOCTLs.

Specifications can document parameter types, valid ranges, constraints, and
alignment requirements. They capture return value semantics including success
conditions and error codes with their meaning. Execution context requirements,
capabilities, locking constraints, signal handling behavior, and side effects
can all be formally specified.

These specifications live alongside the code they document and are both
human-readable and machine-parseable. They can be validated at runtime when
CONFIG_KAPI_RUNTIME_CHECKS is enabled, exported via debugfs for userspace
tools, and extracted from either vmlinux or source code.

This enables static analysis tools to verify userspace API usage at compile
time, test generation based on formal specifications, consistent error handling
validation, automated documentation generation, and formal verification of
kernel interfaces.

The implementation includes a core framework with ELF section storage,
kerneldoc integration for inline specification, a debugfs interface for runtime
querying, and a Rust-based extraction tool (tools/kapi) supporting JSON, RST,
and plain text output formats. Example specifications are provided for async
I/O, xattr, and basic file operation syscalls.

The series with runtime testing enabled (CONFIG_KAPI_RUNTIME_CHECKS=y)
currently survives LTP tests in a KVM VM.

I am not looking for a review of the actual specs just yet. Those are
LLM generated, and the main focus right now is the format of the spec
rather than the individual specs. Having said that, there is a set with
more syscall specs (100+) available at
git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux.git branch
"spec" for folks who are interested in testing it out.

Changes since RFC v4:

- Rebase on top of v6.19-rc1, addressing a few conflict resulting of how
  Documentation/ now handles python.
- Moved documentation from Documentation/admin-guide/ to
  Documentation/dev-tools/ (Randy Dunlap)
- Different examples of specs to hopefully allow for better review.
- Simplified architecture support by using generic vmlinux.lds.h instead of
  per-architecture linker script modifications.

Sasha Levin (15):
  kernel/api: introduce kernel API specification framework
  kernel/api: enable kerneldoc-based API specifications
  kernel/api: add debugfs interface for kernel API specifications
  tools/kapi: Add kernel API specification extraction tool
  kernel/api: add API specification for io_setup
  kernel/api: add API specification for io_destroy
  kernel/api: add API specification for io_submit
  kernel/api: add API specification for io_cancel
  kernel/api: add API specification for setxattr
  kernel/api: add API specification for lsetxattr
  kernel/api: add API specification for fsetxattr
  kernel/api: add API specification for sys_open
  kernel/api: add API specification for sys_close
  kernel/api: add API specification for sys_read
  kernel/api: add API specification for sys_write

 .gitignore                                    |    1 +
 Documentation/dev-tools/kernel-api-spec.rst   |  699 ++++++++
 MAINTAINERS                                   |    9 +
 fs/aio.c                                      |  982 +++++++++-
 fs/open.c                                     |  565 +++++-
 fs/read_write.c                               |  664 +++++++
 fs/xattr.c                                    |  959 ++++++++++
 include/asm-generic/vmlinux.lds.h             |   28 +
 include/linux/kernel_api_spec.h               | 1597 +++++++++++++++++
 include/linux/syscall_api_spec.h              |  198 ++
 include/linux/syscalls.h                      |   38 +
 init/Kconfig                                  |    2 +
 kernel/Makefile                               |    3 +
 kernel/api/Kconfig                            |   55 +
 kernel/api/Makefile                           |   32 +
 kernel/api/kapi_debugfs.c                     |  358 ++++
 kernel/api/kernel_api_spec.c                  | 1185 ++++++++++++
 scripts/Makefile.build                        |   28 +
 scripts/Makefile.clean                        |    3 +
 scripts/generate_api_specs.sh                 |   87 +
 scripts/kernel-doc.py                         |    5 +
 tools/kapi/.gitignore                         |    4 +
 tools/kapi/Cargo.toml                         |   19 +
 tools/kapi/src/extractor/debugfs.rs           |  442 +++++
 tools/kapi/src/extractor/kerneldoc_parser.rs  |  692 +++++++
 tools/kapi/src/extractor/mod.rs               |  464 +++++
 tools/kapi/src/extractor/source_parser.rs     |  213 +++
 .../src/extractor/vmlinux/binary_utils.rs     |  180 ++
 .../src/extractor/vmlinux/magic_finder.rs     |  102 ++
 tools/kapi/src/extractor/vmlinux/mod.rs       |  864 +++++++++
 tools/kapi/src/formatter/json.rs              |  468 +++++
 tools/kapi/src/formatter/mod.rs               |  140 ++
 tools/kapi/src/formatter/plain.rs             |  549 ++++++
 tools/kapi/src/formatter/rst.rs               |  621 +++++++
 tools/kapi/src/main.rs                        |  116 ++
 tools/lib/python/kdoc/kdoc_apispec.py         |  755 ++++++++
 tools/lib/python/kdoc/kdoc_output.py          |    9 +-
 tools/lib/python/kdoc/kdoc_parser.py          |   86 +-
 38 files changed, 13176 insertions(+), 46 deletions(-)
 create mode 100644 Documentation/dev-tools/kernel-api-spec.rst
 create mode 100644 include/linux/kernel_api_spec.h
 create mode 100644 include/linux/syscall_api_spec.h
 create mode 100644 kernel/api/Kconfig
 create mode 100644 kernel/api/Makefile
 create mode 100644 kernel/api/kapi_debugfs.c
 create mode 100644 kernel/api/kernel_api_spec.c
 create mode 100755 scripts/generate_api_specs.sh
 create mode 100644 tools/kapi/.gitignore
 create mode 100644 tools/kapi/Cargo.toml
 create mode 100644 tools/kapi/src/extractor/debugfs.rs
 create mode 100644 tools/kapi/src/extractor/kerneldoc_parser.rs
 create mode 100644 tools/kapi/src/extractor/mod.rs
 create mode 100644 tools/kapi/src/extractor/source_parser.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/magic_finder.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
 create mode 100644 tools/kapi/src/formatter/json.rs
 create mode 100644 tools/kapi/src/formatter/mod.rs
 create mode 100644 tools/kapi/src/formatter/plain.rs
 create mode 100644 tools/kapi/src/formatter/rst.rs
 create mode 100644 tools/kapi/src/main.rs
 create mode 100644 tools/lib/python/kdoc/kdoc_apispec.py

-- 
2.51.0


^ permalink raw reply

* Re: [PATCH v23 2/8] Documentation: userspace-api: Add shadow stack API documentation
From: Randy Dunlap @ 2025-12-18 18:55 UTC (permalink / raw)
  To: Mark Brown, Rick P. Edgecombe, Deepak Gupta, H.J. Lu,
	Florian Weimer, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, Andrew Morton,
	Yury Khrustalev, Adhemerval Zanella Netto, Wilco Dijkstra,
	CarlosO'Donell, Rich Felker, linux-kselftest, linux-api,
	Kees Cook, Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-2-7cb318fbb385@kernel.org>



On 12/18/25 12:10 AM, Mark Brown wrote:
> There are a number of architectures with shadow stack features which we are
> presenting to userspace with as consistent an API as we can (though there
> are some architecture specifics). Especially given that there are some
> important considerations for userspace code interacting directly with the
> feature let's provide some documentation covering the common aspects.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Kees Cook <kees@kernel.org>
> Tested-by: Kees Cook <kees@kernel.org>
> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
> Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
> Reviewed-by: Deepak Gupta <debug@rivosinc.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  Documentation/userspace-api/index.rst        |  1 +
>  Documentation/userspace-api/shadow_stack.rst | 44 ++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 8a61ac4c1bf1..64b0099ee161 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -63,6 +63,7 @@ Everything else
>     ELF
>     liveupdate
>     netlink/index
> +   shadow_stack
>     sysfs-platform_profile
>     vduse
>     futex2
> diff --git a/Documentation/userspace-api/shadow_stack.rst b/Documentation/userspace-api/shadow_stack.rst
> new file mode 100644
> index 000000000000..42617d0470ba
> --- /dev/null
> +++ b/Documentation/userspace-api/shadow_stack.rst
> @@ -0,0 +1,44 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=============
> +Shadow Stacks
> +=============
> +
> +Introduction
> +============
> +
> +Several architectures have features which provide backward edge
> +control flow protection through a hardware maintained stack, only
> +writable by userspace through very limited operations.  This feature
> +is referred to as shadow stacks on Linux, on x86 it is part of Intel
> +Control Enforcement Technology (CET), on arm64 it is Guarded Control
> +Stacks feature (FEAT_GCS) and for RISC-V it is the Zicfiss extension.
> +It is expected that this feature will normally be managed by the
> +system dynamic linker and libc in ways broadly transparent to
> +application code, this document covers interfaces and considerations.
> +
> +
> +Enabling
> +========
> +
> +Shadow stacks default to disabled when a userspace process is
> +executed, they can be enabled for the current thread with a syscall:
> +
> + - For x86 the ARCH_SHSTK_ENABLE arch_prctl()
> + - For other architectures the PR_SET_SHADOW_STACK_ENABLE prctl()
> +
> +It is expected that this will normally be done by the dynamic linker.
> +Any new threads created by a thread with shadow stacks enabled will
> +themselves have shadow stacks enabled.
> +
> +
> +Enablement considerations
> +=========================
> +
> +- Returning from the function that enables shadow stacks without first
> +  disabling them will cause a shadow stack exception.  This includes
> +  any syscall wrapper or other library functions, the syscall will need
> +  to be inlined.
> +- A lock feature allows userspace to prevent disabling of shadow stacks.
> +- Those that change the stack context like longjmp() or use of ucontext

     Those <what>?  maybe syscalls, library calls, specific functon calls?

> +  changes on signal return will need support from libc.
> 

-- 
~Randy


^ permalink raw reply

* Re: [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Darrick J. Wong @ 2025-12-18 18:45 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Christoph Hellwig, brauner, linux-api, linux-ext4, jack,
	linux-xfs, linux-fsdevel, gabriel, amir73il, linux-man
In-Reply-To: <vffjgmklao4a7cb7r7nky7umh2pdyvzdpv4hzi5j3j5lgxxwsp@6phylpwucm2k>

On Thu, Dec 18, 2025 at 12:04:11PM +0100, Alejandro Colomar wrote:
> Hi Christoph,
> 
> On Wed, Dec 17, 2025 at 09:17:15PM -0800, Christoph Hellwig wrote:
> > On Wed, Dec 17, 2025 at 06:02:56PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Stop definining these privately and instead move them to the uapi
> > > errno.h so that they become canonical instead of copy pasta.
> > 
> > Sounds fine:
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > 
> > Do we need to document these overlay errnos in the man man pages,
> > though?
> 
> I'd say yes.  errno(3) is where these are documented.  Thanks for CCing.

I'll send a manpage update, thanks!

--D

> 
> Have a lovely day!
> Alex
> 
> > 
> > > 
> > > Cc: linux-api@vger.kernel.org
> > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > ---
> > >  arch/alpha/include/uapi/asm/errno.h        |    2 ++
> > >  arch/mips/include/uapi/asm/errno.h         |    2 ++
> > >  arch/parisc/include/uapi/asm/errno.h       |    2 ++
> > >  arch/sparc/include/uapi/asm/errno.h        |    2 ++
> > >  fs/erofs/internal.h                        |    2 --
> > >  fs/ext2/ext2.h                             |    1 -
> > >  fs/ext4/ext4.h                             |    3 ---
> > >  fs/f2fs/f2fs.h                             |    3 ---
> > >  fs/minix/minix.h                           |    2 --
> > >  fs/udf/udf_sb.h                            |    2 --
> > >  fs/xfs/xfs_linux.h                         |    2 --
> > >  include/linux/jbd2.h                       |    3 ---
> > >  include/uapi/asm-generic/errno.h           |    2 ++
> > >  tools/arch/alpha/include/uapi/asm/errno.h  |    2 ++
> > >  tools/arch/mips/include/uapi/asm/errno.h   |    2 ++
> > >  tools/arch/parisc/include/uapi/asm/errno.h |    2 ++
> > >  tools/arch/sparc/include/uapi/asm/errno.h  |    2 ++
> > >  tools/include/uapi/asm-generic/errno.h     |    2 ++
> > >  18 files changed, 20 insertions(+), 18 deletions(-)
> > > 
> > > 
> > > diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
> > > index 3d265f6babaf0a..6791f6508632ee 100644
> > > --- a/arch/alpha/include/uapi/asm/errno.h
> > > +++ b/arch/alpha/include/uapi/asm/errno.h
> > > @@ -55,6 +55,7 @@
> > >  #define	ENOSR		82	/* Out of streams resources */
> > >  #define	ETIME		83	/* Timer expired */
> > >  #define	EBADMSG		84	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EPROTO		85	/* Protocol error */
> > >  #define	ENODATA		86	/* No data available */
> > >  #define	ENOSTR		87	/* Device not a stream */
> > > @@ -96,6 +97,7 @@
> > >  #define	EREMCHG		115	/* Remote address changed */
> > >  
> > >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> > >  #define	EISNAM		120	/* Is a named type file */
> > > diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
> > > index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> > > --- a/arch/mips/include/uapi/asm/errno.h
> > > +++ b/arch/mips/include/uapi/asm/errno.h
> > > @@ -50,6 +50,7 @@
> > >  #define EDOTDOT		73	/* RFS specific error */
> > >  #define EMULTIHOP	74	/* Multihop attempted */
> > >  #define EBADMSG		77	/* Not a data message */
> > > +#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define ENAMETOOLONG	78	/* File name too long */
> > >  #define EOVERFLOW	79	/* Value too large for defined data type */
> > >  #define ENOTUNIQ	80	/* Name not unique on network */
> > > @@ -88,6 +89,7 @@
> > >  #define EISCONN		133	/* Transport endpoint is already connected */
> > >  #define ENOTCONN	134	/* Transport endpoint is not connected */
> > >  #define EUCLEAN		135	/* Structure needs cleaning */
> > > +#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define ENOTNAM		137	/* Not a XENIX named type file */
> > >  #define ENAVAIL		138	/* No XENIX semaphores available */
> > >  #define EISNAM		139	/* Is a named type file */
> > > diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
> > > index 8d94739d75c67c..8cbc07c1903e4c 100644
> > > --- a/arch/parisc/include/uapi/asm/errno.h
> > > +++ b/arch/parisc/include/uapi/asm/errno.h
> > > @@ -36,6 +36,7 @@
> > >  
> > >  #define	EDOTDOT		66	/* RFS specific error */
> > >  #define	EBADMSG		67	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EUSERS		68	/* Too many users */
> > >  #define	EDQUOT		69	/* Quota exceeded */
> > >  #define	ESTALE		70	/* Stale file handle */
> > > @@ -62,6 +63,7 @@
> > >  #define	ERESTART	175	/* Interrupted system call should be restarted */
> > >  #define	ESTRPIPE	176	/* Streams pipe error */
> > >  #define	EUCLEAN		177	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		178	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		179	/* No XENIX semaphores available */
> > >  #define	EISNAM		180	/* Is a named type file */
> > > diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
> > > index 81a732b902ee38..4a41e7835fd5b8 100644
> > > --- a/arch/sparc/include/uapi/asm/errno.h
> > > +++ b/arch/sparc/include/uapi/asm/errno.h
> > > @@ -48,6 +48,7 @@
> > >  #define	ENOSR		74	/* Out of streams resources */
> > >  #define	ENOMSG		75	/* No message of desired type */
> > >  #define	EBADMSG		76	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EIDRM		77	/* Identifier removed */
> > >  #define	EDEADLK		78	/* Resource deadlock would occur */
> > >  #define	ENOLCK		79	/* No record locks available */
> > > @@ -91,6 +92,7 @@
> > >  #define	ENOTUNIQ	115	/* Name not unique on network */
> > >  #define	ERESTART	116	/* Interrupted syscall should be restarted */
> > >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> > >  #define	EISNAM		120	/* Is a named type file */
> > > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > > index f7f622836198da..d06e99baf5d5ae 100644
> > > --- a/fs/erofs/internal.h
> > > +++ b/fs/erofs/internal.h
> > > @@ -541,6 +541,4 @@ long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
> > >  long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
> > >  			unsigned long arg);
> > >  
> > > -#define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
> > > -
> > >  #endif	/* __EROFS_INTERNAL_H */
> > > diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
> > > index cf97b76e9fd3e9..5e0c6c5fcb6cd6 100644
> > > --- a/fs/ext2/ext2.h
> > > +++ b/fs/ext2/ext2.h
> > > @@ -357,7 +357,6 @@ struct ext2_inode {
> > >   */
> > >  #define	EXT2_VALID_FS			0x0001	/* Unmounted cleanly */
> > >  #define	EXT2_ERROR_FS			0x0002	/* Errors detected */
> > > -#define	EFSCORRUPTED			EUCLEAN	/* Filesystem is corrupted */
> > >  
> > >  /*
> > >   * Mount flags
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index 56112f201cace7..62c091b52bacdf 100644
> > > --- a/fs/ext4/ext4.h
> > > +++ b/fs/ext4/ext4.h
> > > @@ -3938,7 +3938,4 @@ extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> > >  				  get_block_t *get_block);
> > >  #endif	/* __KERNEL__ */
> > >  
> > > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > > -
> > >  #endif	/* _EXT4_H */
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index 20edbb99b814a7..9f3aa3c7f12613 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -5004,7 +5004,4 @@ static inline void f2fs_invalidate_internal_cache(struct f2fs_sb_info *sbi,
> > >  	f2fs_invalidate_compress_pages_range(sbi, blkaddr, len);
> > >  }
> > >  
> > > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > > -
> > >  #endif /* _LINUX_F2FS_H */
> > > diff --git a/fs/minix/minix.h b/fs/minix/minix.h
> > > index 2bfaf377f2086c..7e1f652f16d311 100644
> > > --- a/fs/minix/minix.h
> > > +++ b/fs/minix/minix.h
> > > @@ -175,6 +175,4 @@ static inline int minix_test_bit(int nr, const void *vaddr)
> > >  	__minix_error_inode((inode), __func__, __LINE__,	\
> > >  			    (fmt), ##__VA_ARGS__)
> > >  
> > > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > > -
> > >  #endif /* FS_MINIX_H */
> > > diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
> > > index 08ec8756b9487b..8399accc788dea 100644
> > > --- a/fs/udf/udf_sb.h
> > > +++ b/fs/udf/udf_sb.h
> > > @@ -55,8 +55,6 @@
> > >  #define MF_DUPLICATE_MD		0x01
> > >  #define MF_MIRROR_FE_LOADED	0x02
> > >  
> > > -#define EFSCORRUPTED EUCLEAN
> > > -
> > >  struct udf_meta_data {
> > >  	__u32	s_meta_file_loc;
> > >  	__u32	s_mirror_file_loc;
> > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > > index 4dd747bdbccab2..55064228c4d574 100644
> > > --- a/fs/xfs/xfs_linux.h
> > > +++ b/fs/xfs/xfs_linux.h
> > > @@ -121,8 +121,6 @@ typedef __u32			xfs_nlink_t;
> > >  
> > >  #define ENOATTR		ENODATA		/* Attribute not found */
> > >  #define EWRONGFS	EINVAL		/* Mount with wrong filesystem type */
> > > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > >  
> > >  #define __return_address __builtin_return_address(0)
> > >  
> > > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > > index f5eaf76198f377..a53a00d36228ce 100644
> > > --- a/include/linux/jbd2.h
> > > +++ b/include/linux/jbd2.h
> > > @@ -1815,7 +1815,4 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle)
> > >  
> > >  #endif	/* __KERNEL__ */
> > >  
> > > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > > -
> > >  #endif	/* _LINUX_JBD2_H */
> > > diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> > > index cf9c51ac49f97e..92e7ae493ee315 100644
> > > --- a/include/uapi/asm-generic/errno.h
> > > +++ b/include/uapi/asm-generic/errno.h
> > > @@ -55,6 +55,7 @@
> > >  #define	EMULTIHOP	72	/* Multihop attempted */
> > >  #define	EDOTDOT		73	/* RFS specific error */
> > >  #define	EBADMSG		74	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EOVERFLOW	75	/* Value too large for defined data type */
> > >  #define	ENOTUNIQ	76	/* Name not unique on network */
> > >  #define	EBADFD		77	/* File descriptor in bad state */
> > > @@ -98,6 +99,7 @@
> > >  #define	EINPROGRESS	115	/* Operation now in progress */
> > >  #define	ESTALE		116	/* Stale file handle */
> > >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> > >  #define	EISNAM		120	/* Is a named type file */
> > > diff --git a/tools/arch/alpha/include/uapi/asm/errno.h b/tools/arch/alpha/include/uapi/asm/errno.h
> > > index 3d265f6babaf0a..6791f6508632ee 100644
> > > --- a/tools/arch/alpha/include/uapi/asm/errno.h
> > > +++ b/tools/arch/alpha/include/uapi/asm/errno.h
> > > @@ -55,6 +55,7 @@
> > >  #define	ENOSR		82	/* Out of streams resources */
> > >  #define	ETIME		83	/* Timer expired */
> > >  #define	EBADMSG		84	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EPROTO		85	/* Protocol error */
> > >  #define	ENODATA		86	/* No data available */
> > >  #define	ENOSTR		87	/* Device not a stream */
> > > @@ -96,6 +97,7 @@
> > >  #define	EREMCHG		115	/* Remote address changed */
> > >  
> > >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> > >  #define	EISNAM		120	/* Is a named type file */
> > > diff --git a/tools/arch/mips/include/uapi/asm/errno.h b/tools/arch/mips/include/uapi/asm/errno.h
> > > index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> > > --- a/tools/arch/mips/include/uapi/asm/errno.h
> > > +++ b/tools/arch/mips/include/uapi/asm/errno.h
> > > @@ -50,6 +50,7 @@
> > >  #define EDOTDOT		73	/* RFS specific error */
> > >  #define EMULTIHOP	74	/* Multihop attempted */
> > >  #define EBADMSG		77	/* Not a data message */
> > > +#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define ENAMETOOLONG	78	/* File name too long */
> > >  #define EOVERFLOW	79	/* Value too large for defined data type */
> > >  #define ENOTUNIQ	80	/* Name not unique on network */
> > > @@ -88,6 +89,7 @@
> > >  #define EISCONN		133	/* Transport endpoint is already connected */
> > >  #define ENOTCONN	134	/* Transport endpoint is not connected */
> > >  #define EUCLEAN		135	/* Structure needs cleaning */
> > > +#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define ENOTNAM		137	/* Not a XENIX named type file */
> > >  #define ENAVAIL		138	/* No XENIX semaphores available */
> > >  #define EISNAM		139	/* Is a named type file */
> > > diff --git a/tools/arch/parisc/include/uapi/asm/errno.h b/tools/arch/parisc/include/uapi/asm/errno.h
> > > index 8d94739d75c67c..8cbc07c1903e4c 100644
> > > --- a/tools/arch/parisc/include/uapi/asm/errno.h
> > > +++ b/tools/arch/parisc/include/uapi/asm/errno.h
> > > @@ -36,6 +36,7 @@
> > >  
> > >  #define	EDOTDOT		66	/* RFS specific error */
> > >  #define	EBADMSG		67	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EUSERS		68	/* Too many users */
> > >  #define	EDQUOT		69	/* Quota exceeded */
> > >  #define	ESTALE		70	/* Stale file handle */
> > > @@ -62,6 +63,7 @@
> > >  #define	ERESTART	175	/* Interrupted system call should be restarted */
> > >  #define	ESTRPIPE	176	/* Streams pipe error */
> > >  #define	EUCLEAN		177	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		178	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		179	/* No XENIX semaphores available */
> > >  #define	EISNAM		180	/* Is a named type file */
> > > diff --git a/tools/arch/sparc/include/uapi/asm/errno.h b/tools/arch/sparc/include/uapi/asm/errno.h
> > > index 81a732b902ee38..4a41e7835fd5b8 100644
> > > --- a/tools/arch/sparc/include/uapi/asm/errno.h
> > > +++ b/tools/arch/sparc/include/uapi/asm/errno.h
> > > @@ -48,6 +48,7 @@
> > >  #define	ENOSR		74	/* Out of streams resources */
> > >  #define	ENOMSG		75	/* No message of desired type */
> > >  #define	EBADMSG		76	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EIDRM		77	/* Identifier removed */
> > >  #define	EDEADLK		78	/* Resource deadlock would occur */
> > >  #define	ENOLCK		79	/* No record locks available */
> > > @@ -91,6 +92,7 @@
> > >  #define	ENOTUNIQ	115	/* Name not unique on network */
> > >  #define	ERESTART	116	/* Interrupted syscall should be restarted */
> > >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> > >  #define	EISNAM		120	/* Is a named type file */
> > > diff --git a/tools/include/uapi/asm-generic/errno.h b/tools/include/uapi/asm-generic/errno.h
> > > index cf9c51ac49f97e..92e7ae493ee315 100644
> > > --- a/tools/include/uapi/asm-generic/errno.h
> > > +++ b/tools/include/uapi/asm-generic/errno.h
> > > @@ -55,6 +55,7 @@
> > >  #define	EMULTIHOP	72	/* Multihop attempted */
> > >  #define	EDOTDOT		73	/* RFS specific error */
> > >  #define	EBADMSG		74	/* Not a data message */
> > > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> > >  #define	EOVERFLOW	75	/* Value too large for defined data type */
> > >  #define	ENOTUNIQ	76	/* Name not unique on network */
> > >  #define	EBADFD		77	/* File descriptor in bad state */
> > > @@ -98,6 +99,7 @@
> > >  #define	EINPROGRESS	115	/* Operation now in progress */
> > >  #define	ESTALE		116	/* Stale file handle */
> > >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> > >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> > >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> > >  #define	EISNAM		120	/* Is a named type file */
> > > 
> > > 
> > ---end quoted text---
> > 
> 
> -- 
> <https://www.alejandro-colomar.es>



^ permalink raw reply

* Re: [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Alejandro Colomar @ 2025-12-18 11:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, brauner, linux-api, linux-ext4, jack, linux-xfs,
	linux-fsdevel, gabriel, amir73il, linux-man
In-Reply-To: <aUOOW9z3K5ff-531@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 16436 bytes --]

Hi Christoph,

On Wed, Dec 17, 2025 at 09:17:15PM -0800, Christoph Hellwig wrote:
> On Wed, Dec 17, 2025 at 06:02:56PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Stop definining these privately and instead move them to the uapi
> > errno.h so that they become canonical instead of copy pasta.
> 
> Sounds fine:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Do we need to document these overlay errnos in the man man pages,
> though?

I'd say yes.  errno(3) is where these are documented.  Thanks for CCing.


Have a lovely day!
Alex

> 
> > 
> > Cc: linux-api@vger.kernel.org
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > ---
> >  arch/alpha/include/uapi/asm/errno.h        |    2 ++
> >  arch/mips/include/uapi/asm/errno.h         |    2 ++
> >  arch/parisc/include/uapi/asm/errno.h       |    2 ++
> >  arch/sparc/include/uapi/asm/errno.h        |    2 ++
> >  fs/erofs/internal.h                        |    2 --
> >  fs/ext2/ext2.h                             |    1 -
> >  fs/ext4/ext4.h                             |    3 ---
> >  fs/f2fs/f2fs.h                             |    3 ---
> >  fs/minix/minix.h                           |    2 --
> >  fs/udf/udf_sb.h                            |    2 --
> >  fs/xfs/xfs_linux.h                         |    2 --
> >  include/linux/jbd2.h                       |    3 ---
> >  include/uapi/asm-generic/errno.h           |    2 ++
> >  tools/arch/alpha/include/uapi/asm/errno.h  |    2 ++
> >  tools/arch/mips/include/uapi/asm/errno.h   |    2 ++
> >  tools/arch/parisc/include/uapi/asm/errno.h |    2 ++
> >  tools/arch/sparc/include/uapi/asm/errno.h  |    2 ++
> >  tools/include/uapi/asm-generic/errno.h     |    2 ++
> >  18 files changed, 20 insertions(+), 18 deletions(-)
> > 
> > 
> > diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
> > index 3d265f6babaf0a..6791f6508632ee 100644
> > --- a/arch/alpha/include/uapi/asm/errno.h
> > +++ b/arch/alpha/include/uapi/asm/errno.h
> > @@ -55,6 +55,7 @@
> >  #define	ENOSR		82	/* Out of streams resources */
> >  #define	ETIME		83	/* Timer expired */
> >  #define	EBADMSG		84	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EPROTO		85	/* Protocol error */
> >  #define	ENODATA		86	/* No data available */
> >  #define	ENOSTR		87	/* Device not a stream */
> > @@ -96,6 +97,7 @@
> >  #define	EREMCHG		115	/* Remote address changed */
> >  
> >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> >  #define	EISNAM		120	/* Is a named type file */
> > diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
> > index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> > --- a/arch/mips/include/uapi/asm/errno.h
> > +++ b/arch/mips/include/uapi/asm/errno.h
> > @@ -50,6 +50,7 @@
> >  #define EDOTDOT		73	/* RFS specific error */
> >  #define EMULTIHOP	74	/* Multihop attempted */
> >  #define EBADMSG		77	/* Not a data message */
> > +#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define ENAMETOOLONG	78	/* File name too long */
> >  #define EOVERFLOW	79	/* Value too large for defined data type */
> >  #define ENOTUNIQ	80	/* Name not unique on network */
> > @@ -88,6 +89,7 @@
> >  #define EISCONN		133	/* Transport endpoint is already connected */
> >  #define ENOTCONN	134	/* Transport endpoint is not connected */
> >  #define EUCLEAN		135	/* Structure needs cleaning */
> > +#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define ENOTNAM		137	/* Not a XENIX named type file */
> >  #define ENAVAIL		138	/* No XENIX semaphores available */
> >  #define EISNAM		139	/* Is a named type file */
> > diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
> > index 8d94739d75c67c..8cbc07c1903e4c 100644
> > --- a/arch/parisc/include/uapi/asm/errno.h
> > +++ b/arch/parisc/include/uapi/asm/errno.h
> > @@ -36,6 +36,7 @@
> >  
> >  #define	EDOTDOT		66	/* RFS specific error */
> >  #define	EBADMSG		67	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EUSERS		68	/* Too many users */
> >  #define	EDQUOT		69	/* Quota exceeded */
> >  #define	ESTALE		70	/* Stale file handle */
> > @@ -62,6 +63,7 @@
> >  #define	ERESTART	175	/* Interrupted system call should be restarted */
> >  #define	ESTRPIPE	176	/* Streams pipe error */
> >  #define	EUCLEAN		177	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		178	/* Not a XENIX named type file */
> >  #define	ENAVAIL		179	/* No XENIX semaphores available */
> >  #define	EISNAM		180	/* Is a named type file */
> > diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
> > index 81a732b902ee38..4a41e7835fd5b8 100644
> > --- a/arch/sparc/include/uapi/asm/errno.h
> > +++ b/arch/sparc/include/uapi/asm/errno.h
> > @@ -48,6 +48,7 @@
> >  #define	ENOSR		74	/* Out of streams resources */
> >  #define	ENOMSG		75	/* No message of desired type */
> >  #define	EBADMSG		76	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EIDRM		77	/* Identifier removed */
> >  #define	EDEADLK		78	/* Resource deadlock would occur */
> >  #define	ENOLCK		79	/* No record locks available */
> > @@ -91,6 +92,7 @@
> >  #define	ENOTUNIQ	115	/* Name not unique on network */
> >  #define	ERESTART	116	/* Interrupted syscall should be restarted */
> >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> >  #define	EISNAM		120	/* Is a named type file */
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index f7f622836198da..d06e99baf5d5ae 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -541,6 +541,4 @@ long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
> >  long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
> >  			unsigned long arg);
> >  
> > -#define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
> > -
> >  #endif	/* __EROFS_INTERNAL_H */
> > diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
> > index cf97b76e9fd3e9..5e0c6c5fcb6cd6 100644
> > --- a/fs/ext2/ext2.h
> > +++ b/fs/ext2/ext2.h
> > @@ -357,7 +357,6 @@ struct ext2_inode {
> >   */
> >  #define	EXT2_VALID_FS			0x0001	/* Unmounted cleanly */
> >  #define	EXT2_ERROR_FS			0x0002	/* Errors detected */
> > -#define	EFSCORRUPTED			EUCLEAN	/* Filesystem is corrupted */
> >  
> >  /*
> >   * Mount flags
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 56112f201cace7..62c091b52bacdf 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -3938,7 +3938,4 @@ extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> >  				  get_block_t *get_block);
> >  #endif	/* __KERNEL__ */
> >  
> > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > -
> >  #endif	/* _EXT4_H */
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 20edbb99b814a7..9f3aa3c7f12613 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -5004,7 +5004,4 @@ static inline void f2fs_invalidate_internal_cache(struct f2fs_sb_info *sbi,
> >  	f2fs_invalidate_compress_pages_range(sbi, blkaddr, len);
> >  }
> >  
> > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > -
> >  #endif /* _LINUX_F2FS_H */
> > diff --git a/fs/minix/minix.h b/fs/minix/minix.h
> > index 2bfaf377f2086c..7e1f652f16d311 100644
> > --- a/fs/minix/minix.h
> > +++ b/fs/minix/minix.h
> > @@ -175,6 +175,4 @@ static inline int minix_test_bit(int nr, const void *vaddr)
> >  	__minix_error_inode((inode), __func__, __LINE__,	\
> >  			    (fmt), ##__VA_ARGS__)
> >  
> > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > -
> >  #endif /* FS_MINIX_H */
> > diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
> > index 08ec8756b9487b..8399accc788dea 100644
> > --- a/fs/udf/udf_sb.h
> > +++ b/fs/udf/udf_sb.h
> > @@ -55,8 +55,6 @@
> >  #define MF_DUPLICATE_MD		0x01
> >  #define MF_MIRROR_FE_LOADED	0x02
> >  
> > -#define EFSCORRUPTED EUCLEAN
> > -
> >  struct udf_meta_data {
> >  	__u32	s_meta_file_loc;
> >  	__u32	s_mirror_file_loc;
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index 4dd747bdbccab2..55064228c4d574 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -121,8 +121,6 @@ typedef __u32			xfs_nlink_t;
> >  
> >  #define ENOATTR		ENODATA		/* Attribute not found */
> >  #define EWRONGFS	EINVAL		/* Mount with wrong filesystem type */
> > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> >  
> >  #define __return_address __builtin_return_address(0)
> >  
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index f5eaf76198f377..a53a00d36228ce 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1815,7 +1815,4 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle)
> >  
> >  #endif	/* __KERNEL__ */
> >  
> > -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> > -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > -
> >  #endif	/* _LINUX_JBD2_H */
> > diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> > index cf9c51ac49f97e..92e7ae493ee315 100644
> > --- a/include/uapi/asm-generic/errno.h
> > +++ b/include/uapi/asm-generic/errno.h
> > @@ -55,6 +55,7 @@
> >  #define	EMULTIHOP	72	/* Multihop attempted */
> >  #define	EDOTDOT		73	/* RFS specific error */
> >  #define	EBADMSG		74	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EOVERFLOW	75	/* Value too large for defined data type */
> >  #define	ENOTUNIQ	76	/* Name not unique on network */
> >  #define	EBADFD		77	/* File descriptor in bad state */
> > @@ -98,6 +99,7 @@
> >  #define	EINPROGRESS	115	/* Operation now in progress */
> >  #define	ESTALE		116	/* Stale file handle */
> >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> >  #define	EISNAM		120	/* Is a named type file */
> > diff --git a/tools/arch/alpha/include/uapi/asm/errno.h b/tools/arch/alpha/include/uapi/asm/errno.h
> > index 3d265f6babaf0a..6791f6508632ee 100644
> > --- a/tools/arch/alpha/include/uapi/asm/errno.h
> > +++ b/tools/arch/alpha/include/uapi/asm/errno.h
> > @@ -55,6 +55,7 @@
> >  #define	ENOSR		82	/* Out of streams resources */
> >  #define	ETIME		83	/* Timer expired */
> >  #define	EBADMSG		84	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EPROTO		85	/* Protocol error */
> >  #define	ENODATA		86	/* No data available */
> >  #define	ENOSTR		87	/* Device not a stream */
> > @@ -96,6 +97,7 @@
> >  #define	EREMCHG		115	/* Remote address changed */
> >  
> >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> >  #define	EISNAM		120	/* Is a named type file */
> > diff --git a/tools/arch/mips/include/uapi/asm/errno.h b/tools/arch/mips/include/uapi/asm/errno.h
> > index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> > --- a/tools/arch/mips/include/uapi/asm/errno.h
> > +++ b/tools/arch/mips/include/uapi/asm/errno.h
> > @@ -50,6 +50,7 @@
> >  #define EDOTDOT		73	/* RFS specific error */
> >  #define EMULTIHOP	74	/* Multihop attempted */
> >  #define EBADMSG		77	/* Not a data message */
> > +#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define ENAMETOOLONG	78	/* File name too long */
> >  #define EOVERFLOW	79	/* Value too large for defined data type */
> >  #define ENOTUNIQ	80	/* Name not unique on network */
> > @@ -88,6 +89,7 @@
> >  #define EISCONN		133	/* Transport endpoint is already connected */
> >  #define ENOTCONN	134	/* Transport endpoint is not connected */
> >  #define EUCLEAN		135	/* Structure needs cleaning */
> > +#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define ENOTNAM		137	/* Not a XENIX named type file */
> >  #define ENAVAIL		138	/* No XENIX semaphores available */
> >  #define EISNAM		139	/* Is a named type file */
> > diff --git a/tools/arch/parisc/include/uapi/asm/errno.h b/tools/arch/parisc/include/uapi/asm/errno.h
> > index 8d94739d75c67c..8cbc07c1903e4c 100644
> > --- a/tools/arch/parisc/include/uapi/asm/errno.h
> > +++ b/tools/arch/parisc/include/uapi/asm/errno.h
> > @@ -36,6 +36,7 @@
> >  
> >  #define	EDOTDOT		66	/* RFS specific error */
> >  #define	EBADMSG		67	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EUSERS		68	/* Too many users */
> >  #define	EDQUOT		69	/* Quota exceeded */
> >  #define	ESTALE		70	/* Stale file handle */
> > @@ -62,6 +63,7 @@
> >  #define	ERESTART	175	/* Interrupted system call should be restarted */
> >  #define	ESTRPIPE	176	/* Streams pipe error */
> >  #define	EUCLEAN		177	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		178	/* Not a XENIX named type file */
> >  #define	ENAVAIL		179	/* No XENIX semaphores available */
> >  #define	EISNAM		180	/* Is a named type file */
> > diff --git a/tools/arch/sparc/include/uapi/asm/errno.h b/tools/arch/sparc/include/uapi/asm/errno.h
> > index 81a732b902ee38..4a41e7835fd5b8 100644
> > --- a/tools/arch/sparc/include/uapi/asm/errno.h
> > +++ b/tools/arch/sparc/include/uapi/asm/errno.h
> > @@ -48,6 +48,7 @@
> >  #define	ENOSR		74	/* Out of streams resources */
> >  #define	ENOMSG		75	/* No message of desired type */
> >  #define	EBADMSG		76	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EIDRM		77	/* Identifier removed */
> >  #define	EDEADLK		78	/* Resource deadlock would occur */
> >  #define	ENOLCK		79	/* No record locks available */
> > @@ -91,6 +92,7 @@
> >  #define	ENOTUNIQ	115	/* Name not unique on network */
> >  #define	ERESTART	116	/* Interrupted syscall should be restarted */
> >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> >  #define	EISNAM		120	/* Is a named type file */
> > diff --git a/tools/include/uapi/asm-generic/errno.h b/tools/include/uapi/asm-generic/errno.h
> > index cf9c51ac49f97e..92e7ae493ee315 100644
> > --- a/tools/include/uapi/asm-generic/errno.h
> > +++ b/tools/include/uapi/asm-generic/errno.h
> > @@ -55,6 +55,7 @@
> >  #define	EMULTIHOP	72	/* Multihop attempted */
> >  #define	EDOTDOT		73	/* RFS specific error */
> >  #define	EBADMSG		74	/* Not a data message */
> > +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
> >  #define	EOVERFLOW	75	/* Value too large for defined data type */
> >  #define	ENOTUNIQ	76	/* Name not unique on network */
> >  #define	EBADFD		77	/* File descriptor in bad state */
> > @@ -98,6 +99,7 @@
> >  #define	EINPROGRESS	115	/* Operation now in progress */
> >  #define	ESTALE		116	/* Stale file handle */
> >  #define	EUCLEAN		117	/* Structure needs cleaning */
> > +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
> >  #define	ENOTNAM		118	/* Not a XENIX named type file */
> >  #define	ENAVAIL		119	/* No XENIX semaphores available */
> >  #define	EISNAM		120	/* Is a named type file */
> > 
> > 
> ---end quoted text---
> 

-- 
<https://www.alejandro-colomar.es>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Gao Xiang @ 2025-12-18  9:33 UTC (permalink / raw)
  To: Darrick J. Wong, brauner
  Cc: linux-api, linux-ext4, jack, linux-xfs, linux-fsdevel, gabriel,
	hch, amir73il
In-Reply-To: <176602332146.686273.6355079912638580915.stgit@frogsfrogsfrogs>



On 2025/12/18 10:02, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Stop definining these privately and instead move them to the uapi
> errno.h so that they become canonical instead of copy pasta.
> 
> Cc: linux-api@vger.kernel.org
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

(...I really think it could be done earlier)
Thanks,
Gao Xiang

^ permalink raw reply

* [PATCH v23 8/8] selftests/clone3: Test shadow stack support
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

Add basic test coverage for specifying the shadow stack for a newly
created thread via clone3(), including coverage of the newly extended
argument structure.  We check that a user specified shadow stack can be
provided, and that invalid combinations of parameters are rejected.

In order to facilitate testing on systems without userspace shadow stack
support we manually enable shadow stacks on startup, this is architecture
specific due to the use of an arch_prctl() on x86. Due to interactions with
potential userspace locking of features we actually detect support for
shadow stacks on the running system by attempting to allocate a shadow
stack page during initialisation using map_shadow_stack(), warning if this
succeeds when the enable failed.

In order to allow testing of user configured shadow stacks on
architectures with that feature we need to ensure that we do not return
from the function where the clone3() syscall is called in the child
process, doing so would trigger a shadow stack underflow.  To do this we
use inline assembly rather than the standard syscall wrapper to call
clone3().  In order to avoid surprises we also use a syscall rather than
the libc exit() function., this should be overly cautious.

Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/clone3/clone3.c           | 143 +++++++++++++++++++++-
 tools/testing/selftests/clone3/clone3_selftests.h |  63 ++++++++++
 2 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 932cc64e9ae4..829f0d1268c8 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -3,6 +3,7 @@
 /* Based on Christian Brauner's clone3() example */
 
 #define _GNU_SOURCE
+#include <asm/mman.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <linux/types.h>
@@ -11,6 +12,7 @@
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <sys/mman.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <sys/un.h>
@@ -19,8 +21,12 @@
 #include <sched.h>
 
 #include "kselftest.h"
+#include "ksft_shstk.h"
 #include "clone3_selftests.h"
 
+static bool shadow_stack_supported;
+static size_t max_supported_args_size;
+
 enum test_mode {
 	CLONE3_ARGS_NO_TEST,
 	CLONE3_ARGS_ALL_0,
@@ -28,6 +34,10 @@ enum test_mode {
 	CLONE3_ARGS_INVAL_EXIT_SIGNAL_NEG,
 	CLONE3_ARGS_INVAL_EXIT_SIGNAL_CSIG,
 	CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
+	CLONE3_ARGS_SHADOW_STACK,
+	CLONE3_ARGS_SHADOW_STACK_MISALIGNED,
+	CLONE3_ARGS_SHADOW_STACK_NO_TOKEN,
+	CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
 };
 
 typedef bool (*filter_function)(void);
@@ -44,6 +54,44 @@ struct test {
 	filter_function filter;
 };
 
+
+/*
+ * We check for shadow stack support by attempting to use
+ * map_shadow_stack() since features may have been locked by the
+ * dynamic linker resulting in spurious errors when we attempt to
+ * enable on startup.  We warn if the enable failed.
+ */
+static void test_shadow_stack_supported(void)
+{
+	long ret;
+
+	ret = syscall(__NR_map_shadow_stack, 0, getpagesize(), 0);
+	if (ret == -1) {
+		ksft_print_msg("map_shadow_stack() not supported\n");
+	} else if ((void *)ret == MAP_FAILED) {
+		ksft_print_msg("Failed to map shadow stack\n");
+	} else {
+		ksft_print_msg("Shadow stack supportd\n");
+		shadow_stack_supported = true;
+
+		if (!shadow_stack_enabled)
+			ksft_print_msg("Mapped but did not enable shadow stack\n");
+	}
+}
+
+static void *get_shadow_stack_page(unsigned long flags)
+{
+	unsigned long long page;
+
+	page = syscall(__NR_map_shadow_stack, 0, getpagesize(), flags);
+	if ((void *)page == MAP_FAILED) {
+		ksft_print_msg("map_shadow_stack() failed: %d\n", errno);
+		return 0;
+	}
+
+	return (void *)page;
+}
+
 static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 {
 	struct __clone_args args = {
@@ -57,6 +105,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 	} args_ext;
 
 	pid_t pid = -1;
+	void *p;
 	int status;
 
 	memset(&args_ext, 0, sizeof(args_ext));
@@ -89,6 +138,26 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 	case CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG:
 		args.exit_signal = 0x00000000000000f0ULL;
 		break;
+	case CLONE3_ARGS_SHADOW_STACK:
+		p = get_shadow_stack_page(SHADOW_STACK_SET_TOKEN);
+		p += getpagesize() - sizeof(void *);
+		args.shstk_token = (unsigned long long)p;
+		break;
+	case CLONE3_ARGS_SHADOW_STACK_MISALIGNED:
+		p = get_shadow_stack_page(SHADOW_STACK_SET_TOKEN);
+		p += getpagesize() - sizeof(void *) - 1;
+		args.shstk_token = (unsigned long long)p;
+		break;
+	case CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY:
+		p = malloc(getpagesize());
+		p += getpagesize() - sizeof(void *);
+		args.shstk_token = (unsigned long long)p;
+		break;
+	case CLONE3_ARGS_SHADOW_STACK_NO_TOKEN:
+		p = get_shadow_stack_page(0);
+		p += getpagesize() - sizeof(void *);
+		args.shstk_token = (unsigned long long)p;
+		break;
 	}
 
 	memcpy(&args_ext.args, &args, sizeof(struct __clone_args));
@@ -102,7 +171,12 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 
 	if (pid == 0) {
 		ksft_print_msg("I am the child, my PID is %d\n", getpid());
-		_exit(EXIT_SUCCESS);
+		/*
+		 * Use a raw syscall to ensure we don't get issues
+		 * with manually specified shadow stack and exit handlers.
+		 */
+		syscall(__NR_exit, EXIT_SUCCESS);
+		ksft_print_msg("CHILD FAILED TO EXIT PID is %d\n", getpid());
 	}
 
 	ksft_print_msg("I am the parent (%d). My child's pid is %d\n",
@@ -184,6 +258,26 @@ static bool no_timenamespace(void)
 	return true;
 }
 
+static bool have_shadow_stack(void)
+{
+	if (shadow_stack_supported) {
+		ksft_print_msg("Shadow stack supported\n");
+		return true;
+	}
+
+	return false;
+}
+
+static bool no_shadow_stack(void)
+{
+	if (!shadow_stack_supported) {
+		ksft_print_msg("Shadow stack not supported\n");
+		return true;
+	}
+
+	return false;
+}
+
 static size_t page_size_plus_8(void)
 {
 	return getpagesize() + 8;
@@ -327,6 +421,50 @@ static const struct test tests[] = {
 		.expected = -EINVAL,
 		.test_mode = CLONE3_ARGS_NO_TEST,
 	},
+	{
+		.name = "Shadow stack on system with shadow stack",
+		.size = 0,
+		.expected = 0,
+		.e2big_valid = true,
+		.test_mode = CLONE3_ARGS_SHADOW_STACK,
+		.filter = no_shadow_stack,
+	},
+	{
+		.name = "Shadow stack with misaligned address",
+		.flags = CLONE_VM,
+		.size = 0,
+		.expected = -EINVAL,
+		.e2big_valid = true,
+		.test_mode = CLONE3_ARGS_SHADOW_STACK_MISALIGNED,
+		.filter = no_shadow_stack,
+	},
+	{
+		.name = "Shadow stack with normal memory",
+		.flags = CLONE_VM,
+		.size = 0,
+		.expected = -EFAULT,
+		.e2big_valid = true,
+		.test_mode = CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
+		.filter = no_shadow_stack,
+	},
+	{
+		.name = "Shadow stack with no token",
+		.flags = CLONE_VM,
+		.size = 0,
+		.expected = -EINVAL,
+		.e2big_valid = true,
+		.test_mode = CLONE3_ARGS_SHADOW_STACK_NO_TOKEN,
+		.filter = no_shadow_stack,
+	},
+	{
+		.name = "Shadow stack on system without shadow stack",
+		.flags = CLONE_VM,
+		.size = 0,
+		.expected = -EFAULT,
+		.e2big_valid = true,
+		.test_mode = CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
+		.filter = have_shadow_stack,
+	},
 };
 
 int main(int argc, char *argv[])
@@ -334,9 +472,12 @@ int main(int argc, char *argv[])
 	size_t size;
 	int i;
 
+	enable_shadow_stack();
+
 	ksft_print_header();
 	ksft_set_plan(ARRAY_SIZE(tests));
 	test_clone3_supported();
+	test_shadow_stack_supported();
 
 	for (i = 0; i < ARRAY_SIZE(tests); i++)
 		test_clone3(&tests[i]);
diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
index a7ab2f1cccda..97d98d07fb78 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -31,12 +31,75 @@ struct __clone_args {
 	__aligned_u64 set_tid;
 	__aligned_u64 set_tid_size;
 	__aligned_u64 cgroup;
+#ifndef CLONE_ARGS_SIZE_VER2
+#define CLONE_ARGS_SIZE_VER2 88	/* sizeof third published struct */
+#endif
+	__aligned_u64 shstk_token;
+#ifndef CLONE_ARGS_SIZE_VER3
+#define CLONE_ARGS_SIZE_VER3 96 /* sizeof fourth published struct */
+#endif
 };
 
+/*
+ * For architectures with shadow stack support we need to be
+ * absolutely sure that the clone3() syscall will be inline and not a
+ * function call so we open code.
+ */
+#ifdef __x86_64__
+static __always_inline pid_t sys_clone3(struct __clone_args *args, size_t size)
+{
+	register long _num  __asm__ ("rax") = __NR_clone3;
+	register long _args __asm__ ("rdi") = (long)(args);
+	register long _size __asm__ ("rsi") = (long)(size);
+	long ret;
+
+	__asm__ volatile (
+		"syscall\n"
+		: "=a"(ret)
+		: "r"(_args), "r"(_size),
+		  "0"(_num)
+		: "rcx", "r11", "memory", "cc"
+	);
+
+	if (ret < 0) {
+		errno = -ret;
+		return -1;
+	}
+
+	return ret;
+}
+#elif defined(__aarch64__)
+static __always_inline pid_t sys_clone3(struct __clone_args *args, size_t size)
+{
+	register long _num  __asm__ ("x8") = __NR_clone3;
+	register long _args __asm__ ("x0") = (long)(args);
+	register long _size __asm__ ("x1") = (long)(size);
+	register long arg2 __asm__ ("x2") = 0;
+	register long arg3 __asm__ ("x3") = 0;
+	register long arg4 __asm__ ("x4") = 0;
+
+	__asm__ volatile (
+		"svc #0\n"
+		: "=r"(_args)
+		: "r"(_args), "r"(_size),
+		  "r"(_num), "r"(arg2),
+		  "r"(arg3), "r"(arg4)
+		: "memory", "cc"
+	);
+
+	if ((int)_args < 0) {
+		errno = -((int)_args);
+		return -1;
+	}
+
+	return _args;
+}
+#else
 static pid_t sys_clone3(struct __clone_args *args, size_t size)
 {
 	return syscall(__NR_clone3, args, size);
 }
+#endif
 
 static inline void test_clone3_supported(void)
 {

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 7/8] selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
	Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

The clone_args structure is extensible, with the syscall passing in the
length of the structure. Inside the kernel we use copy_struct_from_user()
to read the struct but this has the unfortunate side effect of silently
accepting some overrun in the structure size providing the extra data is
all zeros. This means that we can't discover the clone3() features that
the running kernel supports by simply probing with various struct sizes.
We need to check this for the benefit of test systems which run newer
kselftests on old kernels.

Add a flag which can be set on a test to indicate that clone3() may return
-E2BIG due to the use of newer struct versions. Currently no tests need
this but it will become an issue for testing clone3() support for shadow
stacks, the support for shadow stacks is already present on x86.

Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/clone3/clone3.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 8c852d022c55..932cc64e9ae4 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -39,6 +39,7 @@ struct test {
 	size_t size;
 	size_function size_function;
 	int expected;
+	bool e2big_valid;
 	enum test_mode test_mode;
 	filter_function filter;
 };
@@ -146,6 +147,11 @@ static void test_clone3(const struct test *test)
 	ksft_print_msg("[%d] clone3() with flags says: %d expected %d\n",
 			getpid(), ret, test->expected);
 	if (ret != test->expected) {
+		if (test->e2big_valid && ret == -E2BIG) {
+			ksft_print_msg("Test reported -E2BIG\n");
+			ksft_test_result_skip("%s\n", test->name);
+			return;
+		}
 		ksft_print_msg(
 			"[%d] Result (%d) is different than expected (%d)\n",
 			getpid(), ret, test->expected);

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 6/8] selftests/clone3: Factor more of main loop into test_clone3()
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
	Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

In order to make it easier to add more configuration for the tests and
more support for runtime detection of when tests can be run pass the
structure describing the tests into test_clone3() rather than picking
the arguments out of it and have that function do all the per-test work.

No functional change.

Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/clone3/clone3.c | 77 ++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 40 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 289e0c7c1f09..8c852d022c55 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -30,6 +30,19 @@ enum test_mode {
 	CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
 };
 
+typedef bool (*filter_function)(void);
+typedef size_t (*size_function)(void);
+
+struct test {
+	const char *name;
+	uint64_t flags;
+	size_t size;
+	size_function size_function;
+	int expected;
+	enum test_mode test_mode;
+	filter_function filter;
+};
+
 static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 {
 	struct __clone_args args = {
@@ -109,30 +122,40 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 	return 0;
 }
 
-static bool test_clone3(uint64_t flags, size_t size, int expected,
-			enum test_mode test_mode)
+static void test_clone3(const struct test *test)
 {
+	size_t size;
 	int ret;
 
+	if (test->filter && test->filter()) {
+		ksft_test_result_skip("%s\n", test->name);
+		return;
+	}
+
+	if (test->size_function)
+		size = test->size_function();
+	else
+		size = test->size;
+
+	ksft_print_msg("Running test '%s'\n", test->name);
+
 	ksft_print_msg(
 		"[%d] Trying clone3() with flags %#" PRIx64 " (size %zu)\n",
-		getpid(), flags, size);
-	ret = call_clone3(flags, size, test_mode);
+		getpid(), test->flags, size);
+	ret = call_clone3(test->flags, size, test->test_mode);
 	ksft_print_msg("[%d] clone3() with flags says: %d expected %d\n",
-			getpid(), ret, expected);
-	if (ret != expected) {
+			getpid(), ret, test->expected);
+	if (ret != test->expected) {
 		ksft_print_msg(
 			"[%d] Result (%d) is different than expected (%d)\n",
-			getpid(), ret, expected);
-		return false;
+			getpid(), ret, test->expected);
+		ksft_test_result_fail("%s\n", test->name);
+		return;
 	}
 
-	return true;
+	ksft_test_result_pass("%s\n", test->name);
 }
 
-typedef bool (*filter_function)(void);
-typedef size_t (*size_function)(void);
-
 static bool not_root(void)
 {
 	if (getuid() != 0) {
@@ -160,16 +183,6 @@ static size_t page_size_plus_8(void)
 	return getpagesize() + 8;
 }
 
-struct test {
-	const char *name;
-	uint64_t flags;
-	size_t size;
-	size_function size_function;
-	int expected;
-	enum test_mode test_mode;
-	filter_function filter;
-};
-
 static const struct test tests[] = {
 	{
 		.name = "simple clone3()",
@@ -319,24 +332,8 @@ int main(int argc, char *argv[])
 	ksft_set_plan(ARRAY_SIZE(tests));
 	test_clone3_supported();
 
-	for (i = 0; i < ARRAY_SIZE(tests); i++) {
-		if (tests[i].filter && tests[i].filter()) {
-			ksft_test_result_skip("%s\n", tests[i].name);
-			continue;
-		}
-
-		if (tests[i].size_function)
-			size = tests[i].size_function();
-		else
-			size = tests[i].size;
-
-		ksft_print_msg("Running test '%s'\n", tests[i].name);
-
-		ksft_test_result(test_clone3(tests[i].flags, size,
-					     tests[i].expected,
-					     tests[i].test_mode),
-				 "%s\n", tests[i].name);
-	}
+	for (i = 0; i < ARRAY_SIZE(tests); i++)
+		test_clone3(&tests[i]);
 
 	ksft_finished();
 }

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 5/8] selftests/clone3: Remove redundant flushes of output streams
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
	Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

Since there were widespread issues with output not being flushed the
kselftest framework was modified to explicitly set the output streams
unbuffered in commit 58e2847ad2e6 ("selftests: line buffer test
program's stdout") so there is no need to explicitly flush in the clone3
tests.

Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/clone3/clone3_selftests.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
index a0593e8950f0..a7ab2f1cccda 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -35,8 +35,6 @@ struct __clone_args {
 
 static pid_t sys_clone3(struct __clone_args *args, size_t size)
 {
-	fflush(stdout);
-	fflush(stderr);
 	return syscall(__NR_clone3, args, size);
 }
 

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 4/8] fork: Add shadow stack support to clone3()
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

Unlike with the normal stack there is no API for configuring the shadow
stack for a new thread, instead the kernel will dynamically allocate a
new shadow stack with the same size as the normal stack. This appears to
be due to the shadow stack series having been in development since
before the more extensible clone3() was added rather than anything more
deliberate.

Add a parameter to clone3() specifying a shadow stack pointer to use
for the new thread, this is inconsistent with the way we specify the
normal stack but during review concerns were expressed about having to
identify where the shadow stack pointer should be placed especially in
cases where the shadow stack has been previously active.  If no shadow
stack is specified then the existing implicit allocation behaviour is
maintained.

If a shadow stack pointer is specified then it is required to have an
architecture defined token placed on the stack, this will be consumed by
the new task, the shadow stack is specified by pointing to this token.  If
no valid token is present then this will be reported with -EINVAL.  This
token prevents new threads being created pointing at the shadow stack of
an existing running thread.  On architectures with support for userspace
pivoting of shadow stacks it is expected that the same format and placement
of tokens will be used, this is the case for arm64 and x86.

If the architecture does not support shadow stacks the shadow stack
pointer must be not be specified, architectures that do support the
feature are expected to enforce the same requirement on individual
systems that lack shadow stack support.

Update the existing arm64 and x86 implementations to pay attention to
the newly added arguments, in order to maintain compatibility we use the
existing behaviour if no shadow stack is specified. Since we are now
using more fields from the kernel_clone_args we pass that into the
shadow stack code rather than individual fields.

Portions of the x86 architecture code were written by Rick Edgecombe.

Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/mm/gcs.c              | 47 +++++++++++++++++++-
 arch/x86/include/asm/shstk.h     | 11 +++--
 arch/x86/kernel/process.c        |  2 +-
 arch/x86/kernel/shstk.c          | 53 ++++++++++++++++++++---
 include/asm-generic/cacheflush.h | 11 +++++
 include/linux/sched/task.h       | 17 ++++++++
 include/uapi/linux/sched.h       |  9 ++--
 kernel/fork.c                    | 93 ++++++++++++++++++++++++++++++++++------
 8 files changed, 217 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 3abcbf9adb5c..fd1d5a6655de 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -43,8 +43,23 @@ int gcs_alloc_thread_stack(struct task_struct *tsk,
 {
 	unsigned long addr, size;
 
-	if (!system_supports_gcs())
+	if (!system_supports_gcs()) {
+		if (args->shstk_token)
+			return -EINVAL;
+
 		return 0;
+	}
+
+	/*
+	 * If the user specified a GCS then use it, otherwise fall
+	 * back to a default allocation strategy. Validation is done
+	 * in arch_shstk_validate_clone().
+	 */
+	if (args->shstk_token) {
+		tsk->thread.gcs_base = 0;
+		tsk->thread.gcs_size = 0;
+		return 0;
+	}
 
 	if (!task_gcs_el0_enabled(tsk))
 		return 0;
@@ -68,6 +83,36 @@ int gcs_alloc_thread_stack(struct task_struct *tsk,
 	return 0;
 }
 
+static bool gcs_consume_token(struct vm_area_struct *vma, struct page *page,
+			      unsigned long user_addr)
+{
+	u64 expected = GCS_CAP(user_addr);
+	u64 *token = page_address(page) + offset_in_page(user_addr);
+
+	if (!cmpxchg_to_user_page(vma, page, user_addr, token, expected, 0))
+		return false;
+	set_page_dirty_lock(page);
+
+	return true;
+}
+
+int arch_shstk_validate_clone(struct task_struct *tsk,
+			      struct vm_area_struct *vma,
+			      struct page *page,
+			      struct kernel_clone_args *args)
+{
+	unsigned long gcspr_el0;
+	int ret = 0;
+
+	gcspr_el0 = args->shstk_token;
+	if (!gcs_consume_token(vma, page, gcspr_el0))
+		return -EINVAL;
+
+	tsk->thread.gcspr_el0 = gcspr_el0 + sizeof(u64);
+
+	return ret;
+}
+
 SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
 {
 	unsigned long alloc_size;
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index fc7dcec58fd4..42a5f03a51e2 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 
 struct task_struct;
+struct kernel_clone_args;
 struct ksignal;
 
 #ifdef CONFIG_X86_USER_SHADOW_STACK
@@ -16,8 +17,8 @@ struct thread_shstk {
 
 long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
 void reset_thread_features(void);
-unsigned long shstk_alloc_thread_stack(struct task_struct *p, u64 clone_flags,
-				       unsigned long stack_size);
+unsigned long shstk_alloc_thread_stack(struct task_struct *p,
+				       const struct kernel_clone_args *args);
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
@@ -30,8 +31,10 @@ static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
 static inline void reset_thread_features(void) {}
 static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
-						     u64 clone_flags,
-						     unsigned long stack_size) { return 0; }
+						     const struct kernel_clone_args *args)
+{
+	return 0;
+}
 static inline void shstk_free(struct task_struct *p) {}
 static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
 static inline int restore_signal_shadow_stack(void) { return 0; }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4c718f8adc59..449da29a9f92 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -219,7 +219,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	 * is disabled, new_ssp will remain 0, and fpu_clone() will know not to
 	 * update it.
 	 */
-	new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
+	new_ssp = shstk_alloc_thread_stack(p, args);
 	if (IS_ERR_VALUE(new_ssp))
 		return PTR_ERR((void *)new_ssp);
 
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 978232b6d48d..d66c866274d0 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -191,18 +191,61 @@ void reset_thread_features(void)
 	current->thread.features_locked = 0;
 }
 
-unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, u64 clone_flags,
-				       unsigned long stack_size)
+int arch_shstk_validate_clone(struct task_struct *t,
+			      struct vm_area_struct *vma,
+			      struct page *page,
+			      struct kernel_clone_args *args)
+{
+	void *maddr = page_address(page);
+	unsigned long token;
+	int offset;
+	u64 expected;
+
+	/*
+	 * kernel_clone_args() verification assures token address is 8
+	 * byte aligned.
+	 */
+	token = args->shstk_token;
+	expected = (token + SS_FRAME_SIZE) | BIT(0);
+	offset = offset_in_page(token);
+
+	if (!cmpxchg_to_user_page(vma, page, token, (unsigned long *)(maddr + offset),
+				  expected, 0))
+		return -EINVAL;
+	set_page_dirty_lock(page);
+
+	return 0;
+}
+
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
+				       const struct kernel_clone_args *args)
 {
 	struct thread_shstk *shstk = &tsk->thread.shstk;
+	u64 clone_flags = args->flags;
 	unsigned long addr, size;
 
 	/*
 	 * If shadow stack is not enabled on the new thread, skip any
-	 * switch to a new shadow stack.
+	 * implicit switch to a new shadow stack and reject attempts to
+	 * explicitly specify one.
 	 */
-	if (!features_enabled(ARCH_SHSTK_SHSTK))
+	if (!features_enabled(ARCH_SHSTK_SHSTK)) {
+		if (args->shstk_token)
+			return (unsigned long)ERR_PTR(-EINVAL);
+
 		return 0;
+	}
+
+	/*
+	 * If the user specified a shadow stack then use it, otherwise
+	 * fall back to a default allocation strategy. Validation is
+	 * done in arch_shstk_validate_clone().
+	 */
+	if (args->shstk_token) {
+		shstk->base = 0;
+		shstk->size = 0;
+		return args->shstk_token + 8;
+	}
 
 	/*
 	 * For CLONE_VFORK the child will share the parents shadow stack.
@@ -222,7 +265,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, u64 clone_flags,
 	if (!(clone_flags & CLONE_VM))
 		return 0;
 
-	size = adjust_shstk_size(stack_size);
+	size = adjust_shstk_size(args->stack_size);
 	addr = alloc_shstk(0, size, 0, false);
 	if (IS_ERR_VALUE(addr))
 		return addr;
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 7ee8a179d103..96cc0c7a5c90 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -124,4 +124,15 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
 	} while (0)
 #endif
 
+#ifndef cmpxchg_to_user_page
+#define cmpxchg_to_user_page(vma, page, vaddr, ptr, old, new)  \
+({							  \
+	bool ret;						  \
+								  \
+	ret = try_cmpxchg(ptr, &old, new);			  \
+	flush_icache_user_page(vma, page, vaddr, sizeof(*ptr));	  \
+	ret;							  \
+})
+#endif
+
 #endif /* _ASM_GENERIC_CACHEFLUSH_H */
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 525aa2a632b2..7f860edc58da 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -16,6 +16,7 @@ struct task_struct;
 struct rusage;
 union thread_union;
 struct css_set;
+struct vm_area_struct;
 
 /* All the bits taken by the old clone syscall. */
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
@@ -44,6 +45,7 @@ struct kernel_clone_args {
 	struct cgroup *cgrp;
 	struct css_set *cset;
 	unsigned int kill_seq;
+	unsigned long shstk_token;
 };
 
 /*
@@ -225,4 +227,19 @@ static inline void task_unlock(struct task_struct *p)
 
 DEFINE_GUARD(task_lock, struct task_struct *, task_lock(_T), task_unlock(_T))
 
+#ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
+int arch_shstk_validate_clone(struct task_struct *p,
+			      struct vm_area_struct *vma,
+			      struct page *page,
+			      struct kernel_clone_args *args);
+#else
+static inline int arch_shstk_validate_clone(struct task_struct *p,
+					    struct vm_area_struct *vma,
+					    struct page *page,
+					    struct kernel_clone_args *args)
+{
+	return 0;
+}
+#endif
+
 #endif /* _LINUX_SCHED_TASK_H */
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 359a14cc76a4..7e18e7b3df3a 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -84,6 +84,7 @@
  *                kernel's limit of nested PID namespaces.
  * @cgroup:       If CLONE_INTO_CGROUP is specified set this to
  *                a file descriptor for the cgroup.
+ * @shstk_token:  Pointer to shadow stack token at top of stack.
  *
  * The structure is versioned by size and thus extensible.
  * New struct members must go at the end of the struct and
@@ -101,12 +102,14 @@ struct clone_args {
 	__aligned_u64 set_tid;
 	__aligned_u64 set_tid_size;
 	__aligned_u64 cgroup;
+	__aligned_u64 shstk_token;
 };
 #endif
 
-#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
-#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
-#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+#define CLONE_ARGS_SIZE_VER0  64 /* sizeof first published struct */
+#define CLONE_ARGS_SIZE_VER1  80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2  88 /* sizeof third published struct */
+#define CLONE_ARGS_SIZE_VER3  96 /* sizeof fourth published struct */
 
 /*
  * Scheduling policies
diff --git a/kernel/fork.c b/kernel/fork.c
index b1f3915d5f8e..0d171e9055d6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1955,6 +1955,51 @@ static bool need_futex_hash_allocate_default(u64 clone_flags)
 	return true;
 }
 
+static int shstk_validate_clone(struct task_struct *p,
+				struct kernel_clone_args *args)
+{
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct page *page;
+	unsigned long addr;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_USER_SHADOW_STACK))
+		return 0;
+
+	if (!args->shstk_token)
+		return 0;
+
+	mm = get_task_mm(p);
+	if (!mm)
+		return -EFAULT;
+
+	mmap_read_lock(mm);
+
+	addr = untagged_addr_remote(mm, args->shstk_token);
+	page = get_user_page_vma_remote(mm, addr, FOLL_FORCE | FOLL_WRITE,
+					&vma);
+	if (IS_ERR(page)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (!(vma->vm_flags & VM_SHADOW_STACK) ||
+	    !(vma->vm_flags & VM_WRITE)) {
+		ret = -EFAULT;
+		goto out_page;
+	}
+
+	ret = arch_shstk_validate_clone(p, vma, page, args);
+
+out_page:
+	put_page(page);
+out:
+	mmap_read_unlock(mm);
+	mmput(mm);
+	return ret;
+}
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
@@ -2228,6 +2273,9 @@ __latent_entropy struct task_struct *copy_process(
 	if (retval)
 		goto bad_fork_cleanup_namespaces;
 	retval = copy_thread(p, args);
+	if (retval)
+		goto bad_fork_cleanup_io;
+	retval = shstk_validate_clone(p, args);
 	if (retval)
 		goto bad_fork_cleanup_io;
 
@@ -2807,7 +2855,9 @@ static noinline int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		     CLONE_ARGS_SIZE_VER1);
 	BUILD_BUG_ON(offsetofend(struct clone_args, cgroup) !=
 		     CLONE_ARGS_SIZE_VER2);
-	BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2);
+	BUILD_BUG_ON(offsetofend(struct clone_args, shstk_token) !=
+		     CLONE_ARGS_SIZE_VER3);
+	BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER3);
 
 	if (unlikely(usize > PAGE_SIZE))
 		return -E2BIG;
@@ -2840,16 +2890,17 @@ static noinline int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		return -EINVAL;
 
 	*kargs = (struct kernel_clone_args){
-		.flags		= args.flags,
-		.pidfd		= u64_to_user_ptr(args.pidfd),
-		.child_tid	= u64_to_user_ptr(args.child_tid),
-		.parent_tid	= u64_to_user_ptr(args.parent_tid),
-		.exit_signal	= args.exit_signal,
-		.stack		= args.stack,
-		.stack_size	= args.stack_size,
-		.tls		= args.tls,
-		.set_tid_size	= args.set_tid_size,
-		.cgroup		= args.cgroup,
+		.flags			= args.flags,
+		.pidfd			= u64_to_user_ptr(args.pidfd),
+		.child_tid		= u64_to_user_ptr(args.child_tid),
+		.parent_tid		= u64_to_user_ptr(args.parent_tid),
+		.exit_signal		= args.exit_signal,
+		.stack			= args.stack,
+		.stack_size		= args.stack_size,
+		.tls			= args.tls,
+		.set_tid_size		= args.set_tid_size,
+		.cgroup			= args.cgroup,
+		.shstk_token		= args.shstk_token,
 	};
 
 	if (args.set_tid &&
@@ -2890,6 +2941,24 @@ static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
 	return true;
 }
 
+/**
+ * clone3_shadow_stack_valid - check and prepare shadow stack
+ * @kargs: kernel clone args
+ *
+ * Verify that shadow stacks are only enabled if supported.
+ */
+static inline bool clone3_shadow_stack_valid(struct kernel_clone_args *kargs)
+{
+	if (!kargs->shstk_token)
+		return true;
+
+	if (!IS_ALIGNED(kargs->shstk_token, sizeof(void *)))
+		return false;
+
+	/* Fail if the kernel wasn't built with shadow stacks */
+	return IS_ENABLED(CONFIG_ARCH_HAS_USER_SHADOW_STACK);
+}
+
 static bool clone3_args_valid(struct kernel_clone_args *kargs)
 {
 	/* Verify that no unknown flags are passed along. */
@@ -2912,7 +2981,7 @@ static bool clone3_args_valid(struct kernel_clone_args *kargs)
 	    kargs->exit_signal)
 		return false;
 
-	if (!clone3_stack_valid(kargs))
+	if (!clone3_stack_valid(kargs) || !clone3_shadow_stack_valid(kargs))
 		return false;
 
 	return true;

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 3/8] selftests: Provide helper header for shadow stack testing
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
	Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

While almost all users of shadow stacks should be relying on the dynamic
linker and libc to enable the feature there are several low level test
programs where it is useful to enable without any libc support, allowing
testing without full system enablement. This low level testing is helpful
during bringup of the support itself, and also in enabling coverage by
automated testing without needing all system components in the target root
filesystems to have enablement.

Provide a header with helpers for this purpose, intended for use only by
test programs directly exercising shadow stack interfaces.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/ksft_shstk.h | 98 ++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/tools/testing/selftests/ksft_shstk.h b/tools/testing/selftests/ksft_shstk.h
new file mode 100644
index 000000000000..fecf91218ea5
--- /dev/null
+++ b/tools/testing/selftests/ksft_shstk.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Helpers for shadow stack enablement, this is intended to only be
+ * used by low level test programs directly exercising interfaces for
+ * working with shadow stacks.
+ *
+ * Copyright (C) 2024 ARM Ltd.
+ */
+
+#ifndef __KSFT_SHSTK_H
+#define __KSFT_SHSTK_H
+
+#include <asm/mman.h>
+
+/* This is currently only defined for x86 */
+#ifndef SHADOW_STACK_SET_TOKEN
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0)
+#endif
+
+static bool shadow_stack_enabled;
+
+#ifdef __x86_64__
+#define ARCH_SHSTK_ENABLE	0x5001
+#define ARCH_SHSTK_SHSTK	(1ULL <<  0)
+
+#define ARCH_PRCTL(arg1, arg2)					\
+({								\
+	long _ret;						\
+	register long _num  asm("eax") = __NR_arch_prctl;	\
+	register long _arg1 asm("rdi") = (long)(arg1);		\
+	register long _arg2 asm("rsi") = (long)(arg2);		\
+								\
+	asm volatile (						\
+		"syscall\n"					\
+		: "=a"(_ret)					\
+		: "r"(_arg1), "r"(_arg2),			\
+		  "0"(_num)					\
+		: "rcx", "r11", "memory", "cc"			\
+	);							\
+	_ret;							\
+})
+
+#define ENABLE_SHADOW_STACK
+static __always_inline void enable_shadow_stack(void)
+{
+	int ret = ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK);
+	if (ret == 0)
+		shadow_stack_enabled = true;
+}
+
+#endif
+
+#ifdef __aarch64__
+#define PR_SET_SHADOW_STACK_STATUS      75
+# define PR_SHADOW_STACK_ENABLE         (1UL << 0)
+
+#define my_syscall2(num, arg1, arg2)                                          \
+({                                                                            \
+	register long _num  __asm__ ("x8") = (num);                           \
+	register long _arg1 __asm__ ("x0") = (long)(arg1);                    \
+	register long _arg2 __asm__ ("x1") = (long)(arg2);                    \
+	register long _arg3 __asm__ ("x2") = 0;                               \
+	register long _arg4 __asm__ ("x3") = 0;                               \
+	register long _arg5 __asm__ ("x4") = 0;                               \
+									      \
+	__asm__  volatile (                                                   \
+		"svc #0\n"                                                    \
+		: "=r"(_arg1)                                                 \
+		: "r"(_arg1), "r"(_arg2),                                     \
+		  "r"(_arg3), "r"(_arg4),                                     \
+		  "r"(_arg5), "r"(_num)					      \
+		: "memory", "cc"                                              \
+	);                                                                    \
+	_arg1;                                                                \
+})
+
+#define ENABLE_SHADOW_STACK
+static __always_inline void enable_shadow_stack(void)
+{
+	int ret;
+
+	ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+			  PR_SHADOW_STACK_ENABLE);
+	if (ret == 0)
+		shadow_stack_enabled = true;
+}
+
+#endif
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 453
+#endif
+
+#ifndef ENABLE_SHADOW_STACK
+static inline void enable_shadow_stack(void) { }
+#endif
+
+#endif

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 2/8] Documentation: userspace-api: Add shadow stack API documentation
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
	Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

There are a number of architectures with shadow stack features which we are
presenting to userspace with as consistent an API as we can (though there
are some architecture specifics). Especially given that there are some
important considerations for userspace code interacting directly with the
feature let's provide some documentation covering the common aspects.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/userspace-api/index.rst        |  1 +
 Documentation/userspace-api/shadow_stack.rst | 44 ++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 8a61ac4c1bf1..64b0099ee161 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -63,6 +63,7 @@ Everything else
    ELF
    liveupdate
    netlink/index
+   shadow_stack
    sysfs-platform_profile
    vduse
    futex2
diff --git a/Documentation/userspace-api/shadow_stack.rst b/Documentation/userspace-api/shadow_stack.rst
new file mode 100644
index 000000000000..42617d0470ba
--- /dev/null
+++ b/Documentation/userspace-api/shadow_stack.rst
@@ -0,0 +1,44 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Shadow Stacks
+=============
+
+Introduction
+============
+
+Several architectures have features which provide backward edge
+control flow protection through a hardware maintained stack, only
+writable by userspace through very limited operations.  This feature
+is referred to as shadow stacks on Linux, on x86 it is part of Intel
+Control Enforcement Technology (CET), on arm64 it is Guarded Control
+Stacks feature (FEAT_GCS) and for RISC-V it is the Zicfiss extension.
+It is expected that this feature will normally be managed by the
+system dynamic linker and libc in ways broadly transparent to
+application code, this document covers interfaces and considerations.
+
+
+Enabling
+========
+
+Shadow stacks default to disabled when a userspace process is
+executed, they can be enabled for the current thread with a syscall:
+
+ - For x86 the ARCH_SHSTK_ENABLE arch_prctl()
+ - For other architectures the PR_SET_SHADOW_STACK_ENABLE prctl()
+
+It is expected that this will normally be done by the dynamic linker.
+Any new threads created by a thread with shadow stacks enabled will
+themselves have shadow stacks enabled.
+
+
+Enablement considerations
+=========================
+
+- Returning from the function that enables shadow stacks without first
+  disabling them will cause a shadow stack exception.  This includes
+  any syscall wrapper or other library functions, the syscall will need
+  to be inlined.
+- A lock feature allows userspace to prevent disabling of shadow stacks.
+- Those that change the stack context like longjmp() or use of ucontext
+  changes on signal return will need support from libc.

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 1/8] arm64/gcs: Return a success value from gcs_alloc_thread_stack()
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>

Currently as a result of templating from x86 code gcs_alloc_thread_stack()
returns a pointer as an unsigned int however on arm64 we don't actually use
this pointer value as anything other than a pass/fail flag. Simplify the
interface to just return an int with 0 on success and a negative error code
on failure.

Acked-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/gcs.h | 8 ++++----
 arch/arm64/kernel/process.c  | 8 ++++----
 arch/arm64/mm/gcs.c          | 8 ++++----
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 8fa0707069e8..534ea5ae9281 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -64,8 +64,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task)
 void gcs_set_el0_mode(struct task_struct *task);
 void gcs_free(struct task_struct *task);
 void gcs_preserve_current_state(void);
-unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
-				     const struct kernel_clone_args *args);
+int gcs_alloc_thread_stack(struct task_struct *tsk,
+			   const struct kernel_clone_args *args);
 
 static inline int gcs_check_locked(struct task_struct *task,
 				   unsigned long new_val)
@@ -171,8 +171,8 @@ static inline void put_user_gcs(unsigned long val, unsigned long __user *addr,
 				int *err) { }
 static inline void push_user_gcs(unsigned long val, int *err) { }
 
-static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
-						   const struct kernel_clone_args *args)
+static inline int gcs_alloc_thread_stack(struct task_struct *tsk,
+					 const struct kernel_clone_args *args)
 {
 	return -ENOTSUPP;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index fba7ca102a8c..4dadc70df16b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -299,7 +299,7 @@ static void flush_gcs(void)
 static int copy_thread_gcs(struct task_struct *p,
 			   const struct kernel_clone_args *args)
 {
-	unsigned long gcs;
+	int ret;
 
 	if (!system_supports_gcs())
 		return 0;
@@ -310,9 +310,9 @@ static int copy_thread_gcs(struct task_struct *p,
 	p->thread.gcs_el0_mode = current->thread.gcs_el0_mode;
 	p->thread.gcs_el0_locked = current->thread.gcs_el0_locked;
 
-	gcs = gcs_alloc_thread_stack(p, args);
-	if (IS_ERR_VALUE(gcs))
-		return PTR_ERR((void *)gcs);
+	ret = gcs_alloc_thread_stack(p, args);
+	if (ret != 0)
+		return ret;
 
 	return 0;
 }
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 6e93f78de79b..3abcbf9adb5c 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -38,8 +38,8 @@ static unsigned long gcs_size(unsigned long size)
 	return max(PAGE_SIZE, size);
 }
 
-unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
-				     const struct kernel_clone_args *args)
+int gcs_alloc_thread_stack(struct task_struct *tsk,
+			   const struct kernel_clone_args *args)
 {
 	unsigned long addr, size;
 
@@ -59,13 +59,13 @@ unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
 	size = gcs_size(size);
 	addr = alloc_gcs(0, size);
 	if (IS_ERR_VALUE(addr))
-		return addr;
+		return PTR_ERR((void *)addr);
 
 	tsk->thread.gcs_base = addr;
 	tsk->thread.gcs_size = size;
 	tsk->thread.gcspr_el0 = addr + size - sizeof(u64);
 
-	return addr;
+	return 0;
 }
 
 SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)

-- 
2.47.3


^ permalink raw reply related

* [PATCH v23 0/8] fork: Support shadow stacks in clone3()
From: Mark Brown @ 2025-12-18  8:10 UTC (permalink / raw)
  To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Christian Brauner, Shuah Khan
  Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
	Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
	Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
	linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
	Shuah Khan

At this point I think everyone in the on the kernel side is happy with
this but there were some questions from the glibc side about the value
of controlling the shadow stack placement and size, especially with the
current inability to reuse the shadow stack for an exited thread.  With
support for reuse it would be possible to have a cache of shadow stacks
as is currently supported for the normal stack.

Since the discussion petered out I'm resending this in order to give
people something work with while prototyping.  It should be possible to
prototype any potential kernel features to help build out shadow stack
support in userspace by enabling shadow stack writes, as suggested by
Rick Edgecombe this may end up being required anyway for supporting more
exotic scenarios.  On all current architectures with the feature writes
to shadow stack require specific instructions so there are still
security benefits even with writes enabled.

I did send a change implementing a feature writing a token on thread
exit to allow reuse:

   https://lore.kernel.org/r/20250921-arm64-gcs-exit-token-v1-0-45cf64e648d5@kernel.org

but wasn't planning to refresh it without some indication from the
userspace side that that'd be useful.

Non-process cover letter:

The kernel has added support for shadow stacks, currently x86 only using
their CET feature but both arm64 and RISC-V have equivalent features
(GCS and Zicfiss respectively), I am actively working on GCS[1].  With
shadow stacks the hardware maintains an additional stack containing only
the return addresses for branch instructions which is not generally
writeable by userspace and ensures that any returns are to the recorded
addresses.  This provides some protection against ROP attacks and making
it easier to collect call stacks.  These shadow stacks are allocated in
the address space of the userspace process.

Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled.  The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread.  This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces.  As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.

Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone().  The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.

Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3().  There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does).  The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.

[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org/T/#mc58f97f27461749ccf400ebabf6f9f937116a86b
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com

Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v23:
- Rebase onto v6.19-rc1.
- Link to v22: https://lore.kernel.org/r/20251015-clone3-shadow-stack-v22-0-a8c8da011427@kernel.org

Changes in v22:
- Rebase onto v6.18-rc1.
- Cover letter updates.
- Link to v21: https://lore.kernel.org/r/20250916-clone3-shadow-stack-v21-0-910493527013@kernel.org

Changes in v21:
- Rebase onto https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git kernel-6.18.clone3
- Rename shadow_stack_token to shstk_token, since it's a simple rename I've
  kept the acks and reviews but I dropped the tested-bys just to be safe.
- Link to v20: https://lore.kernel.org/r/20250902-clone3-shadow-stack-v20-0-4d9fff1c53e7@kernel.org

Changes in v20:
- Comment fixes and clarifications in x86 arch_shstk_validate_clone()
  from Rick Edgecombe.
- Spelling fix in documentation.
- Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@kernel.org

Changes in v19:
- Rebase onto v6.17-rc1.
- Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@kernel.org

Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
  on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@kernel.org

Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@kernel.org

Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@kernel.org

Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@kernel.org

Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@kernel.org

Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@kernel.org

Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
  support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@kernel.org

Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
  integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
  base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@kernel.org

Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
  Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@kernel.org

Changes in v9:
- Pull token validation earlier and report problems with an error return
  to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@kernel.org

Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@kernel.org

Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@kernel.org

Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
  x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
  the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@kernel.org

Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
  map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
  other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@kernel.org

Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
  validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@kernel.org

Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
  CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@kernel.org

Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
  desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@kernel.org

---
Mark Brown (8):
      arm64/gcs: Return a success value from gcs_alloc_thread_stack()
      Documentation: userspace-api: Add shadow stack API documentation
      selftests: Provide helper header for shadow stack testing
      fork: Add shadow stack support to clone3()
      selftests/clone3: Remove redundant flushes of output streams
      selftests/clone3: Factor more of main loop into test_clone3()
      selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
      selftests/clone3: Test shadow stack support

 Documentation/userspace-api/index.rst             |   1 +
 Documentation/userspace-api/shadow_stack.rst      |  44 +++++
 arch/arm64/include/asm/gcs.h                      |   8 +-
 arch/arm64/kernel/process.c                       |   8 +-
 arch/arm64/mm/gcs.c                               |  55 +++++-
 arch/x86/include/asm/shstk.h                      |  11 +-
 arch/x86/kernel/process.c                         |   2 +-
 arch/x86/kernel/shstk.c                           |  53 ++++-
 include/asm-generic/cacheflush.h                  |  11 ++
 include/linux/sched/task.h                        |  17 ++
 include/uapi/linux/sched.h                        |   9 +-
 kernel/fork.c                                     |  93 +++++++--
 tools/testing/selftests/clone3/clone3.c           | 226 ++++++++++++++++++----
 tools/testing/selftests/clone3/clone3_selftests.h |  65 ++++++-
 tools/testing/selftests/ksft_shstk.h              |  98 ++++++++++
 15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20231019-clone3-shadow-stack-15d40d2bf536

Best regards,
--  
Mark Brown <broonie@kernel.org>

^ permalink raw reply

* Re: [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Christoph Hellwig @ 2025-12-18  5:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: brauner, linux-api, linux-ext4, jack, linux-xfs, linux-fsdevel,
	gabriel, amir73il, linux-man
In-Reply-To: <176602332146.686273.6355079912638580915.stgit@frogsfrogsfrogs>

On Wed, Dec 17, 2025 at 06:02:56PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Stop definining these privately and instead move them to the uapi
> errno.h so that they become canonical instead of copy pasta.

Sounds fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Do we need to document these overlay errnos in the man man pages,
though?

> 
> Cc: linux-api@vger.kernel.org
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  arch/alpha/include/uapi/asm/errno.h        |    2 ++
>  arch/mips/include/uapi/asm/errno.h         |    2 ++
>  arch/parisc/include/uapi/asm/errno.h       |    2 ++
>  arch/sparc/include/uapi/asm/errno.h        |    2 ++
>  fs/erofs/internal.h                        |    2 --
>  fs/ext2/ext2.h                             |    1 -
>  fs/ext4/ext4.h                             |    3 ---
>  fs/f2fs/f2fs.h                             |    3 ---
>  fs/minix/minix.h                           |    2 --
>  fs/udf/udf_sb.h                            |    2 --
>  fs/xfs/xfs_linux.h                         |    2 --
>  include/linux/jbd2.h                       |    3 ---
>  include/uapi/asm-generic/errno.h           |    2 ++
>  tools/arch/alpha/include/uapi/asm/errno.h  |    2 ++
>  tools/arch/mips/include/uapi/asm/errno.h   |    2 ++
>  tools/arch/parisc/include/uapi/asm/errno.h |    2 ++
>  tools/arch/sparc/include/uapi/asm/errno.h  |    2 ++
>  tools/include/uapi/asm-generic/errno.h     |    2 ++
>  18 files changed, 20 insertions(+), 18 deletions(-)
> 
> 
> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
> index 3d265f6babaf0a..6791f6508632ee 100644
> --- a/arch/alpha/include/uapi/asm/errno.h
> +++ b/arch/alpha/include/uapi/asm/errno.h
> @@ -55,6 +55,7 @@
>  #define	ENOSR		82	/* Out of streams resources */
>  #define	ETIME		83	/* Timer expired */
>  #define	EBADMSG		84	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EPROTO		85	/* Protocol error */
>  #define	ENODATA		86	/* No data available */
>  #define	ENOSTR		87	/* Device not a stream */
> @@ -96,6 +97,7 @@
>  #define	EREMCHG		115	/* Remote address changed */
>  
>  #define	EUCLEAN		117	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		118	/* Not a XENIX named type file */
>  #define	ENAVAIL		119	/* No XENIX semaphores available */
>  #define	EISNAM		120	/* Is a named type file */
> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
> index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> --- a/arch/mips/include/uapi/asm/errno.h
> +++ b/arch/mips/include/uapi/asm/errno.h
> @@ -50,6 +50,7 @@
>  #define EDOTDOT		73	/* RFS specific error */
>  #define EMULTIHOP	74	/* Multihop attempted */
>  #define EBADMSG		77	/* Not a data message */
> +#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define ENAMETOOLONG	78	/* File name too long */
>  #define EOVERFLOW	79	/* Value too large for defined data type */
>  #define ENOTUNIQ	80	/* Name not unique on network */
> @@ -88,6 +89,7 @@
>  #define EISCONN		133	/* Transport endpoint is already connected */
>  #define ENOTCONN	134	/* Transport endpoint is not connected */
>  #define EUCLEAN		135	/* Structure needs cleaning */
> +#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define ENOTNAM		137	/* Not a XENIX named type file */
>  #define ENAVAIL		138	/* No XENIX semaphores available */
>  #define EISNAM		139	/* Is a named type file */
> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
> index 8d94739d75c67c..8cbc07c1903e4c 100644
> --- a/arch/parisc/include/uapi/asm/errno.h
> +++ b/arch/parisc/include/uapi/asm/errno.h
> @@ -36,6 +36,7 @@
>  
>  #define	EDOTDOT		66	/* RFS specific error */
>  #define	EBADMSG		67	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EUSERS		68	/* Too many users */
>  #define	EDQUOT		69	/* Quota exceeded */
>  #define	ESTALE		70	/* Stale file handle */
> @@ -62,6 +63,7 @@
>  #define	ERESTART	175	/* Interrupted system call should be restarted */
>  #define	ESTRPIPE	176	/* Streams pipe error */
>  #define	EUCLEAN		177	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		178	/* Not a XENIX named type file */
>  #define	ENAVAIL		179	/* No XENIX semaphores available */
>  #define	EISNAM		180	/* Is a named type file */
> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
> index 81a732b902ee38..4a41e7835fd5b8 100644
> --- a/arch/sparc/include/uapi/asm/errno.h
> +++ b/arch/sparc/include/uapi/asm/errno.h
> @@ -48,6 +48,7 @@
>  #define	ENOSR		74	/* Out of streams resources */
>  #define	ENOMSG		75	/* No message of desired type */
>  #define	EBADMSG		76	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EIDRM		77	/* Identifier removed */
>  #define	EDEADLK		78	/* Resource deadlock would occur */
>  #define	ENOLCK		79	/* No record locks available */
> @@ -91,6 +92,7 @@
>  #define	ENOTUNIQ	115	/* Name not unique on network */
>  #define	ERESTART	116	/* Interrupted syscall should be restarted */
>  #define	EUCLEAN		117	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		118	/* Not a XENIX named type file */
>  #define	ENAVAIL		119	/* No XENIX semaphores available */
>  #define	EISNAM		120	/* Is a named type file */
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index f7f622836198da..d06e99baf5d5ae 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -541,6 +541,4 @@ long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
>  long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
>  			unsigned long arg);
>  
> -#define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
> -
>  #endif	/* __EROFS_INTERNAL_H */
> diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
> index cf97b76e9fd3e9..5e0c6c5fcb6cd6 100644
> --- a/fs/ext2/ext2.h
> +++ b/fs/ext2/ext2.h
> @@ -357,7 +357,6 @@ struct ext2_inode {
>   */
>  #define	EXT2_VALID_FS			0x0001	/* Unmounted cleanly */
>  #define	EXT2_ERROR_FS			0x0002	/* Errors detected */
> -#define	EFSCORRUPTED			EUCLEAN	/* Filesystem is corrupted */
>  
>  /*
>   * Mount flags
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 56112f201cace7..62c091b52bacdf 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3938,7 +3938,4 @@ extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
>  				  get_block_t *get_block);
>  #endif	/* __KERNEL__ */
>  
> -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> -
>  #endif	/* _EXT4_H */
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 20edbb99b814a7..9f3aa3c7f12613 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -5004,7 +5004,4 @@ static inline void f2fs_invalidate_internal_cache(struct f2fs_sb_info *sbi,
>  	f2fs_invalidate_compress_pages_range(sbi, blkaddr, len);
>  }
>  
> -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> -
>  #endif /* _LINUX_F2FS_H */
> diff --git a/fs/minix/minix.h b/fs/minix/minix.h
> index 2bfaf377f2086c..7e1f652f16d311 100644
> --- a/fs/minix/minix.h
> +++ b/fs/minix/minix.h
> @@ -175,6 +175,4 @@ static inline int minix_test_bit(int nr, const void *vaddr)
>  	__minix_error_inode((inode), __func__, __LINE__,	\
>  			    (fmt), ##__VA_ARGS__)
>  
> -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> -
>  #endif /* FS_MINIX_H */
> diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
> index 08ec8756b9487b..8399accc788dea 100644
> --- a/fs/udf/udf_sb.h
> +++ b/fs/udf/udf_sb.h
> @@ -55,8 +55,6 @@
>  #define MF_DUPLICATE_MD		0x01
>  #define MF_MIRROR_FE_LOADED	0x02
>  
> -#define EFSCORRUPTED EUCLEAN
> -
>  struct udf_meta_data {
>  	__u32	s_meta_file_loc;
>  	__u32	s_mirror_file_loc;
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 4dd747bdbccab2..55064228c4d574 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -121,8 +121,6 @@ typedef __u32			xfs_nlink_t;
>  
>  #define ENOATTR		ENODATA		/* Attribute not found */
>  #define EWRONGFS	EINVAL		/* Mount with wrong filesystem type */
> -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
>  
>  #define __return_address __builtin_return_address(0)
>  
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index f5eaf76198f377..a53a00d36228ce 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1815,7 +1815,4 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle)
>  
>  #endif	/* __KERNEL__ */
>  
> -#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> -#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> -
>  #endif	/* _LINUX_JBD2_H */
> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> index cf9c51ac49f97e..92e7ae493ee315 100644
> --- a/include/uapi/asm-generic/errno.h
> +++ b/include/uapi/asm-generic/errno.h
> @@ -55,6 +55,7 @@
>  #define	EMULTIHOP	72	/* Multihop attempted */
>  #define	EDOTDOT		73	/* RFS specific error */
>  #define	EBADMSG		74	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EOVERFLOW	75	/* Value too large for defined data type */
>  #define	ENOTUNIQ	76	/* Name not unique on network */
>  #define	EBADFD		77	/* File descriptor in bad state */
> @@ -98,6 +99,7 @@
>  #define	EINPROGRESS	115	/* Operation now in progress */
>  #define	ESTALE		116	/* Stale file handle */
>  #define	EUCLEAN		117	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		118	/* Not a XENIX named type file */
>  #define	ENAVAIL		119	/* No XENIX semaphores available */
>  #define	EISNAM		120	/* Is a named type file */
> diff --git a/tools/arch/alpha/include/uapi/asm/errno.h b/tools/arch/alpha/include/uapi/asm/errno.h
> index 3d265f6babaf0a..6791f6508632ee 100644
> --- a/tools/arch/alpha/include/uapi/asm/errno.h
> +++ b/tools/arch/alpha/include/uapi/asm/errno.h
> @@ -55,6 +55,7 @@
>  #define	ENOSR		82	/* Out of streams resources */
>  #define	ETIME		83	/* Timer expired */
>  #define	EBADMSG		84	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EPROTO		85	/* Protocol error */
>  #define	ENODATA		86	/* No data available */
>  #define	ENOSTR		87	/* Device not a stream */
> @@ -96,6 +97,7 @@
>  #define	EREMCHG		115	/* Remote address changed */
>  
>  #define	EUCLEAN		117	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		118	/* Not a XENIX named type file */
>  #define	ENAVAIL		119	/* No XENIX semaphores available */
>  #define	EISNAM		120	/* Is a named type file */
> diff --git a/tools/arch/mips/include/uapi/asm/errno.h b/tools/arch/mips/include/uapi/asm/errno.h
> index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> --- a/tools/arch/mips/include/uapi/asm/errno.h
> +++ b/tools/arch/mips/include/uapi/asm/errno.h
> @@ -50,6 +50,7 @@
>  #define EDOTDOT		73	/* RFS specific error */
>  #define EMULTIHOP	74	/* Multihop attempted */
>  #define EBADMSG		77	/* Not a data message */
> +#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define ENAMETOOLONG	78	/* File name too long */
>  #define EOVERFLOW	79	/* Value too large for defined data type */
>  #define ENOTUNIQ	80	/* Name not unique on network */
> @@ -88,6 +89,7 @@
>  #define EISCONN		133	/* Transport endpoint is already connected */
>  #define ENOTCONN	134	/* Transport endpoint is not connected */
>  #define EUCLEAN		135	/* Structure needs cleaning */
> +#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define ENOTNAM		137	/* Not a XENIX named type file */
>  #define ENAVAIL		138	/* No XENIX semaphores available */
>  #define EISNAM		139	/* Is a named type file */
> diff --git a/tools/arch/parisc/include/uapi/asm/errno.h b/tools/arch/parisc/include/uapi/asm/errno.h
> index 8d94739d75c67c..8cbc07c1903e4c 100644
> --- a/tools/arch/parisc/include/uapi/asm/errno.h
> +++ b/tools/arch/parisc/include/uapi/asm/errno.h
> @@ -36,6 +36,7 @@
>  
>  #define	EDOTDOT		66	/* RFS specific error */
>  #define	EBADMSG		67	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EUSERS		68	/* Too many users */
>  #define	EDQUOT		69	/* Quota exceeded */
>  #define	ESTALE		70	/* Stale file handle */
> @@ -62,6 +63,7 @@
>  #define	ERESTART	175	/* Interrupted system call should be restarted */
>  #define	ESTRPIPE	176	/* Streams pipe error */
>  #define	EUCLEAN		177	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		178	/* Not a XENIX named type file */
>  #define	ENAVAIL		179	/* No XENIX semaphores available */
>  #define	EISNAM		180	/* Is a named type file */
> diff --git a/tools/arch/sparc/include/uapi/asm/errno.h b/tools/arch/sparc/include/uapi/asm/errno.h
> index 81a732b902ee38..4a41e7835fd5b8 100644
> --- a/tools/arch/sparc/include/uapi/asm/errno.h
> +++ b/tools/arch/sparc/include/uapi/asm/errno.h
> @@ -48,6 +48,7 @@
>  #define	ENOSR		74	/* Out of streams resources */
>  #define	ENOMSG		75	/* No message of desired type */
>  #define	EBADMSG		76	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EIDRM		77	/* Identifier removed */
>  #define	EDEADLK		78	/* Resource deadlock would occur */
>  #define	ENOLCK		79	/* No record locks available */
> @@ -91,6 +92,7 @@
>  #define	ENOTUNIQ	115	/* Name not unique on network */
>  #define	ERESTART	116	/* Interrupted syscall should be restarted */
>  #define	EUCLEAN		117	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		118	/* Not a XENIX named type file */
>  #define	ENAVAIL		119	/* No XENIX semaphores available */
>  #define	EISNAM		120	/* Is a named type file */
> diff --git a/tools/include/uapi/asm-generic/errno.h b/tools/include/uapi/asm-generic/errno.h
> index cf9c51ac49f97e..92e7ae493ee315 100644
> --- a/tools/include/uapi/asm-generic/errno.h
> +++ b/tools/include/uapi/asm-generic/errno.h
> @@ -55,6 +55,7 @@
>  #define	EMULTIHOP	72	/* Multihop attempted */
>  #define	EDOTDOT		73	/* RFS specific error */
>  #define	EBADMSG		74	/* Not a data message */
> +#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
>  #define	EOVERFLOW	75	/* Value too large for defined data type */
>  #define	ENOTUNIQ	76	/* Name not unique on network */
>  #define	EBADFD		77	/* File descriptor in bad state */
> @@ -98,6 +99,7 @@
>  #define	EINPROGRESS	115	/* Operation now in progress */
>  #define	ESTALE		116	/* Stale file handle */
>  #define	EUCLEAN		117	/* Structure needs cleaning */
> +#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
>  #define	ENOTNAM		118	/* Not a XENIX named type file */
>  #define	ENAVAIL		119	/* No XENIX semaphores available */
>  #define	EISNAM		120	/* Is a named type file */
> 
> 
---end quoted text---

^ permalink raw reply

* [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Darrick J. Wong @ 2025-12-18  2:02 UTC (permalink / raw)
  To: brauner, djwong
  Cc: linux-api, linux-ext4, jack, linux-xfs, linux-fsdevel, gabriel,
	hch, amir73il
In-Reply-To: <176602332085.686273.7564676516217176769.stgit@frogsfrogsfrogs>

From: Darrick J. Wong <djwong@kernel.org>

Stop definining these privately and instead move them to the uapi
errno.h so that they become canonical instead of copy pasta.

Cc: linux-api@vger.kernel.org
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 arch/alpha/include/uapi/asm/errno.h        |    2 ++
 arch/mips/include/uapi/asm/errno.h         |    2 ++
 arch/parisc/include/uapi/asm/errno.h       |    2 ++
 arch/sparc/include/uapi/asm/errno.h        |    2 ++
 fs/erofs/internal.h                        |    2 --
 fs/ext2/ext2.h                             |    1 -
 fs/ext4/ext4.h                             |    3 ---
 fs/f2fs/f2fs.h                             |    3 ---
 fs/minix/minix.h                           |    2 --
 fs/udf/udf_sb.h                            |    2 --
 fs/xfs/xfs_linux.h                         |    2 --
 include/linux/jbd2.h                       |    3 ---
 include/uapi/asm-generic/errno.h           |    2 ++
 tools/arch/alpha/include/uapi/asm/errno.h  |    2 ++
 tools/arch/mips/include/uapi/asm/errno.h   |    2 ++
 tools/arch/parisc/include/uapi/asm/errno.h |    2 ++
 tools/arch/sparc/include/uapi/asm/errno.h  |    2 ++
 tools/include/uapi/asm-generic/errno.h     |    2 ++
 18 files changed, 20 insertions(+), 18 deletions(-)


diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
index 3d265f6babaf0a..6791f6508632ee 100644
--- a/arch/alpha/include/uapi/asm/errno.h
+++ b/arch/alpha/include/uapi/asm/errno.h
@@ -55,6 +55,7 @@
 #define	ENOSR		82	/* Out of streams resources */
 #define	ETIME		83	/* Timer expired */
 #define	EBADMSG		84	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EPROTO		85	/* Protocol error */
 #define	ENODATA		86	/* No data available */
 #define	ENOSTR		87	/* Device not a stream */
@@ -96,6 +97,7 @@
 #define	EREMCHG		115	/* Remote address changed */
 
 #define	EUCLEAN		117	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		118	/* Not a XENIX named type file */
 #define	ENAVAIL		119	/* No XENIX semaphores available */
 #define	EISNAM		120	/* Is a named type file */
diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
--- a/arch/mips/include/uapi/asm/errno.h
+++ b/arch/mips/include/uapi/asm/errno.h
@@ -50,6 +50,7 @@
 #define EDOTDOT		73	/* RFS specific error */
 #define EMULTIHOP	74	/* Multihop attempted */
 #define EBADMSG		77	/* Not a data message */
+#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define ENAMETOOLONG	78	/* File name too long */
 #define EOVERFLOW	79	/* Value too large for defined data type */
 #define ENOTUNIQ	80	/* Name not unique on network */
@@ -88,6 +89,7 @@
 #define EISCONN		133	/* Transport endpoint is already connected */
 #define ENOTCONN	134	/* Transport endpoint is not connected */
 #define EUCLEAN		135	/* Structure needs cleaning */
+#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define ENOTNAM		137	/* Not a XENIX named type file */
 #define ENAVAIL		138	/* No XENIX semaphores available */
 #define EISNAM		139	/* Is a named type file */
diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
index 8d94739d75c67c..8cbc07c1903e4c 100644
--- a/arch/parisc/include/uapi/asm/errno.h
+++ b/arch/parisc/include/uapi/asm/errno.h
@@ -36,6 +36,7 @@
 
 #define	EDOTDOT		66	/* RFS specific error */
 #define	EBADMSG		67	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EUSERS		68	/* Too many users */
 #define	EDQUOT		69	/* Quota exceeded */
 #define	ESTALE		70	/* Stale file handle */
@@ -62,6 +63,7 @@
 #define	ERESTART	175	/* Interrupted system call should be restarted */
 #define	ESTRPIPE	176	/* Streams pipe error */
 #define	EUCLEAN		177	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		178	/* Not a XENIX named type file */
 #define	ENAVAIL		179	/* No XENIX semaphores available */
 #define	EISNAM		180	/* Is a named type file */
diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
index 81a732b902ee38..4a41e7835fd5b8 100644
--- a/arch/sparc/include/uapi/asm/errno.h
+++ b/arch/sparc/include/uapi/asm/errno.h
@@ -48,6 +48,7 @@
 #define	ENOSR		74	/* Out of streams resources */
 #define	ENOMSG		75	/* No message of desired type */
 #define	EBADMSG		76	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EIDRM		77	/* Identifier removed */
 #define	EDEADLK		78	/* Resource deadlock would occur */
 #define	ENOLCK		79	/* No record locks available */
@@ -91,6 +92,7 @@
 #define	ENOTUNIQ	115	/* Name not unique on network */
 #define	ERESTART	116	/* Interrupted syscall should be restarted */
 #define	EUCLEAN		117	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		118	/* Not a XENIX named type file */
 #define	ENAVAIL		119	/* No XENIX semaphores available */
 #define	EISNAM		120	/* Is a named type file */
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f7f622836198da..d06e99baf5d5ae 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -541,6 +541,4 @@ long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
 long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
 			unsigned long arg);
 
-#define EFSCORRUPTED    EUCLEAN         /* Filesystem is corrupted */
-
 #endif	/* __EROFS_INTERNAL_H */
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index cf97b76e9fd3e9..5e0c6c5fcb6cd6 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -357,7 +357,6 @@ struct ext2_inode {
  */
 #define	EXT2_VALID_FS			0x0001	/* Unmounted cleanly */
 #define	EXT2_ERROR_FS			0x0002	/* Errors detected */
-#define	EFSCORRUPTED			EUCLEAN	/* Filesystem is corrupted */
 
 /*
  * Mount flags
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 56112f201cace7..62c091b52bacdf 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3938,7 +3938,4 @@ extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
 				  get_block_t *get_block);
 #endif	/* __KERNEL__ */
 
-#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
-#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
-
 #endif	/* _EXT4_H */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 20edbb99b814a7..9f3aa3c7f12613 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -5004,7 +5004,4 @@ static inline void f2fs_invalidate_internal_cache(struct f2fs_sb_info *sbi,
 	f2fs_invalidate_compress_pages_range(sbi, blkaddr, len);
 }
 
-#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
-#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
-
 #endif /* _LINUX_F2FS_H */
diff --git a/fs/minix/minix.h b/fs/minix/minix.h
index 2bfaf377f2086c..7e1f652f16d311 100644
--- a/fs/minix/minix.h
+++ b/fs/minix/minix.h
@@ -175,6 +175,4 @@ static inline int minix_test_bit(int nr, const void *vaddr)
 	__minix_error_inode((inode), __func__, __LINE__,	\
 			    (fmt), ##__VA_ARGS__)
 
-#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
-
 #endif /* FS_MINIX_H */
diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
index 08ec8756b9487b..8399accc788dea 100644
--- a/fs/udf/udf_sb.h
+++ b/fs/udf/udf_sb.h
@@ -55,8 +55,6 @@
 #define MF_DUPLICATE_MD		0x01
 #define MF_MIRROR_FE_LOADED	0x02
 
-#define EFSCORRUPTED EUCLEAN
-
 struct udf_meta_data {
 	__u32	s_meta_file_loc;
 	__u32	s_mirror_file_loc;
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 4dd747bdbccab2..55064228c4d574 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -121,8 +121,6 @@ typedef __u32			xfs_nlink_t;
 
 #define ENOATTR		ENODATA		/* Attribute not found */
 #define EWRONGFS	EINVAL		/* Mount with wrong filesystem type */
-#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
-#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
 
 #define __return_address __builtin_return_address(0)
 
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f5eaf76198f377..a53a00d36228ce 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1815,7 +1815,4 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle)
 
 #endif	/* __KERNEL__ */
 
-#define EFSBADCRC	EBADMSG		/* Bad CRC detected */
-#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
-
 #endif	/* _LINUX_JBD2_H */
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
index cf9c51ac49f97e..92e7ae493ee315 100644
--- a/include/uapi/asm-generic/errno.h
+++ b/include/uapi/asm-generic/errno.h
@@ -55,6 +55,7 @@
 #define	EMULTIHOP	72	/* Multihop attempted */
 #define	EDOTDOT		73	/* RFS specific error */
 #define	EBADMSG		74	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EOVERFLOW	75	/* Value too large for defined data type */
 #define	ENOTUNIQ	76	/* Name not unique on network */
 #define	EBADFD		77	/* File descriptor in bad state */
@@ -98,6 +99,7 @@
 #define	EINPROGRESS	115	/* Operation now in progress */
 #define	ESTALE		116	/* Stale file handle */
 #define	EUCLEAN		117	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		118	/* Not a XENIX named type file */
 #define	ENAVAIL		119	/* No XENIX semaphores available */
 #define	EISNAM		120	/* Is a named type file */
diff --git a/tools/arch/alpha/include/uapi/asm/errno.h b/tools/arch/alpha/include/uapi/asm/errno.h
index 3d265f6babaf0a..6791f6508632ee 100644
--- a/tools/arch/alpha/include/uapi/asm/errno.h
+++ b/tools/arch/alpha/include/uapi/asm/errno.h
@@ -55,6 +55,7 @@
 #define	ENOSR		82	/* Out of streams resources */
 #define	ETIME		83	/* Timer expired */
 #define	EBADMSG		84	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EPROTO		85	/* Protocol error */
 #define	ENODATA		86	/* No data available */
 #define	ENOSTR		87	/* Device not a stream */
@@ -96,6 +97,7 @@
 #define	EREMCHG		115	/* Remote address changed */
 
 #define	EUCLEAN		117	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		118	/* Not a XENIX named type file */
 #define	ENAVAIL		119	/* No XENIX semaphores available */
 #define	EISNAM		120	/* Is a named type file */
diff --git a/tools/arch/mips/include/uapi/asm/errno.h b/tools/arch/mips/include/uapi/asm/errno.h
index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
--- a/tools/arch/mips/include/uapi/asm/errno.h
+++ b/tools/arch/mips/include/uapi/asm/errno.h
@@ -50,6 +50,7 @@
 #define EDOTDOT		73	/* RFS specific error */
 #define EMULTIHOP	74	/* Multihop attempted */
 #define EBADMSG		77	/* Not a data message */
+#define EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define ENAMETOOLONG	78	/* File name too long */
 #define EOVERFLOW	79	/* Value too large for defined data type */
 #define ENOTUNIQ	80	/* Name not unique on network */
@@ -88,6 +89,7 @@
 #define EISCONN		133	/* Transport endpoint is already connected */
 #define ENOTCONN	134	/* Transport endpoint is not connected */
 #define EUCLEAN		135	/* Structure needs cleaning */
+#define EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define ENOTNAM		137	/* Not a XENIX named type file */
 #define ENAVAIL		138	/* No XENIX semaphores available */
 #define EISNAM		139	/* Is a named type file */
diff --git a/tools/arch/parisc/include/uapi/asm/errno.h b/tools/arch/parisc/include/uapi/asm/errno.h
index 8d94739d75c67c..8cbc07c1903e4c 100644
--- a/tools/arch/parisc/include/uapi/asm/errno.h
+++ b/tools/arch/parisc/include/uapi/asm/errno.h
@@ -36,6 +36,7 @@
 
 #define	EDOTDOT		66	/* RFS specific error */
 #define	EBADMSG		67	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EUSERS		68	/* Too many users */
 #define	EDQUOT		69	/* Quota exceeded */
 #define	ESTALE		70	/* Stale file handle */
@@ -62,6 +63,7 @@
 #define	ERESTART	175	/* Interrupted system call should be restarted */
 #define	ESTRPIPE	176	/* Streams pipe error */
 #define	EUCLEAN		177	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		178	/* Not a XENIX named type file */
 #define	ENAVAIL		179	/* No XENIX semaphores available */
 #define	EISNAM		180	/* Is a named type file */
diff --git a/tools/arch/sparc/include/uapi/asm/errno.h b/tools/arch/sparc/include/uapi/asm/errno.h
index 81a732b902ee38..4a41e7835fd5b8 100644
--- a/tools/arch/sparc/include/uapi/asm/errno.h
+++ b/tools/arch/sparc/include/uapi/asm/errno.h
@@ -48,6 +48,7 @@
 #define	ENOSR		74	/* Out of streams resources */
 #define	ENOMSG		75	/* No message of desired type */
 #define	EBADMSG		76	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EIDRM		77	/* Identifier removed */
 #define	EDEADLK		78	/* Resource deadlock would occur */
 #define	ENOLCK		79	/* No record locks available */
@@ -91,6 +92,7 @@
 #define	ENOTUNIQ	115	/* Name not unique on network */
 #define	ERESTART	116	/* Interrupted syscall should be restarted */
 #define	EUCLEAN		117	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		118	/* Not a XENIX named type file */
 #define	ENAVAIL		119	/* No XENIX semaphores available */
 #define	EISNAM		120	/* Is a named type file */
diff --git a/tools/include/uapi/asm-generic/errno.h b/tools/include/uapi/asm-generic/errno.h
index cf9c51ac49f97e..92e7ae493ee315 100644
--- a/tools/include/uapi/asm-generic/errno.h
+++ b/tools/include/uapi/asm-generic/errno.h
@@ -55,6 +55,7 @@
 #define	EMULTIHOP	72	/* Multihop attempted */
 #define	EDOTDOT		73	/* RFS specific error */
 #define	EBADMSG		74	/* Not a data message */
+#define	EFSBADCRC	EBADMSG	/* Bad CRC detected */
 #define	EOVERFLOW	75	/* Value too large for defined data type */
 #define	ENOTUNIQ	76	/* Name not unique on network */
 #define	EBADFD		77	/* File descriptor in bad state */
@@ -98,6 +99,7 @@
 #define	EINPROGRESS	115	/* Operation now in progress */
 #define	ESTALE		116	/* Stale file handle */
 #define	EUCLEAN		117	/* Structure needs cleaning */
+#define	EFSCORRUPTED	EUCLEAN	/* Filesystem is corrupted */
 #define	ENOTNAM		118	/* Not a XENIX named type file */
 #define	ENAVAIL		119	/* No XENIX semaphores available */
 #define	EISNAM		120	/* Is a named type file */


^ permalink raw reply related

* [PATCHSET V4 1/2] fs: generic file IO error reporting
From: Darrick J. Wong @ 2025-12-18  2:02 UTC (permalink / raw)
  To: brauner, djwong
  Cc: linux-api, hch, linux-ext4, jack, linux-xfs, linux-fsdevel,
	gabriel, hch, amir73il

Hi all,

This patchset adds some generic helpers so that filesystems can report
errors to fsnotify in a standard way.  Then it adapts iomap to use the
generic helpers so that any iomap-enabled filesystem can report I/O
errors through this mechanism as well.  Finally, it makes XFS report
metadata errors through this mechanism in much the same way that ext4
does now.

These are a prerequisite for the XFS self-healing V4 series which will
come at a later time.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=filesystem-error-reporting

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=filesystem-error-reporting
---
Commits in this patchset:
 * uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
 * fs: report filesystem and file I/O errors to fsnotify
 * iomap: report file I/O errors to the VFS
 * xfs: report fs metadata errors via fsnotify
 * xfs: translate fsdax media errors into file "data lost" errors when convenient
 * ext4: convert to new fserror helpers
---
 arch/alpha/include/uapi/asm/errno.h        |    2 
 arch/mips/include/uapi/asm/errno.h         |    2 
 arch/parisc/include/uapi/asm/errno.h       |    2 
 arch/sparc/include/uapi/asm/errno.h        |    2 
 fs/erofs/internal.h                        |    2 
 fs/ext2/ext2.h                             |    1 
 fs/ext4/ext4.h                             |    3 -
 fs/f2fs/f2fs.h                             |    3 -
 fs/minix/minix.h                           |    2 
 fs/udf/udf_sb.h                            |    2 
 fs/xfs/xfs_linux.h                         |    2 
 include/linux/fs/super_types.h             |    7 +
 include/linux/fserror.h                    |   93 ++++++++++++++++
 include/linux/jbd2.h                       |    3 -
 include/uapi/asm-generic/errno.h           |    2 
 tools/arch/alpha/include/uapi/asm/errno.h  |    2 
 tools/arch/mips/include/uapi/asm/errno.h   |    2 
 tools/arch/parisc/include/uapi/asm/errno.h |    2 
 tools/arch/sparc/include/uapi/asm/errno.h  |    2 
 tools/include/uapi/asm-generic/errno.h     |    2 
 fs/Makefile                                |    2 
 fs/ext4/ioctl.c                            |    2 
 fs/ext4/super.c                            |   13 ++
 fs/fserror.c                               |  168 ++++++++++++++++++++++++++++
 fs/iomap/buffered-io.c                     |   23 ++++
 fs/iomap/direct-io.c                       |   12 ++
 fs/iomap/ioend.c                           |    6 +
 fs/super.c                                 |    3 +
 fs/xfs/xfs_fsops.c                         |    4 +
 fs/xfs/xfs_health.c                        |   14 ++
 fs/xfs/xfs_notify_failure.c                |    4 +
 31 files changed, 365 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/fserror.h
 create mode 100644 fs/fserror.c


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox