* [RFC 00/19] Kernel API Specification Framework
@ 2025-06-14 13:48 Sasha Levin
2025-06-14 13:48 ` [RFC 01/19] kernel/api: introduce kernel API specification framework Sasha Levin
` (20 more replies)
0 siblings, 21 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
This patch series introduces a framework for formally specifying kernel
APIs, addressing the long-standing challenge of maintaining stable
interfaces between the kernel and user-space programs. As outlined in
previous discussions about kernel ABI stability, the lack of
machine-readable API specifications has led to inadvertent breakages and
inconsistent validation across system calls and IOCTLs.
The framework provides three key components: declarative macros for
specifying system call and IOCTL interfaces directly in the kernel
source, automated extraction tools for generating machine-readable
specifications, and a runtime validation infrastructure accessible
through debugfs. By embedding specifications alongside implementation
code, we ensure they remain synchronized and enable automated detection
of API/ABI changes that could break user-space applications.
This implementation demonstrates the approach with specifications for
core system calls (epoll, exec, mlock families) and complex IOCTL
interfaces (binder, fwctl). The specifications capture parameter types,
validation rules, return values, and error conditions in a structured
format that enables both documentation generation and runtime
verification. Future work will expand coverage to additional subsystems
and integrate with existing testing infrastructure to provide
API compatibility guarantees.
To complement the framework, we introduce the 'kapi' tool - a
utility for extracting and analyzing kernel API specifications from
multiple sources. The tool can extract specifications from kernel source
code (parsing KAPI macros), compiled vmlinux binaries (reading the
.kapi_specs ELF section), or from a running kernel via debugfs. It
supports multiple output formats (plain text, JSON, RST) to facilitate
integration with documentation systems and automated testing workflows.
This tool enables developers to easily inspect API specifications,
verify changes across kernel versions, and generate documentation
without requiring kernel rebuilds.
Sasha Levin (19):
kernel/api: introduce kernel API specification framework
eventpoll: add API specification for epoll_create1
eventpoll: add API specification for epoll_create
eventpoll: add API specification for epoll_ctl
eventpoll: add API specification for epoll_wait
eventpoll: add API specification for epoll_pwait
eventpoll: add API specification for epoll_pwait2
exec: add API specification for execve
exec: add API specification for execveat
mm/mlock: add API specification for mlock
mm/mlock: add API specification for mlock2
mm/mlock: add API specification for mlockall
mm/mlock: add API specification for munlock
mm/mlock: add API specification for munlockall
kernel/api: add debugfs interface for kernel API specifications
kernel/api: add IOCTL specification infrastructure
fwctl: add detailed IOCTL API specifications
binder: add detailed IOCTL API specifications
tools/kapi: Add kernel API specification extraction tool
Documentation/admin-guide/kernel-api-spec.rst | 699 +++++++++
MAINTAINERS | 9 +
arch/um/kernel/dyn.lds.S | 3 +
arch/um/kernel/uml.lds.S | 3 +
arch/x86/kernel/vmlinux.lds.S | 3 +
drivers/android/binder.c | 758 ++++++++++
drivers/fwctl/main.c | 295 +++-
fs/eventpoll.c | 1056 ++++++++++++++
fs/exec.c | 463 ++++++
include/asm-generic/vmlinux.lds.h | 20 +
include/linux/ioctl_api_spec.h | 540 +++++++
include/linux/kernel_api_spec.h | 942 ++++++++++++
include/linux/syscall_api_spec.h | 341 +++++
include/linux/syscalls.h | 1 +
init/Kconfig | 2 +
kernel/Makefile | 1 +
kernel/api/Kconfig | 55 +
kernel/api/Makefile | 13 +
kernel/api/ioctl_validation.c | 360 +++++
kernel/api/kapi_debugfs.c | 340 +++++
kernel/api/kernel_api_spec.c | 1257 +++++++++++++++++
mm/mlock.c | 646 +++++++++
tools/kapi/.gitignore | 4 +
tools/kapi/Cargo.toml | 19 +
tools/kapi/src/extractor/debugfs.rs | 204 +++
tools/kapi/src/extractor/mod.rs | 95 ++
tools/kapi/src/extractor/source_parser.rs | 488 +++++++
.../src/extractor/vmlinux/binary_utils.rs | 130 ++
tools/kapi/src/extractor/vmlinux/mod.rs | 372 +++++
tools/kapi/src/formatter/json.rs | 170 +++
tools/kapi/src/formatter/mod.rs | 68 +
tools/kapi/src/formatter/plain.rs | 99 ++
tools/kapi/src/formatter/rst.rs | 144 ++
tools/kapi/src/main.rs | 121 ++
34 files changed, 9719 insertions(+), 2 deletions(-)
create mode 100644 Documentation/admin-guide/kernel-api-spec.rst
create mode 100644 include/linux/ioctl_api_spec.h
create mode 100644 include/linux/kernel_api_spec.h
create mode 100644 include/linux/syscall_api_spec.h
create mode 100644 kernel/api/Kconfig
create mode 100644 kernel/api/Makefile
create mode 100644 kernel/api/ioctl_validation.c
create mode 100644 kernel/api/kapi_debugfs.c
create mode 100644 kernel/api/kernel_api_spec.c
create mode 100644 tools/kapi/.gitignore
create mode 100644 tools/kapi/Cargo.toml
create mode 100644 tools/kapi/src/extractor/debugfs.rs
create mode 100644 tools/kapi/src/extractor/mod.rs
create mode 100644 tools/kapi/src/extractor/source_parser.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
create mode 100644 tools/kapi/src/formatter/json.rs
create mode 100644 tools/kapi/src/formatter/mod.rs
create mode 100644 tools/kapi/src/formatter/plain.rs
create mode 100644 tools/kapi/src/formatter/rst.rs
create mode 100644 tools/kapi/src/main.rs
--
2.39.5
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC 01/19] kernel/api: introduce kernel API specification framework
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 02/19] eventpoll: add API specification for epoll_create1 Sasha Levin
` (19 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add a comprehensive framework for formally documenting kernel APIs with
inline specifications. This framework provides:
- Structured API documentation with parameter specifications, return
values, error conditions, and execution context requirements
- Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
- Export of specifications via debugfs for tooling integration
- Support for both internal kernel APIs and system calls
The framework stores specifications in a dedicated ELF section and
provides infrastructure for:
- Compile-time validation of specifications
- Runtime querying of API documentation
- Machine-readable export formats
- Integration with existing SYSCALL_DEFINE macros
This commit introduces the core infrastructure without modifying any
existing APIs. Subsequent patches will add specifications to individual
subsystems.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Documentation/admin-guide/kernel-api-spec.rst | 507 +++++++
MAINTAINERS | 9 +
arch/um/kernel/dyn.lds.S | 3 +
arch/um/kernel/uml.lds.S | 3 +
arch/x86/kernel/vmlinux.lds.S | 3 +
include/asm-generic/vmlinux.lds.h | 20 +
include/linux/kernel_api_spec.h | 942 +++++++++++++
include/linux/syscall_api_spec.h | 341 +++++
include/linux/syscalls.h | 1 +
init/Kconfig | 2 +
kernel/Makefile | 1 +
kernel/api/Kconfig | 35 +
kernel/api/Makefile | 7 +
kernel/api/kernel_api_spec.c | 1169 +++++++++++++++++
14 files changed, 3043 insertions(+)
create mode 100644 Documentation/admin-guide/kernel-api-spec.rst
create mode 100644 include/linux/kernel_api_spec.h
create mode 100644 include/linux/syscall_api_spec.h
create mode 100644 kernel/api/Kconfig
create mode 100644 kernel/api/Makefile
create mode 100644 kernel/api/kernel_api_spec.c
diff --git a/Documentation/admin-guide/kernel-api-spec.rst b/Documentation/admin-guide/kernel-api-spec.rst
new file mode 100644
index 0000000000000..3a63f6711e27b
--- /dev/null
+++ b/Documentation/admin-guide/kernel-api-spec.rst
@@ -0,0 +1,507 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Kernel API Specification Framework
+======================================
+
+:Author: Sasha Levin <sashal@kernel.org>
+:Date: June 2025
+
+.. contents:: Table of Contents
+ :depth: 3
+ :local:
+
+Introduction
+============
+
+The Kernel API Specification Framework (KAPI) provides a comprehensive system for
+formally documenting, validating, and introspecting kernel APIs. This framework
+addresses the long-standing challenge of maintaining accurate, machine-readable
+documentation for the thousands of internal kernel APIs and system calls.
+
+Purpose and Goals
+-----------------
+
+The framework aims to:
+
+1. **Improve API Documentation**: Provide structured, inline documentation that
+ lives alongside the code and is maintained as part of the development process.
+
+2. **Enable Runtime Validation**: Optionally validate API usage at runtime to catch
+ common programming errors during development and testing.
+
+3. **Support Tooling**: Export API specifications in machine-readable formats for
+ use by static analyzers, documentation generators, and development tools.
+
+4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
+ for debugging and introspection.
+
+5. **Formalize Contracts**: Explicitly document API contracts including parameter
+ constraints, execution contexts, locking requirements, and side effects.
+
+Architecture Overview
+=====================
+
+Components
+----------
+
+The framework consists of several key components:
+
+1. **Core Framework** (``kernel/api/kernel_api_spec.c``)
+
+ - API specification registration and storage
+ - Runtime validation engine
+ - Specification lookup and querying
+
+2. **DebugFS Interface** (``kernel/api/kapi_debugfs.c``)
+
+ - Runtime introspection via ``/sys/kernel/debug/kapi/``
+ - JSON and XML export formats
+ - Per-API detailed information
+
+3. **IOCTL Support** (``kernel/api/ioctl_validation.c``)
+
+ - Extended framework for IOCTL specifications
+ - Automatic validation wrappers
+ - Structure field validation
+
+4. **Specification Macros** (``include/linux/kernel_api_spec.h``)
+
+ - Declarative macros for API documentation
+ - Type-safe parameter specifications
+ - Context and constraint definitions
+
+Data Model
+----------
+
+The framework uses a hierarchical data model::
+
+ kernel_api_spec
+ ├── Basic Information
+ │ ├── name (API function name)
+ │ ├── version (specification version)
+ │ ├── description (human-readable description)
+ │ └── kernel_version (when API was introduced)
+ │
+ ├── Parameters (up to 16)
+ │ └── kapi_param_spec
+ │ ├── name
+ │ ├── type (int, pointer, string, etc.)
+ │ ├── direction (in, out, inout)
+ │ ├── constraints (range, mask, enum values)
+ │ └── validation rules
+ │
+ ├── Return Value
+ │ └── kapi_return_spec
+ │ ├── type
+ │ ├── success conditions
+ │ └── validation rules
+ │
+ ├── Error Conditions (up to 32)
+ │ └── kapi_error_spec
+ │ ├── error code
+ │ ├── condition description
+ │ └── recovery advice
+ │
+ ├── Execution Context
+ │ ├── allowed contexts (process, interrupt, etc.)
+ │ ├── locking requirements
+ │ └── preemption/interrupt state
+ │
+ └── Side Effects
+ ├── memory allocation
+ ├── state changes
+ └── signal handling
+
+Usage Guide
+===========
+
+Basic API Specification
+-----------------------
+
+To document a kernel API, use the specification macros in the implementation file:
+
+.. code-block:: c
+
+ #include <linux/kernel_api_spec.h>
+
+ KAPI_DEFINE_SPEC(kmalloc_spec, kmalloc, "3.0")
+ KAPI_DESCRIPTION("Allocate kernel memory")
+ KAPI_PARAM(0, size, KAPI_TYPE_SIZE_T, KAPI_DIR_IN,
+ "Number of bytes to allocate")
+ KAPI_PARAM_RANGE(0, 0, KMALLOC_MAX_SIZE)
+ KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+ "Allocation flags (GFP_*)")
+ KAPI_PARAM_MASK(1, __GFP_BITS_MASK)
+ KAPI_RETURN(KAPI_TYPE_POINTER, "Pointer to allocated memory or NULL")
+ KAPI_ERROR(ENOMEM, "Out of memory")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SOFTIRQ | KAPI_CTX_HARDIRQ)
+ KAPI_SIDE_EFFECT("Allocates memory from kernel heap")
+ KAPI_LOCK_NOT_REQUIRED("Any lock")
+ KAPI_END_SPEC
+
+ void *kmalloc(size_t size, gfp_t flags)
+ {
+ /* Implementation */
+ }
+
+System Call Specification
+-------------------------
+
+System calls use specialized macros:
+
+.. code-block:: c
+
+ KAPI_DEFINE_SYSCALL_SPEC(open_spec, open, "1.0")
+ KAPI_DESCRIPTION("Open a file")
+ KAPI_PARAM(0, pathname, KAPI_TYPE_USER_STRING, KAPI_DIR_IN,
+ "Path to file")
+ KAPI_PARAM_PATH(0, PATH_MAX)
+ KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+ "Open flags (O_*)")
+ KAPI_PARAM(2, mode, KAPI_TYPE_MODE_T, KAPI_DIR_IN,
+ "File permissions (if creating)")
+ KAPI_RETURN(KAPI_TYPE_INT, "File descriptor or -1")
+ KAPI_ERROR(EACCES, "Permission denied")
+ KAPI_ERROR(ENOENT, "File does not exist")
+ KAPI_ERROR(EMFILE, "Too many open files")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+ KAPI_SIGNAL(EINTR, "Open can be interrupted by signal")
+ KAPI_END_SYSCALL_SPEC
+
+IOCTL Specification
+-------------------
+
+IOCTLs have extended support for structure validation:
+
+.. code-block:: c
+
+ KAPI_DEFINE_IOCTL_SPEC(vidioc_querycap_spec, VIDIOC_QUERYCAP,
+ "VIDIOC_QUERYCAP",
+ sizeof(struct v4l2_capability),
+ sizeof(struct v4l2_capability),
+ "video_fops")
+ KAPI_DESCRIPTION("Query device capabilities")
+ KAPI_IOCTL_FIELD(driver, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+ "Driver name", 16)
+ KAPI_IOCTL_FIELD(card, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+ "Device name", 32)
+ KAPI_IOCTL_FIELD(version, KAPI_TYPE_U32, KAPI_DIR_OUT,
+ "Driver version")
+ KAPI_IOCTL_FIELD(capabilities, KAPI_TYPE_FLAGS, KAPI_DIR_OUT,
+ "Device capabilities")
+ KAPI_END_IOCTL_SPEC
+
+Runtime Validation
+==================
+
+Enabling Validation
+-------------------
+
+Runtime validation is controlled by kernel configuration:
+
+1. Enable ``CONFIG_KAPI_SPEC`` to build the framework
+2. Enable ``CONFIG_KAPI_RUNTIME_CHECKS`` for runtime validation
+3. Optionally enable ``CONFIG_KAPI_SPEC_DEBUGFS`` for debugfs interface
+
+Validation Modes
+----------------
+
+The framework supports several validation modes:
+
+.. code-block:: c
+
+ /* Enable validation for specific API */
+ kapi_enable_validation("kmalloc");
+
+ /* Enable validation for all APIs */
+ kapi_enable_all_validation();
+
+ /* Set validation level */
+ kapi_set_validation_level(KAPI_VALIDATE_FULL);
+
+Validation Levels:
+
+- ``KAPI_VALIDATE_NONE``: No validation
+- ``KAPI_VALIDATE_BASIC``: Type and NULL checks only
+- ``KAPI_VALIDATE_NORMAL``: Basic + range and constraint checks
+- ``KAPI_VALIDATE_FULL``: All checks including custom validators
+
+Custom Validators
+-----------------
+
+APIs can register custom validation functions:
+
+.. code-block:: c
+
+ static bool validate_buffer_size(const struct kapi_param_spec *spec,
+ const void *value, void *context)
+ {
+ size_t size = *(size_t *)value;
+ struct my_context *ctx = context;
+
+ return size > 0 && size <= ctx->max_buffer_size;
+ }
+
+ KAPI_PARAM_CUSTOM_VALIDATOR(0, validate_buffer_size)
+
+DebugFS Interface
+=================
+
+The debugfs interface provides runtime access to API specifications:
+
+Directory Structure
+-------------------
+
+::
+
+ /sys/kernel/debug/kapi/
+ ├── apis/ # All registered APIs
+ │ ├── kmalloc/
+ │ │ ├── specification # Human-readable spec
+ │ │ ├── json # JSON format
+ │ │ └── xml # XML format
+ │ └── open/
+ │ └── ...
+ ├── summary # Overview of all APIs
+ ├── validation/ # Validation controls
+ │ ├── enabled # Global enable/disable
+ │ ├── level # Validation level
+ │ └── stats # Validation statistics
+ └── export/ # Bulk export options
+ ├── all.json # All specs in JSON
+ └── all.xml # All specs in XML
+
+Usage Examples
+--------------
+
+Query specific API::
+
+ $ cat /sys/kernel/debug/kapi/apis/kmalloc/specification
+ API: kmalloc
+ Version: 3.0
+ Description: Allocate kernel memory
+
+ Parameters:
+ [0] size (size_t, in): Number of bytes to allocate
+ Range: 0 - 4194304
+ [1] flags (flags, in): Allocation flags (GFP_*)
+ Mask: 0x1ffffff
+
+ Returns: pointer - Pointer to allocated memory or NULL
+
+ Errors:
+ ENOMEM: Out of memory
+
+ Context: process, softirq, hardirq
+
+ Side Effects:
+ - Allocates memory from kernel heap
+
+Export all specifications::
+
+ $ cat /sys/kernel/debug/kapi/export/all.json > kernel-apis.json
+
+Enable validation for specific API::
+
+ $ echo 1 > /sys/kernel/debug/kapi/apis/kmalloc/validate
+
+Performance Considerations
+==========================
+
+Memory Overhead
+---------------
+
+Each API specification consumes approximately 2-4KB of memory. With thousands
+of kernel APIs, this can add up to several megabytes. Consider:
+
+1. Building with ``CONFIG_KAPI_SPEC=n`` for production kernels
+2. Using ``__init`` annotations for APIs only used during boot
+3. Implementing lazy loading for rarely used specifications
+
+Runtime Overhead
+----------------
+
+When ``CONFIG_KAPI_RUNTIME_CHECKS`` is enabled:
+
+- Each validated API call adds 50-200ns overhead
+- Complex validations (custom validators) may add more
+- Use validation only in development/testing kernels
+
+Optimization Strategies
+-----------------------
+
+1. **Compile-time optimization**: When validation is disabled, all
+ validation code is optimized away by the compiler.
+
+2. **Selective validation**: Enable validation only for specific APIs
+ or subsystems under test.
+
+3. **Caching**: The framework caches validation results for repeated
+ calls with identical parameters.
+
+Documentation Generation
+------------------------
+
+The framework exports specifications via debugfs that can be used
+to generate documentation. Tools for automatic documentation generation
+from specifications are planned for future development.
+
+IDE Integration
+---------------
+
+Modern IDEs can use the JSON export for:
+
+- Parameter hints
+- Type checking
+- Context validation
+- Error code documentation
+
+Testing Framework
+-----------------
+
+The framework includes test helpers::
+
+ #ifdef CONFIG_KAPI_TESTING
+ /* Verify API behaves according to specification */
+ kapi_test_api("kmalloc", test_cases);
+ #endif
+
+Best Practices
+==============
+
+Writing Specifications
+----------------------
+
+1. **Be Comprehensive**: Document all parameters, errors, and side effects
+2. **Keep Updated**: Update specs when API behavior changes
+3. **Use Examples**: Include usage examples in descriptions
+4. **Validate Constraints**: Define realistic constraints for parameters
+5. **Document Context**: Clearly specify allowed execution contexts
+
+Maintenance
+-----------
+
+1. **Version Specifications**: Increment version when API changes
+2. **Deprecation**: Mark deprecated APIs and suggest replacements
+3. **Cross-reference**: Link related APIs in descriptions
+4. **Test Specifications**: Verify specs match implementation
+
+Common Patterns
+---------------
+
+**Optional Parameters**::
+
+ KAPI_PARAM(2, optional_arg, KAPI_TYPE_POINTER, KAPI_DIR_IN,
+ "Optional argument (may be NULL)")
+ KAPI_PARAM_OPTIONAL(2)
+
+**Variable Arguments**::
+
+ KAPI_PARAM(1, fmt, KAPI_TYPE_FORMAT_STRING, KAPI_DIR_IN,
+ "Printf-style format string")
+ KAPI_PARAM_VARIADIC(2, "Format arguments")
+
+**Callback Functions**::
+
+ KAPI_PARAM(1, callback, KAPI_TYPE_FUNCTION_PTR, KAPI_DIR_IN,
+ "Callback function")
+ KAPI_PARAM_CALLBACK(1, "int (*)(void *data)", "data")
+
+Troubleshooting
+===============
+
+Common Issues
+-------------
+
+**Specification Not Found**::
+
+ kernel: KAPI: Specification for 'my_api' not found
+
+ Solution: Ensure KAPI_DEFINE_SPEC is in the same translation unit
+ as the function implementation.
+
+**Validation Failures**::
+
+ kernel: KAPI: Validation failed for kmalloc parameter 'size':
+ value 5242880 exceeds maximum 4194304
+
+ Solution: Check parameter constraints or adjust specification if
+ the constraint is incorrect.
+
+**Build Errors**::
+
+ error: 'KAPI_TYPE_UNKNOWN' undeclared
+
+ Solution: Include <linux/kernel_api_spec.h> and ensure
+ CONFIG_KAPI_SPEC is enabled.
+
+Debug Options
+-------------
+
+Enable verbose debugging::
+
+ echo 8 > /proc/sys/kernel/printk
+ echo 1 > /sys/kernel/debug/kapi/debug/verbose
+
+Future Directions
+=================
+
+Planned Features
+----------------
+
+1. **Automatic Extraction**: Tool to extract specifications from existing
+ kernel-doc comments
+
+2. **Contract Verification**: Static analysis to verify implementation
+ matches specification
+
+3. **Performance Profiling**: Measure actual API performance against
+ documented expectations
+
+4. **Fuzzing Integration**: Use specifications to guide intelligent
+ fuzzing of kernel APIs
+
+5. **Version Compatibility**: Track API changes across kernel versions
+
+Research Areas
+--------------
+
+1. **Formal Verification**: Use specifications for mathematical proofs
+ of correctness
+
+2. **Runtime Monitoring**: Detect specification violations in production
+ with minimal overhead
+
+3. **API Evolution**: Analyze how kernel APIs change over time
+
+4. **Security Applications**: Use specifications for security policy
+ enforcement
+
+Contributing
+============
+
+Submitting Specifications
+-------------------------
+
+1. Add specifications to the same file as the API implementation
+2. Follow existing patterns and naming conventions
+3. Test with CONFIG_KAPI_RUNTIME_CHECKS enabled
+4. Verify debugfs output is correct
+5. Run scripts/checkpatch.pl on your changes
+
+Review Criteria
+---------------
+
+Specifications will be reviewed for:
+
+1. **Completeness**: All parameters and errors documented
+2. **Accuracy**: Specification matches implementation
+3. **Clarity**: Descriptions are clear and helpful
+4. **Consistency**: Follows framework conventions
+5. **Performance**: No unnecessary runtime overhead
+
+Contact
+-------
+
+- Maintainer: Sasha Levin <sashal@kernel.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index a92290fffa163..7a2cb663131bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13047,6 +13047,15 @@ W: https://linuxtv.org
T: git git://linuxtv.org/media.git
F: drivers/media/radio/radio-keene*
+KERNEL API SPECIFICATION FRAMEWORK (KAPI)
+M: Sasha Levin <sashal@kernel.org>
+L: linux-api@vger.kernel.org
+S: Maintained
+F: Documentation/admin-guide/kernel-api-spec.rst
+F: include/linux/kernel_api_spec.h
+F: kernel/api/
+F: scripts/extract-kapi-spec.sh
+
KERNEL AUTOMOUNTER
M: Ian Kent <raven@themaw.net>
L: autofs@vger.kernel.org
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index a36b7918a011a..283ab11788d8c 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -102,6 +102,9 @@ SECTIONS
init.data : { INIT_DATA }
__init_end = .;
+ /* Kernel API specifications in dedicated section */
+ KAPI_SPECS_SECTION()
+
/* Ensure the __preinit_array_start label is properly aligned. We
could instead move the label definition inside the section, but
the linker would then create the section even if it turns out to
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index a409d4b66114f..e3850d8293436 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -74,6 +74,9 @@ SECTIONS
init.data : { INIT_DATA }
__init_end = .;
+ /* Kernel API specifications in dedicated section */
+ KAPI_SPECS_SECTION()
+
.data :
{
INIT_TASK_DATA(KERNEL_STACK_SIZE)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4fa0be732af10..8cc508adc9d51 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -173,6 +173,9 @@ SECTIONS
RO_DATA(PAGE_SIZE)
X86_ALIGN_RODATA_END
+ /* Kernel API specifications in dedicated section */
+ KAPI_SPECS_SECTION()
+
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) {
/* Start of data section */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index fa5f19b8d53a0..7b47736057e01 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -279,6 +279,26 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
#define TRACE_SYSCALLS()
#endif
+#ifdef CONFIG_KAPI_SPEC
+#define KAPI_SPECS() \
+ . = ALIGN(8); \
+ __start_kapi_specs = .; \
+ KEEP(*(.kapi_specs)) \
+ __stop_kapi_specs = .;
+
+/* For placing KAPI specs in a dedicated section */
+#define KAPI_SPECS_SECTION() \
+ .kapi_specs : AT(ADDR(.kapi_specs) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __start_kapi_specs = .; \
+ KEEP(*(.kapi_specs)) \
+ __stop_kapi_specs = .; \
+ }
+#else
+#define KAPI_SPECS()
+#define KAPI_SPECS_SECTION()
+#endif
+
#ifdef CONFIG_BPF_EVENTS
#define BPF_RAW_TP() STRUCT_ALIGN(); \
BOUNDED_SECTION_BY(__bpf_raw_tp_map, __bpf_raw_tp)
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
new file mode 100644
index 0000000000000..04df5892bc6d6
--- /dev/null
+++ b/include/linux/kernel_api_spec.h
@@ -0,0 +1,942 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * kernel_api_spec.h - Kernel API Formal Specification Framework
+ *
+ * This framework provides structures and macros to formally specify kernel APIs
+ * in both human and machine-readable formats. It supports comprehensive documentation
+ * of function signatures, parameters, return values, error conditions, and constraints.
+ */
+
+#ifndef _LINUX_KERNEL_API_SPEC_H
+#define _LINUX_KERNEL_API_SPEC_H
+
+#include <linux/types.h>
+#include <linux/stringify.h>
+#include <linux/compiler.h>
+
+#define KAPI_MAX_PARAMS 16
+#define KAPI_MAX_ERRORS 32
+#define KAPI_MAX_CONSTRAINTS 16
+#define KAPI_MAX_SIGNALS 32
+#define KAPI_MAX_NAME_LEN 128
+#define KAPI_MAX_DESC_LEN 512
+
+/**
+ * enum kapi_param_type - Parameter type classification
+ * @KAPI_TYPE_VOID: void type
+ * @KAPI_TYPE_INT: Integer types (int, long, etc.)
+ * @KAPI_TYPE_UINT: Unsigned integer types
+ * @KAPI_TYPE_PTR: Pointer types
+ * @KAPI_TYPE_STRUCT: Structure types
+ * @KAPI_TYPE_UNION: Union types
+ * @KAPI_TYPE_ENUM: Enumeration types
+ * @KAPI_TYPE_FUNC_PTR: Function pointer types
+ * @KAPI_TYPE_ARRAY: Array types
+ * @KAPI_TYPE_FD: File descriptor - validated in process context
+ * @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size
+ * @KAPI_TYPE_PATH: Pathname - validated for access and path limits
+ * @KAPI_TYPE_CUSTOM: Custom/complex types
+ */
+enum kapi_param_type {
+ KAPI_TYPE_VOID = 0,
+ KAPI_TYPE_INT,
+ KAPI_TYPE_UINT,
+ KAPI_TYPE_PTR,
+ KAPI_TYPE_STRUCT,
+ KAPI_TYPE_UNION,
+ KAPI_TYPE_ENUM,
+ KAPI_TYPE_FUNC_PTR,
+ KAPI_TYPE_ARRAY,
+ KAPI_TYPE_FD, /* File descriptor - validated in process context */
+ KAPI_TYPE_USER_PTR, /* User space pointer - validated for access and size */
+ KAPI_TYPE_PATH, /* Pathname - validated for access and path limits */
+ KAPI_TYPE_CUSTOM,
+};
+
+/**
+ * enum kapi_param_flags - Parameter attribute flags
+ * @KAPI_PARAM_IN: Input parameter
+ * @KAPI_PARAM_OUT: Output parameter
+ * @KAPI_PARAM_INOUT: Input/output parameter
+ * @KAPI_PARAM_OPTIONAL: Optional parameter (can be NULL)
+ * @KAPI_PARAM_CONST: Const qualified parameter
+ * @KAPI_PARAM_VOLATILE: Volatile qualified parameter
+ * @KAPI_PARAM_USER: User space pointer
+ * @KAPI_PARAM_DMA: DMA-capable memory required
+ * @KAPI_PARAM_ALIGNED: Alignment requirements
+ */
+enum kapi_param_flags {
+ KAPI_PARAM_IN = (1 << 0),
+ KAPI_PARAM_OUT = (1 << 1),
+ KAPI_PARAM_INOUT = (1 << 2),
+ KAPI_PARAM_OPTIONAL = (1 << 3),
+ KAPI_PARAM_CONST = (1 << 4),
+ KAPI_PARAM_VOLATILE = (1 << 5),
+ KAPI_PARAM_USER = (1 << 6),
+ KAPI_PARAM_DMA = (1 << 7),
+ KAPI_PARAM_ALIGNED = (1 << 8),
+};
+
+/**
+ * enum kapi_context_flags - Function execution context flags
+ * @KAPI_CTX_PROCESS: Can be called from process context
+ * @KAPI_CTX_SOFTIRQ: Can be called from softirq context
+ * @KAPI_CTX_HARDIRQ: Can be called from hardirq context
+ * @KAPI_CTX_NMI: Can be called from NMI context
+ * @KAPI_CTX_ATOMIC: Must be called in atomic context
+ * @KAPI_CTX_SLEEPABLE: May sleep
+ * @KAPI_CTX_PREEMPT_DISABLED: Requires preemption disabled
+ * @KAPI_CTX_IRQ_DISABLED: Requires interrupts disabled
+ */
+enum kapi_context_flags {
+ KAPI_CTX_PROCESS = (1 << 0),
+ KAPI_CTX_SOFTIRQ = (1 << 1),
+ KAPI_CTX_HARDIRQ = (1 << 2),
+ KAPI_CTX_NMI = (1 << 3),
+ KAPI_CTX_ATOMIC = (1 << 4),
+ KAPI_CTX_SLEEPABLE = (1 << 5),
+ KAPI_CTX_PREEMPT_DISABLED = (1 << 6),
+ KAPI_CTX_IRQ_DISABLED = (1 << 7),
+};
+
+/**
+ * enum kapi_lock_type - Lock types used/required by the function
+ * @KAPI_LOCK_NONE: No locking requirements
+ * @KAPI_LOCK_MUTEX: Mutex lock
+ * @KAPI_LOCK_SPINLOCK: Spinlock
+ * @KAPI_LOCK_RWLOCK: Read-write lock
+ * @KAPI_LOCK_SEQLOCK: Sequence lock
+ * @KAPI_LOCK_RCU: RCU lock
+ * @KAPI_LOCK_SEMAPHORE: Semaphore
+ * @KAPI_LOCK_CUSTOM: Custom locking mechanism
+ */
+enum kapi_lock_type {
+ KAPI_LOCK_NONE = 0,
+ KAPI_LOCK_MUTEX,
+ KAPI_LOCK_SPINLOCK,
+ KAPI_LOCK_RWLOCK,
+ KAPI_LOCK_SEQLOCK,
+ KAPI_LOCK_RCU,
+ KAPI_LOCK_SEMAPHORE,
+ KAPI_LOCK_CUSTOM,
+};
+
+/**
+ * enum kapi_constraint_type - Types of parameter constraints
+ * @KAPI_CONSTRAINT_NONE: No constraint
+ * @KAPI_CONSTRAINT_RANGE: Numeric range constraint
+ * @KAPI_CONSTRAINT_MASK: Bitmask constraint
+ * @KAPI_CONSTRAINT_ENUM: Enumerated values constraint
+ * @KAPI_CONSTRAINT_CUSTOM: Custom validation function
+ */
+enum kapi_constraint_type {
+ KAPI_CONSTRAINT_NONE = 0,
+ KAPI_CONSTRAINT_RANGE,
+ KAPI_CONSTRAINT_MASK,
+ KAPI_CONSTRAINT_ENUM,
+ KAPI_CONSTRAINT_CUSTOM,
+};
+
+/**
+ * struct kapi_param_spec - Parameter specification
+ * @name: Parameter name
+ * @type_name: Type name as string
+ * @type: Parameter type classification
+ * @flags: Parameter attribute flags
+ * @size: Size in bytes (for arrays/buffers)
+ * @alignment: Required alignment
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag parameters)
+ * @enum_values: Array of valid enumerated values
+ * @enum_count: Number of valid enumerated values
+ * @constraint_type: Type of constraint applied
+ * @validate: Custom validation function
+ * @description: Human-readable description
+ * @constraints: Additional constraints description
+ * @size_param_idx: Index of parameter that determines size (-1 if fixed size)
+ * @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct))
+ */
+struct kapi_param_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ char type_name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ u32 flags;
+ size_t size;
+ size_t alignment;
+ s64 min_value;
+ s64 max_value;
+ u64 valid_mask;
+ const s64 *enum_values;
+ u32 enum_count;
+ enum kapi_constraint_type constraint_type;
+ bool (*validate)(s64 value);
+ char description[KAPI_MAX_DESC_LEN];
+ char constraints[KAPI_MAX_DESC_LEN];
+ int size_param_idx; /* Index of param that determines size, -1 if N/A */
+ size_t size_multiplier; /* Size per unit (e.g., sizeof(struct epoll_event)) */
+} __attribute__((packed));
+
+/**
+ * struct kapi_error_spec - Error condition specification
+ * @error_code: Error code value
+ * @name: Error code name (e.g., "EINVAL")
+ * @condition: Condition that triggers this error
+ * @description: Detailed error description
+ */
+struct kapi_error_spec {
+ int error_code;
+ char name[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_return_check_type - Return value check types
+ * @KAPI_RETURN_EXACT: Success is an exact value
+ * @KAPI_RETURN_RANGE: Success is within a range
+ * @KAPI_RETURN_ERROR_CHECK: Success is when NOT in error list
+ * @KAPI_RETURN_FD: Return value is a file descriptor (>= 0 is success)
+ * @KAPI_RETURN_CUSTOM: Custom validation function
+ */
+enum kapi_return_check_type {
+ KAPI_RETURN_EXACT,
+ KAPI_RETURN_RANGE,
+ KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_FD,
+ KAPI_RETURN_CUSTOM,
+};
+
+/**
+ * struct kapi_return_spec - Return value specification
+ * @type_name: Return type name
+ * @type: Return type classification
+ * @check_type: Type of success check to perform
+ * @success_value: Exact value indicating success (for EXACT)
+ * @success_min: Minimum success value (for RANGE)
+ * @success_max: Maximum success value (for RANGE)
+ * @error_values: Array of error values (for ERROR_CHECK)
+ * @error_count: Number of error values
+ * @is_success: Custom function to check success
+ * @description: Return value description
+ */
+struct kapi_return_spec {
+ char type_name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ enum kapi_return_check_type check_type;
+ s64 success_value;
+ s64 success_min;
+ s64 success_max;
+ const s64 *error_values;
+ u32 error_count;
+ bool (*is_success)(s64 retval);
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_lock_spec - Lock requirement specification
+ * @lock_name: Name of the lock
+ * @lock_type: Type of lock
+ * @acquired: Whether function acquires this lock
+ * @released: Whether function releases this lock
+ * @held_on_entry: Whether lock must be held on entry
+ * @held_on_exit: Whether lock is held on exit
+ * @description: Additional lock requirements
+ */
+struct kapi_lock_spec {
+ char lock_name[KAPI_MAX_NAME_LEN];
+ enum kapi_lock_type lock_type;
+ bool acquired;
+ bool released;
+ bool held_on_entry;
+ bool held_on_exit;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_constraint_spec - Additional constraint specification
+ * @name: Constraint name
+ * @description: Constraint description
+ * @expression: Formal expression (if applicable)
+ */
+struct kapi_constraint_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ char expression[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_signal_direction - Signal flow direction
+ * @KAPI_SIGNAL_RECEIVE: Function may receive this signal
+ * @KAPI_SIGNAL_SEND: Function may send this signal
+ * @KAPI_SIGNAL_HANDLE: Function handles this signal specially
+ * @KAPI_SIGNAL_BLOCK: Function blocks this signal
+ * @KAPI_SIGNAL_IGNORE: Function ignores this signal
+ */
+enum kapi_signal_direction {
+ KAPI_SIGNAL_RECEIVE = (1 << 0),
+ KAPI_SIGNAL_SEND = (1 << 1),
+ KAPI_SIGNAL_HANDLE = (1 << 2),
+ KAPI_SIGNAL_BLOCK = (1 << 3),
+ KAPI_SIGNAL_IGNORE = (1 << 4),
+};
+
+/**
+ * enum kapi_signal_action - What the function does with the signal
+ * @KAPI_SIGNAL_ACTION_DEFAULT: Default signal action applies
+ * @KAPI_SIGNAL_ACTION_TERMINATE: Causes termination
+ * @KAPI_SIGNAL_ACTION_COREDUMP: Causes termination with core dump
+ * @KAPI_SIGNAL_ACTION_STOP: Stops the process
+ * @KAPI_SIGNAL_ACTION_CONTINUE: Continues a stopped process
+ * @KAPI_SIGNAL_ACTION_CUSTOM: Custom handling described in notes
+ * @KAPI_SIGNAL_ACTION_RETURN: Returns from syscall with EINTR
+ * @KAPI_SIGNAL_ACTION_RESTART: Restarts the syscall
+ */
+enum kapi_signal_action {
+ KAPI_SIGNAL_ACTION_DEFAULT = 0,
+ KAPI_SIGNAL_ACTION_TERMINATE,
+ KAPI_SIGNAL_ACTION_COREDUMP,
+ KAPI_SIGNAL_ACTION_STOP,
+ KAPI_SIGNAL_ACTION_CONTINUE,
+ KAPI_SIGNAL_ACTION_CUSTOM,
+ KAPI_SIGNAL_ACTION_RETURN,
+ KAPI_SIGNAL_ACTION_RESTART,
+};
+
+/**
+ * struct kapi_signal_spec - Signal specification
+ * @signal_num: Signal number (e.g., SIGKILL, SIGTERM)
+ * @signal_name: Signal name as string
+ * @direction: Direction flags (OR of kapi_signal_direction)
+ * @action: What happens when signal is received
+ * @target: Description of target process/thread for sent signals
+ * @condition: Condition under which signal is sent/received/handled
+ * @description: Detailed description of signal handling
+ * @restartable: Whether syscall is restartable after this signal
+ */
+struct kapi_signal_spec {
+ int signal_num;
+ char signal_name[32];
+ u32 direction;
+ enum kapi_signal_action action;
+ char target[KAPI_MAX_DESC_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ bool restartable;
+} __attribute__((packed));
+
+/**
+ * struct kapi_signal_mask_spec - Signal mask specification
+ * @mask_name: Name of the signal mask
+ * @signals: Array of signal numbers in the mask
+ * @signal_count: Number of signals in the mask
+ * @description: Description of what this mask represents
+ */
+struct kapi_signal_mask_spec {
+ char mask_name[KAPI_MAX_NAME_LEN];
+ int signals[KAPI_MAX_SIGNALS];
+ u32 signal_count;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_field - Structure field specification
+ * @name: Field name
+ * @type: Field type classification
+ * @type_name: Type name as string
+ * @offset: Offset within structure
+ * @size: Size of field in bytes
+ * @flags: Field attribute flags
+ * @constraint_type: Type of constraint applied
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag fields)
+ * @description: Field description
+ */
+struct kapi_struct_field {
+ char name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ char type_name[KAPI_MAX_NAME_LEN];
+ size_t offset;
+ size_t size;
+ u32 flags;
+ enum kapi_constraint_type constraint_type;
+ s64 min_value;
+ s64 max_value;
+ u64 valid_mask;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_spec - Structure type specification
+ * @name: Structure name
+ * @size: Total size of structure
+ * @alignment: Required alignment
+ * @field_count: Number of fields
+ * @fields: Field specifications
+ * @description: Structure description
+ */
+struct kapi_struct_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ size_t size;
+ size_t alignment;
+ u32 field_count;
+ struct kapi_struct_field fields[KAPI_MAX_PARAMS];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_side_effect_type - Types of side effects
+ * @KAPI_EFFECT_NONE: No side effects
+ * @KAPI_EFFECT_ALLOC_MEMORY: Allocates memory
+ * @KAPI_EFFECT_FREE_MEMORY: Frees memory
+ * @KAPI_EFFECT_MODIFY_STATE: Modifies global/shared state
+ * @KAPI_EFFECT_SIGNAL_SEND: Sends signals
+ * @KAPI_EFFECT_FILE_POSITION: Modifies file position
+ * @KAPI_EFFECT_LOCK_ACQUIRE: Acquires locks
+ * @KAPI_EFFECT_LOCK_RELEASE: Releases locks
+ * @KAPI_EFFECT_RESOURCE_CREATE: Creates system resources (FDs, PIDs, etc)
+ * @KAPI_EFFECT_RESOURCE_DESTROY: Destroys system resources
+ * @KAPI_EFFECT_SCHEDULE: May cause scheduling/context switch
+ * @KAPI_EFFECT_HARDWARE: Interacts with hardware
+ * @KAPI_EFFECT_NETWORK: Network I/O operation
+ * @KAPI_EFFECT_FILESYSTEM: Filesystem modification
+ * @KAPI_EFFECT_PROCESS_STATE: Modifies process state
+ */
+enum kapi_side_effect_type {
+ KAPI_EFFECT_NONE = 0,
+ KAPI_EFFECT_ALLOC_MEMORY = (1 << 0),
+ KAPI_EFFECT_FREE_MEMORY = (1 << 1),
+ KAPI_EFFECT_MODIFY_STATE = (1 << 2),
+ KAPI_EFFECT_SIGNAL_SEND = (1 << 3),
+ KAPI_EFFECT_FILE_POSITION = (1 << 4),
+ KAPI_EFFECT_LOCK_ACQUIRE = (1 << 5),
+ KAPI_EFFECT_LOCK_RELEASE = (1 << 6),
+ KAPI_EFFECT_RESOURCE_CREATE = (1 << 7),
+ KAPI_EFFECT_RESOURCE_DESTROY = (1 << 8),
+ KAPI_EFFECT_SCHEDULE = (1 << 9),
+ KAPI_EFFECT_HARDWARE = (1 << 10),
+ KAPI_EFFECT_NETWORK = (1 << 11),
+ KAPI_EFFECT_FILESYSTEM = (1 << 12),
+ KAPI_EFFECT_PROCESS_STATE = (1 << 13),
+};
+
+/**
+ * struct kapi_side_effect - Side effect specification
+ * @type: Bitmask of effect types
+ * @target: What is affected (e.g., "process memory", "file descriptor table")
+ * @condition: Condition under which effect occurs
+ * @description: Detailed description of the effect
+ * @reversible: Whether the effect can be undone
+ */
+struct kapi_side_effect {
+ u32 type;
+ char target[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ bool reversible;
+} __attribute__((packed));
+
+/**
+ * struct kapi_state_transition - State transition specification
+ * @from_state: Starting state description
+ * @to_state: Ending state description
+ * @condition: Condition for transition
+ * @object: Object whose state changes
+ * @description: Detailed description
+ */
+struct kapi_state_transition {
+ char from_state[KAPI_MAX_NAME_LEN];
+ char to_state[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char object[KAPI_MAX_NAME_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+#define KAPI_MAX_STRUCT_SPECS 8
+#define KAPI_MAX_SIDE_EFFECTS 16
+#define KAPI_MAX_STATE_TRANS 8
+
+/**
+ * struct kernel_api_spec - Complete kernel API specification
+ * @name: Function name
+ * @version: API version
+ * @description: Brief description
+ * @long_description: Detailed description
+ * @context_flags: Execution context flags
+ * @param_count: Number of parameters
+ * @params: Parameter specifications
+ * @return_spec: Return value specification
+ * @error_count: Number of possible errors
+ * @errors: Error specifications
+ * @lock_count: Number of lock specifications
+ * @locks: Lock requirement specifications
+ * @constraint_count: Number of additional constraints
+ * @constraints: Additional constraint specifications
+ * @examples: Usage examples
+ * @notes: Additional notes
+ * @since_version: Kernel version when introduced
+ * @deprecated: Whether API is deprecated
+ * @replacement: Replacement API if deprecated
+ * @signal_count: Number of signal specifications
+ * @signals: Signal handling specifications
+ * @signal_mask_count: Number of signal mask specifications
+ * @signal_masks: Signal mask specifications
+ * @struct_spec_count: Number of structure specifications
+ * @struct_specs: Structure type specifications
+ * @side_effect_count: Number of side effect specifications
+ * @side_effects: Side effect specifications
+ * @state_trans_count: Number of state transition specifications
+ * @state_transitions: State transition specifications
+ */
+struct kernel_api_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ u32 version;
+ char description[KAPI_MAX_DESC_LEN];
+ char long_description[KAPI_MAX_DESC_LEN * 4];
+ u32 context_flags;
+
+ /* Parameters */
+ u32 param_count;
+ struct kapi_param_spec params[KAPI_MAX_PARAMS];
+
+ /* Return value */
+ struct kapi_return_spec return_spec;
+
+ /* Errors */
+ u32 error_count;
+ struct kapi_error_spec errors[KAPI_MAX_ERRORS];
+
+ /* Locking */
+ u32 lock_count;
+ struct kapi_lock_spec locks[KAPI_MAX_CONSTRAINTS];
+
+ /* Constraints */
+ u32 constraint_count;
+ struct kapi_constraint_spec constraints[KAPI_MAX_CONSTRAINTS];
+
+ /* Additional information */
+ char examples[KAPI_MAX_DESC_LEN * 2];
+ char notes[KAPI_MAX_DESC_LEN];
+ char since_version[32];
+ bool deprecated;
+ char replacement[KAPI_MAX_NAME_LEN];
+
+ /* Signal specifications */
+ u32 signal_count;
+ struct kapi_signal_spec signals[KAPI_MAX_SIGNALS];
+
+ /* Signal mask specifications */
+ u32 signal_mask_count;
+ struct kapi_signal_mask_spec signal_masks[KAPI_MAX_SIGNALS];
+
+ /* Structure specifications */
+ u32 struct_spec_count;
+ struct kapi_struct_spec struct_specs[KAPI_MAX_STRUCT_SPECS];
+
+ /* Side effects */
+ u32 side_effect_count;
+ struct kapi_side_effect side_effects[KAPI_MAX_SIDE_EFFECTS];
+
+ /* State transitions */
+ u32 state_trans_count;
+ struct kapi_state_transition state_transitions[KAPI_MAX_STATE_TRANS];
+} __attribute__((packed));
+
+/* Macros for defining API specifications */
+
+/**
+ * DEFINE_KERNEL_API_SPEC - Define a kernel API specification
+ * @func_name: Function name to specify
+ */
+#define DEFINE_KERNEL_API_SPEC(func_name) \
+ static struct kernel_api_spec __kapi_spec_##func_name \
+ __used __section(".kapi_specs") = { \
+ .name = __stringify(func_name), \
+ .version = 1,
+
+#define KAPI_END_SPEC };
+
+/**
+ * KAPI_DESCRIPTION - Set API description
+ * @desc: Description string
+ */
+#define KAPI_DESCRIPTION(desc) \
+ .description = desc,
+
+/**
+ * KAPI_LONG_DESC - Set detailed API description
+ * @desc: Detailed description string
+ */
+#define KAPI_LONG_DESC(desc) \
+ .long_description = desc,
+
+/**
+ * KAPI_CONTEXT - Set execution context flags
+ * @flags: Context flags (OR'ed KAPI_CTX_* values)
+ */
+#define KAPI_CONTEXT(flags) \
+ .context_flags = flags,
+
+/**
+ * KAPI_PARAM - Define a parameter specification
+ * @idx: Parameter index (0-based)
+ * @pname: Parameter name
+ * @ptype: Type name string
+ * @pdesc: Parameter description
+ */
+#define KAPI_PARAM(idx, pname, ptype, pdesc) \
+ .params[idx] = { \
+ .name = pname, \
+ .type_name = ptype, \
+ .description = pdesc, \
+ .size_param_idx = -1, /* Default: no dynamic sizing */
+
+#define KAPI_PARAM_FLAGS(pflags) \
+ .flags = pflags,
+
+#define KAPI_PARAM_SIZE(psize) \
+ .size = psize,
+
+#define KAPI_PARAM_RANGE(pmin, pmax) \
+ .min_value = pmin, \
+ .max_value = pmax,
+
+#define KAPI_PARAM_END },
+
+/**
+ * KAPI_RETURN - Define return value specification
+ * @rtype: Return type name
+ * @rdesc: Return value description
+ */
+#define KAPI_RETURN(rtype, rdesc) \
+ .return_spec = { \
+ .type_name = rtype, \
+ .description = rdesc,
+
+#define KAPI_RETURN_SUCCESS(val) \
+ .success_value = val,
+
+#define KAPI_RETURN_END },
+
+/**
+ * KAPI_ERROR - Define an error condition
+ * @idx: Error index
+ * @ecode: Error code value
+ * @ename: Error name
+ * @econd: Error condition
+ * @edesc: Error description
+ */
+#define KAPI_ERROR(idx, ecode, ename, econd, edesc) \
+ .errors[idx] = { \
+ .error_code = ecode, \
+ .name = ename, \
+ .condition = econd, \
+ .description = edesc, \
+ },
+
+/**
+ * KAPI_LOCK - Define a lock requirement
+ * @idx: Lock index
+ * @lname: Lock name
+ * @ltype: Lock type
+ */
+#define KAPI_LOCK(idx, lname, ltype) \
+ .locks[idx] = { \
+ .lock_name = lname, \
+ .lock_type = ltype,
+
+#define KAPI_LOCK_ACQUIRED \
+ .acquired = true,
+
+#define KAPI_LOCK_RELEASED \
+ .released = true,
+
+#define KAPI_LOCK_HELD_ENTRY \
+ .held_on_entry = true,
+
+#define KAPI_LOCK_HELD_EXIT \
+ .held_on_exit = true,
+
+#define KAPI_LOCK_DESC(ldesc) \
+ .description = ldesc,
+
+#define KAPI_LOCK_END },
+
+/**
+ * KAPI_CONSTRAINT - Define an additional constraint
+ * @idx: Constraint index
+ * @cname: Constraint name
+ * @cdesc: Constraint description
+ */
+#define KAPI_CONSTRAINT(idx, cname, cdesc) \
+ .constraints[idx] = { \
+ .name = cname, \
+ .description = cdesc,
+
+#define KAPI_CONSTRAINT_EXPR(expr) \
+ .expression = expr,
+
+#define KAPI_CONSTRAINT_END },
+
+/**
+ * KAPI_SIGNAL - Define a signal specification
+ * @idx: Signal index
+ * @signum: Signal number (e.g., SIGKILL)
+ * @signame: Signal name string
+ * @dir: Direction flags
+ * @act: Action taken
+ */
+#define KAPI_SIGNAL(idx, signum, signame, dir, act) \
+ .signals[idx] = { \
+ .signal_num = signum, \
+ .signal_name = signame, \
+ .direction = dir, \
+ .action = act,
+
+#define KAPI_SIGNAL_TARGET(tgt) \
+ .target = tgt,
+
+#define KAPI_SIGNAL_CONDITION(cond) \
+ .condition = cond,
+
+#define KAPI_SIGNAL_DESC(desc) \
+ .description = desc,
+
+#define KAPI_SIGNAL_RESTARTABLE \
+ .restartable = true,
+
+#define KAPI_SIGNAL_END },
+
+/**
+ * KAPI_SIGNAL_MASK - Define a signal mask specification
+ * @idx: Mask index
+ * @name: Mask name
+ * @desc: Mask description
+ */
+#define KAPI_SIGNAL_MASK(idx, name, desc) \
+ .signal_masks[idx] = { \
+ .mask_name = name, \
+ .description = desc,
+
+#define KAPI_SIGNAL_MASK_ADD(signum) \
+ .signals[.signal_count++] = signum,
+
+#define KAPI_SIGNAL_MASK_END },
+
+/**
+ * KAPI_STRUCT_SPEC - Define a structure specification
+ * @idx: Structure spec index
+ * @sname: Structure name
+ * @sdesc: Structure description
+ */
+#define KAPI_STRUCT_SPEC(idx, sname, sdesc) \
+ .struct_specs[idx] = { \
+ .name = #sname, \
+ .description = sdesc,
+
+#define KAPI_STRUCT_SIZE(ssize, salign) \
+ .size = ssize, \
+ .alignment = salign,
+
+#define KAPI_STRUCT_FIELD_COUNT(n) \
+ .field_count = n,
+
+/**
+ * KAPI_STRUCT_FIELD - Define a structure field
+ * @fidx: Field index
+ * @fname: Field name
+ * @ftype: Field type (KAPI_TYPE_*)
+ * @ftype_name: Type name as string
+ * @fdesc: Field description
+ */
+#define KAPI_STRUCT_FIELD(fidx, fname, ftype, ftype_name, fdesc) \
+ .fields[fidx] = { \
+ .name = fname, \
+ .type = ftype, \
+ .type_name = ftype_name, \
+ .description = fdesc,
+
+#define KAPI_FIELD_OFFSET(foffset) \
+ .offset = foffset,
+
+#define KAPI_FIELD_SIZE(fsize) \
+ .size = fsize,
+
+#define KAPI_FIELD_FLAGS(fflags) \
+ .flags = fflags,
+
+#define KAPI_FIELD_CONSTRAINT_RANGE(min, max) \
+ .constraint_type = KAPI_CONSTRAINT_RANGE, \
+ .min_value = min, \
+ .max_value = max,
+
+#define KAPI_FIELD_CONSTRAINT_MASK(mask) \
+ .constraint_type = KAPI_CONSTRAINT_MASK, \
+ .valid_mask = mask,
+
+#define KAPI_FIELD_CONSTRAINT_ENUM(values, count) \
+ .constraint_type = KAPI_CONSTRAINT_ENUM, \
+ .enum_values = values, \
+ .enum_count = count,
+
+#define KAPI_STRUCT_FIELD_END },
+
+#define KAPI_STRUCT_SPEC_END },
+
+/* Counter for structure specifications */
+#define KAPI_STRUCT_SPEC_COUNT(n) \
+ .struct_spec_count = n,
+
+/**
+ * KAPI_SIDE_EFFECT - Define a side effect
+ * @idx: Side effect index
+ * @etype: Effect type bitmask (OR'ed KAPI_EFFECT_* values)
+ * @etarget: What is affected
+ * @edesc: Effect description
+ */
+#define KAPI_SIDE_EFFECT(idx, etype, etarget, edesc) \
+ .side_effects[idx] = { \
+ .type = etype, \
+ .target = etarget, \
+ .description = edesc, \
+ .reversible = false, /* Default to non-reversible */
+
+#define KAPI_EFFECT_CONDITION(cond) \
+ .condition = cond,
+
+#define KAPI_EFFECT_REVERSIBLE \
+ .reversible = true,
+
+#define KAPI_SIDE_EFFECT_END },
+
+/**
+ * KAPI_STATE_TRANS - Define a state transition
+ * @idx: State transition index
+ * @obj: Object whose state changes
+ * @from: From state
+ * @to: To state
+ * @desc: Transition description
+ */
+#define KAPI_STATE_TRANS(idx, obj, from, to, desc) \
+ .state_transitions[idx] = { \
+ .object = obj, \
+ .from_state = from, \
+ .to_state = to, \
+ .description = desc,
+
+#define KAPI_STATE_TRANS_COND(cond) \
+ .condition = cond,
+
+#define KAPI_STATE_TRANS_END },
+
+/* Counters for side effects and state transitions */
+#define KAPI_SIDE_EFFECT_COUNT(n) \
+ .side_effect_count = n,
+
+#define KAPI_STATE_TRANS_COUNT(n) \
+ .state_trans_count = n,
+
+/* Helper macros for common side effect patterns */
+#define KAPI_EFFECTS_MEMORY (KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_MEMORY)
+#define KAPI_EFFECTS_LOCKING (KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_RELEASE)
+#define KAPI_EFFECTS_RESOURCES (KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_RESOURCE_DESTROY)
+#define KAPI_EFFECTS_IO (KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM)
+
+/* Helper macros for common patterns */
+
+#define KAPI_PARAM_IN (KAPI_PARAM_IN)
+#define KAPI_PARAM_OUT (KAPI_PARAM_OUT)
+#define KAPI_PARAM_INOUT (KAPI_PARAM_IN | KAPI_PARAM_OUT)
+#define KAPI_PARAM_OPTIONAL (KAPI_PARAM_OPTIONAL)
+#define KAPI_PARAM_USER_PTR (KAPI_PARAM_USER | KAPI_PARAM_PTR)
+
+/* Validation and runtime checking */
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+bool kapi_validate_params(const struct kernel_api_spec *spec, ...);
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value);
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count);
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value);
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count);
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval);
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval);
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval);
+void kapi_check_context(const struct kernel_api_spec *spec);
+void kapi_check_locks(const struct kernel_api_spec *spec);
+#else
+static inline bool kapi_validate_params(const struct kernel_api_spec *spec, ...)
+{
+ return true;
+}
+static inline bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+ return true;
+}
+static inline bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count)
+{
+ return true;
+}
+static inline int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value)
+{
+ return 0;
+}
+static inline int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count)
+{
+ return 0;
+}
+static inline bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+ return true;
+}
+static inline bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+ return true;
+}
+static inline int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+ return 0;
+}
+static inline void kapi_check_context(const struct kernel_api_spec *spec) {}
+static inline void kapi_check_locks(const struct kernel_api_spec *spec) {}
+#endif
+
+/* Export/query functions */
+const struct kernel_api_spec *kapi_get_spec(const char *name);
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size);
+int kapi_export_xml(const struct kernel_api_spec *spec, char *buf, size_t size);
+void kapi_print_spec(const struct kernel_api_spec *spec);
+
+/* Registration for dynamic APIs */
+int kapi_register_spec(struct kernel_api_spec *spec);
+void kapi_unregister_spec(const char *name);
+
+/* Helper to get parameter constraint info */
+static inline bool kapi_get_param_constraint(const char *api_name, int param_idx,
+ enum kapi_constraint_type *type,
+ u64 *valid_mask, s64 *min_val, s64 *max_val)
+{
+ const struct kernel_api_spec *spec = kapi_get_spec(api_name);
+
+ if (!spec || param_idx >= spec->param_count)
+ return false;
+
+ if (type)
+ *type = spec->params[param_idx].constraint_type;
+ if (valid_mask)
+ *valid_mask = spec->params[param_idx].valid_mask;
+ if (min_val)
+ *min_val = spec->params[param_idx].min_value;
+ if (max_val)
+ *max_val = spec->params[param_idx].max_value;
+
+ return true;
+}
+
+#endif /* _LINUX_KERNEL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/syscall_api_spec.h b/include/linux/syscall_api_spec.h
new file mode 100644
index 0000000000000..48ad95647dd39
--- /dev/null
+++ b/include/linux/syscall_api_spec.h
@@ -0,0 +1,341 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_api_spec.h - System Call API Specification Integration
+ *
+ * This header extends the SYSCALL_DEFINEX macros to support inline API specifications,
+ * allowing syscall documentation to be written alongside the implementation in a
+ * human-readable and machine-parseable format.
+ */
+
+#ifndef _LINUX_SYSCALL_API_SPEC_H
+#define _LINUX_SYSCALL_API_SPEC_H
+
+#include <linux/kernel_api_spec.h>
+
+/*
+ * Extended SYSCALL_DEFINE macros with API specification support
+ *
+ * Usage example:
+ *
+ * SYSCALL_DEFINE_SPEC2(example,
+ * KAPI_DESCRIPTION("Example system call"),
+ * KAPI_LONG_DESC("This is a detailed description of the example syscall"),
+ * KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE),
+ *
+ * KAPI_PARAM(0, "fd", "int", "File descriptor to operate on")
+ * KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ * KAPI_PARAM_RANGE(0, INT_MAX)
+ * KAPI_PARAM_END,
+ *
+ * KAPI_PARAM(1, "flags", "unsigned int", "Operation flags")
+ * KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ * KAPI_PARAM_END,
+ *
+ * KAPI_RETURN("long", "0 on success, negative error code on failure")
+ * KAPI_RETURN_SUCCESS(0, "== 0")
+ * KAPI_RETURN_END,
+ *
+ * KAPI_ERROR(0, -EBADF, "EBADF", "fd is not a valid file descriptor",
+ * "The file descriptor is invalid or closed"),
+ * KAPI_ERROR(1, -EINVAL, "EINVAL", "flags contains invalid values",
+ * "Invalid flag combination specified"),
+ *
+ * .error_count = 2,
+ * .param_count = 2,
+ *
+ * int, fd, unsigned int, flags)
+ * {
+ * // Implementation here
+ * }
+ */
+
+/* Helper to count parameters */
+#define __SYSCALL_PARAM_COUNT(...) __SYSCALL_PARAM_COUNT_I(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
+#define __SYSCALL_PARAM_COUNT_I(_1, _2, _3, _4, _5, _6, N, ...) N
+
+/* Extract syscall name from parameters */
+#define __SYSCALL_NAME(name, ...) name
+
+/* Generate API spec structure name */
+#define __SYSCALL_API_SPEC_NAME(name) __kapi_spec_sys_##name
+
+/* Helper to count syscall parameters (pairs of type, name) */
+#define __SYSCALL_ARG_COUNT(...) __SYSCALL_ARG_COUNT_I(__VA_ARGS__, 6, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 0)
+#define __SYSCALL_ARG_COUNT_I(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, N, ...) N
+
+/* Automatic syscall validation infrastructure */
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/* Helper to inject validation at the beginning of syscall */
+#define __KAPI_SYSCALL_VALIDATE_0(name)
+#define __KAPI_SYSCALL_VALIDATE_1(name, t1, a1) \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[1] = { (s64)(a1) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, 1); \
+ if (__ret) return __ret; \
+ }
+#define __KAPI_SYSCALL_VALIDATE_2(name, t1, a1, t2, a2) \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[2] = { (s64)(a1), (s64)(a2) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, 2); \
+ if (__ret) return __ret; \
+ }
+#define __KAPI_SYSCALL_VALIDATE_3(name, t1, a1, t2, a2, t3, a3) \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[3] = { (s64)(a1), (s64)(a2), (s64)(a3) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, 3); \
+ if (__ret) return __ret; \
+ }
+#define __KAPI_SYSCALL_VALIDATE_4(name, t1, a1, t2, a2, t3, a3, t4, a4) \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[4] = { (s64)(a1), (s64)(a2), (s64)(a3), (s64)(a4) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, 4); \
+ if (__ret) return __ret; \
+ }
+#define __KAPI_SYSCALL_VALIDATE_5(name, t1, a1, t2, a2, t3, a3, t4, a4, t5, a5) \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[5] = { (s64)(a1), (s64)(a2), (s64)(a3), (s64)(a4), (s64)(a5) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, 5); \
+ if (__ret) return __ret; \
+ }
+#define __KAPI_SYSCALL_VALIDATE_6(name, t1, a1, t2, a2, t3, a3, t4, a4, t5, a5, t6, a6) \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[6] = { (s64)(a1), (s64)(a2), (s64)(a3), (s64)(a4), (s64)(a5), (s64)(a6) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, 6); \
+ if (__ret) return __ret; \
+ }
+
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
+
+#define __KAPI_SYSCALL_VALIDATE_0(name)
+#define __KAPI_SYSCALL_VALIDATE_1(name, t1, a1)
+#define __KAPI_SYSCALL_VALIDATE_2(name, t1, a1, t2, a2)
+#define __KAPI_SYSCALL_VALIDATE_3(name, t1, a1, t2, a2, t3, a3)
+#define __KAPI_SYSCALL_VALIDATE_4(name, t1, a1, t2, a2, t3, a3, t4, a4)
+#define __KAPI_SYSCALL_VALIDATE_5(name, t1, a1, t2, a2, t3, a3, t4, a4, t5, a5)
+#define __KAPI_SYSCALL_VALIDATE_6(name, t1, a1, t2, a2, t3, a3, t4, a4, t5, a5, t6, a6)
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+
+/* Helper to inject validation for return values */
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+#define __KAPI_SYSCALL_VALIDATE_RETURN(name, retval) \
+ do { \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ kapi_validate_syscall_return(__spec, (s64)(retval)); \
+ } \
+ } while (0)
+
+/* Wrapper to validate both params and return value */
+#define __SYSCALL_DEFINE_SPEC(name, spec_args, ...) \
+ DEFINE_KERNEL_API_SPEC(sys_##name) \
+ .name = "sys_" #name, \
+ spec_args \
+ KAPI_END_SPEC; \
+ static long __kapi_sys_##name(__MAP((__SYSCALL_ARG_COUNT(__VA_ARGS__)), __SC_DECL, __VA_ARGS__)); \
+ SYSCALL_DEFINE##__SYSCALL_ARG_COUNT(__VA_ARGS__)(name, __VA_ARGS__) \
+ { \
+ long __ret; \
+ __KAPI_SYSCALL_VALIDATE_##__SYSCALL_ARG_COUNT(__VA_ARGS__)(name, __VA_ARGS__); \
+ __ret = __kapi_sys_##name(__MAP((__SYSCALL_ARG_COUNT(__VA_ARGS__)), __SC_CAST, __VA_ARGS__)); \
+ __KAPI_SYSCALL_VALIDATE_RETURN(name, __ret); \
+ return __ret; \
+ } \
+ static long __kapi_sys_##name(__MAP((__SYSCALL_ARG_COUNT(__VA_ARGS__)), __SC_DECL, __VA_ARGS__))
+
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
+
+#define __SYSCALL_DEFINE_SPEC(name, spec_args, ...) \
+ DEFINE_KERNEL_API_SPEC(sys_##name) \
+ .name = "sys_" #name, \
+ spec_args \
+ KAPI_END_SPEC; \
+ SYSCALL_DEFINE##__SYSCALL_ARG_COUNT(__VA_ARGS__)(name, __VA_ARGS__)
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+
+
+/* Convenience macros for different parameter counts */
+#define SYSCALL_DEFINE_SPEC0(name, spec_args) \
+ DEFINE_KERNEL_API_SPEC(sys_##name) \
+ .name = "sys_" #name, \
+ .param_count = 0, \
+ spec_args \
+ KAPI_END_SPEC; \
+ SYSCALL_DEFINE0(name)
+
+#define SYSCALL_DEFINE_SPEC1(name, spec_args, t1, a1) \
+ __SYSCALL_DEFINE_SPEC(name, spec_args, t1, a1)
+
+#define SYSCALL_DEFINE_SPEC2(name, spec_args, t1, a1, t2, a2) \
+ __SYSCALL_DEFINE_SPEC(name, spec_args, t1, a1, t2, a2)
+
+#define SYSCALL_DEFINE_SPEC3(name, spec_args, t1, a1, t2, a2, t3, a3) \
+ __SYSCALL_DEFINE_SPEC(name, spec_args, t1, a1, t2, a2, t3, a3)
+
+#define SYSCALL_DEFINE_SPEC4(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4) \
+ __SYSCALL_DEFINE_SPEC(name, spec_args, t1, a1, t2, a2, t3, a3, t4, a4)
+
+#define SYSCALL_DEFINE_SPEC5(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4, t5, a5) \
+ __SYSCALL_DEFINE_SPEC(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4, t5, a5)
+
+#define SYSCALL_DEFINE_SPEC6(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4, t5, a5, t6, a6) \
+ __SYSCALL_DEFINE_SPEC(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4, t5, a5, t6, a6)
+
+/*
+ * Helper macros for common syscall patterns
+ */
+
+/* For syscalls that can sleep */
+#define KAPI_SYSCALL_SLEEPABLE \
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+/* For syscalls that must be atomic */
+#define KAPI_SYSCALL_ATOMIC \
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC)
+
+/* Common parameter specifications */
+#define KAPI_PARAM_FD(idx, desc) \
+ KAPI_PARAM(idx, "fd", "int", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_FD, \
+ .constraint_type = KAPI_CONSTRAINT_NONE, \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_BUF(idx, name, desc) \
+ KAPI_PARAM(idx, name, "void __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER_PTR | KAPI_PARAM_IN) \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_STRUCT(idx, name, struct_type, desc) \
+ KAPI_PARAM(idx, name, #struct_type " __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size = sizeof(struct_type), \
+ .constraint_type = KAPI_CONSTRAINT_NONE, \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_SIZE_T(idx, name, desc) \
+ KAPI_PARAM(idx, name, "size_t", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ KAPI_PARAM_RANGE(0, SIZE_MAX) \
+ KAPI_PARAM_END
+
+/* Common error specifications */
+#define KAPI_ERROR_EBADF(idx) \
+ KAPI_ERROR(idx, -EBADF, "EBADF", "Invalid file descriptor", \
+ "The file descriptor is not valid or has been closed")
+
+#define KAPI_ERROR_EINVAL(idx, condition) \
+ KAPI_ERROR(idx, -EINVAL, "EINVAL", condition, \
+ "Invalid argument provided")
+
+#define KAPI_ERROR_ENOMEM(idx) \
+ KAPI_ERROR(idx, -ENOMEM, "ENOMEM", "Insufficient memory", \
+ "Cannot allocate memory for the operation")
+
+#define KAPI_ERROR_EPERM(idx) \
+ KAPI_ERROR(idx, -EPERM, "EPERM", "Operation not permitted", \
+ "The calling process does not have the required permissions")
+
+#define KAPI_ERROR_EFAULT(idx) \
+ KAPI_ERROR(idx, -EFAULT, "EFAULT", "Bad address", \
+ "Invalid user space address provided")
+
+/* Standard return value specifications */
+#define KAPI_RETURN_SUCCESS_ZERO \
+ KAPI_RETURN("long", "0 on success, negative error code on failure") \
+ KAPI_RETURN_SUCCESS(0, "== 0") \
+ KAPI_RETURN_END
+
+#define KAPI_RETURN_FD_SPEC \
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure") \
+ .check_type = KAPI_RETURN_FD, \
+ KAPI_RETURN_END
+
+#define KAPI_RETURN_COUNT \
+ KAPI_RETURN("long", "Number of bytes processed on success, negative error code on failure") \
+ KAPI_RETURN_SUCCESS(0, ">= 0") \
+ KAPI_RETURN_END
+
+
+/*
+ * Compat syscall support
+ */
+#ifdef CONFIG_COMPAT
+
+#define COMPAT_SYSCALL_DEFINE_SPEC0(name, spec_args) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 0, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE0(name)
+
+#define COMPAT_SYSCALL_DEFINE_SPEC1(name, spec_args, t1, a1) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 1, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE1(name, t1, a1)
+
+#define COMPAT_SYSCALL_DEFINE_SPEC2(name, spec_args, t1, a1, t2, a2) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 2, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE2(name, t1, a1, t2, a2)
+
+#define COMPAT_SYSCALL_DEFINE_SPEC3(name, spec_args, t1, a1, t2, a2, t3, a3) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 3, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE3(name, t1, a1, t2, a2, t3, a3)
+
+#define COMPAT_SYSCALL_DEFINE_SPEC4(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 4, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE4(name, t1, a1, t2, a2, t3, a3, t4, a4)
+
+#define COMPAT_SYSCALL_DEFINE_SPEC5(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4, t5, a5) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 5, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE5(name, t1, a1, t2, a2, t3, a3, t4, a4, t5, a5)
+
+#define COMPAT_SYSCALL_DEFINE_SPEC6(name, spec_args, t1, a1, t2, a2, t3, a3, \
+ t4, a4, t5, a5, t6, a6) \
+ DEFINE_KERNEL_API_SPEC(compat_sys_##name) \
+ .name = "compat_sys_" #name, \
+ .param_count = 6, \
+ spec_args \
+ KAPI_END_SPEC; \
+ COMPAT_SYSCALL_DEFINE6(name, t1, a1, t2, a2, t3, a3, t4, a4, t5, a5, t6, a6)
+
+#endif /* CONFIG_COMPAT */
+
+#endif /* _LINUX_SYSCALL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e5603cc91963d..f2951ece2068b 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -87,6 +87,7 @@ struct xattr_args;
#include <linux/bug.h>
#include <linux/sem.h>
#include <asm/siginfo.h>
+#include <linux/syscall_api_spec.h>
#include <linux/unistd.h>
#include <linux/quota.h>
#include <linux/key.h>
diff --git a/init/Kconfig b/init/Kconfig
index af4c2f0854554..7a15248933895 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2079,6 +2079,8 @@ config TRACEPOINTS
source "kernel/Kconfig.kexec"
+source "kernel/api/Kconfig"
+
endmenu # General setup
source "arch/Kconfig"
diff --git a/kernel/Makefile b/kernel/Makefile
index 32e80dd626af0..ba94ee4bb2292 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -56,6 +56,7 @@ obj-y += livepatch/
obj-y += dma/
obj-y += entry/
obj-$(CONFIG_MODULES) += module/
+obj-$(CONFIG_KAPI_SPEC) += api/
obj-$(CONFIG_KCMP) += kcmp.o
obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
new file mode 100644
index 0000000000000..fde25ec70e134
--- /dev/null
+++ b/kernel/api/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel API Specification Framework Configuration
+#
+
+config KAPI_SPEC
+ bool "Kernel API Specification Framework"
+ help
+ This option enables the kernel API specification framework,
+ which provides formal documentation of kernel APIs in both
+ human and machine-readable formats.
+
+ The framework allows developers to document APIs inline with
+ their implementation, including parameter specifications,
+ return values, error conditions, locking requirements, and
+ execution context constraints.
+
+ When enabled, API specifications can be queried at runtime
+ and exported in various formats (JSON, XML) through debugfs.
+
+ If unsure, say N.
+
+config KAPI_RUNTIME_CHECKS
+ bool "Runtime API specification checks"
+ depends on KAPI_SPEC
+ depends on DEBUG_KERNEL
+ help
+ Enable runtime validation of API usage against specifications.
+ This includes checking execution context requirements, parameter
+ validation, and lock state verification.
+
+ This adds overhead and should only be used for debugging and
+ development. The checks use WARN_ONCE to report violations.
+
+ If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
new file mode 100644
index 0000000000000..4120ded7e5cf1
--- /dev/null
+++ b/kernel/api/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Kernel API Specification Framework
+#
+
+# Core API specification framework
+obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
\ No newline at end of file
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
new file mode 100644
index 0000000000000..29c0c84d87f7c
--- /dev/null
+++ b/kernel/api/kernel_api_spec.c
@@ -0,0 +1,1169 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel_api_spec.c - Kernel API Specification Framework Implementation
+ *
+ * Provides runtime support for kernel API specifications including validation,
+ * export to various formats, and querying capabilities.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/export.h>
+#include <linux/preempt.h>
+#include <linux/hardirq.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <linux/limits.h>
+#include <linux/fcntl.h>
+
+/* Section where API specifications are stored */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+/* Dynamic API registration */
+static LIST_HEAD(dynamic_api_specs);
+static DEFINE_MUTEX(api_spec_mutex);
+
+struct dynamic_api_spec {
+ struct list_head list;
+ struct kernel_api_spec *spec;
+};
+
+/**
+ * kapi_get_spec - Get API specification by name
+ * @name: Function name to look up
+ *
+ * Return: Pointer to API specification or NULL if not found
+ */
+const struct kernel_api_spec *kapi_get_spec(const char *name)
+{
+ struct kernel_api_spec *spec;
+ struct dynamic_api_spec *dyn_spec;
+
+ /* Search static specifications */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ if (strcmp(spec->name, name) == 0)
+ return spec;
+ }
+
+ /* Search dynamic specifications */
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+ if (strcmp(dyn_spec->spec->name, name) == 0) {
+ mutex_unlock(&api_spec_mutex);
+ return dyn_spec->spec;
+ }
+ }
+ mutex_unlock(&api_spec_mutex);
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(kapi_get_spec);
+
+/**
+ * kapi_register_spec - Register a dynamic API specification
+ * @spec: API specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_spec(struct kernel_api_spec *spec)
+{
+ struct dynamic_api_spec *dyn_spec;
+
+ if (!spec || !spec->name[0])
+ return -EINVAL;
+
+ /* Check if already exists */
+ if (kapi_get_spec(spec->name))
+ return -EEXIST;
+
+ dyn_spec = kzalloc(sizeof(*dyn_spec), GFP_KERNEL);
+ if (!dyn_spec)
+ return -ENOMEM;
+
+ dyn_spec->spec = spec;
+
+ mutex_lock(&api_spec_mutex);
+ list_add_tail(&dyn_spec->list, &dynamic_api_specs);
+ mutex_unlock(&api_spec_mutex);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_register_spec);
+
+/**
+ * kapi_unregister_spec - Unregister a dynamic API specification
+ * @name: Name of API to unregister
+ */
+void kapi_unregister_spec(const char *name)
+{
+ struct dynamic_api_spec *dyn_spec, *tmp;
+
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry_safe(dyn_spec, tmp, &dynamic_api_specs, list) {
+ if (strcmp(dyn_spec->spec->name, name) == 0) {
+ list_del(&dyn_spec->list);
+ kfree(dyn_spec);
+ break;
+ }
+ }
+ mutex_unlock(&api_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_spec);
+
+/**
+ * param_type_to_string - Convert parameter type to string
+ * @type: Parameter type
+ *
+ * Return: String representation of type
+ */
+static const char *param_type_to_string(enum kapi_param_type type)
+{
+ static const char * const type_names[] = {
+ [KAPI_TYPE_VOID] = "void",
+ [KAPI_TYPE_INT] = "int",
+ [KAPI_TYPE_UINT] = "uint",
+ [KAPI_TYPE_PTR] = "pointer",
+ [KAPI_TYPE_STRUCT] = "struct",
+ [KAPI_TYPE_UNION] = "union",
+ [KAPI_TYPE_ENUM] = "enum",
+ [KAPI_TYPE_FUNC_PTR] = "function_pointer",
+ [KAPI_TYPE_ARRAY] = "array",
+ [KAPI_TYPE_FD] = "file_descriptor",
+ [KAPI_TYPE_USER_PTR] = "user_pointer",
+ [KAPI_TYPE_PATH] = "pathname",
+ [KAPI_TYPE_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(type_names))
+ return "unknown";
+
+ return type_names[type];
+}
+
+/**
+ * lock_type_to_string - Convert lock type to string
+ * @type: Lock type
+ *
+ * Return: String representation of lock type
+ */
+static const char *lock_type_to_string(enum kapi_lock_type type)
+{
+ static const char * const lock_names[] = {
+ [KAPI_LOCK_NONE] = "none",
+ [KAPI_LOCK_MUTEX] = "mutex",
+ [KAPI_LOCK_SPINLOCK] = "spinlock",
+ [KAPI_LOCK_RWLOCK] = "rwlock",
+ [KAPI_LOCK_SEQLOCK] = "seqlock",
+ [KAPI_LOCK_RCU] = "rcu",
+ [KAPI_LOCK_SEMAPHORE] = "semaphore",
+ [KAPI_LOCK_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(lock_names))
+ return "unknown";
+
+ return lock_names[type];
+}
+
+/**
+ * return_check_type_to_string - Convert return check type to string
+ * @type: Return check type
+ *
+ * Return: String representation of return check type
+ */
+static const char *return_check_type_to_string(enum kapi_return_check_type type)
+{
+ static const char * const check_names[] = {
+ [KAPI_RETURN_EXACT] = "exact",
+ [KAPI_RETURN_RANGE] = "range",
+ [KAPI_RETURN_ERROR_CHECK] = "error_check",
+ [KAPI_RETURN_FD] = "file_descriptor",
+ [KAPI_RETURN_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(check_names))
+ return "unknown";
+
+ return check_names[type];
+}
+
+/**
+ * kapi_export_json - Export API specification to JSON format
+ * @spec: API specification to export
+ * @buf: Buffer to write JSON to
+ * @size: Size of buffer
+ *
+ * Return: Number of bytes written or negative error
+ */
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
+{
+ int ret = 0;
+ int i;
+
+ if (!spec || !buf || size == 0)
+ return -EINVAL;
+
+ ret = scnprintf(buf, size,
+ "{\n"
+ " \"name\": \"%s\",\n"
+ " \"version\": %u,\n"
+ " \"description\": \"%s\",\n"
+ " \"long_description\": \"%s\",\n"
+ " \"context_flags\": \"0x%x\",\n",
+ spec->name,
+ spec->version,
+ spec->description,
+ spec->long_description,
+ spec->context_flags);
+
+ /* Parameters */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"parameters\": [\n");
+
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"name\": \"%s\",\n"
+ " \"type\": \"%s\",\n"
+ " \"type_class\": \"%s\",\n"
+ " \"flags\": \"0x%x\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ param->name,
+ param->type_name,
+ param_type_to_string(param->type),
+ param->flags,
+ param->description,
+ (i < spec->param_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Return value */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"return\": {\n"
+ " \"type\": \"%s\",\n"
+ " \"type_class\": \"%s\",\n"
+ " \"check_type\": \"%s\",\n",
+ spec->return_spec.type_name,
+ param_type_to_string(spec->return_spec.type),
+ return_check_type_to_string(spec->return_spec.check_type));
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"success_value\": %lld,\n",
+ spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"success_min\": %lld,\n"
+ " \"success_max\": %lld,\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"error_count\": %u,\n",
+ spec->return_spec.error_count);
+ break;
+ default:
+ break;
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ " \"description\": \"%s\"\n"
+ " },\n",
+ spec->return_spec.description);
+
+ /* Errors */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"errors\": [\n");
+
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"code\": %d,\n"
+ " \"name\": \"%s\",\n"
+ " \"condition\": \"%s\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ error->error_code,
+ error->name,
+ error->condition,
+ error->description,
+ (i < spec->error_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Locks */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"locks\": [\n");
+
+ for (i = 0; i < spec->lock_count && i < KAPI_MAX_CONSTRAINTS; i++) {
+ const struct kapi_lock_spec *lock = &spec->locks[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"name\": \"%s\",\n"
+ " \"type\": \"%s\",\n"
+ " \"acquired\": %s,\n"
+ " \"released\": %s,\n"
+ " \"held_on_entry\": %s,\n"
+ " \"held_on_exit\": %s,\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ lock->lock_name,
+ lock_type_to_string(lock->lock_type),
+ lock->acquired ? "true" : "false",
+ lock->released ? "true" : "false",
+ lock->held_on_entry ? "true" : "false",
+ lock->held_on_exit ? "true" : "false",
+ lock->description,
+ (i < spec->lock_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Additional info */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"since_version\": \"%s\",\n"
+ " \"deprecated\": %s,\n"
+ " \"replacement\": \"%s\",\n"
+ " \"examples\": \"%s\",\n"
+ " \"notes\": \"%s\"\n"
+ "}\n",
+ spec->since_version,
+ spec->deprecated ? "true" : "false",
+ spec->replacement,
+ spec->examples,
+ spec->notes);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_export_json);
+
+/**
+ * kapi_export_xml - Export API specification to XML format
+ * @spec: API specification to export
+ * @buf: Buffer to write XML to
+ * @size: Size of buffer
+ *
+ * Return: Number of bytes written or negative error
+ */
+int kapi_export_xml(const struct kernel_api_spec *spec, char *buf, size_t size)
+{
+ int ret = 0;
+ int i;
+
+ if (!spec || !buf || size == 0)
+ return -EINVAL;
+
+ ret = scnprintf(buf, size,
+ "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<kernel_api>\n"
+ " <name>%s</name>\n"
+ " <version>%u</version>\n"
+ " <description>%s</description>\n"
+ " <long_description><![CDATA[%s]]></long_description>\n"
+ " <context_flags>0x%x</context_flags>\n",
+ spec->name,
+ spec->version,
+ spec->description,
+ spec->long_description,
+ spec->context_flags);
+
+ /* Parameters */
+ ret += scnprintf(buf + ret, size - ret, " <parameters>\n");
+
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " <parameter>\n"
+ " <name>%s</name>\n"
+ " <type>%s</type>\n"
+ " <type_class>%s</type_class>\n"
+ " <flags>0x%x</flags>\n"
+ " <description><![CDATA[%s]]></description>\n"
+ " </parameter>\n",
+ param->name,
+ param->type_name,
+ param_type_to_string(param->type),
+ param->flags,
+ param->description);
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " </parameters>\n");
+
+ /* Return value */
+ ret += scnprintf(buf + ret, size - ret,
+ " <return>\n"
+ " <type>%s</type>\n"
+ " <type_class>%s</type_class>\n"
+ " <check_type>%s</check_type>\n",
+ spec->return_spec.type_name,
+ param_type_to_string(spec->return_spec.type),
+ return_check_type_to_string(spec->return_spec.check_type));
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ ret += scnprintf(buf + ret, size - ret,
+ " <success_value>%lld</success_value>\n",
+ spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ ret += scnprintf(buf + ret, size - ret,
+ " <success_min>%lld</success_min>\n"
+ " <success_max>%lld</success_max>\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ ret += scnprintf(buf + ret, size - ret,
+ " <error_count>%u</error_count>\n",
+ spec->return_spec.error_count);
+ break;
+ default:
+ break;
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ " <description><![CDATA[%s]]></description>\n"
+ " </return>\n",
+ spec->return_spec.description);
+
+ /* Errors */
+ ret += scnprintf(buf + ret, size - ret, " <errors>\n");
+
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " <error>\n"
+ " <code>%d</code>\n"
+ " <name>%s</name>\n"
+ " <condition><![CDATA[%s]]></condition>\n"
+ " <description><![CDATA[%s]]></description>\n"
+ " </error>\n",
+ error->error_code,
+ error->name,
+ error->condition,
+ error->description);
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " </errors>\n");
+
+ /* Additional info */
+ ret += scnprintf(buf + ret, size - ret,
+ " <since_version>%s</since_version>\n"
+ " <deprecated>%s</deprecated>\n"
+ " <replacement>%s</replacement>\n"
+ " <examples><![CDATA[%s]]></examples>\n"
+ " <notes><![CDATA[%s]]></notes>\n"
+ "</kernel_api>\n",
+ spec->since_version,
+ spec->deprecated ? "true" : "false",
+ spec->replacement,
+ spec->examples,
+ spec->notes);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_export_xml);
+
+/**
+ * kapi_print_spec - Print API specification to kernel log
+ * @spec: API specification to print
+ */
+void kapi_print_spec(const struct kernel_api_spec *spec)
+{
+ int i;
+
+ if (!spec)
+ return;
+
+ pr_info("=== Kernel API Specification ===\n");
+ pr_info("Name: %s\n", spec->name);
+ pr_info("Version: %u\n", spec->version);
+ pr_info("Description: %s\n", spec->description);
+
+ if (spec->long_description[0])
+ pr_info("Long Description: %s\n", spec->long_description);
+
+ pr_info("Context Flags: 0x%x\n", spec->context_flags);
+
+ /* Parameters */
+ if (spec->param_count > 0) {
+ pr_info("Parameters:\n");
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+ pr_info(" [%d] %s: %s (flags: 0x%x)\n",
+ i, param->name, param->type_name, param->flags);
+ if (param->description[0])
+ pr_info(" Description: %s\n", param->description);
+ }
+ }
+
+ /* Return value */
+ pr_info("Return: %s\n", spec->return_spec.type_name);
+ if (spec->return_spec.description[0])
+ pr_info(" Description: %s\n", spec->return_spec.description);
+
+ /* Errors */
+ if (spec->error_count > 0) {
+ pr_info("Possible Errors:\n");
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+ pr_info(" %s (%d): %s\n",
+ error->name, error->error_code, error->condition);
+ }
+ }
+
+ pr_info("================================\n");
+}
+EXPORT_SYMBOL_GPL(kapi_print_spec);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_fd - Validate that a file descriptor is valid in current context
+ * @fd: File descriptor to validate
+ *
+ * Return: true if fd is valid in current process context, false otherwise
+ */
+static bool kapi_validate_fd(int fd)
+{
+ struct fd f;
+
+ /* Special case: AT_FDCWD is always valid */
+ if (fd == AT_FDCWD)
+ return true;
+
+ /* Check basic range */
+ if (fd < 0)
+ return false;
+
+ /* Check if fd is valid in current process context */
+ f = fdget(fd);
+ if (fd_empty(f)) {
+ return false;
+ }
+
+ /* fd is valid, release reference */
+ fdput(f);
+ return true;
+}
+
+/**
+ * kapi_validate_user_ptr - Validate that a user pointer is accessible
+ * @ptr: User pointer to validate
+ * @size: Size in bytes to validate
+ * @write: Whether write access is required
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr(const void __user *ptr, size_t size, bool write)
+{
+ /* NULL is valid if parameter is marked optional */
+ if (!ptr)
+ return false;
+
+ /* Check if the user memory region is accessible */
+ if (write) {
+ return access_ok(ptr, size);
+ } else {
+ return access_ok(ptr, size);
+ }
+}
+
+/**
+ * kapi_validate_user_ptr_with_params - Validate user pointer with dynamic size
+ * @param_spec: Parameter specification
+ * @ptr: User pointer to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr_with_params(const struct kapi_param_spec *param_spec,
+ const void __user *ptr,
+ const s64 *all_params,
+ int param_count)
+{
+ size_t actual_size;
+ bool write;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ /* Calculate actual size based on related parameter */
+ if (param_spec->size_param_idx >= 0 &&
+ param_spec->size_param_idx < param_count) {
+ s64 count = all_params[param_spec->size_param_idx];
+
+ /* Validate count is positive */
+ if (count <= 0) {
+ pr_warn("Parameter %s: size determinant is non-positive (%lld)\n",
+ param_spec->name, count);
+ return false;
+ }
+
+ /* Check for multiplication overflow */
+ if (param_spec->size_multiplier > 0 &&
+ count > SIZE_MAX / param_spec->size_multiplier) {
+ pr_warn("Parameter %s: size calculation overflow\n",
+ param_spec->name);
+ return false;
+ }
+
+ actual_size = count * param_spec->size_multiplier;
+ } else {
+ /* Use fixed size */
+ actual_size = param_spec->size;
+ }
+
+ write = (param_spec->flags & KAPI_PARAM_OUT) ||
+ (param_spec->flags & KAPI_PARAM_INOUT);
+
+ return kapi_validate_user_ptr(ptr, actual_size, write);
+}
+
+/**
+ * kapi_validate_path - Validate that a pathname is accessible and within limits
+ * @path: User pointer to pathname
+ * @param_spec: Parameter specification
+ *
+ * Return: true if path is valid, false otherwise
+ */
+static bool kapi_validate_path(const char __user *path,
+ const struct kapi_param_spec *param_spec)
+{
+ size_t len;
+
+ /* NULL is allowed for optional parameters */
+ if (!path && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!path) {
+ pr_warn("Parameter %s: NULL path not allowed\n", param_spec->name);
+ return false;
+ }
+
+ /* Check if the path is accessible */
+ if (!access_ok(path, 1)) {
+ pr_warn("Parameter %s: path pointer %p not accessible\n",
+ param_spec->name, path);
+ return false;
+ }
+
+ /* Use strnlen_user to get the length and validate accessibility */
+ len = strnlen_user(path, PATH_MAX + 1);
+ if (len == 0) {
+ pr_warn("Parameter %s: invalid path pointer %p\n",
+ param_spec->name, path);
+ return false;
+ }
+
+ /* Check path length limit */
+ if (len > PATH_MAX) {
+ pr_warn("Parameter %s: path too long (exceeds PATH_MAX)\n",
+ param_spec->name);
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * kapi_validate_param - Validate a parameter against its specification
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+ int i;
+
+ /* Special handling for file descriptor type */
+ if (param_spec->type == KAPI_TYPE_FD) {
+ if (!kapi_validate_fd((int)value)) {
+ pr_warn("Parameter %s: invalid file descriptor %lld\n",
+ param_spec->name, value);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* Special handling for user pointer type */
+ if (param_spec->type == KAPI_TYPE_USER_PTR) {
+ const void __user *ptr = (const void __user *)value;
+ bool write = (param_spec->flags & KAPI_PARAM_OUT) ||
+ (param_spec->flags & KAPI_PARAM_INOUT);
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!kapi_validate_user_ptr(ptr, param_spec->size, write)) {
+ pr_warn("Parameter %s: invalid user pointer %p (size: %zu, %s)\n",
+ param_spec->name, ptr, param_spec->size,
+ write ? "write" : "read");
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* Special handling for path type */
+ if (param_spec->type == KAPI_TYPE_PATH) {
+ const char __user *path = (const char __user *)value;
+
+ if (!kapi_validate_path(path, param_spec)) {
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ switch (param_spec->constraint_type) {
+ case KAPI_CONSTRAINT_NONE:
+ return true;
+
+ case KAPI_CONSTRAINT_RANGE:
+ if (value < param_spec->min_value || value > param_spec->max_value) {
+ pr_warn("Parameter %s value %lld out of range [%lld, %lld]\n",
+ param_spec->name, value,
+ param_spec->min_value, param_spec->max_value);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_MASK:
+ if (value & ~param_spec->valid_mask) {
+ pr_warn("Parameter %s value 0x%llx contains invalid bits (valid mask: 0x%llx)\n",
+ param_spec->name, value, param_spec->valid_mask);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_ENUM:
+ if (!param_spec->enum_values || param_spec->enum_count == 0)
+ return true;
+
+ for (i = 0; i < param_spec->enum_count; i++) {
+ if (value == param_spec->enum_values[i])
+ return true;
+ }
+ pr_warn("Parameter %s value %lld not in valid enumeration\n",
+ param_spec->name, value);
+ return false;
+
+ case KAPI_CONSTRAINT_CUSTOM:
+ if (param_spec->validate)
+ return param_spec->validate(value);
+ return true;
+
+ default:
+ return true;
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param);
+
+/**
+ * kapi_validate_param_with_context - Validate parameter with access to all params
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count)
+{
+ /* Special handling for user pointer type with dynamic sizing */
+ if (param_spec->type == KAPI_TYPE_USER_PTR) {
+ const void __user *ptr = (const void __user *)value;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!kapi_validate_user_ptr_with_params(param_spec, ptr, all_params, param_count)) {
+ pr_warn("Parameter %s: invalid user pointer %p\n",
+ param_spec->name, ptr);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* For other types, fall back to regular validation */
+ return kapi_validate_param(param_spec, value);
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param_with_context);
+
+/**
+ * kapi_validate_syscall_param - Validate syscall parameter with enforcement
+ * @spec: API specification
+ * @param_idx: Parameter index
+ * @value: Parameter value
+ *
+ * Return: -EINVAL if invalid, 0 if valid
+ */
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value)
+{
+ const struct kapi_param_spec *param_spec;
+
+ if (!spec || param_idx >= spec->param_count)
+ return 0;
+
+ param_spec = &spec->params[param_idx];
+
+ if (!kapi_validate_param(param_spec, value)) {
+ if (strncmp(spec->name, "sys_", 4) == 0) {
+ /* For syscalls, we can return EINVAL to userspace */
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_param);
+
+/**
+ * kapi_validate_syscall_params - Validate all syscall parameters together
+ * @spec: API specification
+ * @params: Array of parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: -EINVAL if any parameter is invalid, 0 if all valid
+ */
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count)
+{
+ int i;
+
+ if (!spec || !params)
+ return 0;
+
+ /* Validate that we have the expected number of parameters */
+ if (param_count != spec->param_count) {
+ pr_warn("API %s: parameter count mismatch (expected %u, got %d)\n",
+ spec->name, spec->param_count, param_count);
+ return -EINVAL;
+ }
+
+ /* Validate each parameter with context */
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param_spec = &spec->params[i];
+
+ if (!kapi_validate_param_with_context(param_spec, params[i], params, param_count)) {
+ if (strncmp(spec->name, "sys_", 4) == 0) {
+ /* For syscalls, we can return EINVAL to userspace */
+ return -EINVAL;
+ }
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_params);
+
+/**
+ * kapi_check_return_success - Check if return value indicates success
+ * @return_spec: Return specification
+ * @retval: Return value to check
+ *
+ * Returns true if the return value indicates success according to the spec.
+ */
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+ u32 i;
+
+ if (!return_spec)
+ return true; /* No spec means we can't validate */
+
+ switch (return_spec->check_type) {
+ case KAPI_RETURN_EXACT:
+ return retval == return_spec->success_value;
+
+ case KAPI_RETURN_RANGE:
+ return retval >= return_spec->success_min &&
+ retval <= return_spec->success_max;
+
+ case KAPI_RETURN_ERROR_CHECK:
+ /* Success if NOT in error list */
+ if (return_spec->error_values) {
+ for (i = 0; i < return_spec->error_count; i++) {
+ if (retval == return_spec->error_values[i])
+ return false; /* Found in error list */
+ }
+ }
+ return true; /* Not in error list = success */
+
+ case KAPI_RETURN_FD:
+ /* File descriptors: >= 0 is success, < 0 is error */
+ return retval >= 0;
+
+ case KAPI_RETURN_CUSTOM:
+ if (return_spec->is_success)
+ return return_spec->is_success(retval);
+ fallthrough;
+
+ default:
+ return true; /* Unknown check type, assume success */
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_check_return_success);
+
+/**
+ * kapi_validate_return_value - Validate that return value matches spec
+ * @spec: API specification
+ * @retval: Return value to validate
+ *
+ * Return: true if return value is valid according to spec, false otherwise.
+ *
+ * This function checks:
+ * 1. If the value indicates success, it must match the success criteria
+ * 2. If the value indicates error, it must be one of the specified error codes
+ */
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+ int i;
+ bool is_success;
+
+ if (!spec)
+ return true; /* No spec means we can't validate */
+
+ /* First check if this is a success return */
+ is_success = kapi_check_return_success(&spec->return_spec, retval);
+
+ if (is_success) {
+ /* Success case - already validated by kapi_check_return_success */
+ return true;
+ }
+
+ /* Special validation for file descriptor returns */
+ if (spec->return_spec.check_type == KAPI_RETURN_FD && is_success) {
+ /* For successful FD returns, validate it's a valid FD */
+ if (!kapi_validate_fd((int)retval)) {
+ pr_warn("API %s returned invalid file descriptor %lld\n",
+ spec->name, retval);
+ return false;
+ }
+ return true;
+ }
+
+ /* Error case - check if it's one of the specified errors */
+ if (spec->error_count == 0) {
+ /* No errors specified, so any error is potentially valid */
+ pr_debug("API %s returned unspecified error %lld\n",
+ spec->name, retval);
+ return true;
+ }
+
+ /* Check if the error is in our list of specified errors */
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ if (retval == spec->errors[i].error_code)
+ return true;
+ }
+
+ /* Error not in spec */
+ pr_warn("API %s returned unspecified error code %lld. Valid errors are:\n",
+ spec->name, retval);
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ pr_warn(" %s (%d): %s\n",
+ spec->errors[i].name,
+ spec->errors[i].error_code,
+ spec->errors[i].condition);
+ }
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_return_value);
+
+/**
+ * kapi_validate_syscall_return - Validate syscall return value with enforcement
+ * @spec: API specification
+ * @retval: Return value
+ *
+ * Return: 0 if valid, -EINVAL if the return value doesn't match spec
+ *
+ * For syscalls, this can help detect kernel bugs where unspecified error
+ * codes are returned to userspace.
+ */
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+ if (!spec)
+ return 0;
+
+ if (!kapi_validate_return_value(spec, retval)) {
+ /* Log the violation but don't change the return value */
+ WARN_ONCE(1, "Syscall %s returned unspecified value %lld\n",
+ spec->name, retval);
+ /* Could return -EINVAL here to enforce, but that might break userspace */
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_return);
+
+/**
+ * kapi_check_context - Check if current context matches API requirements
+ * @spec: API specification to check against
+ */
+void kapi_check_context(const struct kernel_api_spec *spec)
+{
+ u32 ctx = spec->context_flags;
+ bool valid = false;
+
+ if (!ctx)
+ return;
+
+ /* Check if we're in an allowed context */
+ if ((ctx & KAPI_CTX_PROCESS) && !in_interrupt())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_SOFTIRQ) && in_softirq())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_HARDIRQ) && in_hardirq())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_NMI) && in_nmi())
+ valid = true;
+
+ if (!valid) {
+ WARN_ONCE(1, "API %s called from invalid context\n", spec->name);
+ }
+
+ /* Check specific requirements */
+ if ((ctx & KAPI_CTX_ATOMIC) && preemptible()) {
+ WARN_ONCE(1, "API %s requires atomic context\n", spec->name);
+ }
+
+ if ((ctx & KAPI_CTX_SLEEPABLE) && !preemptible()) {
+ WARN_ONCE(1, "API %s requires sleepable context\n", spec->name);
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_check_context);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+
+/* DebugFS interface */
+#ifdef CONFIG_DEBUG_FS
+
+static struct dentry *kapi_debugfs_root;
+
+static int kapi_spec_show(struct seq_file *s, void *v)
+{
+ struct kernel_api_spec *spec = s->private;
+ char *buf;
+ int ret;
+
+ buf = kmalloc(PAGE_SIZE * 4, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ ret = kapi_export_json(spec, buf, PAGE_SIZE * 4);
+ if (ret > 0)
+ seq_printf(s, "%s", buf);
+
+ kfree(buf);
+ return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops = {
+ .open = kapi_spec_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int kapi_list_show(struct seq_file *s, void *v)
+{
+ struct kernel_api_spec *spec;
+ struct dynamic_api_spec *dyn_spec;
+
+ seq_printf(s, "Kernel API Specifications:\n\n");
+
+ /* List static specifications */
+ seq_printf(s, "Static APIs:\n");
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ seq_printf(s, " %s (v%u): %s\n",
+ spec->name, spec->version, spec->description);
+ }
+
+ /* List dynamic specifications */
+ seq_printf(s, "\nDynamic APIs:\n");
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+ spec = dyn_spec->spec;
+ seq_printf(s, " %s (v%u): %s\n",
+ spec->name, spec->version, spec->description);
+ }
+ mutex_unlock(&api_spec_mutex);
+
+ return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops = {
+ .open = kapi_list_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+ struct kernel_api_spec *spec;
+ struct dentry *spec_dir;
+
+ kapi_debugfs_root = debugfs_create_dir("kapi", NULL);
+ if (!kapi_debugfs_root)
+ return -ENOMEM;
+
+ /* Create list file */
+ debugfs_create_file("list", 0444, kapi_debugfs_root, NULL,
+ &kapi_list_fops);
+
+ /* Create directory for specifications */
+ spec_dir = debugfs_create_dir("specs", kapi_debugfs_root);
+
+ /* Create files for each static specification */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ debugfs_create_file(spec->name, 0444, spec_dir, spec,
+ &kapi_spec_fops);
+ }
+
+ return 0;
+}
+
+late_initcall(kapi_debugfs_init);
+
+#endif /* CONFIG_DEBUG_FS */
\ No newline at end of file
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 02/19] eventpoll: add API specification for epoll_create1
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
2025-06-14 13:48 ` [RFC 01/19] kernel/api: introduce kernel API specification framework Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 03/19] eventpoll: add API specification for epoll_create Sasha Levin
` (18 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_create1() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index d4dbffdedd08e..8f8a64ebbaef6 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -21,6 +21,7 @@
#include <linux/hash.h>
#include <linux/spinlock.h>
#include <linux/syscalls.h>
+#include <linux/syscall_api_spec.h>
#include <linux/rbtree.h>
#include <linux/wait.h>
#include <linux/eventpoll.h>
@@ -2265,6 +2266,91 @@ static int do_epoll_create(int flags)
return error;
}
+
+/* Valid values for epoll_create1 flags parameter */
+static const s64 epoll_create1_valid_values[] = { 0, EPOLL_CLOEXEC };
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_create1)
+ KAPI_DESCRIPTION("Create an epoll instance")
+ KAPI_LONG_DESC("Creates a new epoll instance and returns a file descriptor "
+ "referring to that instance. The file descriptor is used for all "
+ "subsequent calls to the epoll interface.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "flags", "int", "Creation flags for the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .constraint_type = KAPI_CONSTRAINT_ENUM,
+ .enum_values = epoll_create1_valid_values,
+ .enum_count = ARRAY_SIZE(epoll_create1_valid_values),
+ .constraints = "Must be 0 or EPOLL_CLOEXEC",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_FD,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid flags specified",
+ "The flags parameter contains invalid values. Only EPOLL_CLOEXEC is allowed.")
+ KAPI_ERROR(1, -EMFILE, "EMFILE", "Per-process file descriptor limit reached",
+ "The per-process limit on the number of open file descriptors has been reached.")
+ KAPI_ERROR(2, -ENFILE, "ENFILE", "System file table overflow",
+ "The system-wide limit on the total number of open files has been reached.")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Insufficient kernel memory",
+ "There was insufficient kernel memory to create the epoll instance.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
+ "The system call was interrupted by a signal before the epoll instance could be created.")
+
+ .error_count = 5,
+ .param_count = 1,
+ .since_version = "2.6.27",
+ .examples = "int epfd = epoll_create1(EPOLL_CLOEXEC);",
+ .notes = "EPOLL_CLOEXEC sets the close-on-exec (FD_CLOEXEC) flag on the new file descriptor. "
+ "When all file descriptors referring to an epoll instance are closed, the kernel "
+ "destroys the instance and releases associated resources.",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "epoll instance",
+ "Creates a new epoll instance and allocates kernel memory for it")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_RESOURCE_CREATE,
+ "file descriptor",
+ "Allocates a new file descriptor in the process's file descriptor table")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(2)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "epoll instance", "non-existent", "created and empty",
+ "A new epoll instance is created with no monitored file descriptors")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(1)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During creation if process receives SIGINT")
+ KAPI_SIGNAL_DESC("If interrupted during kernel memory allocation, returns -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During creation if process receives SIGTERM")
+ KAPI_SIGNAL_DESC("If interrupted during kernel memory allocation, returns -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("At any point during the syscall")
+ KAPI_SIGNAL_DESC("Process is terminated immediately, epoll instance creation may be incomplete")
+ KAPI_SIGNAL_END
+
+ .signal_count = 3,
+KAPI_END_SPEC;
SYSCALL_DEFINE1(epoll_create1, int, flags)
{
return do_epoll_create(flags);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 03/19] eventpoll: add API specification for epoll_create
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
2025-06-14 13:48 ` [RFC 01/19] kernel/api: introduce kernel API specification framework Sasha Levin
2025-06-14 13:48 ` [RFC 02/19] eventpoll: add API specification for epoll_create1 Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 04/19] eventpoll: add API specification for epoll_ctl Sasha Levin
` (17 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_create() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8f8a64ebbaef6..50adea7ba43d1 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2356,6 +2356,117 @@ SYSCALL_DEFINE1(epoll_create1, int, flags)
return do_epoll_create(flags);
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_create)
+ KAPI_DESCRIPTION("Create an epoll instance (obsolete)")
+ KAPI_LONG_DESC("Creates a new epoll instance and returns a file descriptor "
+ "referring to that instance. This is the obsolete interface; "
+ "new applications should use epoll_create1() instead.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "size", "int", "Ignored hint about expected number of file descriptors")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_RANGE(1, INT_MAX)
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .constraints = "Must be greater than zero (ignored since Linux 2.6.8)",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_FD,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "size <= 0",
+ "The size parameter must be greater than zero.")
+ KAPI_ERROR(1, -EMFILE, "EMFILE", "Per-process file descriptor limit reached",
+ "The per-process limit on the number of open file descriptors has been reached.")
+ KAPI_ERROR(2, -ENFILE, "ENFILE", "System file table overflow",
+ "The system-wide limit on the total number of open files has been reached.")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Insufficient kernel memory",
+ "There was insufficient kernel memory to create the epoll instance.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
+ "The system call was interrupted by a signal before the epoll instance could be created.")
+
+ .error_count = 5,
+ .param_count = 1,
+ .since_version = "2.6",
+ .deprecated = true,
+ .replacement = "epoll_create1",
+ .examples = "int epfd = epoll_create(1024); // size is ignored since Linux 2.6.8",
+ .notes = "Since Linux 2.6.8, the size argument is ignored but must be greater than zero. "
+ "The kernel dynamically sizes the data structures as needed. "
+ "For new applications, epoll_create1() should be preferred as it allows "
+ "setting close-on-exec flag atomically.",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "epoll instance",
+ "Creates a new epoll instance and allocates kernel memory for it")
+ KAPI_EFFECT_CONDITION("Always when successful")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_RESOURCE_CREATE,
+ "file descriptor",
+ "Allocates a new file descriptor in the process's file descriptor table")
+ KAPI_EFFECT_CONDITION("Always when successful")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "kernel file table",
+ "Adds new file structure to system-wide file table")
+ KAPI_EFFECT_CONDITION("Always when successful")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(3)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "epoll instance", "non-existent", "created and empty",
+ "A new epoll instance is created with no monitored file descriptors")
+ KAPI_STATE_TRANS_COND("On successful creation")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "file descriptor", "unallocated", "allocated and open",
+ "A new file descriptor is allocated in the process's fd table")
+ KAPI_STATE_TRANS_COND("On successful creation")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(2)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During kernel memory allocation")
+ KAPI_SIGNAL_DESC("If interrupted during memory allocation or fd allocation, returns -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During kernel memory allocation")
+ KAPI_SIGNAL_DESC("If interrupted during memory allocation or fd allocation, returns -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("At any point during the syscall")
+ KAPI_SIGNAL_DESC("Process is terminated immediately, epoll instance creation may be incomplete")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, SIGHUP, "SIGHUP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During kernel operations")
+ KAPI_SIGNAL_DESC("If process is being terminated due to terminal hangup, may return -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGPIPE, "SIGPIPE", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("Never generated by epoll_create")
+ KAPI_SIGNAL_DESC("This signal is not relevant to epoll_create as it doesn't involve pipes")
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+KAPI_END_SPEC;
+
SYSCALL_DEFINE1(epoll_create, int, size)
{
if (size <= 0)
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 04/19] eventpoll: add API specification for epoll_ctl
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (2 preceding siblings ...)
2025-06-14 13:48 ` [RFC 03/19] eventpoll: add API specification for epoll_create Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 05/19] eventpoll: add API specification for epoll_wait Sasha Levin
` (16 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_ctl() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 203 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 203 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 50adea7ba43d1..409a0c440f112 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2647,6 +2647,209 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
* the eventpoll file that enables the insertion/removal/change of
* file descriptors inside the interest set.
*/
+
+/* Valid values for epoll_ctl op parameter */
+static const s64 epoll_ctl_valid_ops[] = {
+ EPOLL_CTL_ADD,
+ EPOLL_CTL_DEL,
+ EPOLL_CTL_MOD,
+};
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_ctl)
+ KAPI_DESCRIPTION("Control interface for an epoll file descriptor")
+ KAPI_LONG_DESC("Performs control operations on the epoll instance referred to by epfd. "
+ "It requests that the operation op be performed for the target file "
+ "descriptor fd. Valid operations are adding, modifying, or deleting "
+ "file descriptors from the interest set.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_FD,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "op", "int", "Operation to be performed")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_ENUM,
+ .enum_values = epoll_ctl_valid_ops,
+ .enum_count = ARRAY_SIZE(epoll_ctl_valid_ops),
+ .constraints = "Must be EPOLL_CTL_ADD, EPOLL_CTL_DEL, or EPOLL_CTL_MOD",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "fd", "int", "File descriptor to be monitored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_FD,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Must refer to a file that supports poll operations",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "event", "struct epoll_event __user *", "Settings for the file descriptor")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event))
+ .constraints = "Required for ADD and MOD operations, ignored for DEL",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_EXACT,
+ KAPI_RETURN_SUCCESS(0)
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd or fd is not a valid file descriptor",
+ "One of the file descriptors is invalid or has been closed.")
+ KAPI_ERROR(1, -EEXIST, "EEXIST", "op is EPOLL_CTL_ADD and fd is already registered",
+ "The file descriptor is already present in the epoll instance.")
+ KAPI_ERROR(2, -EINVAL, "EINVAL", "Invalid operation or parameters",
+ "epfd is not an epoll file descriptor, epfd == fd, op is not valid, "
+ "or EPOLLEXCLUSIVE was specified with invalid events.")
+ KAPI_ERROR(3, -ENOENT, "ENOENT", "op is EPOLL_CTL_MOD or EPOLL_CTL_DEL and fd is not registered",
+ "The file descriptor is not registered with this epoll instance.")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Insufficient kernel memory",
+ "There was insufficient memory to handle the requested operation.")
+ KAPI_ERROR(5, -EPERM, "EPERM", "Target file does not support epoll",
+ "The target file fd does not support poll operations.")
+ KAPI_ERROR(6, -ELOOP, "ELOOP", "Circular monitoring detected",
+ "fd refers to an epoll instance and this operation would result "
+ "in a circular loop of epoll instances monitoring one another.")
+ KAPI_ERROR(7, -EFAULT, "EFAULT", "event points outside accessible address space",
+ "The memory area pointed to by event is not accessible with write permissions.")
+ KAPI_ERROR(8, -EAGAIN, "EAGAIN", "Nonblocking mode and lock not available",
+ "The operation was called in nonblocking mode and could not acquire necessary locks.")
+ KAPI_ERROR(9, -ENOSPC, "ENOSPC", "User epoll watch limit exceeded",
+ "The limit on the total number of epoll watches was exceeded. "
+ "See /proc/sys/fs/epoll/max_user_watches.")
+ KAPI_ERROR(10, -EINTR, "EINTR", "Interrupted by signal",
+ "The system call was interrupted by a signal before completion.")
+
+ .error_count = 11,
+ .param_count = 4,
+ .since_version = "2.6",
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the epoll instance during control operations")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "epnested_mutex", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_DESC("Global mutex to prevent circular epoll structures (acquired for nested epoll)")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "epoll interest list",
+ "Adds new epitem structure to the epoll interest list")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_ADD")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_FREE_MEMORY,
+ "epoll interest list",
+ "Removes epitem structure from the epoll interest list")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_DEL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "epoll event mask",
+ "Modifies the event mask for an existing epitem")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_MOD")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "file reference count",
+ "Increases reference count on the monitored file")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_ADD")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "file reference count",
+ "Decreases reference count on the monitored file")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_DEL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "May wake up processes waiting on the epoll instance if events become available")
+ KAPI_EFFECT_CONDITION("When adding or modifying entries that match current events")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(6)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "epoll entry", "non-existent", "monitored",
+ "File descriptor is added to epoll interest list")
+ KAPI_STATE_TRANS_COND("When op is EPOLL_CTL_ADD")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "epoll entry", "monitored", "non-existent",
+ "File descriptor is removed from epoll interest list")
+ KAPI_STATE_TRANS_COND("When op is EPOLL_CTL_DEL")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "epoll entry", "monitored with events A", "monitored with events B",
+ "Event mask for file descriptor is modified")
+ KAPI_STATE_TRANS_COND("When op is EPOLL_CTL_MOD")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "epoll ready list", "empty or partial", "contains new events",
+ "Ready list may be updated if new/modified entry has pending events")
+ KAPI_STATE_TRANS_COND("When monitored fd has events matching the mask")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During mutex acquisition or memory allocation")
+ KAPI_SIGNAL_DESC("Returns -EINTR if interrupted before completing the operation")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During mutex acquisition or memory allocation")
+ KAPI_SIGNAL_DESC("Returns -EINTR if interrupted before completing the operation")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("At any point during the syscall")
+ KAPI_SIGNAL_DESC("Process is terminated immediately, operation may be partially completed")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, SIGHUP, "SIGHUP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During blocking operations")
+ KAPI_SIGNAL_DESC("Returns -EINTR if terminal hangup occurs")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGURG, "SIGURG", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("May be generated by monitored sockets")
+ KAPI_SIGNAL_DESC("Urgent data signals from monitored sockets do not affect epoll_ctl")
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+
+ .examples = "struct epoll_event ev;\n"
+ "ev.events = EPOLLIN | EPOLLOUT;\n"
+ "ev.data.fd = fd;\n"
+ "if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)\n"
+ " handle_error();\n",
+ .notes = "EPOLL_CTL_DEL ignores the event parameter (can be NULL). "
+ "EPOLLEXCLUSIVE flag has restrictions and cannot be used with EPOLL_CTL_MOD. "
+ "The epoll instance maintains a reference to registered files until they are "
+ "explicitly removed with EPOLL_CTL_DEL or the epoll instance is closed.",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
struct epoll_event __user *, event)
{
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 05/19] eventpoll: add API specification for epoll_wait
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (3 preceding siblings ...)
2025-06-14 13:48 ` [RFC 04/19] eventpoll: add API specification for epoll_ctl Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 06/19] eventpoll: add API specification for epoll_pwait Sasha Levin
` (15 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_wait() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 182 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 409a0c440f112..254b50d687d37 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2932,6 +2932,188 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
return ep_poll(ep, events, maxevents, to);
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_wait)
+ KAPI_DESCRIPTION("Wait for events on an epoll instance")
+ KAPI_LONG_DESC("Waits for events on the epoll instance referred to by epfd. "
+ "The function blocks the calling thread until either at least one of the "
+ "file descriptors referred to by epfd becomes ready for some I/O operation, "
+ "the call is interrupted by a signal handler, or the timeout expires.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_FD,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "events", "struct epoll_event __user *", "Buffer where ready events will be stored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event)) /* Base size of single element */
+ .size_param_idx = 2, /* Size determined by maxevents parameter */
+ .size_multiplier = sizeof(struct epoll_event),
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Must point to an array of at least maxevents epoll_event structures",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "maxevents", "int", "Maximum number of events to return")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ KAPI_PARAM_RANGE(1, INT_MAX / sizeof(struct epoll_event)) /* EP_MAX_EVENTS */
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .constraints = "Must be greater than zero and not exceed system limits",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "timeout", "int", "Timeout in milliseconds")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "-1 blocks indefinitely, 0 returns immediately, >0 specifies milliseconds to wait",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Number of ready file descriptors on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_RANGE,
+ .success_min = 0,
+ .success_max = INT_MAX,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd is not a valid file descriptor",
+ "The epoll file descriptor is invalid or has been closed.")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "events points outside accessible address space",
+ "The memory area pointed to by events is not accessible with write permissions.")
+ KAPI_ERROR(2, -EINTR, "EINTR", "Call interrupted by signal handler",
+ "The call was interrupted by a signal handler before any events "
+ "became ready or the timeout expired.")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid parameters",
+ "epfd is not an epoll file descriptor, or maxevents is less than or equal to zero.")
+
+ .error_count = 4,
+ .param_count = 4,
+ .since_version = "2.6",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "ready list",
+ "Removes events from the epoll ready list as they are reported")
+ KAPI_EFFECT_CONDITION("When events are available and level-triggered")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "Blocks the calling thread until events are available or timeout")
+ KAPI_EFFECT_CONDITION("When timeout != 0 and no events are immediately available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "user memory",
+ "Writes event data to user-provided buffer")
+ KAPI_EFFECT_CONDITION("When events are available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_PROCESS_STATE,
+ "signal state",
+ "Clears TIF_SIGPENDING if a signal was pending")
+ KAPI_EFFECT_CONDITION("When returning due to signal interruption")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(4)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "process", "running", "blocked",
+ "Process blocks waiting for events")
+ KAPI_STATE_TRANS_COND("When no events available and timeout != 0")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process", "blocked", "running",
+ "Process wakes up due to events, timeout, or signal")
+ KAPI_STATE_TRANS_COND("When wait condition is satisfied")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "epoll ready list", "has events", "events consumed",
+ "Ready events are consumed from the epoll instance")
+ KAPI_STATE_TRANS_COND("When returning events to userspace")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "events buffer", "uninitialized", "contains event data",
+ "User buffer is populated with ready events")
+ KAPI_STATE_TRANS_COND("When events are available")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "ANY", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Any pending signal")
+ KAPI_SIGNAL_DESC("Any signal delivered to the thread will interrupt epoll_wait() "
+ "and cause it to return -EINTR. This is checked via signal_pending() "
+ "after checking for available events.")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Always delivered, cannot be blocked")
+ KAPI_SIGNAL_DESC("SIGKILL will terminate the process. The epoll_wait call will "
+ "not return as the process is terminated immediately.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP)
+ KAPI_SIGNAL_CONDITION("Always delivered, cannot be blocked")
+ KAPI_SIGNAL_DESC("SIGSTOP will stop the process. When continued with SIGCONT, "
+ "epoll_wait may return -EINTR if the timeout has not expired.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE)
+ KAPI_SIGNAL_CONDITION("When process is stopped")
+ KAPI_SIGNAL_DESC("SIGCONT resumes a stopped process. If epoll_wait was interrupted "
+ "by SIGSTOP, it may return -EINTR when continued.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGALRM, "SIGALRM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Timer expiration")
+ KAPI_SIGNAL_DESC("SIGALRM from timer expiration will interrupt epoll_wait with -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+ .signal_mask_count = 0, /* No signal mask manipulation in epoll_wait */
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the ready list while checking for and consuming events")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects against concurrent epoll_ctl operations during wait")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ .examples = "struct epoll_event events[10];\n"
+ "int nfds = epoll_wait(epfd, events, 10, 1000);\n"
+ "if (nfds == -1) {\n"
+ " perror(\"epoll_wait\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}\n"
+ "for (int n = 0; n < nfds; ++n) {\n"
+ " if (events[n].data.fd == listen_sock) {\n"
+ " accept_new_connection();\n"
+ " } else {\n"
+ " handle_io(events[n].data.fd);\n"
+ " }\n"
+ "}",
+ .notes = "The timeout uses CLOCK_MONOTONIC and may be rounded up to system clock granularity. "
+ "A timeout of -1 causes epoll_wait to block indefinitely, while a timeout of 0 "
+ "causes it to return immediately even if no events are available. "
+ "The struct epoll_event is defined as containing events (uint32_t) and data (epoll_data_t union).",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
int, maxevents, int, timeout)
{
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 06/19] eventpoll: add API specification for epoll_pwait
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (4 preceding siblings ...)
2025-06-14 13:48 ` [RFC 05/19] eventpoll: add API specification for epoll_wait Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 07/19] eventpoll: add API specification for epoll_pwait2 Sasha Levin
` (14 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_pwait() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 230 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 254b50d687d37..8bd25f9230fc8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -3148,6 +3148,236 @@ static int do_epoll_pwait(int epfd, struct epoll_event __user *events,
return error;
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_pwait)
+ KAPI_DESCRIPTION("Wait for events on an epoll instance with signal handling")
+ KAPI_LONG_DESC("Similar to epoll_wait(), but allows the caller to safely wait for "
+ "either events on the epoll instance or the delivery of a signal. "
+ "The sigmask argument specifies a signal mask which is atomically "
+ "set during the wait, allowing signals to be blocked while not waiting "
+ "and ensuring no signal is lost between checking for events and blocking.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_FD,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "events", "struct epoll_event __user *", "Buffer where ready events will be stored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event))
+ .size_param_idx = 2, /* Size determined by maxevents parameter */
+ .size_multiplier = sizeof(struct epoll_event),
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Must point to an array of at least maxevents epoll_event structures",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "maxevents", "int", "Maximum number of events to return")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ KAPI_PARAM_RANGE(1, INT_MAX / sizeof(struct epoll_event)) /* EP_MAX_EVENTS */
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .constraints = "Must be greater than zero and not exceed system limits",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "timeout", "int", "Timeout in milliseconds")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "-1 blocks indefinitely, 0 returns immediately, >0 specifies milliseconds to wait",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "sigmask", "const sigset_t __user *", "Signal mask to atomically set during wait")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(sigset_t))
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Can be NULL if no signal mask change is desired",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(5, "sigsetsize", "size_t", "Size of the signal set in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_RANGE(sizeof(sigset_t), sizeof(sigset_t))
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .constraints = "Must be sizeof(sigset_t)",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Number of ready file descriptors on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_RANGE,
+ .success_min = 0,
+ .success_max = INT_MAX,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd is not a valid file descriptor",
+ "The epoll file descriptor is invalid or has been closed.")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Memory area not accessible",
+ "The memory area pointed to by events or sigmask is not accessible.")
+ KAPI_ERROR(2, -EINTR, "EINTR", "Call interrupted by signal handler",
+ "The call was interrupted by a signal handler before any events "
+ "became ready or the timeout expired; see signal(7).")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid parameters",
+ "epfd is not an epoll file descriptor, maxevents is less than or equal to zero, "
+ "or sigsetsize is not equal to sizeof(sigset_t).")
+
+ .error_count = 4,
+ .param_count = 6,
+ .since_version = "2.6.19",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "signal mask",
+ "Atomically sets the signal mask for the calling thread")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "ready list",
+ "Removes events from the epoll ready list as they are reported")
+ KAPI_EFFECT_CONDITION("When events are available and level-triggered")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "Blocks the calling thread until events are available, timeout, or signal")
+ KAPI_EFFECT_CONDITION("When timeout != 0 and no events are immediately available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "user memory",
+ "Writes event data to user-provided buffer")
+ KAPI_EFFECT_CONDITION("When events are available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_PROCESS_STATE,
+ "saved signal mask",
+ "Saves and restores the original signal mask")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(5)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "signal mask", "original mask", "user-specified mask",
+ "Thread's signal mask is atomically changed to the provided mask")
+ KAPI_STATE_TRANS_COND("When sigmask is not NULL")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process", "running", "blocked",
+ "Process blocks waiting for events with specified signal mask")
+ KAPI_STATE_TRANS_COND("When no events available and timeout != 0")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "process", "blocked", "running",
+ "Process wakes up due to events, timeout, or unblocked signal")
+ KAPI_STATE_TRANS_COND("When wait condition is satisfied")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal mask", "user-specified mask", "original mask",
+ "Thread's signal mask is restored to its original value")
+ KAPI_STATE_TRANS_COND("When returning from epoll_pwait")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "pending signals", "blocked", "deliverable",
+ "Signals that were blocked by the temporary mask become deliverable")
+ KAPI_STATE_TRANS_COND("When signal mask is restored and signals were pending")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(5)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "ANY_UNBLOCKED", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Signal not blocked by provided sigmask")
+ KAPI_SIGNAL_DESC("Any signal not blocked by the sigmask parameter will interrupt "
+ "epoll_pwait() and cause it to return -EINTR. The signal mask is "
+ "atomically set via set_user_sigmask() and restored via "
+ "restore_saved_sigmask_unless() before returning.")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGKILL cannot be blocked and will terminate the process immediately. "
+ "The epoll_pwait call will not return.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGSTOP cannot be blocked and will stop the process. When continued "
+ "with SIGCONT, epoll_pwait may return -EINTR.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, 0, "BLOCKED_SIGNALS", KAPI_SIGNAL_BLOCK, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("Signals in provided sigmask")
+ KAPI_SIGNAL_DESC("Signals specified in the sigmask parameter are blocked for the "
+ "duration of the epoll_pwait call. They remain pending and will be "
+ "delivered after the signal mask is restored.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE)
+ KAPI_SIGNAL_CONDITION("When process is stopped")
+ KAPI_SIGNAL_DESC("SIGCONT resumes a stopped process. If epoll_pwait was interrupted "
+ "by SIGSTOP, it may return -EINTR when continued.")
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+
+ /* Signal mask specifications */
+ KAPI_SIGNAL_MASK(0, "user_sigmask", "User-provided signal mask atomically applied")
+ .description = "The signal mask provided in the sigmask parameter is atomically "
+ "set for the duration of the wait operation. This prevents race "
+ "conditions between checking for events and blocking. The original "
+ "signal mask is restored before epoll_pwait returns, unless the "
+ "return value is -EINTR (in which case the mask is restored by "
+ "the signal delivery mechanism)."
+ KAPI_SIGNAL_MASK_END
+
+ .signal_mask_count = 1,
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the ready list while checking for and consuming events")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects against concurrent epoll_ctl operations during wait")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ .examples = "sigset_t sigmask;\n"
+ "struct epoll_event events[10];\n\n"
+ "/* Block SIGINT during epoll_pwait */\n"
+ "sigemptyset(&sigmask);\n"
+ "sigaddset(&sigmask, SIGINT);\n\n"
+ "int nfds = epoll_pwait(epfd, events, 10, 1000, &sigmask, sizeof(sigmask));\n"
+ "if (nfds == -1) {\n"
+ " if (errno == EINTR) {\n"
+ " /* Handle signal */\n"
+ " }\n"
+ " perror(\"epoll_pwait\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}",
+ .notes = "epoll_pwait() is equivalent to atomically executing:\n"
+ " sigset_t origmask;\n"
+ " pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);\n"
+ " ready = epoll_wait(epfd, events, maxevents, timeout);\n"
+ " pthread_sigmask(SIG_SETMASK, &origmask, NULL);\n"
+ "This atomicity prevents race conditions where a signal could be delivered "
+ "after checking for events but before blocking in epoll_wait(). "
+ "The signal mask is always restored before epoll_pwait() returns.",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
int, maxevents, int, timeout, const sigset_t __user *, sigmask,
size_t, sigsetsize)
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 07/19] eventpoll: add API specification for epoll_pwait2
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (5 preceding siblings ...)
2025-06-14 13:48 ` [RFC 06/19] eventpoll: add API specification for epoll_pwait Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 08/19] exec: add API specification for execve Sasha Levin
` (13 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_pwait2() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 244 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 244 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8bd25f9230fc8..0e90d66467010 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -3389,6 +3389,250 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
sigmask, sigsetsize);
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_pwait2)
+ KAPI_DESCRIPTION("Wait for events on an epoll instance with nanosecond precision timeout")
+ KAPI_LONG_DESC("Similar to epoll_pwait(), but takes a timespec structure that allows "
+ "nanosecond precision for the timeout value. This provides more accurate "
+ "timeout control compared to the millisecond precision of epoll_pwait(). "
+ "Like epoll_pwait(), it atomically sets a signal mask during the wait.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_FD,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "events", "struct epoll_event __user *", "Buffer where ready events will be stored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event))
+ .size_param_idx = 2, /* Size determined by maxevents parameter */
+ .size_multiplier = sizeof(struct epoll_event),
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Must point to an array of at least maxevents epoll_event structures",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "maxevents", "int", "Maximum number of events to return")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ KAPI_PARAM_RANGE(1, INT_MAX / sizeof(struct epoll_event)) /* EP_MAX_EVENTS */
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .constraints = "Must be greater than zero and not exceed system limits",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "timeout", "const struct __kernel_timespec __user *", "Timeout with nanosecond precision")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(struct __kernel_timespec))
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "NULL means block indefinitely, {0, 0} returns immediately, "
+ "negative values are invalid",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "sigmask", "const sigset_t __user *", "Signal mask to atomically set during wait")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ .type = KAPI_TYPE_USER_PTR,
+ KAPI_PARAM_SIZE(sizeof(sigset_t))
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Can be NULL if no signal mask change is desired",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(5, "sigsetsize", "size_t", "Size of the signal set in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_RANGE(sizeof(sigset_t), sizeof(sigset_t))
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .constraints = "Must be sizeof(sigset_t)",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Number of ready file descriptors on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_RANGE,
+ .success_min = 0,
+ .success_max = INT_MAX,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd is not a valid file descriptor",
+ "The epoll file descriptor is invalid or has been closed.")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Memory area not accessible",
+ "The memory area pointed to by events, timeout, or sigmask is not accessible.")
+ KAPI_ERROR(2, -EINTR, "EINTR", "Call interrupted by signal handler",
+ "The call was interrupted by a signal handler before any events "
+ "became ready or the timeout expired.")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid parameters",
+ "epfd is not an epoll file descriptor, maxevents is less than or equal to zero, "
+ "sigsetsize is not equal to sizeof(sigset_t), or timeout values are invalid.")
+
+ .error_count = 4,
+ .param_count = 6,
+ .since_version = "5.11",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "signal mask",
+ "Atomically sets the signal mask for the calling thread")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "ready list",
+ "Removes events from the epoll ready list as they are reported")
+ KAPI_EFFECT_CONDITION("When events are available and level-triggered")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "Blocks the calling thread until events, timeout, or signal")
+ KAPI_EFFECT_CONDITION("When timeout != NULL or timeout->tv_sec/tv_nsec != 0")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "user memory",
+ "Writes event data to user-provided buffer")
+ KAPI_EFFECT_CONDITION("When events are available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_PROCESS_STATE,
+ "saved signal mask",
+ "Saves and restores the original signal mask")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_MODIFY_STATE,
+ "timer precision",
+ "Timeout may be rounded up to system timer granularity")
+ KAPI_EFFECT_CONDITION("When timeout is specified")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(6)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "signal mask", "original mask", "user-specified mask",
+ "Thread's signal mask is atomically changed to the provided mask")
+ KAPI_STATE_TRANS_COND("When sigmask is not NULL")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process", "running", "blocked",
+ "Process blocks waiting for events with specified signal mask")
+ KAPI_STATE_TRANS_COND("When no events available and not immediate return")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "process", "blocked", "running",
+ "Process wakes up due to events, timeout expiry, or unblocked signal")
+ KAPI_STATE_TRANS_COND("When wait condition is satisfied")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal mask", "user-specified mask", "original mask",
+ "Thread's signal mask is restored to its original value")
+ KAPI_STATE_TRANS_COND("When returning from epoll_pwait2")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "pending signals", "blocked", "deliverable",
+ "Signals that were blocked by the temporary mask become deliverable")
+ KAPI_STATE_TRANS_COND("When signal mask is restored and signals were pending")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(5, "timeout timer", "not started", "armed with nanosecond precision",
+ "High resolution timer is armed with the specified timeout")
+ KAPI_STATE_TRANS_COND("When timeout is specified and > 0")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(6)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "ANY_UNBLOCKED", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Signal not blocked by provided sigmask")
+ KAPI_SIGNAL_DESC("Any signal not blocked by the sigmask parameter will interrupt "
+ "epoll_pwait2() and cause it to return -EINTR. Signal handling is "
+ "identical to epoll_pwait().")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGKILL cannot be blocked and will terminate the process immediately.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGSTOP cannot be blocked and will stop the process.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, 0, "BLOCKED_SIGNALS", KAPI_SIGNAL_BLOCK, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("Signals in provided sigmask")
+ KAPI_SIGNAL_DESC("Signals specified in the sigmask parameter are blocked during "
+ "the epoll_pwait2 call.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE)
+ KAPI_SIGNAL_CONDITION("When process is stopped")
+ KAPI_SIGNAL_DESC("SIGCONT resumes a stopped process. If epoll_pwait2 was interrupted "
+ "by SIGSTOP, it may return -EINTR when continued.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(5, SIGALRM, "SIGALRM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Timer expiration")
+ KAPI_SIGNAL_DESC("SIGALRM or other timer signals will interrupt epoll_pwait2 with -EINTR "
+ "if not blocked by sigmask")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ .signal_count = 6,
+
+ /* Signal mask specifications */
+ KAPI_SIGNAL_MASK(0, "user_sigmask", "User-provided signal mask atomically applied")
+ .description = "The signal mask is atomically set and restored exactly as in "
+ "epoll_pwait(), providing the same race-condition prevention."
+ KAPI_SIGNAL_MASK_END
+
+ .signal_mask_count = 1,
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the ready list while checking for and consuming events")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects against concurrent epoll_ctl operations during wait")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ .examples = "sigset_t sigmask;\n"
+ "struct epoll_event events[10];\n"
+ "struct __kernel_timespec ts;\n\n"
+ "/* Block SIGINT during epoll_pwait2 */\n"
+ "sigemptyset(&sigmask);\n"
+ "sigaddset(&sigmask, SIGINT);\n\n"
+ "/* Wait for 1.5 seconds */\n"
+ "ts.tv_sec = 1;\n"
+ "ts.tv_nsec = 500000000; /* 500 milliseconds */\n\n"
+ "int nfds = epoll_pwait2(epfd, events, 10, &ts, &sigmask, sizeof(sigmask));\n"
+ "if (nfds == -1) {\n"
+ " if (errno == EINTR) {\n"
+ " /* Handle signal */\n"
+ " }\n"
+ " perror(\"epoll_pwait2\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}\n\n"
+ "/* Example with infinite timeout */\n"
+ "nfds = epoll_pwait2(epfd, events, 10, NULL, &sigmask, sizeof(sigmask));",
+ .notes = "epoll_pwait2() provides nanosecond precision timeouts, addressing the limitation "
+ "of epoll_pwait() which only supports millisecond precision. The timeout parameter "
+ "uses struct __kernel_timespec which is compatible with 64-bit time values, making "
+ "it Y2038-safe. Like epoll_pwait(), the signal mask operation is atomic. "
+ "The timeout is still subject to system timer granularity and may be rounded up.",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE6(epoll_pwait2, int, epfd, struct epoll_event __user *, events,
int, maxevents, const struct __kernel_timespec __user *, timeout,
const sigset_t __user *, sigmask, size_t, sigsetsize)
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 08/19] exec: add API specification for execve
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (6 preceding siblings ...)
2025-06-14 13:48 ` [RFC 07/19] eventpoll: add API specification for epoll_pwait2 Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-16 21:39 ` Florian Weimer
2025-06-14 13:48 ` [RFC 09/19] exec: add API specification for execveat Sasha Levin
` (12 subsequent siblings)
20 siblings, 1 reply; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add comprehensive kernel API specification for the execve() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/exec.c | 218 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 218 insertions(+)
diff --git a/fs/exec.c b/fs/exec.c
index 1f5fdd2e096e3..3d006105ab23d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -52,6 +52,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/syscall_api_spec.h>
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/audit.h>
@@ -1997,7 +1998,224 @@ void set_dumpable(struct mm_struct *mm, int value)
set_mask_bits(&mm->flags, MMF_DUMPABLE_MASK, value);
}
+
+DEFINE_KERNEL_API_SPEC(sys_execve)
+ KAPI_DESCRIPTION("Execute a new program")
+ KAPI_LONG_DESC("Executes the program referred to by filename. This causes the program "
+ "that is currently being run by the calling process to be replaced with "
+ "a new program, with newly initialized stack, heap, and (initialized and "
+ "uninitialized) data segments. The process ID remains the same.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "filename", "const char __user *", "Pathname of the program to execute")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_PATH,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Must be a valid pathname to an executable file or script",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "argv", "const char __user *const __user *", "Array of argument strings passed to the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "NULL-terminated array of pointers to null-terminated strings",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "envp", "const char __user *const __user *", "Array of environment strings for the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "NULL-terminated array of pointers to null-terminated strings in form key=value",
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Does not return on success; returns -1 on error")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -E2BIG, "E2BIG", "Argument list too long",
+ "The total size of argv and envp exceeds the system limit.")
+ KAPI_ERROR(1, -EACCES, "EACCES", "Permission denied",
+ "Search permission denied on a component of the path, file is not regular, "
+ "or execute permission denied for file or interpreter.")
+ KAPI_ERROR(2, -EFAULT, "EFAULT", "Bad address",
+ "filename, argv, or envp points outside accessible address space.")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid executable format",
+ "An ELF executable has more than one PT_INTERP segment.")
+ KAPI_ERROR(4, -EIO, "EIO", "I/O error",
+ "An I/O error occurred while reading from the file system.")
+ KAPI_ERROR(5, -EISDIR, "EISDIR", "Is a directory",
+ "An ELF interpreter was a directory.")
+ KAPI_ERROR(6, -ELIBBAD, "ELIBBAD", "Invalid ELF interpreter",
+ "An ELF interpreter was not in a recognized format.")
+ KAPI_ERROR(7, -ELOOP, "ELOOP", "Too many symbolic links",
+ "Too many symbolic links encountered while resolving filename or interpreter.")
+ KAPI_ERROR(8, -EMFILE, "EMFILE", "Too many open files",
+ "The per-process limit on open file descriptors has been reached.")
+ KAPI_ERROR(9, -ENAMETOOLONG, "ENAMETOOLONG", "Filename too long",
+ "filename or one of the strings in argv or envp is too long.")
+ KAPI_ERROR(10, -ENFILE, "ENFILE", "System file table overflow",
+ "The system-wide limit on open files has been reached.")
+ KAPI_ERROR(11, -ENOENT, "ENOENT", "File not found",
+ "The file filename or an interpreter does not exist.")
+ KAPI_ERROR(12, -ENOEXEC, "ENOEXEC", "Exec format error",
+ "An executable is not in a recognized format, is for wrong architecture, "
+ "or has other format errors preventing execution.")
+ KAPI_ERROR(13, -ENOMEM, "ENOMEM", "Out of memory",
+ "Insufficient kernel memory available.")
+ KAPI_ERROR(14, -ENOTDIR, "ENOTDIR", "Not a directory",
+ "A component of the path prefix is not a directory.")
+ KAPI_ERROR(15, -EPERM, "EPERM", "Operation not permitted",
+ "The filesystem is mounted nosuid, the user is not root, and the file has "
+ "set-user-ID or set-group-ID bit set.")
+ KAPI_ERROR(16, -ETXTBSY, "ETXTBSY", "Text file busy",
+ "The executable was open for writing by one or more processes.")
+ KAPI_ERROR(17, -EAGAIN, "EAGAIN", "Resource temporarily unavailable",
+ "RLIMIT_NPROC limit exceeded - too many processes for this user.")
+
+ .error_count = 18,
+ .param_count = 3,
+ .since_version = "1.0",
+ .examples = "char *argv[] = { \"echo\", \"hello\", \"world\", NULL };\n"
+ "char *envp[] = { \"PATH=/bin\", NULL };\n"
+ "execve(\"/bin/echo\", argv, envp);\n"
+ "/* This point is only reached on error */\n"
+ "perror(\"execve failed\");\n"
+ "exit(EXIT_FAILURE);",
+ .notes = "On success, execve() does not return; the new program is executed. "
+ "File descriptors remain open unless marked close-on-exec. "
+ "Signal dispositions are reset to default except for ignored signals. "
+ "Any alternate signal stack is not preserved. "
+ "The process's set of pending signals is cleared. "
+ "All threads except the calling thread are destroyed.",
+
+ /* Fatal signals can interrupt exec */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during exec setup")
+ KAPI_SIGNAL_DESC("Fatal signals (checked via fatal_signal_pending()) can interrupt "
+ "exec during setup phases like de_thread(). This causes exec to fail "
+ "and the process to exit.")
+ KAPI_SIGNAL_END
+
+ /* SIGKILL sent to other threads */
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_TARGET("All other threads in the thread group")
+ KAPI_SIGNAL_CONDITION("Multi-threaded process doing exec")
+ KAPI_SIGNAL_DESC("During de_thread(), zap_other_threads() sends SIGKILL to all "
+ "other threads in the thread group to ensure only the execing "
+ "thread survives.")
+ KAPI_SIGNAL_END
+
+ /* Signal handlers reset */
+ KAPI_SIGNAL(2, 0, "ALL_HANDLERS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal has a handler installed")
+ KAPI_SIGNAL_DESC("flush_signal_handlers() resets all signal handlers to SIG_DFL "
+ "except for signals that are ignored (SIG_IGN). This happens "
+ "after de_thread() completes.")
+ KAPI_SIGNAL_END
+
+ /* Ignored signals preserved */
+ KAPI_SIGNAL(3, 0, "IGNORED_SIGNALS", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal disposition is SIG_IGN")
+ KAPI_SIGNAL_DESC("Signals set to SIG_IGN are preserved across exec. This is "
+ "POSIX-compliant behavior allowing parent processes to ignore "
+ "signals in children.")
+ KAPI_SIGNAL_END
+
+ /* Pending signals cleared */
+ KAPI_SIGNAL(4, 0, "PENDING_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Any pending signals")
+ KAPI_SIGNAL_DESC("All pending signals are cleared during exec. This includes "
+ "both thread-specific and process-wide pending signals.")
+ KAPI_SIGNAL_END
+
+ /* Timer signals cleared */
+ KAPI_SIGNAL(5, 0, "TIMER_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Timer-generated signals pending")
+ KAPI_SIGNAL_DESC("flush_itimer_signals() clears any pending timer signals "
+ "(SIGALRM, SIGVTALRM, SIGPROF) to prevent confusion in the "
+ "new program.")
+ KAPI_SIGNAL_END
+
+ /* Exit signal set to SIGCHLD */
+ KAPI_SIGNAL(6, SIGCHLD, "SIGCHLD", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Parent process when this process exits")
+ KAPI_SIGNAL_CONDITION("Process exit after exec")
+ KAPI_SIGNAL_DESC("The exit_signal is set to SIGCHLD during exec, ensuring the "
+ "parent will receive SIGCHLD when this process terminates.")
+ KAPI_SIGNAL_END
+
+ /* Alternate signal stack cleared */
+ KAPI_SIGNAL(7, 0, "SIGALTSTACK", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Process had alternate signal stack")
+ KAPI_SIGNAL_DESC("Any alternate signal stack (sigaltstack) is not preserved "
+ "across exec. The new program starts with no alternate stack.")
+ KAPI_SIGNAL_END
+
+ .signal_count = 8,
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_FREE_MEMORY | KAPI_EFFECT_ALLOC_MEMORY,
+ "process image",
+ "Replaces entire process image including code, data, heap, and stack")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_RESOURCE_DESTROY,
+ "file descriptors",
+ "Closes all file descriptors with close-on-exec flag set")
+ KAPI_EFFECT_CONDITION("FD_CLOEXEC flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "signal handlers",
+ "Resets all signal handlers to default, preserves ignored signals")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_SIGNAL_SEND,
+ "thread group",
+ "Kills all other threads in the thread group with SIGKILL")
+ KAPI_EFFECT_CONDITION("Multi-threaded process")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "process attributes",
+ "Clears pending signals, timers, alternate signal stack, and various process attributes")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_FILESYSTEM,
+ "executable file",
+ "Opens and reads the executable file, may trigger filesystem operations")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(6)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "process memory",
+ "old program image", "new program image",
+ "Complete replacement of process address space with new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process credentials",
+ "current credentials", "potentially modified credentials",
+ "May change effective UID/GID based on file permissions")
+ KAPI_STATE_TRANS_COND("setuid/setgid binary")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "thread state",
+ "multi-threaded", "single-threaded",
+ "Process becomes single-threaded after killing other threads")
+ KAPI_STATE_TRANS_COND("Multi-threaded process")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal state",
+ "custom handlers and pending signals", "default handlers, no pending signals",
+ "Signal handling reset to clean state for new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+KAPI_END_SPEC;
SYSCALL_DEFINE3(execve,
+
const char __user *, filename,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 09/19] exec: add API specification for execveat
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (7 preceding siblings ...)
2025-06-14 13:48 ` [RFC 08/19] exec: add API specification for execve Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 10/19] mm/mlock: add API specification for mlock Sasha Levin
` (11 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add comprehensive kernel API specification for the execveat() system
call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/exec.c | 245 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 245 insertions(+)
diff --git a/fs/exec.c b/fs/exec.c
index 3d006105ab23d..49d8647c053ef 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2223,6 +2223,251 @@ SYSCALL_DEFINE3(execve,
return do_execve(getname(filename), argv, envp);
}
+
+/* Valid flag combinations for execveat */
+static const s64 execveat_valid_flags[] = {
+ 0,
+ AT_EMPTY_PATH,
+ AT_SYMLINK_NOFOLLOW,
+ AT_EXECVE_CHECK,
+ AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+ AT_EMPTY_PATH | AT_EXECVE_CHECK,
+ AT_SYMLINK_NOFOLLOW | AT_EXECVE_CHECK,
+ AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW | AT_EXECVE_CHECK,
+};
+
+DEFINE_KERNEL_API_SPEC(sys_execveat)
+ KAPI_DESCRIPTION("Execute a new program relative to a directory file descriptor")
+ KAPI_LONG_DESC("Executes the program referred to by the combination of fd and filename. "
+ "This system call is useful when implementing a secure execution environment "
+ "or when the calling process has an open file descriptor but no access to "
+ "the corresponding pathname. Like execve(), it replaces the current process "
+ "image with a new process image.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "fd", "int", "Directory file descriptor")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_FD,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "AT_FDCWD for current directory, or valid directory file descriptor",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "filename", "const char __user *", "Pathname of the program to execute")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ .type = KAPI_TYPE_PATH,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Relative or absolute path; empty string with AT_EMPTY_PATH to use fd directly",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "argv", "const char __user *const __user *", "Array of argument strings passed to the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "NULL-terminated array of pointers to null-terminated strings",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "envp", "const char __user *const __user *", "Array of environment strings for the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "NULL-terminated array of pointers to null-terminated strings in form key=value",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "flags", "int", "Execution flags")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_MASK,
+ .valid_mask = AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW | AT_EXECVE_CHECK,
+ .constraints = "0 or combination of AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW, and AT_EXECVE_CHECK",
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "Does not return on success (except with AT_EXECVE_CHECK which returns 0); returns -1 on error")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -E2BIG, "E2BIG", "Argument list too long", "The total size of argv and envp exceeds the system limit.")
+ KAPI_ERROR(1, -EACCES, "EACCES", "Permission denied", "Search permission denied on a component of the path, file is not regular, or execute permission denied for file or interpreter.")
+ KAPI_ERROR(2, -EBADF, "EBADF", "Bad file descriptor", "fd is not a valid file descriptor.")
+ KAPI_ERROR(3, -EFAULT, "EFAULT", "Bad address", "filename, argv, or envp points outside accessible address space.")
+ KAPI_ERROR(4, -EINVAL, "EINVAL", "Invalid flags or executable format", "Invalid flags specified, or ELF executable has more than one PT_INTERP segment.")
+ KAPI_ERROR(5, -EIO, "EIO", "I/O error", "An I/O error occurred while reading from the file system.")
+ KAPI_ERROR(6, -EISDIR, "EISDIR", "Is a directory", "An ELF interpreter was a directory.")
+ KAPI_ERROR(7, -ELIBBAD, "ELIBBAD", "Invalid ELF interpreter", "An ELF interpreter was not in a recognized format.")
+ KAPI_ERROR(8, -ELOOP, "ELOOP", "Too many symbolic links", "Too many symbolic links encountered, or AT_SYMLINK_NOFOLLOW was specified but filename refers to a symbolic link.")
+ KAPI_ERROR(9, -EMFILE, "EMFILE", "Too many open files", "The per-process limit on open file descriptors has been reached.")
+ KAPI_ERROR(10, -ENAMETOOLONG, "ENAMETOOLONG", "Filename too long", "filename or one of the strings in argv or envp is too long.")
+ KAPI_ERROR(11, -ENFILE, "ENFILE", "System file table overflow", "The system-wide limit on open files has been reached.")
+ KAPI_ERROR(12, -ENOENT, "ENOENT", "File not found", "The file filename or an interpreter does not exist, or filename is empty and AT_EMPTY_PATH was not specified in flags.")
+ KAPI_ERROR(13, -ENOEXEC, "ENOEXEC", "Exec format error", "An executable is not in a recognized format, is for wrong architecture, or has other format errors preventing execution.")
+ KAPI_ERROR(14, -ENOMEM, "ENOMEM", "Out of memory", "Insufficient kernel memory available.")
+ KAPI_ERROR(15, -ENOTDIR, "ENOTDIR", "Not a directory", "A component of the path prefix is not a directory, or fd is not a directory when a relative path is given.")
+ KAPI_ERROR(16, -EPERM, "EPERM", "Operation not permitted", "The filesystem is mounted nosuid, the user is not root, and the file has set-user-ID or set-group-ID bit set.")
+ KAPI_ERROR(17, -ETXTBSY, "ETXTBSY", "Text file busy", "The executable was open for writing by one or more processes.")
+ KAPI_ERROR(18, -EAGAIN, "EAGAIN", "Resource temporarily unavailable", "RLIMIT_NPROC limit exceeded - too many processes for this user.")
+ KAPI_ERROR(19, -EINTR, "EINTR", "Interrupted by signal", "The exec was interrupted by a signal during setup phase.")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during exec setup")
+ KAPI_SIGNAL_DESC("Fatal signals (checked via fatal_signal_pending()) can interrupt exec during setup phases like de_thread(). This causes exec to fail and the process to exit.")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_TARGET("All other threads in the thread group")
+ KAPI_SIGNAL_CONDITION("Multi-threaded process doing exec")
+ KAPI_SIGNAL_DESC("During de_thread(), zap_other_threads() sends SIGKILL to all other threads in the thread group to ensure only the execing thread survives.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, 0, "ALL_HANDLERS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal has a handler installed")
+ KAPI_SIGNAL_DESC("flush_signal_handlers() resets all signal handlers to SIG_DFL except for signals that are ignored (SIG_IGN). This happens after de_thread() completes.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, 0, "IGNORED_SIGNALS", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal disposition is SIG_IGN")
+ KAPI_SIGNAL_DESC("Signals set to SIG_IGN are preserved across exec. This is POSIX-compliant behavior allowing parent processes to ignore signals in children.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, 0, "PENDING_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Any pending signals")
+ KAPI_SIGNAL_DESC("All pending signals are cleared during exec. This includes both thread-specific and process-wide pending signals.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(5, 0, "TIMER_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Timer-generated signals pending")
+ KAPI_SIGNAL_DESC("flush_itimer_signals() clears any pending timer signals (SIGALRM, SIGVTALRM, SIGPROF) to prevent confusion in the new program.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(6, SIGCHLD, "SIGCHLD", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Parent process when this process exits")
+ KAPI_SIGNAL_CONDITION("Process exit after exec")
+ KAPI_SIGNAL_DESC("The exit_signal is set to SIGCHLD during exec, ensuring the parent will receive SIGCHLD when this process terminates.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(7, 0, "SIGALTSTACK", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Process had alternate signal stack")
+ KAPI_SIGNAL_DESC("Any alternate signal stack (sigaltstack) is not preserved across exec. The new program starts with no alternate stack.")
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_FREE_MEMORY | KAPI_EFFECT_ALLOC_MEMORY,
+ "process image",
+ "Replaces entire process image including code, data, heap, and stack")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_RESOURCE_DESTROY,
+ "file descriptors",
+ "Closes all file descriptors with close-on-exec flag set")
+ KAPI_EFFECT_CONDITION("FD_CLOEXEC flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "signal handlers",
+ "Resets all signal handlers to default, preserves ignored signals")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_SIGNAL_SEND,
+ "thread group",
+ "Kills all other threads in the thread group with SIGKILL")
+ KAPI_EFFECT_CONDITION("Multi-threaded process")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "process attributes",
+ "Clears pending signals, timers, alternate signal stack, and various process attributes")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_FILESYSTEM,
+ "executable file",
+ "Opens and reads the executable file, may trigger filesystem operations")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(6, KAPI_EFFECT_MODIFY_STATE,
+ "security context",
+ "May change SELinux/AppArmor context based on file labels and transitions")
+ KAPI_EFFECT_CONDITION("LSM enabled")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "process memory",
+ "old program image", "new program image",
+ "Complete replacement of process address space with new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process credentials",
+ "current credentials", "potentially modified credentials",
+ "May change effective UID/GID based on file permissions")
+ KAPI_STATE_TRANS_COND("setuid/setgid binary")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "thread state",
+ "multi-threaded", "single-threaded",
+ "Process becomes single-threaded after killing other threads")
+ KAPI_STATE_TRANS_COND("Multi-threaded process")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal state",
+ "custom handlers and pending signals", "default handlers, no pending signals",
+ "Signal handling reset to clean state for new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "file descriptor table",
+ "contains close-on-exec FDs", "close-on-exec FDs closed",
+ "All file descriptors marked FD_CLOEXEC are closed during exec")
+ KAPI_STATE_TRANS_COND("FDs with FD_CLOEXEC")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(5, "working directory",
+ "fd-relative operations", "resolved to absolute paths",
+ "Directory fd operations resolved before exec completes")
+ KAPI_STATE_TRANS_COND("Using dirfd != AT_FDCWD")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "cred_guard_mutex", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_DESC("Protects against concurrent credential changes during exec")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_DESC("Ensures atomic credential transition during exec process")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "sighand->siglock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Protects signal handler modifications")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken during signal handler reset and pending signal clearing")
+ KAPI_LOCK_END
+
+ KAPI_SIDE_EFFECT_COUNT(7)
+ KAPI_STATE_TRANS_COUNT(6)
+
+ .error_count = 20,
+ .param_count = 5,
+ .since_version = "3.19",
+ .examples = "/* Execute /bin/echo using AT_FDCWD */\n"
+ "char *argv[] = { \"echo\", \"hello\", NULL };\n"
+ "char *envp[] = { \"PATH=/bin\", NULL };\n"
+ "execveat(AT_FDCWD, \"/bin/echo\", argv, envp, 0);\n\n"
+ "/* Execute via file descriptor */\n"
+ "int fd = open(\"/bin/echo\", O_PATH);\n"
+ "execveat(fd, \"\", argv, envp, AT_EMPTY_PATH);\n\n"
+ "/* Execute relative to directory fd */\n"
+ "int dirfd = open(\"/bin\", O_RDONLY | O_DIRECTORY);\n"
+ "execveat(dirfd, \"echo\", argv, envp, 0);",
+ .notes = "execveat() was added to allow fexecve() to be implemented on systems that "
+ "do not have /proc mounted. When filename is an empty string and AT_EMPTY_PATH "
+ "is specified, the file descriptor fd specifies the file to be executed. "
+ "AT_SYMLINK_NOFOLLOW prevents following symbolic links. "
+ "AT_EXECVE_CHECK (since Linux 6.12) only checks if execution would be allowed "
+ "without actually executing. Like execve(), on success execveat() does not return "
+ "(except with AT_EXECVE_CHECK which returns 0).",
+ .signal_count = 8,
+ .lock_count = 2,
+KAPI_END_SPEC;
+
SYSCALL_DEFINE5(execveat,
int, fd, const char __user *, filename,
const char __user *const __user *, argv,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 10/19] mm/mlock: add API specification for mlock
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (8 preceding siblings ...)
2025-06-14 13:48 ` [RFC 09/19] exec: add API specification for execveat Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 11/19] mm/mlock: add API specification for mlock2 Sasha Levin
` (10 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the mlock() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 3cb72b579ffd3..a37102df54b01 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -25,6 +25,7 @@
#include <linux/memcontrol.h>
#include <linux/mm_inline.h>
#include <linux/secretmem.h>
+#include <linux/syscall_api_spec.h>
#include "internal.h"
@@ -658,6 +659,110 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
return 0;
}
+
+DEFINE_KERNEL_API_SPEC(sys_mlock)
+ KAPI_DESCRIPTION("Lock pages in memory")
+ KAPI_LONG_DESC("Locks pages in the specified address range into RAM, "
+ "preventing them from being paged to swap. Requires "
+ "CAP_IPC_LOCK capability or RLIMIT_MEMLOCK resource limit.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address of memory range to lock")
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Rounded down to page boundary",
+ KAPI_PARAM_END
+ KAPI_PARAM(1, "len", "size_t", "Length of memory range to lock in bytes")
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ KAPI_PARAM_RANGE(0, LONG_MAX)
+ .constraints = "Rounded up to page boundary",
+ KAPI_PARAM_END
+
+ .return_spec = {
+ .type_name = "long",
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ .description = "0 on success, negative error code on failure",
+ },
+
+ KAPI_ERROR(0, -ENOMEM, "ENOMEM", "Address range issue",
+ "Some of the specified range is not mapped, has unmapped gaps, "
+ "or the lock would cause the number of mapped regions to exceed the limit.")
+ KAPI_ERROR(1, -EPERM, "EPERM", "Insufficient privileges",
+ "The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.")
+ KAPI_ERROR(2, -EINVAL, "EINVAL", "Address overflow",
+ "The result of the addition start+len was less than start (arithmetic overflow).")
+ KAPI_ERROR(3, -EAGAIN, "EAGAIN", "Some or all memory could not be locked",
+ "Some or all of the specified address range could not be locked.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
+ "The operation was interrupted by a fatal signal before completion.")
+
+ .error_count = 5,
+ .param_count = 2,
+ .since_version = "2.0",
+
+ .locks[0] = {
+ .lock_name = "mmap_lock",
+ .lock_type = KAPI_LOCK_RWLOCK,
+ .acquired = true,
+ .released = true,
+ .description = "Process memory map write lock",
+ },
+ .lock_count = 1,
+
+ /* Signal specifications */
+ .signal_count = 1,
+
+ /* Fatal signals can interrupt mmap_write_lock_killable */
+ KAPI_SIGNAL(0, 0, "FATAL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, etc.) can interrupt the operation "
+ "when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_END
+
+ .examples = "mlock(addr, 4096); // Lock one page\n"
+ "mlock(addr, len); // Lock range of pages",
+ .notes = "Memory locks do not stack - multiple calls on the same range can be "
+ "undone by a single munlock. Locks are not inherited by child processes. "
+ "Pages are locked on whole page boundaries.",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "process memory",
+ "Locks pages into physical memory, preventing swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Increases process locked memory counter")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_ALLOC_MEMORY,
+ "physical pages",
+ "May allocate and populate page table entries")
+ KAPI_EFFECT_CONDITION("Pages not already present")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(3)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "memory pages", "swappable", "locked in RAM",
+ "Pages become non-swappable and pinned in physical memory")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "VMA flags", "unlocked", "VM_LOCKED set",
+ "Virtual memory area marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(2)
+KAPI_END_SPEC;
+
SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
{
return do_mlock(start, len, VM_LOCKED);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 11/19] mm/mlock: add API specification for mlock2
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (9 preceding siblings ...)
2025-06-14 13:48 ` [RFC 10/19] mm/mlock: add API specification for mlock Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 12/19] mm/mlock: add API specification for mlockall Sasha Levin
` (9 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the mlock2() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 148 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index a37102df54b01..af2ab78acc226 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -768,6 +768,154 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+
+DEFINE_KERNEL_API_SPEC(sys_mlock2)
+ KAPI_DESCRIPTION("Lock pages in memory with flags")
+ KAPI_LONG_DESC("Enhanced version of mlock() that supports flags. "
+ "MLOCK_ONFAULT flag allows locking pages on fault rather than immediately.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address of memory range to lock")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Rounded down to page boundary",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "len", "size_t", "Length of memory range to lock in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ KAPI_PARAM_RANGE(0, LONG_MAX)
+ .constraints = "Rounded up to page boundary",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "flags", "int", "Flags controlling lock behavior")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_MASK,
+ .valid_mask = MLOCK_ONFAULT,
+ .constraints = "Only MLOCK_ONFAULT flag is currently supported",
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid flags", "Unknown flags were specified (flags & ~MLOCK_ONFAULT).")
+ KAPI_ERROR(1, -ENOMEM, "ENOMEM", "Address range issue", "Some of the specified range is not mapped, has unmapped gaps, or the lock would cause the number of mapped regions to exceed the limit.")
+ KAPI_ERROR(2, -EPERM, "EPERM", "Insufficient privileges", "The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.")
+ KAPI_ERROR(3, -EAGAIN, "EAGAIN", "Some or all memory could not be locked", "Some or all of the specified address range could not be locked.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a fatal signal before completion.")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGBUS, "SIGBUS", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Memory access to locked page fails")
+ KAPI_SIGNAL_DESC("Can be generated if accessing a locked page that cannot be brought into memory (e.g., truncated file mapping)")
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "process memory",
+ "Locks pages into physical memory, preventing swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Pages become resident in RAM")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Increases process locked memory counter")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Counted against RLIMIT_MEMLOCK")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_ALLOC_MEMORY,
+ "page tables",
+ "May allocate and populate page table entries")
+ KAPI_EFFECT_CONDITION("Pages not already present")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "VMA flags",
+ "Sets VM_LOCKED and optionally VM_LOCKONFAULT on affected VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_FILESYSTEM,
+ "page fault behavior",
+ "With MLOCK_ONFAULT, changes how future page faults are handled")
+ KAPI_EFFECT_CONDITION("MLOCK_ONFAULT flag specified")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "memory pages",
+ "swappable", "locked in RAM",
+ "Pages become non-swappable and pinned in physical memory")
+ KAPI_STATE_TRANS_COND("Without MLOCK_ONFAULT")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "VMA flags",
+ "unlocked", "VM_LOCKED set",
+ "Virtual memory area marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "VMA flags",
+ "normal fault", "VM_LOCKONFAULT set",
+ "VMA marked to lock pages on future faults")
+ KAPI_STATE_TRANS_COND("MLOCK_ONFAULT flag specified")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "page residency",
+ "may be swapped", "resident in memory",
+ "Pages brought into RAM and kept there")
+ KAPI_STATE_TRANS_COND("Without MLOCK_ONFAULT")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "process statistics",
+ "normal memory accounting", "locked memory accounting",
+ "Memory counted against RLIMIT_MEMLOCK")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during lock operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving pages to unevictable list when locking pages")
+ KAPI_LOCK_END
+
+ .error_count = 5,
+ .param_count = 3,
+ .since_version = "4.4",
+ .signal_count = 2,
+ .side_effect_count = 5,
+ .state_trans_count = 5,
+ .lock_count = 2,
+ .examples = "mlock2(addr, len, 0); // Same as mlock()\n"
+ "mlock2(addr, len, MLOCK_ONFAULT); // Lock on fault",
+ .notes = "MLOCK_ONFAULT flag defers actual page locking until pages are accessed. "
+ "Memory locks do not stack. Locks are not inherited by child processes.",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
{
vm_flags_t vm_flags = VM_LOCKED;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 12/19] mm/mlock: add API specification for mlockall
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (10 preceding siblings ...)
2025-06-14 13:48 ` [RFC 11/19] mm/mlock: add API specification for mlock2 Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 13/19] mm/mlock: add API specification for munlock Sasha Levin
` (8 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the mlockall() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 144 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index af2ab78acc226..95ee707c5922f 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -997,6 +997,150 @@ static int apply_mlockall_flags(int flags)
return 0;
}
+
+DEFINE_KERNEL_API_SPEC(sys_mlockall)
+ KAPI_DESCRIPTION("Lock all process pages in memory")
+ KAPI_LONG_DESC("Locks all pages mapped into the process address space. "
+ "MCL_CURRENT locks current pages, MCL_FUTURE locks future mappings, "
+ "MCL_ONFAULT defers locking until page fault.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "flags", "int", "Flags controlling which pages to lock")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_MASK,
+ .valid_mask = MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT,
+ .constraints = "Must specify MCL_CURRENT and/or MCL_FUTURE; MCL_ONFAULT can be OR'd",
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid flags", "Invalid combination of flags specified, or no flags set, or only MCL_ONFAULT without MCL_CURRENT or MCL_FUTURE.")
+ KAPI_ERROR(1, -EPERM, "EPERM", "Insufficient privileges", "The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.")
+ KAPI_ERROR(2, -ENOMEM, "ENOMEM", "Insufficient resources", "MCL_CURRENT is set and total VM size exceeds RLIMIT_MEMLOCK and caller lacks CAP_IPC_LOCK.")
+ KAPI_ERROR(3, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a signal before completion.")
+ KAPI_ERROR(4, -EAGAIN, "EAGAIN", "Some memory could not be locked", "Some pages could not be locked, possibly due to memory pressure.")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGBUS, "SIGBUS", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Memory access to locked page fails")
+ KAPI_SIGNAL_DESC("Can be generated later if accessing a locked page that cannot be brought into memory (e.g., truncated file mapping)")
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "all process memory",
+ "Locks all current pages into physical memory, preventing swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_CURRENT flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->def_flags",
+ "Sets VM_LOCKED in default flags for future mappings")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_FUTURE flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Increases process locked memory counter for entire address space")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_CURRENT flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_ALLOC_MEMORY,
+ "page tables",
+ "May allocate and populate page table entries for all mappings")
+ KAPI_EFFECT_CONDITION("MCL_CURRENT without MCL_ONFAULT")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "VMA flags",
+ "Sets VM_LOCKED on all existing VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_CURRENT flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_SCHEDULE,
+ "mm_populate",
+ "Triggers population of entire address space")
+ KAPI_EFFECT_CONDITION("MCL_CURRENT without MCL_ONFAULT")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "all memory pages",
+ "swappable", "locked in RAM",
+ "All pages in process become non-swappable")
+ KAPI_STATE_TRANS_COND("MCL_CURRENT flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "future mappings",
+ "normal", "auto-locked",
+ "New mappings will be automatically locked")
+ KAPI_STATE_TRANS_COND("MCL_FUTURE flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "VMA flags",
+ "varied", "all VM_LOCKED",
+ "All virtual memory areas marked as locked")
+ KAPI_STATE_TRANS_COND("MCL_CURRENT flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "page fault behavior",
+ "normal faulting", "lock on fault",
+ "Pages locked when faulted in rather than immediately")
+ KAPI_STATE_TRANS_COND("MCL_ONFAULT flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "process statistics",
+ "partial locked memory", "all memory locked",
+ "Entire VM size counted against RLIMIT_MEMLOCK")
+ KAPI_STATE_TRANS_COND("MCL_CURRENT flag set")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during mlockall operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving pages to unevictable list for all locked pages")
+ KAPI_LOCK_END
+
+ .error_count = 5,
+ .param_count = 1,
+ .since_version = "2.0",
+ .signal_count = 2,
+ .side_effect_count = 6,
+ .state_trans_count = 5,
+ .lock_count = 2,
+ .examples = "mlockall(MCL_CURRENT); // Lock current mappings\n"
+ "mlockall(MCL_CURRENT | MCL_FUTURE); // Lock current and future\n"
+ "mlockall(MCL_CURRENT | MCL_ONFAULT); // Lock current on fault",
+ .notes = "Affects all current VMAs and optionally future mappings via mm->def_flags",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE1(mlockall, int, flags)
{
unsigned long lock_limit;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 13/19] mm/mlock: add API specification for munlock
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (11 preceding siblings ...)
2025-06-14 13:48 ` [RFC 12/19] mm/mlock: add API specification for mlockall Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 14/19] mm/mlock: add API specification for munlockall Sasha Levin
` (7 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the munlock() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 129 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 95ee707c5922f..ef691adc78ad7 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -929,6 +929,135 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
return do_mlock(start, len, vm_flags);
}
+
+DEFINE_KERNEL_API_SPEC(sys_munlock)
+ KAPI_DESCRIPTION("Unlock pages in memory")
+ KAPI_LONG_DESC("Unlocks pages in the specified address range, allowing them "
+ "to be paged out to swap if needed.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address of memory range to unlock")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_NONE,
+ .constraints = "Rounded down to page boundary",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "len", "size_t", "Length of memory range to unlock in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ KAPI_PARAM_RANGE(0, LONG_MAX)
+ .constraints = "Rounded up to page boundary",
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -ENOMEM, "ENOMEM", "Memory range not mapped", "(Linux 2.6.9 and later) Some of the specified address range does not correspond to mapped pages in the process address space.")
+ KAPI_ERROR(1, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a signal before completion.")
+ KAPI_ERROR(2, -EINVAL, "EINVAL", "Address overflow", "The result of the addition start+len was less than start (arithmetic overflow).")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "process memory",
+ "Unlocks pages, making them eligible for swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Pages were previously locked")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Decreases process locked memory counter")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Pages were counted in locked_vm")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "VMA flags",
+ "Clears VM_LOCKED and VM_LOCKONFAULT from affected VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "page flags",
+ "Clears PG_mlocked flag from unlocked pages")
+ KAPI_EFFECT_CONDITION("Pages had PG_mlocked set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "LRU lists",
+ "Moves pages from unevictable to appropriate LRU list")
+ KAPI_EFFECT_CONDITION("Pages were on unevictable list")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "memory pages",
+ "locked in RAM", "swappable",
+ "Pages become eligible for swap out")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "VMA flags",
+ "VM_LOCKED set", "VM_LOCKED cleared",
+ "Virtual memory areas no longer marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "page residency",
+ "guaranteed resident", "may be swapped",
+ "Pages can now be evicted under memory pressure")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "process statistics",
+ "locked memory accounted", "normal memory accounting",
+ "Memory no longer counted against RLIMIT_MEMLOCK")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "page LRU status",
+ "unevictable list", "active/inactive list",
+ "Pages moved to normal LRU lists for reclaim")
+ KAPI_STATE_TRANS_COND("Pages were mlocked")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during unlock operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving pages from unevictable to normal LRU lists")
+ KAPI_LOCK_END
+
+ .error_count = 3,
+ .param_count = 2,
+ .since_version = "2.0",
+ .signal_count = 1,
+ .side_effect_count = 5,
+ .state_trans_count = 5,
+ .lock_count = 2,
+ .examples = "munlock(addr, 4096); // Unlock one page\n"
+ "munlock(addr, len); // Unlock range of pages",
+ .notes = "No special permissions required to unlock memory",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 14/19] mm/mlock: add API specification for munlockall
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (12 preceding siblings ...)
2025-06-14 13:48 ` [RFC 13/19] mm/mlock: add API specification for munlock Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 15/19] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
` (6 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the munlockall() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 120 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index ef691adc78ad7..80f51e932aa95 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -1299,6 +1299,126 @@ SYSCALL_DEFINE1(mlockall, int, flags)
return ret;
}
+
+DEFINE_KERNEL_API_SPEC(sys_munlockall)
+ KAPI_DESCRIPTION("Unlock all process pages")
+ KAPI_LONG_DESC("Unlocks all pages mapped into the process address space and "
+ "clears the MCL_FUTURE flag if set.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* No parameters - this is a SYSCALL_DEFINE0 */
+ .param_count = 0,
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a signal before completion.")
+ KAPI_ERROR(1, -ENOMEM, "ENOMEM", "Memory operation failed", "Failed to modify memory mappings (should not normally occur).")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "all process memory",
+ "Unlocks all pages, making entire address space swappable")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Process had locked pages")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->def_flags",
+ "Clears VM_LOCKED from default flags for future mappings")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_FUTURE was previously set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Resets process locked memory counter to zero")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "all VMA flags",
+ "Clears VM_LOCKED and VM_LOCKONFAULT from all VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "page flags",
+ "Clears PG_mlocked flag from all locked pages")
+ KAPI_EFFECT_CONDITION("Pages had PG_mlocked set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_MODIFY_STATE,
+ "LRU lists",
+ "Moves all pages from unevictable to normal LRU lists")
+ KAPI_EFFECT_CONDITION("Pages were on unevictable list")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "all memory pages",
+ "locked in RAM", "swappable",
+ "All pages in process become eligible for swap out")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "future mappings",
+ "auto-locked", "normal",
+ "New mappings will no longer be automatically locked")
+ KAPI_STATE_TRANS_COND("MCL_FUTURE was set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "all VMA flags",
+ "VM_LOCKED set", "VM_LOCKED cleared",
+ "All virtual memory areas no longer marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "process statistics",
+ "all memory locked", "no memory locked",
+ "Entire locked memory accounting reset to zero")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "page LRU status",
+ "unevictable list", "active/inactive list",
+ "All pages moved to normal LRU lists for reclaim")
+ KAPI_STATE_TRANS_COND("Pages were mlocked")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during munlockall operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving all pages from unevictable to normal LRU lists")
+ KAPI_LOCK_END
+
+ .error_count = 2,
+ .since_version = "2.0",
+ .signal_count = 1,
+ .side_effect_count = 6,
+ .state_trans_count = 5,
+ .lock_count = 2,
+ .examples = "munlockall(); // Unlock all pages",
+ .notes = "Clears VM_LOCKED and VM_LOCKONFAULT from all VMAs and mm->def_flags",
+KAPI_END_SPEC;
+
SYSCALL_DEFINE0(munlockall)
{
int ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 15/19] kernel/api: add debugfs interface for kernel API specifications
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (13 preceding siblings ...)
2025-06-14 13:48 ` [RFC 14/19] mm/mlock: add API specification for munlockall Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 16/19] kernel/api: add IOCTL specification infrastructure Sasha Levin
` (5 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add a debugfs interface to expose kernel API specifications at runtime.
This allows tools and users to query the complete API specifications
through the debugfs filesystem.
The interface provides:
- /sys/kernel/debug/kapi/list - lists all available API specifications
- /sys/kernel/debug/kapi/specs/<name> - detailed info for each API
Each specification file includes:
- Function name, version, and descriptions
- Execution context requirements and flags
- Parameter details with types, flags, and constraints
- Return value specifications and success conditions
- Error codes with descriptions and conditions
- Locking requirements and constraints
- Signal handling specifications
- Examples, notes, and deprecation status
This enables runtime introspection of kernel APIs for documentation
tools, static analyzers, and debugging purposes.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/api/Kconfig | 20 +++
kernel/api/Makefile | 5 +-
kernel/api/kapi_debugfs.c | 340 ++++++++++++++++++++++++++++++++++++++
3 files changed, 364 insertions(+), 1 deletion(-)
create mode 100644 kernel/api/kapi_debugfs.c
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
index fde25ec70e134..d2754b21acc43 100644
--- a/kernel/api/Kconfig
+++ b/kernel/api/Kconfig
@@ -33,3 +33,23 @@ config KAPI_RUNTIME_CHECKS
development. The checks use WARN_ONCE to report violations.
If unsure, say N.
+
+config KAPI_SPEC_DEBUGFS
+ bool "Export kernel API specifications via debugfs"
+ depends on KAPI_SPEC
+ depends on DEBUG_FS
+ help
+ This option enables exporting kernel API specifications through
+ the debugfs filesystem. When enabled, specifications can be
+ accessed at /sys/kernel/debug/kapi/.
+
+ The debugfs interface provides:
+ - A list of all available API specifications
+ - Detailed information for each API including parameters,
+ return values, errors, locking requirements, and constraints
+ - Complete machine-readable representation of the specs
+
+ This is useful for documentation tools, static analyzers, and
+ runtime introspection of kernel APIs.
+
+ If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index 4120ded7e5cf1..07b8c007ec156 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -4,4 +4,7 @@
#
# Core API specification framework
-obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
\ No newline at end of file
+obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
+
+# Debugfs interface for kernel API specs
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
\ No newline at end of file
diff --git a/kernel/api/kapi_debugfs.c b/kernel/api/kapi_debugfs.c
new file mode 100644
index 0000000000000..bf65ea6a49205
--- /dev/null
+++ b/kernel/api/kapi_debugfs.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kernel API specification debugfs interface
+ *
+ * This provides a debugfs interface to expose kernel API specifications
+ * at runtime, allowing tools and users to query the complete API specs.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+/* External symbols for kernel API spec section */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+static struct dentry *kapi_debugfs_root;
+
+/* Helper function to print parameter type as string */
+static const char *param_type_str(enum kapi_param_type type)
+{
+ switch (type) {
+ case KAPI_TYPE_INT: return "int";
+ case KAPI_TYPE_UINT: return "uint";
+ case KAPI_TYPE_PTR: return "ptr";
+ case KAPI_TYPE_STRUCT: return "struct";
+ case KAPI_TYPE_UNION: return "union";
+ case KAPI_TYPE_ARRAY: return "array";
+ case KAPI_TYPE_FD: return "fd";
+ case KAPI_TYPE_ENUM: return "enum";
+ case KAPI_TYPE_USER_PTR: return "user_ptr";
+ case KAPI_TYPE_PATH: return "path";
+ case KAPI_TYPE_FUNC_PTR: return "func_ptr";
+ case KAPI_TYPE_CUSTOM: return "custom";
+ default: return "unknown";
+ }
+}
+
+/* Helper to print parameter flags */
+static void print_param_flags(struct seq_file *m, u32 flags)
+{
+ seq_printf(m, " flags: ");
+ if (flags & KAPI_PARAM_IN) seq_printf(m, "IN ");
+ if (flags & KAPI_PARAM_OUT) seq_printf(m, "OUT ");
+ if (flags & KAPI_PARAM_INOUT) seq_printf(m, "INOUT ");
+ if (flags & KAPI_PARAM_OPTIONAL) seq_printf(m, "OPTIONAL ");
+ if (flags & KAPI_PARAM_CONST) seq_printf(m, "CONST ");
+ if (flags & KAPI_PARAM_USER) seq_printf(m, "USER ");
+ if (flags & KAPI_PARAM_VOLATILE) seq_printf(m, "VOLATILE ");
+ if (flags & KAPI_PARAM_DMA) seq_printf(m, "DMA ");
+ if (flags & KAPI_PARAM_ALIGNED) seq_printf(m, "ALIGNED ");
+ seq_printf(m, "\n");
+}
+
+/* Helper to print context flags */
+static void print_context_flags(struct seq_file *m, u32 flags)
+{
+ seq_printf(m, "Context flags: ");
+ if (flags & KAPI_CTX_PROCESS) seq_printf(m, "PROCESS ");
+ if (flags & KAPI_CTX_HARDIRQ) seq_printf(m, "HARDIRQ ");
+ if (flags & KAPI_CTX_SOFTIRQ) seq_printf(m, "SOFTIRQ ");
+ if (flags & KAPI_CTX_NMI) seq_printf(m, "NMI ");
+ if (flags & KAPI_CTX_SLEEPABLE) seq_printf(m, "SLEEPABLE ");
+ if (flags & KAPI_CTX_ATOMIC) seq_printf(m, "ATOMIC ");
+ if (flags & KAPI_CTX_PREEMPT_DISABLED) seq_printf(m, "PREEMPT_DISABLED ");
+ if (flags & KAPI_CTX_IRQ_DISABLED) seq_printf(m, "IRQ_DISABLED ");
+ seq_printf(m, "\n");
+}
+
+/* Show function for individual API spec */
+static int kapi_spec_show(struct seq_file *m, void *v)
+{
+ struct kernel_api_spec *spec = m->private;
+ int i;
+
+ seq_printf(m, "Kernel API Specification\n");
+ seq_printf(m, "========================\n\n");
+
+ /* Basic info */
+ seq_printf(m, "Name: %s\n", spec->name);
+ seq_printf(m, "Version: %u\n", spec->version);
+ seq_printf(m, "Description: %s\n", spec->description);
+ if (strlen(spec->long_description) > 0)
+ seq_printf(m, "Long description: %s\n", spec->long_description);
+
+ /* Context */
+ print_context_flags(m, spec->context_flags);
+ seq_printf(m, "\n");
+
+ /* Parameters */
+ if (spec->param_count > 0) {
+ seq_printf(m, "Parameters (%u):\n", spec->param_count);
+ for (i = 0; i < spec->param_count; i++) {
+ struct kapi_param_spec *param = &spec->params[i];
+ seq_printf(m, " [%d] %s:\n", i, param->name);
+ seq_printf(m, " type: %s (%s)\n",
+ param_type_str(param->type), param->type_name);
+ print_param_flags(m, param->flags);
+ if (strlen(param->description) > 0)
+ seq_printf(m, " description: %s\n", param->description);
+ if (param->size > 0)
+ seq_printf(m, " size: %zu\n", param->size);
+ if (param->alignment > 0)
+ seq_printf(m, " alignment: %zu\n", param->alignment);
+
+ /* Print constraints if any */
+ if (param->constraint_type != KAPI_CONSTRAINT_NONE) {
+ seq_printf(m, " constraints:\n");
+ switch (param->constraint_type) {
+ case KAPI_CONSTRAINT_RANGE:
+ seq_printf(m, " type: range\n");
+ seq_printf(m, " min: %lld\n", param->min_value);
+ seq_printf(m, " max: %lld\n", param->max_value);
+ break;
+ case KAPI_CONSTRAINT_MASK:
+ seq_printf(m, " type: mask\n");
+ seq_printf(m, " valid_bits: 0x%llx\n", param->valid_mask);
+ break;
+ case KAPI_CONSTRAINT_ENUM:
+ seq_printf(m, " type: enum\n");
+ seq_printf(m, " count: %u\n", param->enum_count);
+ break;
+ case KAPI_CONSTRAINT_CUSTOM:
+ seq_printf(m, " type: custom\n");
+ if (strlen(param->constraints) > 0)
+ seq_printf(m, " description: %s\n",
+ param->constraints);
+ break;
+ default:
+ break;
+ }
+ }
+ seq_printf(m, "\n");
+ }
+ }
+
+ /* Return value */
+ seq_printf(m, "Return value:\n");
+ seq_printf(m, " type: %s\n", spec->return_spec.type_name);
+ if (strlen(spec->return_spec.description) > 0)
+ seq_printf(m, " description: %s\n", spec->return_spec.description);
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ seq_printf(m, " success: == %lld\n", spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ seq_printf(m, " success: [%lld, %lld]\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_FD:
+ seq_printf(m, " success: valid file descriptor (>= 0)\n");
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ seq_printf(m, " success: error check\n");
+ break;
+ case KAPI_RETURN_CUSTOM:
+ seq_printf(m, " success: custom check\n");
+ break;
+ default:
+ break;
+ }
+ seq_printf(m, "\n");
+
+ /* Errors */
+ if (spec->error_count > 0) {
+ seq_printf(m, "Errors (%u):\n", spec->error_count);
+ for (i = 0; i < spec->error_count; i++) {
+ struct kapi_error_spec *err = &spec->errors[i];
+ seq_printf(m, " %s (%d): %s\n",
+ err->name, err->error_code, err->description);
+ if (strlen(err->condition) > 0)
+ seq_printf(m, " condition: %s\n", err->condition);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Locks */
+ if (spec->lock_count > 0) {
+ seq_printf(m, "Locks (%u):\n", spec->lock_count);
+ for (i = 0; i < spec->lock_count; i++) {
+ struct kapi_lock_spec *lock = &spec->locks[i];
+ const char *type_str;
+ switch (lock->lock_type) {
+ case KAPI_LOCK_MUTEX: type_str = "mutex"; break;
+ case KAPI_LOCK_SPINLOCK: type_str = "spinlock"; break;
+ case KAPI_LOCK_RWLOCK: type_str = "rwlock"; break;
+ case KAPI_LOCK_SEMAPHORE: type_str = "semaphore"; break;
+ case KAPI_LOCK_RCU: type_str = "rcu"; break;
+ case KAPI_LOCK_SEQLOCK: type_str = "seqlock"; break;
+ default: type_str = "unknown"; break;
+ }
+ seq_printf(m, " %s (%s): %s\n",
+ lock->lock_name, type_str, lock->description);
+ if (lock->acquired)
+ seq_printf(m, " acquired by function\n");
+ if (lock->released)
+ seq_printf(m, " released by function\n");
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Constraints */
+ if (spec->constraint_count > 0) {
+ seq_printf(m, "Additional constraints (%u):\n", spec->constraint_count);
+ for (i = 0; i < spec->constraint_count; i++) {
+ seq_printf(m, " - %s\n", spec->constraints[i].description);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Signals */
+ if (spec->signal_count > 0) {
+ seq_printf(m, "Signal handling (%u):\n", spec->signal_count);
+ for (i = 0; i < spec->signal_count; i++) {
+ struct kapi_signal_spec *sig = &spec->signals[i];
+ seq_printf(m, " %s (%d):\n", sig->signal_name, sig->signal_num);
+ seq_printf(m, " direction: ");
+ if (sig->direction & KAPI_SIGNAL_SEND) seq_printf(m, "send ");
+ if (sig->direction & KAPI_SIGNAL_RECEIVE) seq_printf(m, "receive ");
+ if (sig->direction & KAPI_SIGNAL_HANDLE) seq_printf(m, "handle ");
+ if (sig->direction & KAPI_SIGNAL_BLOCK) seq_printf(m, "block ");
+ if (sig->direction & KAPI_SIGNAL_IGNORE) seq_printf(m, "ignore ");
+ seq_printf(m, "\n");
+ seq_printf(m, " action: ");
+ switch (sig->action) {
+ case KAPI_SIGNAL_ACTION_DEFAULT: seq_printf(m, "default"); break;
+ case KAPI_SIGNAL_ACTION_TERMINATE: seq_printf(m, "terminate"); break;
+ case KAPI_SIGNAL_ACTION_COREDUMP: seq_printf(m, "coredump"); break;
+ case KAPI_SIGNAL_ACTION_STOP: seq_printf(m, "stop"); break;
+ case KAPI_SIGNAL_ACTION_CONTINUE: seq_printf(m, "continue"); break;
+ case KAPI_SIGNAL_ACTION_CUSTOM: seq_printf(m, "custom"); break;
+ case KAPI_SIGNAL_ACTION_RETURN: seq_printf(m, "return"); break;
+ case KAPI_SIGNAL_ACTION_RESTART: seq_printf(m, "restart"); break;
+ default: seq_printf(m, "unknown"); break;
+ }
+ seq_printf(m, "\n");
+ if (strlen(sig->description) > 0)
+ seq_printf(m, " description: %s\n", sig->description);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Additional info */
+ if (strlen(spec->examples) > 0) {
+ seq_printf(m, "Examples:\n%s\n\n", spec->examples);
+ }
+ if (strlen(spec->notes) > 0) {
+ seq_printf(m, "Notes:\n%s\n\n", spec->notes);
+ }
+ if (strlen(spec->since_version) > 0) {
+ seq_printf(m, "Since: %s\n", spec->since_version);
+ }
+ if (spec->deprecated) {
+ seq_printf(m, "DEPRECATED");
+ if (strlen(spec->replacement) > 0)
+ seq_printf(m, " - use %s instead", spec->replacement);
+ seq_printf(m, "\n");
+ }
+
+ return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops = {
+ .open = kapi_spec_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+/* Show all available API specs */
+static int kapi_list_show(struct seq_file *m, void *v)
+{
+ struct kernel_api_spec *spec;
+ int count = 0;
+
+ seq_printf(m, "Available Kernel API Specifications\n");
+ seq_printf(m, "===================================\n\n");
+
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ seq_printf(m, "%s - %s\n", spec->name, spec->description);
+ count++;
+ }
+
+ seq_printf(m, "\nTotal: %d specifications\n", count);
+ return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops = {
+ .open = kapi_list_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+ struct kernel_api_spec *spec;
+ struct dentry *spec_dir;
+
+ /* Create main directory */
+ kapi_debugfs_root = debugfs_create_dir("kapi", NULL);
+
+ /* Create list file */
+ debugfs_create_file("list", 0444, kapi_debugfs_root, NULL, &kapi_list_fops);
+
+ /* Create specs subdirectory */
+ spec_dir = debugfs_create_dir("specs", kapi_debugfs_root);
+
+ /* Create a file for each API spec */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ debugfs_create_file(spec->name, 0444, spec_dir, spec, &kapi_spec_fops);
+ }
+
+ pr_info("Kernel API debugfs interface initialized\n");
+ return 0;
+}
+
+static void __exit kapi_debugfs_exit(void)
+{
+ debugfs_remove_recursive(kapi_debugfs_root);
+}
+
+/* Initialize as part of kernel, not as a module */
+fs_initcall(kapi_debugfs_init);
\ No newline at end of file
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 16/19] kernel/api: add IOCTL specification infrastructure
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (14 preceding siblings ...)
2025-06-14 13:48 ` [RFC 15/19] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 17/19] fwctl: add detailed IOCTL API specifications Sasha Levin
` (4 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add IOCTL API specification support to the kernel API specification
framework. This enables detailed documentation and runtime validation of
IOCTL interfaces.
Key features:
- IOCTL specification structure with command info and parameter details
- Registration/unregistration functions for IOCTL specs
- Helper macros for defining IOCTL specifications
- KAPI_IOCTL_SPEC_DRIVER macro for simplified driver integration
- Runtime validation support with KAPI_DEFINE_FOPS wrapper
- Validation of IOCTL parameters and return values
- Integration with existing kernel API spec infrastructure
The validation framework checks:
- Parameter constraints (ranges, enums, masks)
- User pointer validity
- Buffer size constraints
- Return value correctness against specification
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/ioctl_api_spec.h | 540 ++++++++++++++++++++++++++++++++
include/linux/kernel_api_spec.h | 2 +-
kernel/api/Makefile | 5 +-
kernel/api/ioctl_validation.c | 360 +++++++++++++++++++++
kernel/api/kernel_api_spec.c | 90 +++++-
5 files changed, 994 insertions(+), 3 deletions(-)
create mode 100644 include/linux/ioctl_api_spec.h
create mode 100644 kernel/api/ioctl_validation.c
diff --git a/include/linux/ioctl_api_spec.h b/include/linux/ioctl_api_spec.h
new file mode 100644
index 0000000000000..ab3337449ad77
--- /dev/null
+++ b/include/linux/ioctl_api_spec.h
@@ -0,0 +1,540 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ioctl_api_spec.h - IOCTL API specification framework
+ *
+ * Extends the kernel API specification framework to support ioctl validation
+ * and documentation.
+ */
+
+#ifndef _LINUX_IOCTL_API_SPEC_H
+#define _LINUX_IOCTL_API_SPEC_H
+
+#include <linux/kernel_api_spec.h>
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* Forward declarations */
+struct file;
+
+/**
+ * struct kapi_ioctl_spec - IOCTL-specific API specification
+ * @api_spec: Base API specification
+ * @cmd: IOCTL command number
+ * @cmd_name: Human-readable command name
+ * @input_size: Size of input structure (0 if none)
+ * @output_size: Size of output structure (0 if none)
+ * @file_ops_name: Name of the file_operations structure
+ */
+struct kapi_ioctl_spec {
+ struct kernel_api_spec api_spec;
+ unsigned int cmd;
+ const char *cmd_name;
+ size_t input_size;
+ size_t output_size;
+ const char *file_ops_name;
+};
+
+/* Registry functions for IOCTL specifications */
+#ifdef CONFIG_KAPI_SPEC
+int kapi_register_ioctl_spec(const struct kapi_ioctl_spec *spec);
+void kapi_unregister_ioctl_spec(unsigned int cmd);
+const struct kapi_ioctl_spec *kapi_get_ioctl_spec(unsigned int cmd);
+
+/* IOCTL validation functions */
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+int kapi_validate_ioctl(struct file *filp, unsigned int cmd, void __user *arg);
+int kapi_validate_ioctl_struct(const struct kapi_ioctl_spec *spec,
+ const void *data, size_t size);
+#else
+static inline int kapi_validate_ioctl(struct file *filp, unsigned int cmd,
+ void __user *arg)
+{
+ return 0;
+}
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+
+#else /* !CONFIG_KAPI_SPEC */
+static inline int kapi_register_ioctl_spec(const struct kapi_ioctl_spec *spec)
+{
+ return 0;
+}
+static inline void kapi_unregister_ioctl_spec(unsigned int cmd) {}
+static inline const struct kapi_ioctl_spec *kapi_get_ioctl_spec(unsigned int cmd)
+{
+ return NULL;
+}
+#endif /* CONFIG_KAPI_SPEC */
+
+/* Helper macros for IOCTL specification */
+
+/**
+ * DEFINE_IOCTL_API_SPEC - Start an IOCTL API specification
+ * @name: Unique identifier for the specification
+ * @cmd: IOCTL command number
+ * @cmd_name_str: String name of the command
+ */
+#define DEFINE_IOCTL_API_SPEC(name, cmd, cmd_name_str) \
+static const struct kapi_ioctl_spec name##_spec = { \
+ .cmd = cmd, \
+ .cmd_name = cmd_name_str, \
+ .api_spec = { \
+ .name = #name,
+
+/**
+ * KAPI_IOCTL_SIZE - Specify input/output structure sizes
+ * @in_size: Size of input structure
+ * @out_size: Size of output structure
+ */
+#define KAPI_IOCTL_SIZE(in_size, out_size) \
+ }, \
+ .input_size = in_size, \
+ .output_size = out_size,
+
+/**
+ * KAPI_IOCTL_FILE_OPS - Specify the file_operations structure name
+ * @ops_name: Name of the file_operations structure
+ */
+#define KAPI_IOCTL_FILE_OPS(ops_name) \
+ .file_ops_name = #ops_name,
+
+/**
+ * Common IOCTL parameter specifications
+ */
+#define KAPI_IOCTL_PARAM_SIZE \
+ KAPI_PARAM(0, "size", "__u32", "Size of the structure") \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_UINT, \
+ .constraint_type = KAPI_CONSTRAINT_CUSTOM, \
+ .constraints = "Must match sizeof(struct)", \
+ KAPI_PARAM_END
+
+#define KAPI_IOCTL_PARAM_FLAGS \
+ KAPI_PARAM(1, "flags", "__u32", "Feature flags") \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_UINT, \
+ .constraint_type = KAPI_CONSTRAINT_MASK, \
+ .valid_mask = 0, /* 0 means no flags currently */ \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_IOCTL_PARAM_USER_BUF - User buffer parameter
+ * @idx: Parameter index
+ * @name: Parameter name
+ * @desc: Parameter description
+ * @len_idx: Index of the length parameter
+ */
+#define KAPI_IOCTL_PARAM_USER_BUF(idx, name, desc, len_idx) \
+ KAPI_PARAM(idx, name, "__aligned_u64", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER_PTR) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size_param_idx = len_idx, \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_IOCTL_PARAM_USER_OUT_BUF - User output buffer parameter
+ * @idx: Parameter index
+ * @name: Parameter name
+ * @desc: Parameter description
+ * @len_idx: Index of the length parameter
+ */
+#define KAPI_IOCTL_PARAM_USER_OUT_BUF(idx, name, desc, len_idx) \
+ KAPI_PARAM(idx, name, "__aligned_u64", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER_PTR) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size_param_idx = len_idx, \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_IOCTL_PARAM_LEN - Buffer length parameter
+ * @idx: Parameter index
+ * @name: Parameter name
+ * @desc: Parameter description
+ * @max_size: Maximum allowed size
+ */
+#define KAPI_IOCTL_PARAM_LEN(idx, name, desc, max_size) \
+ KAPI_PARAM(idx, name, "__u32", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT) \
+ .type = KAPI_TYPE_UINT, \
+ .constraint_type = KAPI_CONSTRAINT_RANGE, \
+ .min_value = 0, \
+ .max_value = max_size, \
+ KAPI_PARAM_END
+
+/* End the IOCTL specification */
+#define KAPI_IOCTL_END_SPEC \
+}; \
+ \
+static int __init name##_spec_init(void) \
+{ \
+ return kapi_register_ioctl_spec(&name##_spec); \
+} \
+ \
+static void __exit name##_spec_exit(void) \
+{ \
+ kapi_unregister_ioctl_spec(name##_spec.cmd); \
+} \
+ \
+module_init(name##_spec_init); \
+module_exit(name##_spec_exit);
+
+/* Inline IOCTL specification support */
+
+/* Forward declaration */
+struct fwctl_ucmd;
+
+/**
+ * struct kapi_ioctl_handler - IOCTL handler with inline specification
+ * @spec: IOCTL specification
+ * @handler: Original IOCTL handler function
+ */
+struct kapi_ioctl_handler {
+ struct kapi_ioctl_spec spec;
+ int (*handler)(struct fwctl_ucmd *ucmd);
+};
+
+/**
+ * DEFINE_IOCTL_HANDLER - Define an IOCTL handler with inline specification
+ * @name: Handler name
+ * @cmd: IOCTL command number
+ * @handler_func: Handler function
+ * @struct_type: Structure type for this IOCTL
+ * @last_field: Last field in the structure
+ */
+#define DEFINE_IOCTL_HANDLER(name, cmd, handler_func, struct_type, last_field) \
+static const struct kapi_ioctl_handler name = { \
+ .spec = { \
+ .cmd = cmd, \
+ .cmd_name = #cmd, \
+ .input_size = sizeof(struct_type), \
+ .output_size = sizeof(struct_type), \
+ .api_spec = { \
+ .name = #name,
+
+#define KAPI_IOCTL_HANDLER_END \
+ }, \
+ }, \
+ .handler = handler_func, \
+}
+
+/**
+ * kapi_ioctl_wrapper - Wrapper function for transparent IOCTL validation
+ * @filp: File pointer
+ * @cmd: IOCTL command
+ * @arg: User argument
+ * @real_ioctl: The real ioctl handler
+ *
+ * This wrapper performs validation before and after the actual IOCTL call
+ */
+static inline long kapi_ioctl_wrapper(struct file *filp, unsigned int cmd,
+ unsigned long arg,
+ long (*real_ioctl)(struct file *, unsigned int, unsigned long))
+{
+ long ret;
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+ /* Pre-validation */
+ ret = kapi_validate_ioctl(filp, cmd, (void __user *)arg);
+ if (ret)
+ return ret;
+#endif
+
+ /* Call the real IOCTL handler */
+ ret = real_ioctl(filp, cmd, arg);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+ /* Post-validation could be added here if needed */
+ /* For example, validating output parameters */
+#endif
+
+ return ret;
+}
+
+/**
+ * KAPI_IOCTL_OPS - Define file_operations with transparent validation
+ * @name: Name of the file_operations structure
+ * @real_ioctl: The real ioctl handler function
+ * @... : Other file operation handlers
+ */
+#define KAPI_IOCTL_OPS(name, real_ioctl, ...) \
+static long name##_validated_ioctl(struct file *filp, unsigned int cmd, \
+ unsigned long arg) \
+{ \
+ return kapi_ioctl_wrapper(filp, cmd, arg, real_ioctl); \
+} \
+ \
+static const struct file_operations name = { \
+ .unlocked_ioctl = name##_validated_ioctl, \
+ __VA_ARGS__ \
+}
+
+/**
+ * KAPI_IOCTL_OP_ENTRY - Define an IOCTL operation table entry with spec
+ * @_ioctl: IOCTL command macro
+ * @_handler: Handler structure (defined with DEFINE_IOCTL_HANDLER)
+ * @_struct: Structure type
+ * @_last: Last field name
+ */
+#define KAPI_IOCTL_OP_ENTRY(_ioctl, _handler, _struct, _last) \
+ [_IOC_NR(_ioctl) - FWCTL_CMD_BASE] = { \
+ .size = sizeof(_struct) + \
+ BUILD_BUG_ON_ZERO(sizeof(union fwctl_ucmd_buffer) < \
+ sizeof(_struct)), \
+ .min_size = offsetofend(_struct, _last), \
+ .ioctl_num = _ioctl, \
+ .execute = _handler.handler, \
+ }
+
+/* Helper to register all handlers in a module */
+#define KAPI_REGISTER_IOCTL_HANDLERS(handlers, count) \
+static int __init kapi_ioctl_handlers_init(void) \
+{ \
+ int i, ret; \
+ for (i = 0; i < count; i++) { \
+ ret = kapi_register_ioctl_spec(&handlers[i].spec); \
+ if (ret) { \
+ while (--i >= 0) \
+ kapi_unregister_ioctl_spec(handlers[i].spec.cmd); \
+ return ret; \
+ } \
+ } \
+ return 0; \
+} \
+ \
+static void __exit kapi_ioctl_handlers_exit(void) \
+{ \
+ int i; \
+ for (i = 0; i < count; i++) \
+ kapi_unregister_ioctl_spec(handlers[i].spec.cmd); \
+} \
+ \
+module_init(kapi_ioctl_handlers_init); \
+module_exit(kapi_ioctl_handlers_exit)
+
+/**
+ * KAPI_REGISTER_IOCTL_SPECS - Register an array of IOCTL specifications
+ * @specs: Array of pointers to kapi_ioctl_spec
+ * @count: Number of specifications
+ *
+ * This macro generates init/exit functions to register/unregister
+ * the IOCTL specifications. The functions return 0 on success or
+ * negative error code on failure.
+ *
+ * Usage:
+ * static const struct kapi_ioctl_spec *my_ioctl_specs[] = {
+ * &spec1, &spec2, &spec3,
+ * };
+ * KAPI_REGISTER_IOCTL_SPECS(my_ioctl_specs, ARRAY_SIZE(my_ioctl_specs))
+ *
+ * Then call the generated functions in your module init/exit:
+ * ret = kapi_register_##name();
+ * kapi_unregister_##name();
+ */
+#define KAPI_REGISTER_IOCTL_SPECS(name, specs) \
+static int kapi_register_##name(void) \
+{ \
+ int i, ret; \
+ for (i = 0; i < ARRAY_SIZE(specs); i++) { \
+ ret = kapi_register_ioctl_spec(specs[i]); \
+ if (ret) { \
+ pr_warn("Failed to register IOCTL spec for %s: %d\n", \
+ specs[i]->cmd_name, ret); \
+ while (--i >= 0) \
+ kapi_unregister_ioctl_spec(specs[i]->cmd); \
+ return ret; \
+ } \
+ } \
+ pr_info("Registered %zu IOCTL specifications\n", \
+ ARRAY_SIZE(specs)); \
+ return 0; \
+} \
+ \
+static void kapi_unregister_##name(void) \
+{ \
+ int i; \
+ for (i = 0; i < ARRAY_SIZE(specs); i++) \
+ kapi_unregister_ioctl_spec(specs[i]->cmd); \
+}
+
+/**
+ * KAPI_DEFINE_IOCTL_SPEC - Define a single IOCTL specification
+ * @name: Name of the specification variable
+ * @cmd: IOCTL command number
+ * @cmd_name: String name of the command
+ * @in_size: Input structure size
+ * @out_size: Output structure size
+ * @fops_name: Name of the file_operations structure
+ *
+ * This macro starts the definition of an IOCTL specification.
+ * It must be followed by the API specification details and
+ * ended with KAPI_END_IOCTL_SPEC.
+ *
+ * Example:
+ * KAPI_DEFINE_IOCTL_SPEC(my_ioctl_spec, MY_IOCTL, "MY_IOCTL",
+ * sizeof(struct my_input), sizeof(struct my_output),
+ * "my_fops")
+ * KAPI_DESCRIPTION("Description here")
+ * ...
+ * KAPI_END_IOCTL_SPEC;
+ */
+#define KAPI_DEFINE_IOCTL_SPEC(name, cmd, cmd_name_str, in_size, out_size, fops) \
+static const struct kapi_ioctl_spec name = { \
+ .cmd = (cmd), \
+ .cmd_name = cmd_name_str, \
+ .input_size = in_size, \
+ .output_size = out_size, \
+ .file_ops_name = fops, \
+ .api_spec = { \
+ .name = #name,
+
+#define KAPI_END_IOCTL_SPEC \
+ }, \
+}
+
+/**
+ * KAPI_IOCTL_SPEC_DRIVER - Complete IOCTL specification for a driver
+ * @driver_name: Name of the driver (used for logging)
+ * @specs_array: Name of the array containing IOCTL spec pointers
+ *
+ * This macro provides everything needed for IOCTL spec registration:
+ * 1. Generates the specs array declaration
+ * 2. Creates init/exit functions for registration
+ * 3. Provides simple function names to call from module init/exit
+ *
+ * Usage:
+ * // Define individual specs
+ * KAPI_DEFINE_IOCTL_SPEC(spec1, ...) ... KAPI_END_IOCTL_SPEC;
+ * KAPI_DEFINE_IOCTL_SPEC(spec2, ...) ... KAPI_END_IOCTL_SPEC;
+ *
+ * // Create the driver registration (at end of file)
+ * KAPI_IOCTL_SPEC_DRIVER("my_driver", {
+ * &spec1,
+ * &spec2,
+ * })
+ *
+ * // In module init: ret = kapi_ioctl_specs_init();
+ * // In module exit: kapi_ioctl_specs_exit();
+ */
+#define KAPI_IOCTL_SPEC_DRIVER(driver_name, ...) \
+static const struct kapi_ioctl_spec *__kapi_ioctl_specs[] = __VA_ARGS__; \
+ \
+static int __init kapi_ioctl_specs_init(void) \
+{ \
+ int i, ret; \
+ for (i = 0; i < ARRAY_SIZE(__kapi_ioctl_specs); i++) { \
+ ret = kapi_register_ioctl_spec(__kapi_ioctl_specs[i]); \
+ if (ret) { \
+ pr_warn("%s: Failed to register %s: %d\n", \
+ driver_name, \
+ __kapi_ioctl_specs[i]->cmd_name, ret); \
+ while (--i >= 0) \
+ kapi_unregister_ioctl_spec( \
+ __kapi_ioctl_specs[i]->cmd); \
+ return ret; \
+ } \
+ } \
+ pr_info("%s: Registered %zu IOCTL specifications\n", \
+ driver_name, ARRAY_SIZE(__kapi_ioctl_specs)); \
+ return 0; \
+} \
+ \
+static void kapi_ioctl_specs_exit(void) \
+{ \
+ int i; \
+ for (i = 0; i < ARRAY_SIZE(__kapi_ioctl_specs); i++) \
+ kapi_unregister_ioctl_spec(__kapi_ioctl_specs[i]->cmd);\
+}
+
+/* Transparent IOCTL validation wrapper support */
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * struct kapi_fops_wrapper - Wrapper for file_operations with validation
+ * @real_fops: Original file_operations
+ * @wrapped_fops: Modified file_operations with validation wrapper
+ * @real_ioctl: Original unlocked_ioctl handler
+ */
+struct kapi_fops_wrapper {
+ const struct file_operations *real_fops;
+ const struct file_operations *wrapped_fops;
+ long (*real_ioctl)(struct file *, unsigned int, unsigned long);
+};
+
+/* Forward declarations */
+long kapi_ioctl_validation_wrapper(struct file *filp, unsigned int cmd,
+ unsigned long arg);
+void kapi_register_wrapper(struct kapi_fops_wrapper *wrapper);
+
+/**
+ * kapi_wrap_file_operations - Wrap file_operations for transparent validation
+ * @fops: Original file_operations to wrap
+ *
+ * This creates a wrapper that intercepts ioctl calls for validation.
+ * The wrapper is stored in a static variable in the calling module.
+ */
+#define kapi_wrap_file_operations(fops) \
+({ \
+ static struct kapi_fops_wrapper __kapi_wrapper = { \
+ .real_fops = &(fops), \
+ }; \
+ if (__kapi_wrapper.real_fops->unlocked_ioctl) { \
+ __kapi_wrapper.wrapped_fops = (fops); \
+ __kapi_wrapper.real_ioctl = (fops).unlocked_ioctl; \
+ __kapi_wrapper.wrapped_fops.unlocked_ioctl = \
+ kapi_ioctl_validation_wrapper; \
+ &__kapi_wrapper.wrapped_fops; \
+ } else { \
+ &(fops); \
+ } \
+})
+
+
+/**
+ * KAPI_DEFINE_FOPS - Define file_operations with automatic validation
+ * @name: Name of the file_operations structure
+ * @... : File operation handlers
+ *
+ * Usage:
+ * KAPI_DEFINE_FOPS(my_fops,
+ * .owner = THIS_MODULE,
+ * .open = my_open,
+ * .unlocked_ioctl = my_ioctl,
+ * );
+ *
+ * Then in your module init, call: kapi_init_fops_##name()
+ */
+#define KAPI_DEFINE_FOPS(name, ...) \
+static const struct file_operations __kapi_real_##name = { \
+ __VA_ARGS__ \
+}; \
+static struct file_operations __kapi_wrapped_##name; \
+static struct kapi_fops_wrapper __kapi_wrapper_##name; \
+static const struct file_operations *name; \
+static void kapi_init_fops_##name(void) \
+{ \
+ if (__kapi_real_##name.unlocked_ioctl) { \
+ __kapi_wrapped_##name = __kapi_real_##name; \
+ __kapi_wrapper_##name.real_fops = &__kapi_real_##name; \
+ __kapi_wrapper_##name.wrapped_fops = &__kapi_wrapped_##name; \
+ __kapi_wrapper_##name.real_ioctl = \
+ __kapi_real_##name.unlocked_ioctl; \
+ __kapi_wrapped_##name.unlocked_ioctl = \
+ kapi_ioctl_validation_wrapper; \
+ kapi_register_wrapper(&__kapi_wrapper_##name); \
+ name = &__kapi_wrapped_##name; \
+ } else { \
+ name = &__kapi_real_##name; \
+ } \
+}
+
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
+
+/* When runtime checks are disabled, no wrapping occurs */
+#define kapi_wrap_file_operations(fops) (&(fops))
+#define KAPI_DEFINE_FOPS(name, ...) \
+static const struct file_operations name = { __VA_ARGS__ }; \
+static inline void kapi_init_fops_##name(void) {}
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+
+#endif /* _LINUX_IOCTL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
index 04df5892bc6d6..9590fe3bb007c 100644
--- a/include/linux/kernel_api_spec.h
+++ b/include/linux/kernel_api_spec.h
@@ -849,7 +849,7 @@ struct kernel_api_spec {
#define KAPI_PARAM_OUT (KAPI_PARAM_OUT)
#define KAPI_PARAM_INOUT (KAPI_PARAM_IN | KAPI_PARAM_OUT)
#define KAPI_PARAM_OPTIONAL (KAPI_PARAM_OPTIONAL)
-#define KAPI_PARAM_USER_PTR (KAPI_PARAM_USER | KAPI_PARAM_PTR)
+#define KAPI_PARAM_USER_PTR (KAPI_PARAM_USER)
/* Validation and runtime checking */
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index 07b8c007ec156..9d2daf38f0029 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -6,5 +6,8 @@
# Core API specification framework
obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
+# IOCTL validation framework
+obj-$(CONFIG_KAPI_SPEC) += ioctl_validation.o
+
# Debugfs interface for kernel API specs
-obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
\ No newline at end of file
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
diff --git a/kernel/api/ioctl_validation.c b/kernel/api/ioctl_validation.c
new file mode 100644
index 0000000000000..25f6db8cb33eb
--- /dev/null
+++ b/kernel/api/ioctl_validation.c
@@ -0,0 +1,360 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ioctl_validation.c - Runtime validation for IOCTL API specifications
+ *
+ * Provides functions to validate ioctl parameters against their specifications
+ * at runtime when CONFIG_KAPI_RUNTIME_CHECKS is enabled.
+ */
+
+#include <linux/kernel.h>
+#include <linux/ioctl_api_spec.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/uaccess.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/container_of.h>
+#include <linux/export.h>
+#include <uapi/fwctl/fwctl.h>
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_ioctl - Validate an ioctl call against its specification
+ * @filp: File pointer
+ * @cmd: IOCTL command
+ * @arg: IOCTL argument
+ *
+ * Return: 0 if valid, negative errno if validation fails
+ */
+int kapi_validate_ioctl(struct file *filp, unsigned int cmd, void __user *arg)
+{
+ const struct kapi_ioctl_spec *spec;
+ const struct kernel_api_spec *api_spec;
+ void *data = NULL;
+ size_t copy_size;
+ int ret = 0;
+ int i;
+
+ spec = kapi_get_ioctl_spec(cmd);
+ if (!spec)
+ return 0; /* No spec, can't validate */
+
+ api_spec = &spec->api_spec;
+
+ pr_debug("kapi: validating ioctl %s (0x%x)\n", spec->cmd_name, cmd);
+
+ /* Check if this ioctl requires specific capabilities */
+ if (api_spec->param_count > 0) {
+ for (i = 0; i < api_spec->param_count; i++) {
+ const struct kapi_param_spec *param = &api_spec->params[i];
+
+ /* Check for capability requirements in constraints */
+ if (param->constraint_type == KAPI_CONSTRAINT_CUSTOM &&
+ param->constraints[0] && strstr(param->constraints, "CAP_")) {
+ /* Could add capability checks here if needed */
+ }
+ }
+ }
+
+ /* For ioctls with input/output structures, copy and validate */
+ if (spec->input_size > 0 || spec->output_size > 0) {
+ copy_size = max(spec->input_size, spec->output_size);
+
+ /* Allocate temporary buffer for validation */
+ data = kzalloc(copy_size, GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ /* Copy input data from user */
+ if (spec->input_size > 0) {
+ ret = copy_from_user(data, arg, spec->input_size);
+ if (ret) {
+ ret = -EFAULT;
+ goto out;
+ }
+ }
+
+ /* Validate structure fields */
+ ret = kapi_validate_ioctl_struct(spec, data, copy_size);
+ if (ret)
+ goto out;
+ }
+
+out:
+ kfree(data);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_ioctl);
+
+/**
+ * struct field_offset - Maps structure fields to their offsets
+ * @field_idx: Parameter index
+ * @offset: Offset in structure
+ * @size: Size of field
+ */
+struct field_offset {
+ int field_idx;
+ size_t offset;
+ size_t size;
+};
+
+/* Common ioctl structure layouts */
+static const struct field_offset fwctl_info_offsets[] = {
+ {0, 0, sizeof(u32)}, /* size */
+ {1, 4, sizeof(u32)}, /* flags */
+ {2, 8, sizeof(u32)}, /* out_device_type */
+ {3, 12, sizeof(u32)}, /* device_data_len */
+ {4, 16, sizeof(u64)}, /* out_device_data */
+};
+
+static const struct field_offset fwctl_rpc_offsets[] = {
+ {0, 0, sizeof(u32)}, /* size */
+ {1, 4, sizeof(u32)}, /* scope */
+ {2, 8, sizeof(u32)}, /* in_len */
+ {3, 12, sizeof(u32)}, /* out_len */
+ {4, 16, sizeof(u64)}, /* in */
+ {5, 24, sizeof(u64)}, /* out */
+};
+
+/**
+ * get_field_offsets - Get field offset information for an ioctl
+ * @cmd: IOCTL command
+ * @count: Returns number of fields
+ *
+ * Return: Array of field offsets or NULL
+ */
+static const struct field_offset *get_field_offsets(unsigned int cmd, int *count)
+{
+ switch (cmd) {
+ case FWCTL_INFO:
+ *count = ARRAY_SIZE(fwctl_info_offsets);
+ return fwctl_info_offsets;
+ case FWCTL_RPC:
+ *count = ARRAY_SIZE(fwctl_rpc_offsets);
+ return fwctl_rpc_offsets;
+ default:
+ *count = 0;
+ return NULL;
+ }
+}
+
+/**
+ * extract_field_value - Extract a field value from structure
+ * @data: Structure data
+ * @param: Parameter specification
+ * @offset_info: Field offset information
+ *
+ * Return: Field value or 0 on error
+ */
+static s64 extract_field_value(const void *data,
+ const struct kapi_param_spec *param,
+ const struct field_offset *offset_info)
+{
+ const void *field = data + offset_info->offset;
+
+ switch (param->type) {
+ case KAPI_TYPE_UINT:
+ if (offset_info->size == sizeof(u32))
+ return *(u32 *)field;
+ else if (offset_info->size == sizeof(u64))
+ return *(u64 *)field;
+ break;
+ case KAPI_TYPE_INT:
+ if (offset_info->size == sizeof(s32))
+ return *(s32 *)field;
+ else if (offset_info->size == sizeof(s64))
+ return *(s64 *)field;
+ break;
+ case KAPI_TYPE_USER_PTR:
+ /* User pointers are typically u64 in ioctl structures */
+ return (s64)(*(u64 *)field);
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+/**
+ * kapi_validate_ioctl_struct - Validate an ioctl structure against specification
+ * @spec: IOCTL specification
+ * @data: Structure data
+ * @size: Size of the structure
+ *
+ * Return: 0 if valid, negative errno if validation fails
+ */
+int kapi_validate_ioctl_struct(const struct kapi_ioctl_spec *spec,
+ const void *data, size_t size)
+{
+ const struct kernel_api_spec *api_spec = &spec->api_spec;
+ const struct field_offset *offsets;
+ int offset_count;
+ int i, j;
+
+ if (!spec || !data)
+ return -EINVAL;
+
+ /* Get field offset information for this ioctl */
+ offsets = get_field_offsets(spec->cmd, &offset_count);
+
+ /* Validate each parameter in the structure */
+ for (i = 0; i < api_spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &api_spec->params[i];
+ const struct field_offset *offset_info = NULL;
+ s64 value;
+
+ /* Find offset information for this parameter */
+ if (offsets) {
+ for (j = 0; j < offset_count; j++) {
+ if (offsets[j].field_idx == i) {
+ offset_info = &offsets[j];
+ break;
+ }
+ }
+ }
+
+ if (!offset_info) {
+ pr_debug("kapi: no offset info for param %d\n", i);
+ continue;
+ }
+
+ /* Extract field value */
+ value = extract_field_value(data, param, offset_info);
+
+ /* Special handling for user pointers */
+ if (param->type == KAPI_TYPE_USER_PTR) {
+ /* Check if pointer looks valid (non-kernel address) */
+ if (value && (value >= TASK_SIZE)) {
+ pr_warn("ioctl %s: parameter %s has kernel pointer %llx\n",
+ spec->cmd_name, param->name, value);
+ return -EINVAL;
+ }
+
+ /* For size validation, check against size_param_idx */
+ if (param->size_param_idx >= 0 &&
+ param->size_param_idx < offset_count) {
+ const struct field_offset *size_offset = NULL;
+
+ for (j = 0; j < offset_count; j++) {
+ if (offsets[j].field_idx == param->size_param_idx) {
+ size_offset = &offsets[j];
+ break;
+ }
+ }
+
+ if (size_offset) {
+ s64 buf_size = extract_field_value(data,
+ &api_spec->params[param->size_param_idx],
+ size_offset);
+
+ /* Validate buffer size constraints */
+ if (buf_size > 0 &&
+ !kapi_validate_param(&api_spec->params[param->size_param_idx],
+ buf_size)) {
+ pr_warn("ioctl %s: buffer size %lld invalid for %s\n",
+ spec->cmd_name, buf_size, param->name);
+ return -EINVAL;
+ }
+ }
+ }
+ } else {
+ /* Validate using the standard parameter validation */
+ if (!kapi_validate_param(param, value)) {
+ pr_warn("ioctl %s: parameter %s validation failed (value=%lld)\n",
+ spec->cmd_name, param->name, value);
+ return -EINVAL;
+ }
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_ioctl_struct);
+
+/* Global registry of wrappers - in real implementation this would be per-module */
+static struct kapi_fops_wrapper *kapi_global_wrapper;
+
+/**
+ * kapi_register_wrapper - Register a wrapper (called from macro)
+ * @wrapper: Wrapper to register
+ */
+void kapi_register_wrapper(struct kapi_fops_wrapper *wrapper)
+{
+ /* Simple implementation - just store the last one */
+ kapi_global_wrapper = wrapper;
+}
+EXPORT_SYMBOL_GPL(kapi_register_wrapper);
+
+/**
+ * kapi_find_wrapper - Find wrapper for given file_operations
+ * @fops: File operations structure to check
+ *
+ * Return: Wrapper structure or NULL if not wrapped
+ */
+static struct kapi_fops_wrapper *kapi_find_wrapper(const struct file_operations *fops)
+{
+ /* Simple implementation - just return the global one if it matches */
+ if (kapi_global_wrapper && kapi_global_wrapper->wrapped_fops == fops)
+ return kapi_global_wrapper;
+ return NULL;
+}
+
+/**
+ * kapi_ioctl_validation_wrapper - Wrapper function for transparent validation
+ * @filp: File pointer
+ * @cmd: IOCTL command
+ * @arg: User argument
+ *
+ * This function is called instead of the real ioctl handler when validation
+ * is enabled. It performs pre-validation, calls the real handler, then does
+ * post-validation.
+ *
+ * Return: Result from the real ioctl handler or error
+ */
+long kapi_ioctl_validation_wrapper(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kapi_fops_wrapper *wrapper;
+ const struct kapi_ioctl_spec *spec;
+ long ret;
+
+ wrapper = kapi_find_wrapper(filp->f_op);
+ if (!wrapper || !wrapper->real_ioctl)
+ return -EINVAL;
+
+ /* Pre-validation */
+ spec = kapi_get_ioctl_spec(cmd);
+ if (spec) {
+ ret = kapi_validate_ioctl(filp, cmd, (void __user *)arg);
+ if (ret)
+ return ret;
+ }
+
+ /* Call the real ioctl handler */
+ ret = wrapper->real_ioctl(filp, cmd, arg);
+
+ /* Post-validation - check return value against spec */
+ if (spec && spec->api_spec.error_count > 0) {
+ /* Validate that returned error is in the spec */
+ if (ret < 0) {
+ int i;
+ bool found = false;
+ for (i = 0; i < spec->api_spec.error_count; i++) {
+ if (ret == spec->api_spec.errors[i].error_code) {
+ found = true;
+ break;
+ }
+ }
+ if (!found) {
+ pr_warn("IOCTL %s returned unexpected error %ld\n",
+ spec->cmd_name, ret);
+ }
+ }
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_ioctl_validation_wrapper);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
index 29c0c84d87f7c..70e16a49f5dbe 100644
--- a/kernel/api/kernel_api_spec.c
+++ b/kernel/api/kernel_api_spec.c
@@ -1166,4 +1166,92 @@ static int __init kapi_debugfs_init(void)
late_initcall(kapi_debugfs_init);
-#endif /* CONFIG_DEBUG_FS */
\ No newline at end of file
+#endif /* CONFIG_DEBUG_FS */
+
+/* IOCTL specification registry */
+#ifdef CONFIG_KAPI_SPEC
+
+#include <linux/ioctl_api_spec.h>
+
+static DEFINE_MUTEX(ioctl_spec_mutex);
+static LIST_HEAD(ioctl_specs);
+
+struct ioctl_spec_entry {
+ struct list_head list;
+ const struct kapi_ioctl_spec *spec;
+};
+
+/**
+ * kapi_register_ioctl_spec - Register an IOCTL API specification
+ * @spec: IOCTL specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_ioctl_spec(const struct kapi_ioctl_spec *spec)
+{
+ struct ioctl_spec_entry *entry;
+
+ if (!spec || !spec->cmd_name)
+ return -EINVAL;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->spec = spec;
+
+ mutex_lock(&ioctl_spec_mutex);
+ list_add_tail(&entry->list, &ioctl_specs);
+ mutex_unlock(&ioctl_spec_mutex);
+
+ pr_debug("Registered IOCTL spec: %s (0x%x)\n", spec->cmd_name, spec->cmd);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_register_ioctl_spec);
+
+/**
+ * kapi_unregister_ioctl_spec - Unregister an IOCTL API specification
+ * @cmd: IOCTL command number to unregister
+ */
+void kapi_unregister_ioctl_spec(unsigned int cmd)
+{
+ struct ioctl_spec_entry *entry, *tmp;
+
+ mutex_lock(&ioctl_spec_mutex);
+ list_for_each_entry_safe(entry, tmp, &ioctl_specs, list) {
+ if (entry->spec->cmd == cmd) {
+ list_del(&entry->list);
+ kfree(entry);
+ pr_debug("Unregistered IOCTL spec for cmd 0x%x\n", cmd);
+ break;
+ }
+ }
+ mutex_unlock(&ioctl_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_ioctl_spec);
+
+/**
+ * kapi_get_ioctl_spec - Retrieve IOCTL specification by command number
+ * @cmd: IOCTL command number
+ *
+ * Return: Pointer to the specification or NULL if not found
+ */
+const struct kapi_ioctl_spec *kapi_get_ioctl_spec(unsigned int cmd)
+{
+ struct ioctl_spec_entry *entry;
+ const struct kapi_ioctl_spec *spec = NULL;
+
+ mutex_lock(&ioctl_spec_mutex);
+ list_for_each_entry(entry, &ioctl_specs, list) {
+ if (entry->spec->cmd == cmd) {
+ spec = entry->spec;
+ break;
+ }
+ }
+ mutex_unlock(&ioctl_spec_mutex);
+
+ return spec;
+}
+EXPORT_SYMBOL_GPL(kapi_get_ioctl_spec);
+
+#endif /* CONFIG_KAPI_SPEC */
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 17/19] fwctl: add detailed IOCTL API specifications
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (15 preceding siblings ...)
2025-06-14 13:48 ` [RFC 16/19] kernel/api: add IOCTL specification infrastructure Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 18/19] binder: " Sasha Levin
` (3 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specifications to the fwctl driver using the IOCTL
specification framework. This provides detailed documentation and
enables runtime validation of the fwctl IOCTL interface.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/fwctl/main.c | 295 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 293 insertions(+), 2 deletions(-)
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index bc6378506296c..fc85d54ecb6a0 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -10,6 +10,8 @@
#include <linux/module.h>
#include <linux/sizes.h>
#include <linux/slab.h>
+#include <linux/ioctl_api_spec.h>
+#include <linux/kernel_api_spec.h>
#include <uapi/fwctl/fwctl.h>
@@ -261,13 +263,291 @@ static int fwctl_fops_release(struct inode *inode, struct file *filp)
return 0;
}
-static const struct file_operations fwctl_fops = {
+/* Use KAPI_DEFINE_FOPS for automatic validation wrapping */
+KAPI_DEFINE_FOPS(fwctl_fops,
.owner = THIS_MODULE,
.open = fwctl_fops_open,
.release = fwctl_fops_release,
.unlocked_ioctl = fwctl_fops_ioctl,
+);
+
+/* IOCTL API Specifications */
+
+static const struct kapi_ioctl_spec fwctl_info_spec = {
+ .cmd = FWCTL_INFO,
+ .cmd_name = "FWCTL_INFO",
+ .input_size = sizeof(struct fwctl_info),
+ .output_size = sizeof(struct fwctl_info),
+ .file_ops_name = "fwctl_fops",
+ .api_spec = {
+ .name = "fwctl_info",
+ KAPI_DESCRIPTION("Query device information and capabilities")
+ KAPI_LONG_DESC("Returns basic information about the fwctl instance, "
+ "including the device type and driver-specific data. "
+ "The driver-specific data format depends on the device type.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_IOCTL_PARAM_SIZE
+ KAPI_IOCTL_PARAM_FLAGS
+
+ KAPI_PARAM(2, "out_device_type", "__u32", "Device type from enum fwctl_device_type")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_ENUM,
+ .enum_values = (const s64[]){FWCTL_DEVICE_TYPE_ERROR,
+ FWCTL_DEVICE_TYPE_MLX5,
+ FWCTL_DEVICE_TYPE_CXL,
+ FWCTL_DEVICE_TYPE_PDS},
+ .enum_count = 4,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "device_data_len", "__u32", "Length of device data buffer")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = SZ_1M, /* Reasonable limit for device info */
+ KAPI_PARAM_END
+
+ KAPI_IOCTL_PARAM_USER_OUT_BUF(4, "out_device_data",
+ "Driver-specific device data", 3)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EFAULT, -EOPNOTSUPP, -ENODEV},
+ .error_count = 3,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy data to/from user space",
+ "Check that provided pointers are valid user space addresses")
+ KAPI_ERROR(1, -EOPNOTSUPP, "EOPNOTSUPP", "Invalid flags provided",
+ "Currently flags must be 0")
+ KAPI_ERROR(2, -ENODEV, "ENODEV", "Device has been hot-unplugged",
+ "The underlying device is no longer available")
+
+ .error_count = 3,
+ .param_count = 5,
+ .since_version = "6.13",
+
+ /* Structure specifications */
+ KAPI_STRUCT_SPEC(0, fwctl_info, "Device information query structure")
+ KAPI_STRUCT_SIZE(sizeof(struct fwctl_info), __alignof__(struct fwctl_info))
+ KAPI_STRUCT_FIELD_COUNT(4)
+
+ KAPI_STRUCT_FIELD(0, "size", KAPI_TYPE_UINT, "__u32",
+ "Structure size for versioning")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, size))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(1, "flags", KAPI_TYPE_UINT, "__u32",
+ "Must be 0, reserved for future use")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, flags))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_FIELD_CONSTRAINT_RANGE(0, 0)
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(2, "out_device_type", KAPI_TYPE_UINT, "__u32",
+ "Device type identifier")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, out_device_type))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(3, "device_data_len", KAPI_TYPE_UINT, "__u32",
+ "Length of device-specific data")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, device_data_len))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+ KAPI_STRUCT_SPEC_END
+
+ KAPI_STRUCT_SPEC_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_NONE,
+ "none",
+ "Read-only operation with no side effects")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(1)
+
+ /* State transitions */
+ KAPI_STATE_TRANS_COUNT(0) /* No state transitions for query operation */
+ },
};
+static const struct kapi_ioctl_spec fwctl_rpc_spec = {
+ .cmd = FWCTL_RPC,
+ .cmd_name = "FWCTL_RPC",
+ .input_size = sizeof(struct fwctl_rpc),
+ .output_size = sizeof(struct fwctl_rpc),
+ .file_ops_name = "fwctl_fops",
+ .api_spec = {
+ .name = "fwctl_rpc",
+ KAPI_DESCRIPTION("Execute a Remote Procedure Call to device firmware")
+ KAPI_LONG_DESC("Delivers an RPC to the device firmware and returns the response. "
+ "The RPC format is device-specific and determined by out_device_type "
+ "from FWCTL_INFO. Different scopes have different permission requirements.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_IOCTL_PARAM_SIZE
+
+ KAPI_PARAM(1, "scope", "__u32", "Access scope from enum fwctl_rpc_scope")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_ENUM,
+ .enum_values = (const s64[]){FWCTL_RPC_CONFIGURATION,
+ FWCTL_RPC_DEBUG_READ_ONLY,
+ FWCTL_RPC_DEBUG_WRITE,
+ FWCTL_RPC_DEBUG_WRITE_FULL},
+ .enum_count = 4,
+ .constraints = "FWCTL_RPC_DEBUG_WRITE_FULL requires CAP_SYS_RAWIO",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "in_len", "__u32", "Length of input buffer")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = MAX_RPC_LEN,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "out_len", "__u32", "Length of output buffer")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = MAX_RPC_LEN,
+ KAPI_PARAM_END
+
+ KAPI_IOCTL_PARAM_USER_BUF(4, "in", "RPC request in device-specific format", 2)
+ KAPI_IOCTL_PARAM_USER_OUT_BUF(5, "out", "RPC response in device-specific format", 3)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EMSGSIZE, -EOPNOTSUPP, -EPERM,
+ -ENOMEM, -EFAULT, -ENODEV},
+ .error_count = 6,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EMSGSIZE, "EMSGSIZE", "RPC message too large",
+ "in_len or out_len exceeds MAX_RPC_LEN (2MB)")
+ KAPI_ERROR(1, -EOPNOTSUPP, "EOPNOTSUPP", "Invalid scope value",
+ "scope must be one of the defined fwctl_rpc_scope values")
+ KAPI_ERROR(2, -EPERM, "EPERM", "Insufficient permissions",
+ "FWCTL_RPC_DEBUG_WRITE_FULL requires CAP_SYS_RAWIO")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Memory allocation failed",
+ "Unable to allocate buffers for RPC")
+ KAPI_ERROR(4, -EFAULT, "EFAULT", "Failed to copy data to/from user space",
+ "Check that provided pointers are valid user space addresses")
+ KAPI_ERROR(5, -ENODEV, "ENODEV", "Device has been hot-unplugged",
+ "The underlying device is no longer available")
+
+ .error_count = 6,
+ .param_count = 6,
+ .since_version = "6.13",
+ .notes = "FWCTL_RPC_DEBUG_WRITE and FWCTL_RPC_DEBUG_WRITE_FULL will "
+ "taint the kernel with TAINT_FWCTL on first use",
+
+ /* Structure specifications */
+ KAPI_STRUCT_SPEC(0, fwctl_rpc, "RPC request/response structure")
+ KAPI_STRUCT_SIZE(sizeof(struct fwctl_rpc), __alignof__(struct fwctl_rpc))
+ KAPI_STRUCT_FIELD_COUNT(6)
+
+ KAPI_STRUCT_FIELD(0, "size", KAPI_TYPE_UINT, "__u32",
+ "Structure size for versioning")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, size))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(1, "scope", KAPI_TYPE_UINT, "__u32",
+ "Access scope level")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, scope))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_FIELD_CONSTRAINT_ENUM((const s64[]){FWCTL_RPC_CONFIGURATION,
+ FWCTL_RPC_DEBUG_READ_ONLY,
+ FWCTL_RPC_DEBUG_WRITE,
+ FWCTL_RPC_DEBUG_WRITE_FULL}, 4)
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(2, "in_len", KAPI_TYPE_UINT, "__u32",
+ "Input data length")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, in_len))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(3, "out_len", KAPI_TYPE_UINT, "__u32",
+ "Output buffer length")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, out_len))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(4, "in", KAPI_TYPE_PTR, "__aligned_u64",
+ "Pointer to input data")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, in))
+ KAPI_FIELD_SIZE(sizeof(__aligned_u64))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(5, "out", KAPI_TYPE_PTR, "__aligned_u64",
+ "Pointer to output buffer")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, out))
+ KAPI_FIELD_SIZE(sizeof(__aligned_u64))
+ KAPI_STRUCT_FIELD_END
+ KAPI_STRUCT_SPEC_END
+
+ KAPI_STRUCT_SPEC_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_HARDWARE | KAPI_EFFECT_MODIFY_STATE,
+ "device firmware",
+ "May modify device configuration or firmware state")
+ KAPI_EFFECT_CONDITION("scope >= FWCTL_RPC_DEBUG_WRITE")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "kernel taint",
+ "Taints kernel with TAINT_FWCTL on first debug write")
+ KAPI_EFFECT_CONDITION("scope >= FWCTL_RPC_DEBUG_WRITE && first use")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_SCHEDULE,
+ "process",
+ "May block while firmware processes the RPC")
+ KAPI_EFFECT_CONDITION("firmware operation takes time")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(3)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "device state",
+ "current configuration", "modified configuration",
+ "Device configuration changed by RPC command")
+ KAPI_STATE_TRANS_COND("RPC modifies device settings")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "kernel taint state",
+ "untainted", "TAINT_FWCTL set",
+ "Kernel marked as tainted due to firmware modification")
+ KAPI_STATE_TRANS_COND("First debug write operation")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(2)
+ },
+};
+
+/* Register all fwctl IOCTL specifications */
+KAPI_IOCTL_SPEC_DRIVER("fwctl", {
+ &fwctl_info_spec,
+ &fwctl_rpc_spec,
+})
+
static void fwctl_device_release(struct device *device)
{
struct fwctl_device *fwctl =
@@ -325,7 +605,7 @@ struct fwctl_device *_fwctl_alloc_device(struct device *parent,
if (!fwctl)
return NULL;
- cdev_init(&fwctl->cdev, &fwctl_fops);
+ cdev_init(&fwctl->cdev, fwctl_fops);
/*
* The driver module is protected by fwctl_register/unregister(),
* unregister won't complete until we are done with the driver's module.
@@ -395,6 +675,9 @@ static int __init fwctl_init(void)
{
int ret;
+ /* Initialize the wrapped file_operations */
+ kapi_init_fops_fwctl_fops();
+
ret = alloc_chrdev_region(&fwctl_dev, 0, FWCTL_MAX_DEVICES, "fwctl");
if (ret)
return ret;
@@ -402,8 +685,15 @@ static int __init fwctl_init(void)
ret = class_register(&fwctl_class);
if (ret)
goto err_chrdev;
+
+ ret = kapi_ioctl_specs_init();
+ if (ret)
+ goto err_class;
+
return 0;
+err_class:
+ class_unregister(&fwctl_class);
err_chrdev:
unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
return ret;
@@ -411,6 +701,7 @@ static int __init fwctl_init(void)
static void __exit fwctl_exit(void)
{
+ kapi_ioctl_specs_exit();
class_unregister(&fwctl_class);
unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 18/19] binder: add detailed IOCTL API specifications
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (16 preceding siblings ...)
2025-06-14 13:48 ` [RFC 17/19] fwctl: add detailed IOCTL API specifications Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-14 13:48 ` [RFC 19/19] tools/kapi: Add kernel API specification extraction tool Sasha Levin
` (2 subsequent siblings)
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
Add kernel API specifications to the binder driver using the IOCTL
specification framework. This provides detailed documentation and
enables runtime validation of all binder IOCTL interfaces.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/android/binder.c | 758 +++++++++++++++++++++++++++++++++++++++
1 file changed, 758 insertions(+)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index c463ca4a8fff8..975f07216724b 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -67,6 +67,8 @@
#include <linux/task_work.h>
#include <linux/sizes.h>
#include <linux/ktime.h>
+#include <linux/ioctl_api_spec.h>
+#include <linux/kernel_api_spec.h>
#include <uapi/linux/android/binder.h>
@@ -6930,6 +6932,7 @@ static int transaction_log_show(struct seq_file *m, void *unused)
return 0;
}
+/* Define the actual binder_fops structure */
const struct file_operations binder_fops = {
.owner = THIS_MODULE,
.poll = binder_poll,
@@ -6941,6 +6944,751 @@ const struct file_operations binder_fops = {
.release = binder_release,
};
+/* Define wrapper for KAPI validation */
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+static struct file_operations __kapi_wrapped_binder_fops;
+static struct kapi_fops_wrapper __kapi_wrapper_binder_fops;
+
+static void kapi_init_fops_binder_fops(void)
+{
+ if (binder_fops.unlocked_ioctl) {
+ __kapi_wrapped_binder_fops = binder_fops;
+ __kapi_wrapper_binder_fops.real_fops = &binder_fops;
+ __kapi_wrapper_binder_fops.wrapped_fops = &__kapi_wrapped_binder_fops;
+ __kapi_wrapper_binder_fops.real_ioctl = binder_fops.unlocked_ioctl;
+ __kapi_wrapped_binder_fops.unlocked_ioctl = kapi_ioctl_validation_wrapper;
+ kapi_register_wrapper(&__kapi_wrapper_binder_fops);
+ }
+}
+#else
+static inline void kapi_init_fops_binder_fops(void) {}
+#endif
+
+/* IOCTL API Specifications for Binder */
+
+static const struct kapi_ioctl_spec binder_write_read_spec = {
+ .cmd = BINDER_WRITE_READ,
+ .cmd_name = "BINDER_WRITE_READ",
+ .input_size = sizeof(struct binder_write_read),
+ .output_size = sizeof(struct binder_write_read),
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_write_read",
+ KAPI_DESCRIPTION("Perform read/write operations on binder")
+ KAPI_LONG_DESC("Main workhorse of binder IPC. Allows writing commands to "
+ "binder driver and reading responses. Commands are encoded "
+ "in a special protocol format. Both read and write operations "
+ "can be performed in a single ioctl call.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "write_size", "binder_size_t", "Bytes to write")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = SZ_4M, /* Reasonable limit for IPC */
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "write_consumed", "binder_size_t", "Bytes consumed by driver")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = SZ_4M,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "write_buffer", "binder_uintptr_t", "User buffer with commands")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ .size_param_idx = 0,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "read_size", "binder_size_t", "Bytes to read")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = SZ_4M,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "read_consumed", "binder_size_t", "Bytes consumed by driver")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = SZ_4M,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(5, "read_buffer", "binder_uintptr_t", "User buffer for responses")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ .type = KAPI_TYPE_USER_PTR,
+ .size_param_idx = 3,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EFAULT, -EINVAL, -EAGAIN, -EINTR,
+ -ENOMEM, -ECONNREFUSED},
+ .error_count = 6,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy data to/from user space",
+ "Check buffer pointers are valid user space addresses")
+ KAPI_ERROR(1, -EINVAL, "EINVAL", "Invalid parameters",
+ "Buffer sizes or commands are invalid")
+ KAPI_ERROR(2, -EAGAIN, "EAGAIN", "Try again",
+ "Non-blocking read with no data available")
+ KAPI_ERROR(3, -EINTR, "EINTR", "Interrupted by signal",
+ "Operation interrupted, should be retried")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate memory for operation")
+ KAPI_ERROR(5, -ECONNREFUSED, "ECONNREFUSED", "Connection refused",
+ "Process is being destroyed, no further operations allowed")
+
+ .error_count = 6,
+ .param_count = 6,
+ .since_version = "3.0",
+ .notes = "This is the primary interface for binder IPC. Most other "
+ "ioctls are for configuration and management.",
+
+ /* Structure specifications */
+ KAPI_STRUCT_SPEC(0, binder_write_read, "Read/write operation structure")
+ KAPI_STRUCT_SIZE(sizeof(struct binder_write_read), __alignof__(struct binder_write_read))
+ KAPI_STRUCT_FIELD_COUNT(6)
+
+ KAPI_STRUCT_FIELD(0, "write_size", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes to write")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, write_size))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(1, "write_consumed", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes consumed by driver")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, write_consumed))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(2, "write_buffer", KAPI_TYPE_PTR, "binder_uintptr_t",
+ "Pointer to write buffer")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, write_buffer))
+ KAPI_FIELD_SIZE(sizeof(binder_uintptr_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(3, "read_size", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes to read")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, read_size))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(4, "read_consumed", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes consumed by driver")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, read_consumed))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(5, "read_buffer", KAPI_TYPE_PTR, "binder_uintptr_t",
+ "Pointer to read buffer")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, read_buffer))
+ KAPI_FIELD_SIZE(sizeof(binder_uintptr_t))
+ KAPI_STRUCT_FIELD_END
+ KAPI_STRUCT_SPEC_END
+
+ KAPI_STRUCT_SPEC_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_NETWORK,
+ "binder transaction queue",
+ "Enqueues transactions or commands to target process")
+ KAPI_EFFECT_CONDITION("write_size > 0")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "May block waiting for incoming transactions")
+ KAPI_EFFECT_CONDITION("read_size > 0 && no data available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_RESOURCE_CREATE,
+ "binder nodes/refs",
+ "May create or destroy binder nodes and references")
+ KAPI_EFFECT_CONDITION("specific commands")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_SIGNAL_SEND,
+ "target process",
+ "May trigger death notifications to linked processes")
+ KAPI_EFFECT_CONDITION("death notification")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(4)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "transaction",
+ "pending in sender", "queued in target",
+ "Transaction moves from sender to target's queue")
+ KAPI_STATE_TRANS_COND("BC_TRANSACTION command")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "thread state",
+ "running", "waiting for work",
+ "Thread blocks waiting for incoming transactions")
+ KAPI_STATE_TRANS_COND("read with no work available")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "binder ref",
+ "active", "released",
+ "Reference count decremented, may trigger cleanup")
+ KAPI_STATE_TRANS_COND("BC_RELEASE command")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(3)
+ },
+};
+
+static const struct kapi_ioctl_spec binder_set_max_threads_spec = {
+ .cmd = BINDER_SET_MAX_THREADS,
+ .cmd_name = "BINDER_SET_MAX_THREADS",
+ .input_size = sizeof(__u32),
+ .output_size = 0,
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_set_max_threads",
+ KAPI_DESCRIPTION("Set maximum number of binder threads")
+ KAPI_LONG_DESC("Sets the maximum number of threads that the binder driver "
+ "will request this process to spawn for handling incoming "
+ "transactions. The driver sends BR_SPAWN_LOOPER when it needs "
+ "more threads.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "max_threads", "__u32", "Maximum number of threads")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = INT_MAX,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EINVAL, -EFAULT},
+ .error_count = 2,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid thread count",
+ "Thread count exceeds system limits")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+
+ .error_count = 2,
+ .param_count = 1,
+ .since_version = "3.0",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_set_context_mgr_spec = {
+ .cmd = BINDER_SET_CONTEXT_MGR,
+ .cmd_name = "BINDER_SET_CONTEXT_MGR",
+ .input_size = 0,
+ .output_size = 0,
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_set_context_mgr",
+ KAPI_DESCRIPTION("Become the context manager (handle 0)")
+ KAPI_LONG_DESC("Registers the calling process as the context manager for "
+ "this binder domain. The context manager has special handle 0 "
+ "and typically implements the service manager. Only one process "
+ "per binder domain can be the context manager.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EBUSY, -EPERM, -ENOMEM},
+ .error_count = 3,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EBUSY, "EBUSY", "Context manager already set",
+ "Another process is already the context manager")
+ KAPI_ERROR(1, -EPERM, "EPERM", "Permission denied",
+ "Caller lacks permission or wrong UID")
+ KAPI_ERROR(2, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate context manager node")
+
+ .error_count = 3,
+ .param_count = 0,
+ .since_version = "3.0",
+ .notes = "Requires CAP_SYS_NICE or proper SELinux permissions",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_set_context_mgr_ext_spec = {
+ .cmd = BINDER_SET_CONTEXT_MGR_EXT,
+ .cmd_name = "BINDER_SET_CONTEXT_MGR_EXT",
+ .input_size = sizeof(struct flat_binder_object),
+ .output_size = 0,
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_set_context_mgr_ext",
+ KAPI_DESCRIPTION("Become context manager with extended info")
+ KAPI_LONG_DESC("Extended version of BINDER_SET_CONTEXT_MGR that allows "
+ "specifying additional properties of the context manager "
+ "through a flat_binder_object structure.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "object", "struct flat_binder_object", "Context manager properties")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_STRUCT,
+ .size = sizeof(struct flat_binder_object),
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EINVAL, -EFAULT, -EBUSY, -EPERM, -ENOMEM},
+ .error_count = 5,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid parameters",
+ "Invalid flat_binder_object structure")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+ KAPI_ERROR(2, -EBUSY, "EBUSY", "Context manager already set",
+ "Another process is already the context manager")
+ KAPI_ERROR(3, -EPERM, "EPERM", "Permission denied",
+ "Caller lacks permission or wrong UID")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate context manager node")
+
+ .error_count = 5,
+ .param_count = 1,
+ .since_version = "4.14",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_thread_exit_spec = {
+ .cmd = BINDER_THREAD_EXIT,
+ .cmd_name = "BINDER_THREAD_EXIT",
+ .input_size = 0,
+ .output_size = 0,
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_thread_exit",
+ KAPI_DESCRIPTION("Exit binder thread")
+ KAPI_LONG_DESC("Notifies the binder driver that this thread is exiting. "
+ "The driver will clean up any pending transactions and "
+ "remove the thread from the thread pool.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){},
+ .error_count = 0,
+ KAPI_RETURN_END
+
+ .error_count = 0,
+ .param_count = 0,
+ .since_version = "3.0",
+ .notes = "Should be called before thread termination to ensure clean shutdown",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_version_spec = {
+ .cmd = BINDER_VERSION,
+ .cmd_name = "BINDER_VERSION",
+ .input_size = 0,
+ .output_size = sizeof(struct binder_version),
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_version",
+ KAPI_DESCRIPTION("Get binder protocol version")
+ KAPI_LONG_DESC("Returns the current binder protocol version supported "
+ "by the driver. Used for compatibility checking.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "protocol_version", "__s32", "Binder protocol version")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_ENUM,
+ .enum_values = (const s64[]){BINDER_CURRENT_PROTOCOL_VERSION},
+ .enum_count = 1,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EINVAL, -EFAULT},
+ .error_count = 2,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid version structure",
+ "Invalid user pointer for version structure")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy to user",
+ "Unable to write version to user space")
+
+ .error_count = 2,
+ .param_count = 1,
+ .since_version = "3.0",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_get_node_info_for_ref_spec = {
+ .cmd = BINDER_GET_NODE_INFO_FOR_REF,
+ .cmd_name = "BINDER_GET_NODE_INFO_FOR_REF",
+ .input_size = sizeof(struct binder_node_info_for_ref),
+ .output_size = sizeof(struct binder_node_info_for_ref),
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_get_node_info_for_ref",
+ KAPI_DESCRIPTION("Get node information for a reference")
+ KAPI_LONG_DESC("Retrieves information about a binder node given its handle. "
+ "Returns the current strong and weak reference counts.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "handle", "__u32", "Binder handle")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "strong_count", "__u32", "Strong reference count")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "weak_count", "__u32", "Weak reference count")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EINVAL, -EFAULT, -ENOENT},
+ .error_count = 3,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid parameters",
+ "Reserved fields must be zero")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy data",
+ "Invalid user pointer provided")
+ KAPI_ERROR(2, -ENOENT, "ENOENT", "Handle not found",
+ "No node exists for the given handle")
+
+ .error_count = 3,
+ .param_count = 3,
+ .since_version = "4.14",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_get_node_debug_info_spec = {
+ .cmd = BINDER_GET_NODE_DEBUG_INFO,
+ .cmd_name = "BINDER_GET_NODE_DEBUG_INFO",
+ .input_size = sizeof(struct binder_node_debug_info),
+ .output_size = sizeof(struct binder_node_debug_info),
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_get_node_debug_info",
+ KAPI_DESCRIPTION("Get debug info for binder nodes")
+ KAPI_LONG_DESC("Iterates through all binder nodes in the process. "
+ "Start with ptr=NULL to get first node, then use "
+ "returned ptr for next call. Returns ptr=0 when done.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "ptr", "binder_uintptr_t", "Node pointer (NULL for first)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT)
+ .type = KAPI_TYPE_PTR,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "cookie", "binder_uintptr_t", "Node cookie value")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "has_strong_ref", "__u32", "Has strong references")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "has_weak_ref", "__u32", "Has weak references")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EFAULT, -EINVAL},
+ .error_count = 2,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy data",
+ "Invalid user pointer provided")
+ KAPI_ERROR(1, -EINVAL, "EINVAL", "Invalid node pointer",
+ "Provided ptr is not a valid node")
+
+ .error_count = 2,
+ .param_count = 4,
+ .since_version = "4.14",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_freeze_spec = {
+ .cmd = BINDER_FREEZE,
+ .cmd_name = "BINDER_FREEZE",
+ .input_size = sizeof(struct binder_freeze_info),
+ .output_size = 0,
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_freeze",
+ KAPI_DESCRIPTION("Freeze or unfreeze a binder process")
+ KAPI_LONG_DESC("Controls whether a process can receive binder transactions. "
+ "When frozen, new transactions are blocked. Can wait for "
+ "existing transactions to complete with timeout.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "pid", "__u32", "Process ID to freeze/unfreeze")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 1,
+ .max_value = PID_MAX_LIMIT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "enable", "__u32", "1 to freeze, 0 to unfreeze")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "timeout_ms", "__u32", "Timeout in milliseconds (0 = no wait)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = 60000, /* 1 minute max */
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EINVAL, -EAGAIN, -EFAULT, -ENOMEM},
+ .error_count = 4,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid process",
+ "Process not found or invalid parameters")
+ KAPI_ERROR(1, -EAGAIN, "EAGAIN", "Timeout waiting for transactions",
+ "Existing transactions did not complete within timeout")
+ KAPI_ERROR(2, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate memory for freeze operation")
+
+ .error_count = 4,
+ .param_count = 3,
+ .since_version = "5.9",
+ .notes = "Requires appropriate permissions to freeze other processes",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_get_frozen_info_spec = {
+ .cmd = BINDER_GET_FROZEN_INFO,
+ .cmd_name = "BINDER_GET_FROZEN_INFO",
+ .input_size = sizeof(struct binder_frozen_status_info),
+ .output_size = sizeof(struct binder_frozen_status_info),
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_get_frozen_info",
+ KAPI_DESCRIPTION("Get frozen status of a process")
+ KAPI_LONG_DESC("Queries whether a process is frozen and if it has "
+ "received transactions while frozen. Useful for "
+ "debugging frozen process issues.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "pid", "__u32", "Process ID to query")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 1,
+ .max_value = PID_MAX_LIMIT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "sync_recv", "__u32", "Sync transactions received while frozen")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ .constraints = "Bit 0: received after frozen, Bit 1: pending during freeze",
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "async_recv", "__u32", "Async transactions received while frozen")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EINVAL, -EFAULT},
+ .error_count = 2,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Process not found",
+ "No binder process found with given PID")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy data",
+ "Invalid user pointer provided")
+
+ .error_count = 2,
+ .param_count = 3,
+ .since_version = "5.9",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_enable_oneway_spam_detection_spec = {
+ .cmd = BINDER_ENABLE_ONEWAY_SPAM_DETECTION,
+ .cmd_name = "BINDER_ENABLE_ONEWAY_SPAM_DETECTION",
+ .input_size = sizeof(__u32),
+ .output_size = 0,
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_enable_oneway_spam_detection",
+ KAPI_DESCRIPTION("Enable/disable oneway spam detection")
+ KAPI_LONG_DESC("Controls whether the driver monitors for excessive "
+ "oneway transactions that might indicate spam or abuse. "
+ "When enabled, BR_ONEWAY_SPAM_SUSPECT is sent when threshold exceeded.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "enable", "__u32", "1 to enable, 0 to disable")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EFAULT},
+ .error_count = 1,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+
+ .error_count = 1,
+ .param_count = 1,
+ .since_version = "5.13",
+ },
+};
+
+static const struct kapi_ioctl_spec binder_get_extended_error_spec = {
+ .cmd = BINDER_GET_EXTENDED_ERROR,
+ .cmd_name = "BINDER_GET_EXTENDED_ERROR",
+ .input_size = 0,
+ .output_size = sizeof(struct binder_extended_error),
+ .file_ops_name = "binder_fops",
+ .api_spec = {
+ .name = "binder_get_extended_error",
+ KAPI_DESCRIPTION("Get extended error information")
+ KAPI_LONG_DESC("Retrieves detailed error information from the last "
+ "failed binder operation on this thread. Clears the "
+ "error after reading.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "id", "__u32", "Error identifier")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "command", "__u32", "Binder command that failed")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "param", "__s32", "Error parameter (negative errno)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_INT,
+ .constraint_type = KAPI_CONSTRAINT_RANGE,
+ .min_value = -MAX_ERRNO,
+ .max_value = 0,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .error_values = (const s64[]){-EFAULT},
+ .error_count = 1,
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy to user",
+ "Invalid user pointer provided")
+
+ .error_count = 1,
+ .param_count = 3,
+ .since_version = "5.16",
+ .notes = "Error is cleared after reading, subsequent calls return BR_OK",
+ },
+};
+
+/* Register all binder IOCTL specifications */
+KAPI_IOCTL_SPEC_DRIVER("binder", {
+ &binder_write_read_spec,
+ &binder_set_max_threads_spec,
+ &binder_set_context_mgr_spec,
+ &binder_set_context_mgr_ext_spec,
+ &binder_thread_exit_spec,
+ &binder_version_spec,
+ &binder_get_node_info_for_ref_spec,
+ &binder_get_node_debug_info_spec,
+ &binder_freeze_spec,
+ &binder_get_frozen_info_spec,
+ &binder_enable_oneway_spam_detection_spec,
+ &binder_get_extended_error_spec,
+})
+
DEFINE_SHOW_ATTRIBUTE(state);
DEFINE_SHOW_ATTRIBUTE(state_hashed);
DEFINE_SHOW_ATTRIBUTE(stats);
@@ -7050,6 +7798,13 @@ static int __init binder_init(void)
if (ret)
return ret;
+ /* Initialize the wrapped file_operations */
+ kapi_init_fops_binder_fops();
+
+ ret = kapi_ioctl_specs_init();
+ if (ret)
+ goto err_kapi_init;
+
atomic_set(&binder_transaction_log.cur, ~0U);
atomic_set(&binder_transaction_log_failed.cur, ~0U);
@@ -7102,6 +7857,9 @@ static int __init binder_init(void)
err_alloc_device_names_failed:
debugfs_remove_recursive(binder_debugfs_dir_entry_root);
+ kapi_ioctl_specs_exit();
+
+err_kapi_init:
binder_alloc_shrinker_exit();
return ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC 19/19] tools/kapi: Add kernel API specification extraction tool
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (17 preceding siblings ...)
2025-06-14 13:48 ` [RFC 18/19] binder: " Sasha Levin
@ 2025-06-14 13:48 ` Sasha Levin
2025-06-17 12:08 ` [RFC 00/19] Kernel API Specification Framework David Laight
2025-06-18 21:29 ` Kees Cook
20 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-14 13:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-api, workflows, tools, Sasha Levin
The kapi tool extracts and displays kernel API specifications.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Documentation/admin-guide/kernel-api-spec.rst | 198 ++++++-
tools/kapi/.gitignore | 4 +
tools/kapi/Cargo.toml | 19 +
| 204 ++++++++
| 95 ++++
| 488 ++++++++++++++++++
| 130 +++++
| 372 +++++++++++++
tools/kapi/src/formatter/json.rs | 170 ++++++
tools/kapi/src/formatter/mod.rs | 68 +++
tools/kapi/src/formatter/plain.rs | 99 ++++
tools/kapi/src/formatter/rst.rs | 144 ++++++
tools/kapi/src/main.rs | 121 +++++
13 files changed, 2109 insertions(+), 3 deletions(-)
create mode 100644 tools/kapi/.gitignore
create mode 100644 tools/kapi/Cargo.toml
create mode 100644 tools/kapi/src/extractor/debugfs.rs
create mode 100644 tools/kapi/src/extractor/mod.rs
create mode 100644 tools/kapi/src/extractor/source_parser.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
create mode 100644 tools/kapi/src/formatter/json.rs
create mode 100644 tools/kapi/src/formatter/mod.rs
create mode 100644 tools/kapi/src/formatter/plain.rs
create mode 100644 tools/kapi/src/formatter/rst.rs
create mode 100644 tools/kapi/src/main.rs
diff --git a/Documentation/admin-guide/kernel-api-spec.rst b/Documentation/admin-guide/kernel-api-spec.rst
index 3a63f6711e27b..9b452753111ad 100644
--- a/Documentation/admin-guide/kernel-api-spec.rst
+++ b/Documentation/admin-guide/kernel-api-spec.rst
@@ -31,7 +31,9 @@ The framework aims to:
common programming errors during development and testing.
3. **Support Tooling**: Export API specifications in machine-readable formats for
- use by static analyzers, documentation generators, and development tools.
+ use by static analyzers, documentation generators, and development tools. The
+ ``kapi`` tool (see `The kapi Tool`_) provides comprehensive extraction and
+ formatting capabilities.
4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
for debugging and introspection.
@@ -71,6 +73,13 @@ The framework consists of several key components:
- Type-safe parameter specifications
- Context and constraint definitions
+5. **kapi Tool** (``tools/kapi/``)
+
+ - Userspace utility for extracting specifications
+ - Multiple input sources (source, binary, debugfs)
+ - Multiple output formats (plain, JSON, RST)
+ - Testing and validation utilities
+
Data Model
----------
@@ -344,8 +353,177 @@ Documentation Generation
------------------------
The framework exports specifications via debugfs that can be used
-to generate documentation. Tools for automatic documentation generation
-from specifications are planned for future development.
+to generate documentation. The ``kapi`` tool provides comprehensive
+extraction and formatting capabilities for kernel API specifications.
+
+The kapi Tool
+=============
+
+Overview
+--------
+
+The ``kapi`` tool is a userspace utility that extracts and displays kernel API
+specifications from multiple sources. It provides a unified interface to access
+API documentation whether from compiled kernels, source code, or runtime systems.
+
+Installation
+------------
+
+Build the tool from the kernel source tree::
+
+ $ cd tools/kapi
+ $ cargo build --release
+
+ # Optional: Install system-wide
+ $ cargo install --path .
+
+The tool requires Rust and Cargo to build. The binary will be available at
+``tools/kapi/target/release/kapi``.
+
+Command-Line Usage
+------------------
+
+Basic syntax::
+
+ kapi [OPTIONS] [API_NAME]
+
+Options:
+
+- ``--vmlinux <PATH>``: Extract from compiled kernel binary
+- ``--source <PATH>``: Extract from kernel source code
+- ``--debugfs <PATH>``: Extract from debugfs (default: /sys/kernel/debug)
+- ``-f, --format <FORMAT>``: Output format (plain, json, rst)
+- ``-h, --help``: Display help information
+- ``-V, --version``: Display version information
+
+Input Modes
+-----------
+
+**1. Source Code Mode**
+
+Extract specifications directly from kernel source::
+
+ # Scan entire kernel source tree
+ $ kapi --source /path/to/linux
+
+ # Extract from specific file
+ $ kapi --source kernel/sched/core.c
+
+ # Get details for specific API
+ $ kapi --source /path/to/linux sys_sched_yield
+
+**2. Vmlinux Mode**
+
+Extract from compiled kernel with debug symbols::
+
+ # List all APIs in vmlinux
+ $ kapi --vmlinux /boot/vmlinux-5.15.0
+
+ # Get specific syscall details
+ $ kapi --vmlinux ./vmlinux sys_read
+
+**3. Debugfs Mode**
+
+Extract from running kernel via debugfs::
+
+ # Use default debugfs path
+ $ kapi
+
+ # Use custom debugfs mount
+ $ kapi --debugfs /mnt/debugfs
+
+ # Get specific API from running kernel
+ $ kapi sys_write
+
+Output Formats
+--------------
+
+**Plain Text Format** (default)::
+
+ $ kapi sys_read
+
+ Detailed information for sys_read:
+ ==================================
+ Description: Read from a file descriptor
+
+ Detailed Description:
+ Reads up to count bytes from file descriptor fd into the buffer starting at buf.
+
+ Execution Context:
+ - KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+
+ Parameters (3):
+
+ Available since: 1.0
+
+**JSON Format**::
+
+ $ kapi --format json sys_read
+ {
+ "api_details": {
+ "name": "sys_read",
+ "description": "Read from a file descriptor",
+ "long_description": "Reads up to count bytes...",
+ "context_flags": ["KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE"],
+ "since_version": "1.0"
+ }
+ }
+
+**ReStructuredText Format**::
+
+ $ kapi --format rst sys_read
+
+ sys_read
+ ========
+
+ **Read from a file descriptor**
+
+ Reads up to count bytes from file descriptor fd into the buffer...
+
+Usage Examples
+--------------
+
+**Generate complete API documentation**::
+
+ # Export all kernel APIs to JSON
+ $ kapi --source /path/to/linux --format json > kernel-apis.json
+
+ # Generate RST documentation for all syscalls
+ $ kapi --vmlinux ./vmlinux --format rst > syscalls.rst
+
+ # List APIs from specific subsystem
+ $ kapi --source drivers/gpu/drm/
+
+**Integration with other tools**::
+
+ # Find all APIs that can sleep
+ $ kapi --format json | jq '.apis[] | select(.context_flags[] | contains("SLEEPABLE"))'
+
+ # Generate markdown documentation
+ $ kapi --format rst sys_mmap | pandoc -f rst -t markdown
+
+**Debugging and analysis**::
+
+ # Compare API between kernel versions
+ $ diff <(kapi --vmlinux vmlinux-5.10) <(kapi --vmlinux vmlinux-5.15)
+
+ # Check if specific API exists
+ $ kapi --source . my_custom_api || echo "API not found"
+
+Implementation Details
+----------------------
+
+The tool extracts API specifications from three sources:
+
+1. **Source Code**: Parses KAPI specification macros using regular expressions
+2. **Vmlinux**: Reads the ``.kapi_specs`` ELF section from compiled kernels
+3. **Debugfs**: Reads from ``/sys/kernel/debug/kapi/`` filesystem interface
+
+The tool supports all KAPI specification types:
+
+- System calls (``DEFINE_KERNEL_API_SPEC``)
+- IOCTLs (``DEFINE_IOCTL_API_SPEC``)
+- Kernel functions (``KAPI_DEFINE_SPEC``)
IDE Integration
---------------
@@ -357,6 +535,11 @@ Modern IDEs can use the JSON export for:
- Context validation
- Error code documentation
+Example IDE integration::
+
+ # Generate IDE completion data
+ $ kapi --format json > .vscode/kernel-apis.json
+
Testing Framework
-----------------
@@ -367,6 +550,15 @@ The framework includes test helpers::
kapi_test_api("kmalloc", test_cases);
#endif
+The kapi tool can verify specifications against implementations::
+
+ # Run consistency tests
+ $ cd tools/kapi
+ $ ./test_consistency.sh
+
+ # Compare source vs binary specifications
+ $ ./compare_all_syscalls.sh
+
Best Practices
==============
diff --git a/tools/kapi/.gitignore b/tools/kapi/.gitignore
new file mode 100644
index 0000000000000..1390bfc12686c
--- /dev/null
+++ b/tools/kapi/.gitignore
@@ -0,0 +1,4 @@
+# Rust build artifacts
+/target/
+**/*.rs.bk
+
diff --git a/tools/kapi/Cargo.toml b/tools/kapi/Cargo.toml
new file mode 100644
index 0000000000000..4e6bcb10d132f
--- /dev/null
+++ b/tools/kapi/Cargo.toml
@@ -0,0 +1,19 @@
+[package]
+name = "kapi"
+version = "0.1.0"
+edition = "2024"
+authors = ["Sasha Levin <sashal@kernel.org>"]
+description = "Tool for extracting and displaying kernel API specifications"
+license = "GPL-2.0"
+
+[dependencies]
+goblin = "0.10"
+clap = { version = "4.4", features = ["derive"] }
+anyhow = "1.0"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+regex = "1.10"
+walkdir = "2.4"
+
+[dev-dependencies]
+tempfile = "3.8"
--git a/tools/kapi/src/extractor/debugfs.rs b/tools/kapi/src/extractor/debugfs.rs
new file mode 100644
index 0000000000000..91775dea223f5
--- /dev/null
+++ b/tools/kapi/src/extractor/debugfs.rs
@@ -0,0 +1,204 @@
+use anyhow::{Context, Result, bail};
+use std::fs;
+use std::io::Write;
+use std::path::PathBuf;
+use crate::formatter::OutputFormatter;
+
+use super::{ApiExtractor, ApiSpec, display_api_spec};
+
+/// Extractor for kernel API specifications from debugfs
+pub struct DebugfsExtractor {
+ debugfs_path: PathBuf,
+}
+
+impl DebugfsExtractor {
+ /// Create a new debugfs extractor with the specified debugfs path
+ pub fn new(debugfs_path: Option<String>) -> Result<Self> {
+ let path = match debugfs_path {
+ Some(p) => PathBuf::from(p),
+ None => PathBuf::from("/sys/kernel/debug"),
+ };
+
+ // Check if the debugfs path exists
+ if !path.exists() {
+ bail!("Debugfs path does not exist: {}", path.display());
+ }
+
+ // Check if kapi directory exists
+ let kapi_path = path.join("kapi");
+ if !kapi_path.exists() {
+ bail!("Kernel API debugfs interface not found at: {}", kapi_path.display());
+ }
+
+ Ok(Self {
+ debugfs_path: path,
+ })
+ }
+
+ /// Parse the list file to get all available API names
+ fn parse_list_file(&self) -> Result<Vec<String>> {
+ let list_path = self.debugfs_path.join("kapi/list");
+ let content = fs::read_to_string(&list_path)
+ .with_context(|| format!("Failed to read {}", list_path.display()))?;
+
+ let mut apis = Vec::new();
+ let mut in_list = false;
+
+ for line in content.lines() {
+ if line.contains("===") {
+ in_list = true;
+ continue;
+ }
+
+ if in_list && line.starts_with("Total:") {
+ break;
+ }
+
+ if in_list && !line.trim().is_empty() {
+ // Extract API name from lines like "sys_read - Read from a file descriptor"
+ if let Some(name) = line.split(" - ").next() {
+ apis.push(name.trim().to_string());
+ }
+ }
+ }
+
+ Ok(apis)
+ }
+
+ /// Parse a single API specification file
+ fn parse_spec_file(&self, api_name: &str) -> Result<ApiSpec> {
+ let spec_path = self.debugfs_path.join(format!("kapi/specs/{}", api_name));
+ let content = fs::read_to_string(&spec_path)
+ .with_context(|| format!("Failed to read {}", spec_path.display()))?;
+
+ let mut spec = ApiSpec {
+ name: api_name.to_string(),
+ api_type: "unknown".to_string(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: Vec::new(),
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ };
+
+ // Parse the content
+ let mut collecting_multiline = false;
+ let mut multiline_buffer = String::new();
+ let mut multiline_field = "";
+
+ for line in content.lines() {
+ // Handle section headers
+ if line.starts_with("Parameters (") {
+ if let Some(count_str) = line.strip_prefix("Parameters (").and_then(|s| s.strip_suffix("):")) {
+ spec.param_count = count_str.parse().ok();
+ }
+ continue;
+ } else if line.starts_with("Errors (") {
+ if let Some(count_str) = line.strip_prefix("Errors (").and_then(|s| s.strip_suffix("):")) {
+ spec.error_count = count_str.parse().ok();
+ }
+ continue;
+ } else if line.starts_with("Examples:") {
+ collecting_multiline = true;
+ multiline_field = "examples";
+ multiline_buffer.clear();
+ continue;
+ } else if line.starts_with("Notes:") {
+ collecting_multiline = true;
+ multiline_field = "notes";
+ multiline_buffer.clear();
+ continue;
+ }
+
+ // Handle multiline sections
+ if collecting_multiline {
+ if line.trim().is_empty() && multiline_buffer.ends_with("\n\n") {
+ collecting_multiline = false;
+ match multiline_field {
+ "examples" => spec.examples = Some(multiline_buffer.trim().to_string()),
+ "notes" => spec.notes = Some(multiline_buffer.trim().to_string()),
+ _ => {}
+ }
+ multiline_buffer.clear();
+ } else {
+ if !multiline_buffer.is_empty() {
+ multiline_buffer.push('\n');
+ }
+ multiline_buffer.push_str(line);
+ }
+ continue;
+ }
+
+ // Parse regular fields
+ if let Some(desc) = line.strip_prefix("Description: ") {
+ spec.description = Some(desc.to_string());
+ } else if let Some(long_desc) = line.strip_prefix("Long description: ") {
+ spec.long_description = Some(long_desc.to_string());
+ } else if let Some(version) = line.strip_prefix("Version: ") {
+ spec.version = Some(version.to_string());
+ } else if let Some(since) = line.strip_prefix("Since: ") {
+ spec.since_version = Some(since.to_string());
+ } else if let Some(flags) = line.strip_prefix("Context flags: ") {
+ spec.context_flags = flags.split_whitespace()
+ .map(|s| s.to_string())
+ .collect();
+ }
+ }
+
+ // Determine API type based on name
+ if api_name.starts_with("sys_") {
+ spec.api_type = "syscall".to_string();
+ } else if api_name.contains("_ioctl") || api_name.starts_with("ioctl_") {
+ spec.api_type = "ioctl".to_string();
+ } else {
+ spec.api_type = "function".to_string();
+ }
+
+ Ok(spec)
+ }
+}
+
+impl ApiExtractor for DebugfsExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ let api_names = self.parse_list_file()?;
+ let mut specs = Vec::new();
+
+ for name in api_names {
+ match self.parse_spec_file(&name) {
+ Ok(spec) => specs.push(spec),
+ Err(e) => eprintln!("Warning: Failed to parse spec for {}: {}", name, e),
+ }
+ }
+
+ Ok(specs)
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ let api_names = self.parse_list_file()?;
+
+ if api_names.contains(&name.to_string()) {
+ Ok(Some(self.parse_spec_file(name)?))
+ } else {
+ Ok(None)
+ }
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.extract_by_name(api_name)? {
+ display_api_spec(&spec, formatter, writer)?;
+ } else {
+ writeln!(writer, "API '{}' not found in debugfs", api_name)?;
+ }
+
+ Ok(())
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/mod.rs b/tools/kapi/src/extractor/mod.rs
new file mode 100644
index 0000000000000..bc55201152e3e
--- /dev/null
+++ b/tools/kapi/src/extractor/mod.rs
@@ -0,0 +1,95 @@
+use anyhow::Result;
+use std::io::Write;
+use crate::formatter::OutputFormatter;
+
+pub mod vmlinux;
+pub mod source_parser;
+pub mod debugfs;
+
+pub use vmlinux::VmlinuxExtractor;
+pub use source_parser::SourceExtractor;
+pub use debugfs::DebugfsExtractor;
+
+/// Common API specification information that all extractors should provide
+#[derive(Debug, Clone)]
+pub struct ApiSpec {
+ pub name: String,
+ pub api_type: String,
+ pub description: Option<String>,
+ pub long_description: Option<String>,
+ pub version: Option<String>,
+ pub context_flags: Vec<String>,
+ pub param_count: Option<u32>,
+ pub error_count: Option<u32>,
+ pub examples: Option<String>,
+ pub notes: Option<String>,
+ pub since_version: Option<String>,
+}
+
+/// Trait for extracting API specifications from different sources
+pub trait ApiExtractor {
+ /// Extract all API specifications from the source
+ fn extract_all(&self) -> Result<Vec<ApiSpec>>;
+
+ /// Extract a specific API specification by name
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>>;
+
+ /// Display detailed information about a specific API
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()>;
+}
+
+/// Helper function to display an ApiSpec using a formatter
+pub fn display_api_spec(
+ spec: &ApiSpec,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+) -> Result<()> {
+ formatter.begin_api_details(writer, &spec.name)?;
+
+ if let Some(desc) = &spec.description {
+ formatter.description(writer, desc)?;
+ }
+
+ if let Some(long_desc) = &spec.long_description {
+ formatter.long_description(writer, long_desc)?;
+ }
+
+ if let Some(version) = &spec.since_version {
+ formatter.since_version(writer, version)?;
+ }
+
+ if !spec.context_flags.is_empty() {
+ formatter.begin_context_flags(writer)?;
+ for flag in &spec.context_flags {
+ formatter.context_flag(writer, flag)?;
+ }
+ formatter.end_context_flags(writer)?;
+ }
+
+ if let Some(param_count) = spec.param_count {
+ formatter.begin_parameters(writer, param_count)?;
+ formatter.end_parameters(writer)?;
+ }
+
+ if let Some(error_count) = spec.error_count {
+ formatter.begin_errors(writer, error_count)?;
+ formatter.end_errors(writer)?;
+ }
+
+ if let Some(notes) = &spec.notes {
+ formatter.notes(writer, notes)?;
+ }
+
+ if let Some(examples) = &spec.examples {
+ formatter.examples(writer, examples)?;
+ }
+
+ formatter.end_api_details(writer)?;
+
+ Ok(())
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/source_parser.rs b/tools/kapi/src/extractor/source_parser.rs
new file mode 100644
index 0000000000000..8de35f5a73916
--- /dev/null
+++ b/tools/kapi/src/extractor/source_parser.rs
@@ -0,0 +1,488 @@
+use anyhow::{Context, Result};
+use regex::Regex;
+use std::fs;
+use std::path::Path;
+use std::collections::HashMap;
+use walkdir::WalkDir;
+use std::io::Write;
+use crate::formatter::OutputFormatter;
+use super::{ApiExtractor, ApiSpec, display_api_spec};
+
+#[derive(Debug, Clone)]
+pub struct SourceApiSpec {
+ pub name: String,
+ pub api_type: ApiType,
+ pub parsed_fields: HashMap<String, String>,
+}
+
+#[derive(Debug, Clone, PartialEq)]
+pub enum ApiType {
+ Syscall,
+ Ioctl,
+ Function,
+ Unknown,
+}
+
+impl ApiType {
+ fn from_name(name: &str) -> Self {
+ if name.starts_with("sys_") {
+ ApiType::Syscall
+ } else if name.contains("ioctl") || name.contains("IOCTL") {
+ ApiType::Ioctl
+ } else if name.starts_with("do_") || name.starts_with("__") {
+ ApiType::Function
+ } else {
+ ApiType::Unknown
+ }
+ }
+}
+
+pub struct SourceParser {
+ // Regex patterns for matching KAPI specifications
+ spec_start_pattern: Regex,
+ spec_end_pattern: Regex,
+ ioctl_spec_pattern: Regex,
+}
+
+impl SourceParser {
+ pub fn new() -> Result<Self> {
+ Ok(SourceParser {
+ // Match DEFINE_KERNEL_API_SPEC(function_name)
+ spec_start_pattern: Regex::new(r"DEFINE_KERNEL_API_SPEC\s*\(\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*\)")?,
+ // Match KAPI_END_SPEC
+ spec_end_pattern: Regex::new(r"KAPI_END_SPEC")?,
+ // Match IOCTL specifications
+ ioctl_spec_pattern: Regex::new(r#"DEFINE_IOCTL_API_SPEC\s*\(\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*,\s*([^,]+)\s*,\s*"([^"]+)"\s*\)"#)?,
+ })
+ }
+
+ /// Parse a single source file for KAPI specifications
+ pub fn parse_file(&self, path: &Path) -> Result<Vec<SourceApiSpec>> {
+ let content = fs::read_to_string(path)
+ .with_context(|| format!("Failed to read file: {}", path.display()))?;
+
+ self.parse_content(&content, path)
+ }
+
+ /// Parse file content for KAPI specifications
+ pub fn parse_content(&self, content: &str, _file_path: &Path) -> Result<Vec<SourceApiSpec>> {
+ let mut specs = Vec::new();
+ let lines: Vec<&str> = content.lines().collect();
+
+ // First, look for standard KAPI specs
+ for (i, line) in lines.iter().enumerate() {
+ if let Some(captures) = self.spec_start_pattern.captures(line) {
+ let api_name = captures.get(1).unwrap().as_str().to_string();
+
+ // Find the end of this specification
+ if let Some(spec_content) = self.extract_spec_block(&lines, i) {
+ let mut spec = SourceApiSpec {
+ name: api_name.clone(),
+ api_type: ApiType::from_name(&api_name),
+ parsed_fields: HashMap::new(),
+ };
+
+ // Parse the fields
+ self.parse_spec_fields(&spec_content, &mut spec.parsed_fields)?;
+
+ specs.push(spec);
+ }
+ }
+
+ // Also look for IOCTL specs
+ if let Some(captures) = self.ioctl_spec_pattern.captures(line) {
+ let spec_name = captures.get(1).unwrap().as_str().to_string();
+ let cmd = captures.get(2).unwrap().as_str().to_string();
+ let cmd_name = captures.get(3).unwrap().as_str().to_string();
+
+ // Find the end of this IOCTL specification
+ if let Some(spec_content) = self.extract_ioctl_spec_block(&lines, i) {
+ let mut spec = SourceApiSpec {
+ name: spec_name,
+ api_type: ApiType::Ioctl,
+ parsed_fields: HashMap::new(),
+ };
+
+ // Add IOCTL-specific fields
+ spec.parsed_fields.insert("cmd".to_string(), cmd);
+ spec.parsed_fields.insert("cmd_name".to_string(), cmd_name);
+
+ // Parse other fields
+ self.parse_spec_fields(&spec_content, &mut spec.parsed_fields)?;
+
+ specs.push(spec);
+ }
+ }
+ }
+
+ Ok(specs)
+ }
+
+ /// Extract a complete KAPI specification block from the source
+ fn extract_spec_block(&self, lines: &[&str], start_idx: usize) -> Option<String> {
+ let mut spec_lines = Vec::new();
+ let mut brace_count = 0;
+ let mut in_spec = false;
+
+ for (_i, line) in lines.iter().enumerate().skip(start_idx) {
+ spec_lines.push(line.to_string());
+
+ // Count braces to handle nested structures
+ for ch in line.chars() {
+ match ch {
+ '{' => {
+ brace_count += 1;
+ in_spec = true;
+ }
+ '}' => {
+ brace_count -= 1;
+ }
+ _ => {}
+ }
+ }
+
+ // Check for end of spec
+ if self.spec_end_pattern.is_match(line) {
+ return Some(spec_lines.join("\n"));
+ }
+
+ // Alternative end: closing brace with semicolon
+ if in_spec && brace_count == 0 && line.contains("};") {
+ return Some(spec_lines.join("\n"));
+ }
+ }
+
+ None
+ }
+
+ /// Extract a complete IOCTL specification block
+ fn extract_ioctl_spec_block(&self, lines: &[&str], start_idx: usize) -> Option<String> {
+ let mut spec_lines = Vec::new();
+ let mut brace_count = 0;
+
+ for (i, line) in lines.iter().enumerate().skip(start_idx) {
+ spec_lines.push(line.to_string());
+
+ // Count braces
+ for ch in line.chars() {
+ match ch {
+ '{' => brace_count += 1,
+ '}' => brace_count -= 1,
+ _ => {}
+ }
+ }
+
+ // Check for end patterns
+ if line.contains("KAPI_END_IOCTL_SPEC") || line.contains("KAPI_IOCTL_END_SPEC") {
+ return Some(spec_lines.join("\n"));
+ }
+
+ // Alternative end: closing brace with semicolon at top level
+ if brace_count == 0 && line.contains("};") && i > start_idx {
+ return Some(spec_lines.join("\n"));
+ }
+ }
+
+ None
+ }
+
+ /// Parse individual KAPI fields from the specification
+ fn parse_spec_fields(&self, content: &str, fields: &mut HashMap<String, String>) -> Result<()> {
+ // Parse KAPI_DESCRIPTION
+ if let Some(captures) = Regex::new(r#"KAPI_DESCRIPTION\s*\(\s*"([^"]*)"\s*\)"#)?.captures(content) {
+ fields.insert("description".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_LONG_DESC (handle multi-line)
+ if let Some(captures) = Regex::new(r#"KAPI_LONG_DESC\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?.captures(content) {
+ let long_desc = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t\"", " ");
+ fields.insert("long_description".to_string(), long_desc);
+ }
+
+ // Parse KAPI_CONTEXT
+ if let Some(captures) = Regex::new(r"KAPI_CONTEXT\s*\(([^)]+)\)")?.captures(content) {
+ fields.insert("context".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_NOTES (handle multi-line)
+ if let Some(captures) = Regex::new(r#"KAPI_NOTES\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?.captures(content) {
+ let notes = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t\"", " ")
+ .trim()
+ .to_string();
+ fields.insert("notes".to_string(), notes);
+ }
+
+ // Parse KAPI_EXAMPLES (handle multi-line)
+ if let Some(captures) = Regex::new(r#"KAPI_EXAMPLES\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?.captures(content) {
+ let examples = captures.get(1).unwrap().as_str()
+ .replace("\\n\"\n\t\t \"", "\n")
+ .replace("\\n\"\n\t\t \"", "\n")
+ .replace("\\n\"\n\t\t \"", "\n")
+ .replace("\\n\"\n\t\t \"", "\n")
+ .replace("\\n\"\n\t\t\"", "\n")
+ .replace("\\n", "\n")
+ .trim()
+ .to_string();
+ fields.insert("examples".to_string(), examples);
+ }
+
+ // Parse KAPI_SINCE_VERSION
+ if let Some(captures) = Regex::new(r#"KAPI_SINCE_VERSION\s*\(\s*"([^"]*)"\s*\)"#)?.captures(content) {
+ fields.insert("since_version".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse parameter count
+ let param_regex = Regex::new(r"KAPI_PARAM\s*\(\s*(\d+)\s*,")?;
+ let mut max_param_idx = 0;
+ for captures in param_regex.captures_iter(content) {
+ if let Ok(idx) = captures.get(1).unwrap().as_str().parse::<usize>() {
+ max_param_idx = max_param_idx.max(idx + 1);
+ }
+ }
+ if max_param_idx > 0 {
+ fields.insert("param_count".to_string(), max_param_idx.to_string());
+ }
+
+ // Parse error count
+ let error_regex = Regex::new(r"KAPI_ERROR\s*\(\s*(\d+)\s*,")?;
+ let mut max_error_idx = 0;
+ for captures in error_regex.captures_iter(content) {
+ if let Ok(idx) = captures.get(1).unwrap().as_str().parse::<usize>() {
+ max_error_idx = max_error_idx.max(idx + 1);
+ }
+ }
+ if max_error_idx > 0 {
+ fields.insert("error_count".to_string(), max_error_idx.to_string());
+ }
+
+ // Parse other counts
+ if content.contains(".error_count =") {
+ if let Some(captures) = Regex::new(r"\.error_count\s*=\s*(\d+)")?.captures(content) {
+ fields.insert("error_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ if content.contains(".param_count =") {
+ if let Some(captures) = Regex::new(r"\.param_count\s*=\s*(\d+)")?.captures(content) {
+ fields.insert("param_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ // Parse .since_version
+ if let Some(captures) = Regex::new(r#"\.since_version\s*=\s*"([^"]*)""#)?.captures(content) {
+ fields.insert("since_version".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse .notes (handle multi-line)
+ if let Some(captures) = Regex::new(r#"\.notes\s*=\s*"([^"]*(?:\s*"[^"]*)*?)""#)?.captures(content) {
+ let notes = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t\"", " ")
+ .replace("\"\n\t \"", " ") // Handle single tab + space
+ .trim()
+ .to_string();
+ fields.insert("notes".to_string(), notes);
+ }
+
+ // Parse .examples (handle multi-line)
+ if let Some(captures) = Regex::new(r#"\.examples\s*=\s*"([^"]*(?:\s*"[^"]*)*?)""#)?.captures(content) {
+ let examples = captures.get(1).unwrap().as_str()
+ .replace("\\n\"\n\t\t \"", "\n")
+ .replace("\\n", "\n");
+ fields.insert("examples".to_string(), examples);
+ }
+
+ Ok(())
+ }
+
+ /// Scan a directory tree for files containing KAPI specifications
+ pub fn scan_directory(&self, dir: &Path, extensions: &[&str]) -> Result<Vec<SourceApiSpec>> {
+ let mut all_specs = Vec::new();
+
+ for entry in WalkDir::new(dir)
+ .follow_links(true)
+ .into_iter()
+ .filter_map(|e| e.ok())
+ {
+ let path = entry.path();
+
+ // Skip non-files
+ if !path.is_file() {
+ continue;
+ }
+
+ // Check file extension
+ if let Some(ext) = path.extension() {
+ if extensions.iter().any(|&e| ext == e) {
+ // Try to parse the file
+ match self.parse_file(path) {
+ Ok(specs) => {
+ if !specs.is_empty() {
+ all_specs.extend(specs);
+ }
+ }
+ Err(e) => {
+ eprintln!("Warning: Failed to parse {}: {}", path.display(), e);
+ }
+ }
+ }
+ }
+ }
+
+ Ok(all_specs)
+ }
+
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+ use std::io::Write;
+ use tempfile::NamedTempFile;
+
+ #[test]
+ fn test_parse_syscall_spec() {
+ let parser = SourceParser::new().unwrap();
+
+ let content = r#"
+DEFINE_KERNEL_API_SPEC(sys_mlock)
+ KAPI_DESCRIPTION("Lock pages in memory")
+ KAPI_LONG_DESC("Locks pages in the specified address range into RAM")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "len", "size_t", "Length of range")
+ KAPI_PARAM_END
+
+ .param_count = 2,
+ .error_count = 3,
+
+KAPI_END_SPEC
+"#;
+
+ let mut temp_file = NamedTempFile::new().unwrap();
+ write!(temp_file, "{}", content).unwrap();
+
+ let specs = parser.parse_content(content, temp_file.path()).unwrap();
+
+ assert_eq!(specs.len(), 1);
+ assert_eq!(specs[0].name, "sys_mlock");
+ assert_eq!(specs[0].api_type, ApiType::Syscall);
+ assert_eq!(specs[0].parsed_fields.get("description").unwrap(), "Lock pages in memory");
+ assert_eq!(specs[0].parsed_fields.get("param_count").unwrap(), "2");
+ }
+
+ #[test]
+ fn test_parse_ioctl_spec() {
+ let parser = SourceParser::new().unwrap();
+
+ let content = r#"
+DEFINE_IOCTL_API_SPEC(binder_write_read, BINDER_WRITE_READ, "BINDER_WRITE_READ")
+ KAPI_DESCRIPTION("Perform read/write operations on binder")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "write_size", "binder_size_t", "Bytes to write")
+ KAPI_PARAM_END
+
+KAPI_END_IOCTL_SPEC
+"#;
+
+ let mut temp_file = NamedTempFile::new().unwrap();
+ write!(temp_file, "{}", content).unwrap();
+
+ let specs = parser.parse_content(content, temp_file.path()).unwrap();
+
+ assert_eq!(specs.len(), 1);
+ assert_eq!(specs[0].name, "binder_write_read");
+ assert_eq!(specs[0].api_type, ApiType::Ioctl);
+ assert_eq!(specs[0].parsed_fields.get("cmd_name").unwrap(), "BINDER_WRITE_READ");
+ }
+}
+
+// SourceExtractor implementation
+pub struct SourceExtractor {
+ specs: Vec<SourceApiSpec>,
+}
+
+impl SourceExtractor {
+ pub fn new(path: String) -> Result<Self> {
+ let parser = SourceParser::new()?;
+ let path_obj = Path::new(&path);
+
+ let specs = if path_obj.is_file() {
+ parser.parse_file(path_obj)?
+ } else if path_obj.is_dir() {
+ parser.scan_directory(path_obj, &["c", "h"])?
+ } else {
+ anyhow::bail!("Path does not exist: {}", path_obj.display())
+ };
+
+ Ok(SourceExtractor { specs })
+ }
+
+ fn convert_to_api_spec(&self, source_spec: &SourceApiSpec) -> ApiSpec {
+ ApiSpec {
+ name: source_spec.name.clone(),
+ api_type: match source_spec.api_type {
+ ApiType::Syscall => "syscall".to_string(),
+ ApiType::Ioctl => "ioctl".to_string(),
+ ApiType::Function => "function".to_string(),
+ ApiType::Unknown => "unknown".to_string(),
+ },
+ description: source_spec.parsed_fields.get("description").cloned(),
+ long_description: source_spec.parsed_fields.get("long_description").cloned(),
+ version: source_spec.parsed_fields.get("version").cloned(),
+ context_flags: source_spec.parsed_fields.get("context")
+ .map(|c| vec![c.clone()])
+ .unwrap_or_default(),
+ param_count: source_spec.parsed_fields.get("param_count")
+ .and_then(|s| s.parse::<u32>().ok()),
+ error_count: source_spec.parsed_fields.get("error_count")
+ .and_then(|s| s.parse::<u32>().ok()),
+ examples: source_spec.parsed_fields.get("examples").cloned(),
+ notes: source_spec.parsed_fields.get("notes").cloned(),
+ since_version: source_spec.parsed_fields.get("since_version").cloned(),
+ }
+ }
+}
+
+impl ApiExtractor for SourceExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ Ok(self.specs.iter()
+ .map(|s| self.convert_to_api_spec(s))
+ .collect())
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ Ok(self.specs.iter()
+ .find(|s| s.name == name)
+ .map(|s| self.convert_to_api_spec(s)))
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+ let api_spec = self.convert_to_api_spec(spec);
+ display_api_spec(&api_spec, formatter, writer)?;
+ }
+ Ok(())
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/vmlinux/binary_utils.rs b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
new file mode 100644
index 0000000000000..02c8e3b8eda77
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
@@ -0,0 +1,130 @@
+use anyhow::Result;
+use std::io::Write;
+use crate::formatter::OutputFormatter;
+
+// Constants for all structure field sizes
+pub mod sizes {
+ pub const NAME: usize = 128;
+ pub const DESC: usize = 512;
+ pub const MAX_PARAMS: usize = 16;
+ pub const MAX_ERRORS: usize = 32;
+ pub const MAX_CONSTRAINTS: usize = 16;
+}
+
+// Helper for reading data at specific offsets
+pub struct DataReader<'a> {
+ data: &'a [u8],
+ pos: usize,
+}
+
+impl<'a> DataReader<'a> {
+ pub fn new(data: &'a [u8], offset: usize) -> Self {
+ Self { data, pos: offset }
+ }
+
+ pub fn read_bytes(&mut self, len: usize) -> Option<&'a [u8]> {
+ if self.pos + len <= self.data.len() {
+ let bytes = &self.data[self.pos..self.pos + len];
+ self.pos += len;
+ Some(bytes)
+ } else {
+ None
+ }
+ }
+
+ pub fn read_cstring(&mut self, max_len: usize) -> Option<String> {
+ let bytes = self.read_bytes(max_len)?;
+ if let Some(null_pos) = bytes.iter().position(|&b| b == 0) {
+ if null_pos > 0 {
+ if let Ok(s) = std::str::from_utf8(&bytes[..null_pos]) {
+ return Some(s.to_string());
+ }
+ }
+ }
+ None
+ }
+
+ pub fn read_u32(&mut self) -> Option<u32> {
+ let bytes = self.read_bytes(4)?;
+ Some(u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]))
+ }
+
+ pub fn skip(&mut self, len: usize) {
+ self.pos = (self.pos + len).min(self.data.len());
+ }
+}
+
+#[allow(dead_code)]
+pub fn parse_context_flags(flags: u32, formatter: &mut dyn OutputFormatter, w: &mut dyn Write) -> Result<()> {
+ // Context flags from kernel headers
+ const KAPI_CTX_PROCESS: u32 = 1 << 0;
+ const KAPI_CTX_SOFTIRQ: u32 = 1 << 1;
+ const KAPI_CTX_HARDIRQ: u32 = 1 << 2;
+ const KAPI_CTX_NMI: u32 = 1 << 3;
+ const KAPI_CTX_USER: u32 = 1 << 4;
+ const KAPI_CTX_KERNEL: u32 = 1 << 5;
+ const KAPI_CTX_SLEEPABLE: u32 = 1 << 6;
+ const KAPI_CTX_ATOMIC: u32 = 1 << 7;
+ const KAPI_CTX_PREEMPTIBLE: u32 = 1 << 8;
+ const KAPI_CTX_MIGRATION_DISABLED: u32 = 1 << 9;
+
+ if flags & KAPI_CTX_PROCESS != 0 { formatter.context_flag(w, "Process context")?; }
+ if flags & KAPI_CTX_SOFTIRQ != 0 { formatter.context_flag(w, "Softirq context")?; }
+ if flags & KAPI_CTX_HARDIRQ != 0 { formatter.context_flag(w, "Hardirq context")?; }
+ if flags & KAPI_CTX_NMI != 0 { formatter.context_flag(w, "NMI context")?; }
+ if flags & KAPI_CTX_USER != 0 { formatter.context_flag(w, "User mode")?; }
+ if flags & KAPI_CTX_KERNEL != 0 { formatter.context_flag(w, "Kernel mode")?; }
+ if flags & KAPI_CTX_SLEEPABLE != 0 { formatter.context_flag(w, "May sleep")?; }
+ if flags & KAPI_CTX_ATOMIC != 0 { formatter.context_flag(w, "Atomic context")?; }
+ if flags & KAPI_CTX_PREEMPTIBLE != 0 { formatter.context_flag(w, "Preemptible")?; }
+ if flags & KAPI_CTX_MIGRATION_DISABLED != 0 { formatter.context_flag(w, "Migration disabled")?; }
+
+ Ok(())
+}
+
+// Structure layout definitions for calculating sizes
+pub fn param_spec_layout_size() -> usize {
+ // Packed structure
+ sizes::NAME * 2 + // name, type_name
+ 4 + 4 + // type, flags
+ 8 + 8 + // size, alignment
+ 8 + 8 + // min_value, max_value
+ 8 + // valid_mask
+ 8 + // enum_values pointer
+ 4 + 4 + // enum_count, constraint_type
+ 8 + // validate pointer
+ sizes::DESC * 2 + // description, constraints
+ 4 + 8 // size_param_idx, size_multiplier
+}
+
+pub fn return_spec_layout_size() -> usize {
+ // Packed structure
+ sizes::NAME + // type_name
+ 4 + 4 + // type, check_type
+ 8 + 8 + 8 + // success_value, success_min, success_max
+ 8 + // error_values pointer
+ 4 + // error_count
+ 8 + // is_success pointer
+ sizes::DESC // description
+}
+
+pub fn error_spec_layout_size() -> usize {
+ // Packed structure
+ 4 + // code
+ sizes::NAME + // name
+ sizes::DESC * 2 // condition, description
+}
+
+pub fn lock_spec_layout_size() -> usize {
+ // Packed structure
+ sizes::NAME + // name
+ 4 + // lock_type
+ 1 + 1 + 1 + 1 + // bools
+ sizes::DESC // description
+}
+
+pub fn constraint_spec_layout_size() -> usize {
+ // Packed structure
+ sizes::NAME + // name
+ sizes::DESC * 2 // description, expression
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/vmlinux/mod.rs b/tools/kapi/src/extractor/vmlinux/mod.rs
new file mode 100644
index 0000000000000..5d5ca413d77a2
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/mod.rs
@@ -0,0 +1,372 @@
+use anyhow::{Context, Result};
+use goblin::elf::Elf;
+use std::fs;
+use std::io::Write;
+use crate::formatter::OutputFormatter;
+use super::{ApiExtractor, ApiSpec};
+
+mod binary_utils;
+use binary_utils::{sizes, DataReader,
+ param_spec_layout_size, return_spec_layout_size, error_spec_layout_size,
+ lock_spec_layout_size, constraint_spec_layout_size};
+
+pub struct VmlinuxExtractor {
+ kapi_data: Vec<u8>,
+ specs: Vec<KapiSpec>,
+}
+
+#[derive(Debug)]
+struct KapiSpec {
+ name: String,
+ api_type: String,
+ offset: usize,
+}
+
+impl VmlinuxExtractor {
+ pub fn new(vmlinux_path: String) -> Result<Self> {
+ let vmlinux_data = fs::read(&vmlinux_path)
+ .with_context(|| format!("Failed to read vmlinux file: {}", vmlinux_path))?;
+
+ let elf = Elf::parse(&vmlinux_data)
+ .context("Failed to parse ELF file")?;
+
+ // Find the .kapi_specs section
+ let kapi_section = elf.section_headers
+ .iter()
+ .find(|sh| {
+ if let Some(name) = elf.shdr_strtab.get_at(sh.sh_name) {
+ name == ".kapi_specs"
+ } else {
+ false
+ }
+ })
+ .context("Could not find .kapi_specs section in vmlinux")?;
+
+ // Find __start_kapi_specs and __stop_kapi_specs symbols
+ let mut start_addr = None;
+ let mut stop_addr = None;
+
+ for sym in &elf.syms {
+ if let Some(name) = elf.strtab.get_at(sym.st_name) {
+ match name {
+ "__start_kapi_specs" => start_addr = Some(sym.st_value),
+ "__stop_kapi_specs" => stop_addr = Some(sym.st_value),
+ _ => {}
+ }
+ }
+ }
+
+ let start = start_addr.context("Could not find __start_kapi_specs symbol")?;
+ let stop = stop_addr.context("Could not find __stop_kapi_specs symbol")?;
+
+ if stop <= start {
+ anyhow::bail!("No kernel API specifications found in vmlinux");
+ }
+
+ // Calculate the offset within the file
+ let section_vaddr = kapi_section.sh_addr;
+ let file_offset = kapi_section.sh_offset + (start - section_vaddr);
+ let data_size = (stop - start) as usize;
+
+ if file_offset as usize + data_size > vmlinux_data.len() {
+ anyhow::bail!("Invalid offset/size for .kapi_specs data");
+ }
+
+ // Extract the raw data
+ let kapi_data = vmlinux_data[file_offset as usize..(file_offset as usize + data_size)].to_vec();
+
+ // Parse the specifications
+ let specs = parse_kapi_specs(&kapi_data)?;
+
+ Ok(VmlinuxExtractor {
+ kapi_data,
+ specs,
+ })
+ }
+
+}
+
+impl ApiExtractor for VmlinuxExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ // For vmlinux extractor, we return basic info only
+ // Detailed parsing happens in display_api_details
+ Ok(self.specs.iter().map(|spec| {
+ ApiSpec {
+ name: spec.name.clone(),
+ api_type: spec.api_type.clone(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: vec![],
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ }
+ }).collect())
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ Ok(self.specs.iter()
+ .find(|s| s.name == name)
+ .map(|spec| ApiSpec {
+ name: spec.name.clone(),
+ api_type: spec.api_type.clone(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: vec![],
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ }))
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+ // Parse the binary data into an ApiSpec
+ let api_spec = parse_binary_to_api_spec(&self.kapi_data, spec.offset)?;
+ // Use the common display function
+ super::display_api_spec(&api_spec, formatter, writer)?;
+ }
+ Ok(())
+ }
+}
+
+fn parse_kapi_specs(data: &[u8]) -> Result<Vec<KapiSpec>> {
+ let mut specs = Vec::new();
+
+ // The kernel_api_spec struct size in the kernel is 308064 bytes
+ // This is calculated as sizeof(struct kernel_api_spec) which includes:
+ // - Basic fields (name, version, description, etc.)
+ // - Arrays for parameters, errors, locks, constraints
+ // - Additional metadata fields
+ // TODO: This should ideally be read from kernel headers or made configurable
+ let struct_size = 308064;
+
+ let mut offset = 0;
+ while offset + struct_size <= data.len() {
+ // Try to read the name at this offset
+ if let Some(name) = read_cstring(data, offset, 128) {
+ if is_valid_api_name(&name) {
+ let api_type = if name.starts_with("sys_") {
+ "syscall"
+ } else if name.contains("ioctl") || name.contains("IOCTL") {
+ "ioctl"
+ } else {
+ "other"
+ };
+
+ specs.push(KapiSpec {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ offset,
+ });
+ }
+ }
+
+ offset += struct_size;
+ }
+
+ // Handle any remaining data that might be a partial spec
+ if offset < data.len() && data.len() - offset >= 128 {
+ if let Some(name) = read_cstring(data, offset, 128) {
+ if is_valid_api_name(&name) {
+ let api_type = if name.starts_with("sys_") {
+ "syscall"
+ } else if name.contains("ioctl") || name.contains("IOCTL") {
+ "ioctl"
+ } else {
+ "other"
+ };
+
+ specs.push(KapiSpec {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ offset,
+ });
+ }
+ }
+ }
+
+ Ok(specs)
+}
+
+fn read_cstring(data: &[u8], offset: usize, max_len: usize) -> Option<String> {
+ if offset + max_len > data.len() {
+ return None;
+ }
+
+ let bytes = &data[offset..offset + max_len];
+ if let Some(null_pos) = bytes.iter().position(|&b| b == 0) {
+ if null_pos > 0 {
+ if let Ok(s) = std::str::from_utf8(&bytes[..null_pos]) {
+ return Some(s.to_string());
+ }
+ }
+ }
+ None
+}
+
+fn is_valid_api_name(name: &str) -> bool {
+ if name.is_empty() || name.len() > 100 {
+ return false;
+ }
+
+ name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_')
+ && (name.starts_with("sys_")
+ || name.contains("ioctl")
+ || name.contains("IOCTL")
+ || name.starts_with("do_")
+ || name.starts_with("__"))
+}
+
+fn parse_binary_to_api_spec(data: &[u8], offset: usize) -> Result<ApiSpec> {
+ let mut reader = DataReader::new(data, offset);
+
+ // Read name
+ let name = reader.read_cstring(sizes::NAME)
+ .ok_or_else(|| anyhow::anyhow!("Failed to read API name"))?;
+
+ // Read version
+ let version = reader.read_u32()
+ .map(|v| v.to_string());
+
+ // Read description
+ let description = reader.read_cstring(sizes::DESC)
+ .filter(|s| !s.is_empty());
+
+ // Read long description
+ let long_description = reader.read_cstring(sizes::DESC * 4)
+ .filter(|s| !s.is_empty());
+
+ // Read context flags
+ let context_flags = if let Some(flags) = reader.read_u32() {
+ let mut flag_strings = Vec::new();
+
+ const KAPI_CTX_PROCESS: u32 = 1 << 0;
+ const KAPI_CTX_SOFTIRQ: u32 = 1 << 1;
+ const KAPI_CTX_HARDIRQ: u32 = 1 << 2;
+ const KAPI_CTX_NMI: u32 = 1 << 3;
+ const KAPI_CTX_USER: u32 = 1 << 4;
+ const KAPI_CTX_KERNEL: u32 = 1 << 5;
+ const KAPI_CTX_SLEEPABLE: u32 = 1 << 6;
+ const KAPI_CTX_ATOMIC: u32 = 1 << 7;
+ const KAPI_CTX_PREEMPTIBLE: u32 = 1 << 8;
+ const KAPI_CTX_MIGRATION_DISABLED: u32 = 1 << 9;
+
+ // Build the flag string similar to source format
+ let mut parts = Vec::new();
+ if flags & KAPI_CTX_PROCESS != 0 { parts.push("KAPI_CTX_PROCESS"); }
+ if flags & KAPI_CTX_SOFTIRQ != 0 { parts.push("KAPI_CTX_SOFTIRQ"); }
+ if flags & KAPI_CTX_HARDIRQ != 0 { parts.push("KAPI_CTX_HARDIRQ"); }
+ if flags & KAPI_CTX_NMI != 0 { parts.push("KAPI_CTX_NMI"); }
+ if flags & KAPI_CTX_USER != 0 { parts.push("KAPI_CTX_USER"); }
+ if flags & KAPI_CTX_KERNEL != 0 { parts.push("KAPI_CTX_KERNEL"); }
+ if flags & KAPI_CTX_SLEEPABLE != 0 { parts.push("KAPI_CTX_SLEEPABLE"); }
+ if flags & KAPI_CTX_ATOMIC != 0 { parts.push("KAPI_CTX_ATOMIC"); }
+ if flags & KAPI_CTX_PREEMPTIBLE != 0 { parts.push("KAPI_CTX_PREEMPTIBLE"); }
+ if flags & KAPI_CTX_MIGRATION_DISABLED != 0 { parts.push("KAPI_CTX_MIGRATION_DISABLED"); }
+
+ if !parts.is_empty() {
+ flag_strings.push(parts.join(" | "));
+ }
+ flag_strings
+ } else {
+ vec![]
+ };
+
+ // Read parameter count
+ let param_count = reader.read_u32();
+
+ // Skip parameters for now (to match source output)
+ if let Some(count) = param_count {
+ if count > 0 && count <= sizes::MAX_PARAMS as u32 {
+ reader.skip(param_spec_layout_size() * count as usize);
+ reader.skip(param_spec_layout_size() * (sizes::MAX_PARAMS - count as usize));
+ } else {
+ reader.skip(param_spec_layout_size() * sizes::MAX_PARAMS);
+ }
+ }
+
+ // Skip return spec
+ reader.skip(return_spec_layout_size());
+
+ // Read error count
+ let error_count = reader.read_u32();
+
+ // Skip errors
+ if let Some(count) = error_count {
+ if count > 0 && count <= sizes::MAX_ERRORS as u32 {
+ reader.skip(error_spec_layout_size() * count as usize);
+ reader.skip(error_spec_layout_size() * (sizes::MAX_ERRORS - count as usize));
+ } else {
+ reader.skip(error_spec_layout_size() * sizes::MAX_ERRORS);
+ }
+ }
+
+ // Skip locks
+ if let Some(lock_count) = reader.read_u32() {
+ if lock_count > 0 && lock_count <= sizes::MAX_CONSTRAINTS as u32 {
+ reader.skip(lock_spec_layout_size() * lock_count as usize);
+ reader.skip(lock_spec_layout_size() * (sizes::MAX_CONSTRAINTS - lock_count as usize));
+ } else {
+ reader.skip(lock_spec_layout_size() * sizes::MAX_CONSTRAINTS);
+ }
+ }
+
+ // Skip constraints
+ if let Some(constraint_count) = reader.read_u32() {
+ if constraint_count > 0 && constraint_count <= sizes::MAX_CONSTRAINTS as u32 {
+ reader.skip(constraint_spec_layout_size() * constraint_count as usize);
+ reader.skip(constraint_spec_layout_size() * (sizes::MAX_CONSTRAINTS - constraint_count as usize));
+ } else {
+ reader.skip(constraint_spec_layout_size() * sizes::MAX_CONSTRAINTS);
+ }
+ }
+
+ // Read examples
+ let examples = reader.read_cstring(sizes::DESC * 2)
+ .filter(|s| !s.is_empty());
+
+ // Read notes
+ let notes = reader.read_cstring(sizes::DESC)
+ .filter(|s| !s.is_empty());
+
+ // Read since_version
+ let since_version = reader.read_cstring(32)
+ .filter(|s| !s.is_empty());
+
+ // Determine API type from name
+ let api_type = if name.starts_with("sys_") {
+ "syscall"
+ } else if name.contains("ioctl") || name.contains("IOCTL") {
+ "ioctl"
+ } else {
+ "other"
+ }.to_string();
+
+ Ok(ApiSpec {
+ name,
+ api_type,
+ description,
+ long_description,
+ version,
+ context_flags,
+ param_count,
+ error_count,
+ examples,
+ notes,
+ since_version,
+ })
+}
+
+// Old display_api_details_from_binary function removed - now using parse_binary_to_api_spec + display_api_spec
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/json.rs b/tools/kapi/src/formatter/json.rs
new file mode 100644
index 0000000000000..44d2bbfc91133
--- /dev/null
+++ b/tools/kapi/src/formatter/json.rs
@@ -0,0 +1,170 @@
+use super::OutputFormatter;
+use std::io::Write;
+use serde::Serialize;
+
+pub struct JsonFormatter {
+ data: JsonData,
+}
+
+#[derive(Serialize)]
+struct JsonData {
+ #[serde(skip_serializing_if = "Option::is_none")]
+ apis: Option<Vec<JsonApi>>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ api_details: Option<JsonApiDetails>,
+}
+
+#[derive(Serialize)]
+struct JsonApi {
+ name: String,
+ api_type: String,
+}
+
+#[derive(Serialize)]
+struct JsonApiDetails {
+ name: String,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ description: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ long_description: Option<String>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ context_flags: Vec<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ examples: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ notes: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ since_version: Option<String>,
+}
+
+
+impl JsonFormatter {
+ pub fn new() -> Self {
+ JsonFormatter {
+ data: JsonData {
+ apis: None,
+ api_details: None,
+ }
+ }
+ }
+}
+
+impl OutputFormatter for JsonFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ let json = serde_json::to_string_pretty(&self.data)?;
+ writeln!(w, "{}", json)?;
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, _w: &mut dyn Write, _title: &str) -> std::io::Result<()> {
+ self.data.apis = Some(Vec::new());
+ Ok(())
+ }
+
+ fn api_item(&mut self, _w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+ if let Some(apis) = &mut self.data.apis {
+ apis.push(JsonApi {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ });
+ }
+ Ok(())
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, _w: &mut dyn Write, _count: usize) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_details(&mut self, _w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.data.api_details = Some(JsonApiDetails {
+ name: name.to_string(),
+ description: None,
+ long_description: None,
+ context_flags: Vec::new(),
+ examples: None,
+ notes: None,
+ since_version: None,
+ });
+ Ok(())
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.description = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn long_description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.long_description = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn begin_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn context_flag(&mut self, _w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.context_flags.push(flag.to_string());
+ }
+ Ok(())
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, _w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.examples = Some(examples.to_string());
+ }
+ Ok(())
+ }
+
+ fn notes(&mut self, _w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.notes = Some(notes.to_string());
+ }
+ Ok(())
+ }
+
+ fn since_version(&mut self, _w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.since_version = Some(version.to_string());
+ }
+ Ok(())
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/mod.rs b/tools/kapi/src/formatter/mod.rs
new file mode 100644
index 0000000000000..6eb42e8b404d0
--- /dev/null
+++ b/tools/kapi/src/formatter/mod.rs
@@ -0,0 +1,68 @@
+use std::io::Write;
+
+mod plain;
+mod json;
+mod rst;
+
+pub use plain::PlainFormatter;
+pub use json::JsonFormatter;
+pub use rst::RstFormatter;
+
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum OutputFormat {
+ Plain,
+ Json,
+ Rst,
+}
+
+impl std::str::FromStr for OutputFormat {
+ type Err = String;
+
+ fn from_str(s: &str) -> Result<Self, Self::Err> {
+ match s.to_lowercase().as_str() {
+ "plain" => Ok(OutputFormat::Plain),
+ "json" => Ok(OutputFormat::Json),
+ "rst" => Ok(OutputFormat::Rst),
+ _ => Err(format!("Unknown output format: {}", s)),
+ }
+ }
+}
+
+pub trait OutputFormatter {
+ fn begin_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()>;
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()>;
+ fn end_api_list(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()>;
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()>;
+ fn end_api_details(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()>;
+ fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn end_parameters(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn end_errors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()>;
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()>;
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()>;
+}
+
+pub fn create_formatter(format: OutputFormat) -> Box<dyn OutputFormatter> {
+ match format {
+ OutputFormat::Plain => Box::new(PlainFormatter::new()),
+ OutputFormat::Json => Box::new(JsonFormatter::new()),
+ OutputFormat::Rst => Box::new(RstFormatter::new()),
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/plain.rs b/tools/kapi/src/formatter/plain.rs
new file mode 100644
index 0000000000000..4ccbfcbbc8416
--- /dev/null
+++ b/tools/kapi/src/formatter/plain.rs
@@ -0,0 +1,99 @@
+use super::OutputFormatter;
+use std::io::Write;
+
+pub struct PlainFormatter;
+
+impl PlainFormatter {
+ pub fn new() -> Self {
+ PlainFormatter
+ }
+}
+
+impl OutputFormatter for PlainFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ writeln!(w, "\n{}:", title)?;
+ writeln!(w, "{}", "-".repeat(title.len() + 1))
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str) -> std::io::Result<()> {
+ writeln!(w, " {}", name)
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\nTotal specifications found: {}", count)
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ writeln!(w, "\nDetailed information for {}:", name)?;
+ writeln!(w, "{}=", "=".repeat(25 + name.len()))
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "Description: {}", desc)
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "\nDetailed Description:")?;
+ writeln!(w, "{}", desc)
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nExecution Context:")
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ writeln!(w, " - {}", flag)
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nParameters ({}):", count)
+ }
+
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nPossible Errors ({}):", count)
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ writeln!(w, "\nExamples:")?;
+ writeln!(w, "{}", examples)
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ writeln!(w, "\nNotes:")?;
+ writeln!(w, "{}", notes)
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, "\nAvailable since: {}", version)
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/rst.rs b/tools/kapi/src/formatter/rst.rs
new file mode 100644
index 0000000000000..96be83bf208dd
--- /dev/null
+++ b/tools/kapi/src/formatter/rst.rs
@@ -0,0 +1,144 @@
+use super::OutputFormatter;
+use std::io::Write;
+
+pub struct RstFormatter {
+ current_section_level: usize,
+}
+
+impl RstFormatter {
+ pub fn new() -> Self {
+ RstFormatter {
+ current_section_level: 0,
+ }
+ }
+
+ fn section_char(&self, level: usize) -> char {
+ match level {
+ 0 => '=',
+ 1 => '-',
+ 2 => '~',
+ 3 => '^',
+ _ => '"',
+ }
+ }
+}
+
+impl OutputFormatter for RstFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ writeln!(w, "\n{}", title)?;
+ writeln!(w, "{}", self.section_char(0).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+ writeln!(w, "* **{}** (*{}*)", name, api_type)
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\n**Total specifications found:** {}", count)
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.current_section_level = 0;
+ writeln!(w, "\n{}", name)?;
+ writeln!(w, "{}", self.section_char(0).to_string().repeat(name.len()))?;
+ writeln!(w)
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "**{}**", desc)?;
+ writeln!(w)
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "{}", desc)?;
+ writeln!(w)
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Execution Context";
+ writeln!(w, "{}", title)?;
+ writeln!(w, "{}", self.section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ writeln!(w, "* {}", flag)
+ }
+
+ fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w)
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Parameters ({})", count);
+ writeln!(w, "{}", title)?;
+ writeln!(w, "{}", self.section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Possible Errors ({})", count);
+ writeln!(w, "{}", title)?;
+ writeln!(w, "{}", self.section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Examples";
+ writeln!(w, "{}", title)?;
+ writeln!(w, "{}", self.section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+ writeln!(w, ".. code-block:: c")?;
+ writeln!(w)?;
+ for line in examples.lines() {
+ writeln!(w, " {}", line)?;
+ }
+ writeln!(w)
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Notes";
+ writeln!(w, "{}", title)?;
+ writeln!(w, "{}", self.section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+ writeln!(w, "{}", notes)?;
+ writeln!(w)
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, ":Available since: {}", version)?;
+ writeln!(w)
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/main.rs b/tools/kapi/src/main.rs
new file mode 100644
index 0000000000000..9d6533cbc7dd1
--- /dev/null
+++ b/tools/kapi/src/main.rs
@@ -0,0 +1,121 @@
+//! kapi - Kernel API Specification Tool
+//!
+//! This tool extracts and displays kernel API specifications from multiple sources:
+//! - Kernel source code (KAPI macros)
+//! - Compiled vmlinux binaries (.kapi_specs ELF section)
+//! - Running kernel via debugfs
+
+use anyhow::Result;
+use clap::Parser;
+use std::io::{self, Write};
+
+mod formatter;
+mod extractor;
+
+use formatter::{OutputFormat, create_formatter};
+use extractor::{ApiExtractor, VmlinuxExtractor, SourceExtractor, DebugfsExtractor};
+
+#[derive(Parser, Debug)]
+#[command(author, version, about, long_about = None)]
+struct Args {
+ /// Path to the vmlinux file
+ #[arg(long, value_name = "PATH", group = "input")]
+ vmlinux: Option<String>,
+
+ /// Path to kernel source directory or file
+ #[arg(long, value_name = "PATH", group = "input")]
+ source: Option<String>,
+
+ /// Path to debugfs (defaults to /sys/kernel/debug if not specified)
+ #[arg(long, value_name = "PATH", group = "input")]
+ debugfs: Option<String>,
+
+ /// Optional: Name of specific API to show details for
+ api_name: Option<String>,
+
+ /// Output format
+ #[arg(long, short = 'f', default_value = "plain")]
+ format: String,
+}
+
+fn main() -> Result<()> {
+ let args = Args::parse();
+
+ let output_format: OutputFormat = args.format.parse()
+ .map_err(|e: String| anyhow::anyhow!(e))?;
+
+ let extractor: Box<dyn ApiExtractor> = match (args.vmlinux, args.source, args.debugfs.clone()) {
+ (Some(vmlinux_path), None, None) => {
+ Box::new(VmlinuxExtractor::new(vmlinux_path)?)
+ }
+ (None, Some(source_path), None) => {
+ Box::new(SourceExtractor::new(source_path)?)
+ }
+ (None, None, Some(_)) | (None, None, None) => {
+ // If debugfs is specified or no input is provided, use debugfs
+ Box::new(DebugfsExtractor::new(args.debugfs)?)
+ }
+ _ => {
+ anyhow::bail!("Please specify only one of --vmlinux, --source, or --debugfs")
+ }
+ };
+
+ display_apis(extractor.as_ref(), args.api_name, output_format)
+}
+
+fn display_apis(extractor: &dyn ApiExtractor, api_name: Option<String>, output_format: OutputFormat) -> Result<()> {
+ let mut formatter = create_formatter(output_format);
+ let mut stdout = io::stdout();
+
+ formatter.begin_document(&mut stdout)?;
+
+ if let Some(api_name_req) = api_name {
+ // Use the extractor to display API details
+ if let Some(_spec) = extractor.extract_by_name(&api_name_req)? {
+ extractor.display_api_details(&api_name_req, &mut *formatter, &mut stdout)?;
+ } else if output_format == OutputFormat::Plain {
+ writeln!(stdout, "\nAPI '{}' not found.", api_name_req)?;
+ writeln!(stdout, "\nAvailable APIs:")?;
+ for spec in extractor.extract_all()? {
+ writeln!(stdout, " {} ({})", spec.name, spec.api_type)?;
+ }
+ }
+ } else {
+ // Display list of APIs using the extractor
+ let all_specs = extractor.extract_all()?;
+ let syscalls: Vec<_> = all_specs.iter().filter(|s| s.api_type == "syscall").collect();
+ let ioctls: Vec<_> = all_specs.iter().filter(|s| s.api_type == "ioctl").collect();
+ let functions: Vec<_> = all_specs.iter().filter(|s| s.api_type == "function").collect();
+
+ if !syscalls.is_empty() {
+ formatter.begin_api_list(&mut stdout, "System Calls")?;
+ for spec in syscalls {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ if !ioctls.is_empty() {
+ formatter.begin_api_list(&mut stdout, "IOCTLs")?;
+ for spec in ioctls {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ if !functions.is_empty() {
+ formatter.begin_api_list(&mut stdout, "Functions")?;
+ for spec in functions {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ formatter.total_specs(&mut stdout, all_specs.len())?;
+ }
+
+ formatter.end_document(&mut stdout)?;
+
+ Ok(())
+}
+
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [RFC 08/19] exec: add API specification for execve
2025-06-14 13:48 ` [RFC 08/19] exec: add API specification for execve Sasha Levin
@ 2025-06-16 21:39 ` Florian Weimer
2025-06-17 1:51 ` Sasha Levin
0 siblings, 1 reply; 44+ messages in thread
From: Florian Weimer @ 2025-06-16 21:39 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, linux-api, workflows, tools
* Sasha Levin:
> + KAPI_RETURN("long", "Does not return on success; returns -1 on error")
> + .type = KAPI_TYPE_INT,
> + .check_type = KAPI_RETURN_ERROR_CHECK,
> + KAPI_RETURN_END
Is the -1 part correct?
Many later errors during execve are not recoverable and result in execve
succeeding (nominally) and a fatal signal being delivered to the process
instead. Not sure if the description covers that.
What about the effect of unblocking a parent thread that has vfork'ed?
Thanks,
Florian
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 08/19] exec: add API specification for execve
2025-06-16 21:39 ` Florian Weimer
@ 2025-06-17 1:51 ` Sasha Levin
2025-06-17 7:13 ` Florian Weimer
0 siblings, 1 reply; 44+ messages in thread
From: Sasha Levin @ 2025-06-17 1:51 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-kernel, linux-api, workflows, tools
On Mon, Jun 16, 2025 at 11:39:31PM +0200, Florian Weimer wrote:
>* Sasha Levin:
>
>> + KAPI_RETURN("long", "Does not return on success; returns -1 on error")
>> + .type = KAPI_TYPE_INT,
>> + .check_type = KAPI_RETURN_ERROR_CHECK,
>> + KAPI_RETURN_END
>
>Is the -1 part correct?
Maybe :) That's one of the things I wasn't sure about: we're documenting
the execve syscall rather than the function itself. A user calling
execve() will end up with -1 on failure, and errno set with the error
code.
You could argue that it's libc that sets errno and we're trying to spec
the kernel here, not the userspace interface to it.
At the end I managed to lawyer myself into a decision that I liked: I
figured that since klibc is really a kernel library that is merely
packaged seperately from the kernel, it is really a kernel interface,
and so I followed the libc convention.
Open for suggestions...
>Many later errors during execve are not recoverable and result in execve
>succeeding (nominally) and a fatal signal being delivered to the process
>instead. Not sure if the description covers that.
I was afraid of the "signals" rabit hole: from what I recall, you can
have fatal signals pending past the point of no return but before
execve() completes from both execve() failures as well as external
sources.
There's definitely room for a longer explanation of how all of this
works together.
I'd suggest that we tackle signal specs in the near future, and see how
we can tie those into the rest of the API specs. Right now I'm pretty
unhappy with the vague KAPI_SIGNAL().
>What about the effect of unblocking a parent thread that has vfork'ed?
In my mind it's vfork() that is waiting for the execve to complete (via
wait_for_vfork_done()) rather than execve() actively waking up the
vfork() parent.
We can list it as a side effect of execve()? I suppose that its similar
to something like read() in one process waking up a different process
from epoll_wait(), so we should probably be documenting those as well...
Thanks for the comments!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 08/19] exec: add API specification for execve
2025-06-17 1:51 ` Sasha Levin
@ 2025-06-17 7:13 ` Florian Weimer
2025-06-17 22:58 ` Sasha Levin
0 siblings, 1 reply; 44+ messages in thread
From: Florian Weimer @ 2025-06-17 7:13 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, linux-api, workflows, tools
* Sasha Levin:
> On Mon, Jun 16, 2025 at 11:39:31PM +0200, Florian Weimer wrote:
>>* Sasha Levin:
>>
>>> + KAPI_RETURN("long", "Does not return on success; returns -1 on error")
>>> + .type = KAPI_TYPE_INT,
>>> + .check_type = KAPI_RETURN_ERROR_CHECK,
>>> + KAPI_RETURN_END
>>
>>Is the -1 part correct?
>
> Maybe :) That's one of the things I wasn't sure about: we're documenting
> the execve syscall rather than the function itself. A user calling
> execve() will end up with -1 on failure, and errno set with the error
> code.
Well, it doesn't say execve, it says sys_execve.
> You could argue that it's libc that sets errno and we're trying to spec
> the kernel here, not the userspace interface to it.
And I think this would be appropriate.
Note that in the future, the glibc version of execve will not be a
straightforward system call wrapper because we need to obtain a
consistent snapshot of the environment array. That is actually pretty
hard because we cannot atomically replace the process image, unblock
signals, and unmap a copy of the environment.
So I think it's best for the kernel to stick with the system call
interface and not try to document what libcs are doing.
An even more thorny example are the setuid family of system calls, where
the kernel is extremely far away from what POSIX requires, and we have
to fix it in userspace.
Thanks,
Florian
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (18 preceding siblings ...)
2025-06-14 13:48 ` [RFC 19/19] tools/kapi: Add kernel API specification extraction tool Sasha Levin
@ 2025-06-17 12:08 ` David Laight
2025-06-18 21:29 ` Kees Cook
20 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2025-06-17 12:08 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, linux-api, workflows, tools
On Sat, 14 Jun 2025 09:48:39 -0400
Sasha Levin <sashal@kernel.org> wrote:
> This patch series introduces a framework for formally specifying kernel
> APIs, addressing the long-standing challenge of maintaining stable
> interfaces between the kernel and user-space programs. As outlined in
> previous discussions about kernel ABI stability, the lack of
> machine-readable API specifications has led to inadvertent breakages and
> inconsistent validation across system calls and IOCTLs.
Ugg, looks horrid.
Going to be worse than things like doxygen for getting out of step with
the actual code and grep searches are going to hit the comment blocks.
David
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 08/19] exec: add API specification for execve
2025-06-17 7:13 ` Florian Weimer
@ 2025-06-17 22:58 ` Sasha Levin
0 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-17 22:58 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-kernel, linux-api, workflows, tools
On Tue, Jun 17, 2025 at 09:13:44AM +0200, Florian Weimer wrote:
>* Sasha Levin:
>
>> On Mon, Jun 16, 2025 at 11:39:31PM +0200, Florian Weimer wrote:
>>>* Sasha Levin:
>>>
>>>> + KAPI_RETURN("long", "Does not return on success; returns -1 on error")
>>>> + .type = KAPI_TYPE_INT,
>>>> + .check_type = KAPI_RETURN_ERROR_CHECK,
>>>> + KAPI_RETURN_END
>>>
>>>Is the -1 part correct?
>>
>> Maybe :) That's one of the things I wasn't sure about: we're documenting
>> the execve syscall rather than the function itself. A user calling
>> execve() will end up with -1 on failure, and errno set with the error
>> code.
>
>Well, it doesn't say execve, it says sys_execve.
>
>> You could argue that it's libc that sets errno and we're trying to spec
>> the kernel here, not the userspace interface to it.
>
>And I think this would be appropriate.
>
>Note that in the future, the glibc version of execve will not be a
>straightforward system call wrapper because we need to obtain a
>consistent snapshot of the environment array. That is actually pretty
>hard because we cannot atomically replace the process image, unblock
>signals, and unmap a copy of the environment.
>
>So I think it's best for the kernel to stick with the system call
>interface and not try to document what libcs are doing.
I hear you - it sounds like the "right" solution technically.
Switching back to signals, how does something like the below look as far
as expanding the execve() spec:
+ /* SIGSEGV sent on point of no return failure */
+ KAPI_SIGNAL(9, SIGSEGV, "SIGSEGV", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_COREDUMP)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Exec fails after point of no return")
+ KAPI_SIGNAL_DESC("If exec fails after the point of no return (when the old "
+ "process image has been destroyed), force_fatal_sig(SIGSEGV) "
+ "is called to terminate the process since it cannot continue.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_EXIT)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_STATE_FORBID(KAPI_SIGNAL_STATE_ZOMBIE | KAPI_SIGNAL_STATE_DEAD)
+ KAPI_SIGNAL_END
+
+ /* Signal mask preserved */
+ KAPI_SIGNAL(10, 0, "SIGNAL_MASK", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Process has blocked signals")
+ KAPI_SIGNAL_DESC("The signal mask (blocked signals) is preserved across exec. "
+ "This allows processes to block signals before exec and have "
+ "them remain blocked in the new program.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_END
+
+ /* Realtime signal queues cleared */
+ KAPI_SIGNAL(11, 0, "REALTIME_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_DISCARD)
+ KAPI_SIGNAL_CONDITION("Realtime signals queued")
+ KAPI_SIGNAL_DESC("All queued realtime signals (SIGRTMIN to SIGRTMAX) are "
+ "discarded during exec. The realtime signal queue is cleared.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_QUEUE(KAPI_SIGNAL_QUEUE_REALTIME)
+ KAPI_SIGNAL_END
What's missing for me is that while we now go into more detail, we
should also check this during runtime, but I'm still trying to come up
with something that is not ugly.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
` (19 preceding siblings ...)
2025-06-17 12:08 ` [RFC 00/19] Kernel API Specification Framework David Laight
@ 2025-06-18 21:29 ` Kees Cook
2025-06-19 0:22 ` Sasha Levin
20 siblings, 1 reply; 44+ messages in thread
From: Kees Cook @ 2025-06-18 21:29 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, linux-api, workflows, tools
On Sat, Jun 14, 2025 at 09:48:39AM -0400, Sasha Levin wrote:
> This patch series introduces a framework for formally specifying kernel
> APIs, addressing the long-standing challenge of maintaining stable
> interfaces between the kernel and user-space programs. As outlined in
> previous discussions about kernel ABI stability, the lack of
> machine-readable API specifications has led to inadvertent breakages and
> inconsistent validation across system calls and IOCTLs.
I'd much prefer this be more attached to the code in question, otherwise
we've go two things to update when changes happen. (Well, 3, since
kern-doc already needs updating too.)
Can't we collect error codes programmatically through control flow
analysis? Argument mapping is already present in the SYSCALL macros,
etc. Let's not repeat this info.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-18 21:29 ` Kees Cook
@ 2025-06-19 0:22 ` Sasha Levin
2025-06-23 13:28 ` Dmitry Vyukov
0 siblings, 1 reply; 44+ messages in thread
From: Sasha Levin @ 2025-06-19 0:22 UTC (permalink / raw)
To: Kees Cook; +Cc: linux-kernel, linux-api, workflows, tools
On Wed, Jun 18, 2025 at 02:29:37PM -0700, Kees Cook wrote:
>On Sat, Jun 14, 2025 at 09:48:39AM -0400, Sasha Levin wrote:
>> This patch series introduces a framework for formally specifying kernel
>> APIs, addressing the long-standing challenge of maintaining stable
>> interfaces between the kernel and user-space programs. As outlined in
>> previous discussions about kernel ABI stability, the lack of
>> machine-readable API specifications has led to inadvertent breakages and
>> inconsistent validation across system calls and IOCTLs.
>
>I'd much prefer this be more attached to the code in question, otherwise
>we've go two things to update when changes happen. (Well, 3, since
>kern-doc already needs updating too.)
>
>Can't we collect error codes programmatically through control flow
>analysis? Argument mapping is already present in the SYSCALL macros,
I'm not sure what you meant with in the control flow analysis part: we
have code to verify that the return value from the macro matches one of
the ones defined in the spec.
>etc. Let's not repeat this info.
I tried to come up with a way to get rid of the SYSCALL_DEFINEx() macro
right after the spec. I agree that it's duplication, but my macro-foo is
too weak to get rid of that SYSCALL_DEFINE() call.
Suggestions more than welcome here: I suspect that this might require a
bigger change in the code, but I'm still trying to figure it out.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-19 0:22 ` Sasha Levin
@ 2025-06-23 13:28 ` Dmitry Vyukov
2025-06-24 14:06 ` Cyril Hrubis
2025-06-24 20:04 ` Sasha Levin
0 siblings, 2 replies; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-23 13:28 UTC (permalink / raw)
To: sashal; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
Nice!
A bag of assorted comments:
1. I share the same concern of duplicating info.
If there are lots of duplication it may lead to failure of the whole effort
since folks won't update these and/or they will get out of sync.
If a syscall arg is e.g. umode_t, we already know that it's an integer
of that enum type, and that it's an input arg.
In syzkaller we have a Clang-tool:
https://github.com/google/syzkaller/blob/master/tools/syz-declextract/clangtool/declextract.cpp
that extracts a bunch of interfaces automatically:
https://raw.githubusercontent.com/google/syzkaller/refs/heads/master/sys/linux/auto.txt
Though, oviously that won't have user-readable string descriptions, can't be used as a source
of truth, and may be challenging to integrate into kernel build process.
Though, extracting some of that info automatically may be nice.
2. Does this framework ensure that the specified info about args is correct?
E.g. number of syscall args, and their types match the actual ones?
If such things are not tested/validated during build, I afraid they will be
riddled with bugs over time.
3. To reduce duplication we could use more type information, e.g. I was always
frustrated that close is just:
SYSCALL_DEFINE1(close, unsigned int, fd)
whereas if we would do:
typedef int fd_t;
SYSCALL_DEFINE1(close, fd_t, fd)
then all semantic info about the arg is already in the code.
4. If we specify e.g. error return values here with descirptions,
can that be used as the source of truth to generate man pages?
That would eliminate some duplication.
5. We have a long standing dream that kernel developers add fuzzing descirpions
along with new kernel interfaces. So far we got very few contributions to syzkaller
from kernel developers. This framework can serve as the way to do it, which is nice.
6. What's the goal of validation of the input arguments?
Kernel code must do this validation anyway, right.
Any non-trivial validation is hard, e.g. even for open the validation function
for file name would need to have access to flags and check file precense for
some flags combinations. That may add significant amount of non-trivial code
that duplicates main syscall logic, and that logic may also have bugs and
memory leaks.
7. One of the most useful uses of this framework that I see if testing kernel
behavior correctness. I wonder what properties we can test with these descirptions,
and if we can add more useful info for that purpose.
Argument validation does not help here (it's userspace bugs at best).
Return values potentially may be useful, e.g. if we see a return value that's
not specified, potentially it's a kernel bug.
Side-effects specification potentially can be used to detect logical kernel bugs,
e.g. if a syscall does not claim to change fs state, but it does, it's a bug.
Though, a more useful check should be failure/concurrency atomicity.
Namely, if a syscall claims to not alter state on failure, it shouldn't do so.
Concurrency atomicity means linearizability of concurrent syscalls
(side-effects match one of 2 possible orders of syscalls).
But for these we would need to add additional flags to the descriptions
that say that a syscall supports failure/concurrency atomicity.
8. It would be useful to have a mapping of file_operations to actual files in fs.
Otherwise the exposed info is not very actionable, since there is no way to understand
what actual file/fd the ioctl's can be applied to.
9. I see that syscalls and ioctls say:
KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
Can't we make this implicit? Are there any other options?
Similarly an ioctl description says it releases a mutex (.released = true,),
all ioctls/syscalls must release all acquired mutexes, no?
Generally, the less verbose the descriptions are, the higher chances of their survival.
+Marco also works static compiler-enforced lock checking annotations,
I wonder if they can be used to describe this in a more useful way.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-23 13:28 ` Dmitry Vyukov
@ 2025-06-24 14:06 ` Cyril Hrubis
2025-06-24 14:30 ` Dmitry Vyukov
2025-06-24 20:04 ` Sasha Levin
1 sibling, 1 reply; 44+ messages in thread
From: Cyril Hrubis @ 2025-06-24 14:06 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: sashal, kees, elver, linux-api, linux-kernel, tools, workflows
Hi!
> 6. What's the goal of validation of the input arguments?
> Kernel code must do this validation anyway, right.
> Any non-trivial validation is hard, e.g. even for open the validation function
> for file name would need to have access to flags and check file precense for
> some flags combinations. That may add significant amount of non-trivial code
> that duplicates main syscall logic, and that logic may also have bugs and
> memory leaks.
I was looking at that part and thinking that we could generate (at least
some) automated conformance tests based on this information. We could
make sure that invalid parameters are properly rejected. For open(),
some combinations would be difficuilt to model though, e.g. for
O_DIRECTORY the pathname is supposed to be a path to a directory and
also the file descriptor returned has different properties. Also O_CREAT
requires third parameter and changes which kinds of filepaths are
invalid. Demultiplexing syscalls like this is going to be difficult to
get right.
As for testing purposes, most of the time it would be enough just to say
something as "this parameter is an existing file". If we have this
information in a machine parseable format we can generate automatic
tests for various error conditions e.g. ELOOP, EACESS, ENAMETOOLONG,
ENOENT, ...
For paths we could have something as:
file:existing
file:notexisting
file:replaced|nonexisting
file:nonexisting|existing
dir:existing
dir:nonexisting
Then for open() syscall we can do:
flags=O_DIRECTORY path=dir:existing
flags=O_CREAT path=file:nonexisting|existing
flags=O_CREAT|O_EXCL path=file:nonexisting
...
You may wonder if such kind of tests are useful at all, since quite a
few of these errors are checked for and generated from a common
functions. There are at least two cases I can think of. First of all it
makes sure that errors are stable when particular function/subsystem is
rewritten. And it can also make sure that errors are consistent across
different implementation of the same functionality e.g. filesystems. I
remember that some of the less used FUSE filesystems returned puzzling
errors in certain corner cases.
Maybe it would be more useful to steer this towards a system that
annotates better the types for the syscall parameters and return values.
Something that would be an extension to a C types with a description on
how particular string or integer is interpreted.
> Side-effects specification potentially can be used to detect logical kernel bugs,
> e.g. if a syscall does not claim to change fs state, but it does, it's a bug.
> Though, a more useful check should be failure/concurrency atomicity.
> Namely, if a syscall claims to not alter state on failure, it shouldn't do so.
> Concurrency atomicity means linearizability of concurrent syscalls
> (side-effects match one of 2 possible orders of syscalls).
> But for these we would need to add additional flags to the descriptions
> that say that a syscall supports failure/concurrency atomicity.
>
> 8. It would be useful to have a mapping of file_operations to actual files in fs.
> Otherwise the exposed info is not very actionable, since there is no way to understand
> what actual file/fd the ioctl's can be applied to.
+1 There are many different kinds of file descriptors and they differ
wildy in what operations they support.
Maybe we would need a subclass for a file descriptor, something as:
fd:file
fd:timerfd
fd:pidfs
...
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-24 14:06 ` Cyril Hrubis
@ 2025-06-24 14:30 ` Dmitry Vyukov
2025-06-24 15:27 ` Cyril Hrubis
0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-24 14:30 UTC (permalink / raw)
To: Cyril Hrubis
Cc: sashal, kees, elver, linux-api, linux-kernel, tools, workflows
On Tue, 24 Jun 2025 at 16:05, Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> > 6. What's the goal of validation of the input arguments?
> > Kernel code must do this validation anyway, right.
> > Any non-trivial validation is hard, e.g. even for open the validation function
> > for file name would need to have access to flags and check file precense for
> > some flags combinations. That may add significant amount of non-trivial code
> > that duplicates main syscall logic, and that logic may also have bugs and
> > memory leaks.
>
> I was looking at that part and thinking that we could generate (at least
> some) automated conformance tests based on this information. We could
> make sure that invalid parameters are properly rejected. For open(),
> some combinations would be difficuilt to model though, e.g. for
> O_DIRECTORY the pathname is supposed to be a path to a directory and
> also the file descriptor returned has different properties. Also O_CREAT
> requires third parameter and changes which kinds of filepaths are
> invalid. Demultiplexing syscalls like this is going to be difficult to
> get right.
>
> As for testing purposes, most of the time it would be enough just to say
> something as "this parameter is an existing file". If we have this
> information in a machine parseable format we can generate automatic
> tests for various error conditions e.g. ELOOP, EACESS, ENAMETOOLONG,
> ENOENT, ...
>
> For paths we could have something as:
>
> file:existing
> file:notexisting
> file:replaced|nonexisting
> file:nonexisting|existing
> dir:existing
> dir:nonexisting
>
> Then for open() syscall we can do:
>
> flags=O_DIRECTORY path=dir:existing
> flags=O_CREAT path=file:nonexisting|existing
> flags=O_CREAT|O_EXCL path=file:nonexisting
> ...
>
> You may wonder if such kind of tests are useful at all, since quite a
> few of these errors are checked for and generated from a common
> functions. There are at least two cases I can think of. First of all it
> makes sure that errors are stable when particular function/subsystem is
> rewritten. And it can also make sure that errors are consistent across
> different implementation of the same functionality e.g. filesystems. I
> remember that some of the less used FUSE filesystems returned puzzling
> errors in certain corner cases.
I am not following how this is related to the validation part of the
patch series. Can you elaborate?
Generation of such conformance tests would need info about parameter
types and their semantic meaning, not the validation part.
The conformance tests should test that actual syscall checking of
arguments, not the validation added by this framework.
> Maybe it would be more useful to steer this towards a system that
> annotates better the types for the syscall parameters and return values.
> Something that would be an extension to a C types with a description on
> how particular string or integer is interpreted.
+1
> > Side-effects specification potentially can be used to detect logical kernel bugs,
> > e.g. if a syscall does not claim to change fs state, but it does, it's a bug.
> > Though, a more useful check should be failure/concurrency atomicity.
> > Namely, if a syscall claims to not alter state on failure, it shouldn't do so.
> > Concurrency atomicity means linearizability of concurrent syscalls
> > (side-effects match one of 2 possible orders of syscalls).
> > But for these we would need to add additional flags to the descriptions
> > that say that a syscall supports failure/concurrency atomicity.
> >
> > 8. It would be useful to have a mapping of file_operations to actual files in fs.
> > Otherwise the exposed info is not very actionable, since there is no way to understand
> > what actual file/fd the ioctl's can be applied to.
>
> +1 There are many different kinds of file descriptors and they differ
> wildy in what operations they support.
>
> Maybe we would need a subclass for a file descriptor, something as:
>
> fd:file
> fd:timerfd
> fd:pidfs
FWIW syzkaller has this for the purpose of automatic generation of test inputs.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-24 14:30 ` Dmitry Vyukov
@ 2025-06-24 15:27 ` Cyril Hrubis
0 siblings, 0 replies; 44+ messages in thread
From: Cyril Hrubis @ 2025-06-24 15:27 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: sashal, kees, elver, linux-api, linux-kernel, tools, workflows
Hi!
> > You may wonder if such kind of tests are useful at all, since quite a
> > few of these errors are checked for and generated from a common
> > functions. There are at least two cases I can think of. First of all it
> > makes sure that errors are stable when particular function/subsystem is
> > rewritten. And it can also make sure that errors are consistent across
> > different implementation of the same functionality e.g. filesystems. I
> > remember that some of the less used FUSE filesystems returned puzzling
> > errors in certain corner cases.
>
> I am not following how this is related to the validation part of the
> patch series. Can you elaborate?
This part is me trying to explain that generated conformance tests would
be useful for development as well.
> Generation of such conformance tests would need info about parameter
> types and their semantic meaning, not the validation part.
> The conformance tests should test that actual syscall checking of
> arguments, not the validation added by this framework.
Exactly.
I do not think that it makes sense to encode the argument ranges and
functions to generate a valid syscall parameters into the kernel. Rather
than that the information should encoded in the extended types, if we do
that well enough we can generate combination of different valid and
invalid parameters for the tests based on that.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-23 13:28 ` Dmitry Vyukov
2025-06-24 14:06 ` Cyril Hrubis
@ 2025-06-24 20:04 ` Sasha Levin
2025-06-25 8:49 ` Dmitry Vyukov
` (2 more replies)
1 sibling, 3 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-24 20:04 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Mon, Jun 23, 2025 at 03:28:03PM +0200, Dmitry Vyukov wrote:
>Nice!
>
>A bag of assorted comments:
>
>1. I share the same concern of duplicating info.
>If there are lots of duplication it may lead to failure of the whole effort
>since folks won't update these and/or they will get out of sync.
>If a syscall arg is e.g. umode_t, we already know that it's an integer
>of that enum type, and that it's an input arg.
>In syzkaller we have a Clang-tool:
>https://github.com/google/syzkaller/blob/master/tools/syz-declextract/clangtool/declextract.cpp
>that extracts a bunch of interfaces automatically:
>https://raw.githubusercontent.com/google/syzkaller/refs/heads/master/sys/linux/auto.txt
>Though, oviously that won't have user-readable string descriptions, can't be used as a source
>of truth, and may be challenging to integrate into kernel build process.
>Though, extracting some of that info automatically may be nice.
>
>2. Does this framework ensure that the specified info about args is correct?
>E.g. number of syscall args, and their types match the actual ones?
>If such things are not tested/validated during build, I afraid they will be
>riddled with bugs over time.
This is an answer for both (1) and (2): yes! In my mind, whatever we
spec out needs to be enforced, because otherwise it will go out of sync.
In this RFC, take a look at the code guarded by
CONFIG_KAPI_RUNTIME_CHECKS: the idea is that we can enable runtime
checks that verify the things you've mentioned above (and more).
>3. To reduce duplication we could use more type information, e.g. I was always
>frustrated that close is just:
>
>SYSCALL_DEFINE1(close, unsigned int, fd)
>
>whereas if we would do:
>
>typedef int fd_t;
>SYSCALL_DEFINE1(close, fd_t, fd)
>
>then all semantic info about the arg is already in the code.
Yup. It would also be great if we completely drop the SYSCALL_DEFINE()
part and have it be automatically generated by the spec itself, but I
couldn't wrap my head around doing this in C macro just yet.
>4. If we specify e.g. error return values here with descirptions,
>can that be used as the source of truth to generate man pages?
>That would eliminate some duplication.
Ideally yes. One of the formatters that the kapi tool has (see the last
patch in this series) is the RST formatter that could be used to
generate documentation similar to man pages.
>5. We have a long standing dream that kernel developers add fuzzing descirpions
>along with new kernel interfaces. So far we got very few contributions to syzkaller
>from kernel developers. This framework can serve as the way to do it, which is nice.
This was one of the main usecases I had in mind.
In return, we can get back from syzkaller a body of automatically
generated tests that we can embed into our testing CIs.
>6. What's the goal of validation of the input arguments?
>Kernel code must do this validation anyway, right.
>Any non-trivial validation is hard, e.g. even for open the validation function
>for file name would need to have access to flags and check file precense for
>some flags combinations. That may add significant amount of non-trivial code
>that duplicates main syscall logic, and that logic may also have bugs and
>memory leaks.
Mostly to catch divergence from the spec: think of a scenario where
someone added a new param/flag/etc but forgot to update the spec - this
will help catch it.
Ideally it would also prevent some of the issues that syzkaller is so
good at finding :)
>7. One of the most useful uses of this framework that I see if testing kernel
>behavior correctness. I wonder what properties we can test with these descirptions,
>and if we can add more useful info for that purpose.
>Argument validation does not help here (it's userspace bugs at best).
>Return values potentially may be useful, e.g. if we see a return value that's
>not specified, potentially it's a kernel bug.
>Side-effects specification potentially can be used to detect logical kernel bugs,
>e.g. if a syscall does not claim to change fs state, but it does, it's a bug.
>Though, a more useful check should be failure/concurrency atomicity.
>Namely, if a syscall claims to not alter state on failure, it shouldn't do so.
>Concurrency atomicity means linearizability of concurrent syscalls
>(side-effects match one of 2 possible orders of syscalls).
>But for these we would need to add additional flags to the descriptions
>that say that a syscall supports failure/concurrency atomicity.
I agree: being able to fuzz for more than just kernel splats will be
great.
>8. It would be useful to have a mapping of file_operations to actual files in fs.
>Otherwise the exposed info is not very actionable, since there is no way to understand
>what actual file/fd the ioctl's can be applied to.
Ack. The ioctl() part is a bit hand weavy right now, and at the very
least we'd need to spec out ioctl() itself. It's more of a demonstration
of how it could look like rather than being too useful at this point.
>9. I see that syscalls and ioctls say:
>KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
>Can't we make this implicit? Are there any other options?
Maybe? I wasn't sure how we'd describe somthing like getpid() which
isn't supposed to sleep.
>Similarly an ioctl description says it releases a mutex (.released = true,),
>all ioctls/syscalls must release all acquired mutexes, no?
>Generally, the less verbose the descriptions are, the higher chances of their survival.
>+Marco also works static compiler-enforced lock checking annotations,
>I wonder if they can be used to describe this in a more useful way.
I was thinking about stuff like futex or flock which can return with a
lock back to userspace.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-24 20:04 ` Sasha Levin
@ 2025-06-25 8:49 ` Dmitry Vyukov
2025-06-25 8:52 ` Dmitry Vyukov
2025-06-25 8:56 ` Dmitry Vyukov
2 siblings, 0 replies; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-25 8:49 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> >3. To reduce duplication we could use more type information, e.g. I was always
> >frustrated that close is just:
> >
> >SYSCALL_DEFINE1(close, unsigned int, fd)
> >
> >whereas if we would do:
> >
> >typedef int fd_t;
> >SYSCALL_DEFINE1(close, fd_t, fd)
> >
> >then all semantic info about the arg is already in the code.
>
> Yup. It would also be great if we completely drop the SYSCALL_DEFINE()
> part and have it be automatically generated by the spec itself, but I
> couldn't wrap my head around doing this in C macro just yet.
At some point I was looking at boost.pp library as the source of info
on how to do things. It provides a set of containers and algorithms on
them:
https://www.boost.org/doc/libs/latest/libs/preprocessor/doc/index.html
Sequences may be the most appealing b/c they support variable number
of elements, and don't need specifying number of elements explicitly:
https://www.boost.org/doc/libs/latest/libs/preprocessor/doc/data/sequences.html
A sequence then allows generating multiple things from it using
foreach over elements.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-24 20:04 ` Sasha Levin
2025-06-25 8:49 ` Dmitry Vyukov
@ 2025-06-25 8:52 ` Dmitry Vyukov
2025-06-25 15:46 ` Cyril Hrubis
2025-06-25 15:55 ` Sasha Levin
2025-06-25 8:56 ` Dmitry Vyukov
2 siblings, 2 replies; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-25 8:52 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> >6. What's the goal of validation of the input arguments?
> >Kernel code must do this validation anyway, right.
> >Any non-trivial validation is hard, e.g. even for open the validation function
> >for file name would need to have access to flags and check file precense for
> >some flags combinations. That may add significant amount of non-trivial code
> >that duplicates main syscall logic, and that logic may also have bugs and
> >memory leaks.
>
> Mostly to catch divergence from the spec: think of a scenario where
> someone added a new param/flag/etc but forgot to update the spec - this
> will help catch it.
How exactly is this supposed to work?
Even if we run with a unit test suite, a test suite may include some
incorrect inputs to check for error conditions. The framework will
report violations on these incorrect inputs. These are not bugs in the
API specifications, nor in the test suite (read false positives).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-24 20:04 ` Sasha Levin
2025-06-25 8:49 ` Dmitry Vyukov
2025-06-25 8:52 ` Dmitry Vyukov
@ 2025-06-25 8:56 ` Dmitry Vyukov
2025-06-25 16:23 ` Sasha Levin
2 siblings, 1 reply; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-25 8:56 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> >9. I see that syscalls and ioctls say:
> >KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
> >Can't we make this implicit? Are there any other options?
>
> Maybe? I wasn't sure how we'd describe somthing like getpid() which
> isn't supposed to sleep.
>
> >Similarly an ioctl description says it releases a mutex (.released = true,),
> >all ioctls/syscalls must release all acquired mutexes, no?
> >Generally, the less verbose the descriptions are, the higher chances of their survival.
> >+Marco also works static compiler-enforced lock checking annotations,
> >I wonder if they can be used to describe this in a more useful way.
>
> I was thinking about stuff like futex or flock which can return with a
> lock back to userspace.
I see, this makes sense. Then I would go with explicitly specifying
rare uncommon cases instead, and require 99% of common cases be the
default that does not require saying anything.
E.g. KAPI_CTX_NON_SLEEPABLE, .not_released = true.
KAPI_CTX_NON_SLEEPABLE looks useful, since it allows easy validation:
set current flag, and BUG on any attempt to sleep when the flag is set
(lockdep probably already has required pieces for this).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-25 8:52 ` Dmitry Vyukov
@ 2025-06-25 15:46 ` Cyril Hrubis
2025-06-25 15:55 ` Sasha Levin
1 sibling, 0 replies; 44+ messages in thread
From: Cyril Hrubis @ 2025-06-25 15:46 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Sasha Levin, kees, elver, linux-api, linux-kernel, tools,
workflows
Hi!
> > >6. What's the goal of validation of the input arguments?
> > >Kernel code must do this validation anyway, right.
> > >Any non-trivial validation is hard, e.g. even for open the validation function
> > >for file name would need to have access to flags and check file precense for
> > >some flags combinations. That may add significant amount of non-trivial code
> > >that duplicates main syscall logic, and that logic may also have bugs and
> > >memory leaks.
> >
> > Mostly to catch divergence from the spec: think of a scenario where
> > someone added a new param/flag/etc but forgot to update the spec - this
> > will help catch it.
>
> How exactly is this supposed to work?
> Even if we run with a unit test suite, a test suite may include some
> incorrect inputs to check for error conditions. The framework will
> report violations on these incorrect inputs. These are not bugs in the
> API specifications, nor in the test suite (read false positives).
This is what I tried to respond to but I guess that it didn't go well.
Let me try to reiterate. I my opinion you shouldn't really put this part
into the kernel, but rather than that include more type and semantic
information into the data so that tests can be generated and executed in
userspace. I do not see how can we validate that we get proper errors
from a syscall if one of the input parameters is invalid other than
generating and running a C test in userspace. For that part the syscall
description does not need to be build into the kernel either, it may be
just a build artifact that gets installed with the kernel image.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-25 8:52 ` Dmitry Vyukov
2025-06-25 15:46 ` Cyril Hrubis
@ 2025-06-25 15:55 ` Sasha Levin
2025-06-26 8:32 ` Dmitry Vyukov
1 sibling, 1 reply; 44+ messages in thread
From: Sasha Levin @ 2025-06-25 15:55 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
>On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
>
>> >6. What's the goal of validation of the input arguments?
>> >Kernel code must do this validation anyway, right.
>> >Any non-trivial validation is hard, e.g. even for open the validation function
>> >for file name would need to have access to flags and check file precense for
>> >some flags combinations. That may add significant amount of non-trivial code
>> >that duplicates main syscall logic, and that logic may also have bugs and
>> >memory leaks.
>>
>> Mostly to catch divergence from the spec: think of a scenario where
>> someone added a new param/flag/etc but forgot to update the spec - this
>> will help catch it.
>
>How exactly is this supposed to work?
>Even if we run with a unit test suite, a test suite may include some
>incorrect inputs to check for error conditions. The framework will
>report violations on these incorrect inputs. These are not bugs in the
>API specifications, nor in the test suite (read false positives).
Right now it would be something along the lines of the test checking for
an expected failure message in dmesg, something along the lines of:
https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
I'm not opposed to coming up with a better story...
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-25 8:56 ` Dmitry Vyukov
@ 2025-06-25 16:23 ` Sasha Levin
0 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-06-25 16:23 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Wed, Jun 25, 2025 at 10:56:04AM +0200, Dmitry Vyukov wrote:
>On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
>> >9. I see that syscalls and ioctls say:
>> >KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
>> >Can't we make this implicit? Are there any other options?
>>
>> Maybe? I wasn't sure how we'd describe somthing like getpid() which
>> isn't supposed to sleep.
>>
>> >Similarly an ioctl description says it releases a mutex (.released = true,),
>> >all ioctls/syscalls must release all acquired mutexes, no?
>> >Generally, the less verbose the descriptions are, the higher chances of their survival.
>> >+Marco also works static compiler-enforced lock checking annotations,
>> >I wonder if they can be used to describe this in a more useful way.
>>
>> I was thinking about stuff like futex or flock which can return with a
>> lock back to userspace.
>
>I see, this makes sense. Then I would go with explicitly specifying
>rare uncommon cases instead, and require 99% of common cases be the
>default that does not require saying anything.
>
>E.g. KAPI_CTX_NON_SLEEPABLE, .not_released = true.
>
>KAPI_CTX_NON_SLEEPABLE looks useful, since it allows easy validation:
>set current flag, and BUG on any attempt to sleep when the flag is set
>(lockdep probably already has required pieces for this).
Yup, that makes sense. One of the reason I wrapped all the field
assignments with macros is that we can easily customize them based on
usage, so instead of:
#define KAPI_LOCK_ACQUIRED \
.acquired = true,
#define KAPI_LOCK_RELEASED \
.released = true,
We can add:
#define KAPI_LOCK_USED \
.acquired = true, \
.released = true,
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-25 15:55 ` Sasha Levin
@ 2025-06-26 8:32 ` Dmitry Vyukov
2025-06-26 8:37 ` Dmitry Vyukov
0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-26 8:32 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Wed, 25 Jun 2025 at 17:55, Sasha Levin <sashal@kernel.org> wrote:
>
> On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
> >On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> >
> >> >6. What's the goal of validation of the input arguments?
> >> >Kernel code must do this validation anyway, right.
> >> >Any non-trivial validation is hard, e.g. even for open the validation function
> >> >for file name would need to have access to flags and check file precense for
> >> >some flags combinations. That may add significant amount of non-trivial code
> >> >that duplicates main syscall logic, and that logic may also have bugs and
> >> >memory leaks.
> >>
> >> Mostly to catch divergence from the spec: think of a scenario where
> >> someone added a new param/flag/etc but forgot to update the spec - this
> >> will help catch it.
> >
> >How exactly is this supposed to work?
> >Even if we run with a unit test suite, a test suite may include some
> >incorrect inputs to check for error conditions. The framework will
> >report violations on these incorrect inputs. These are not bugs in the
> >API specifications, nor in the test suite (read false positives).
>
> Right now it would be something along the lines of the test checking for
> an expected failure message in dmesg, something along the lines of:
>
> https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
>
> I'm not opposed to coming up with a better story...
Oh, you mean special tests for this framework (rather than existing tests).
I don't think this is going to work in practice. Besides writing all
these specifications, we will also need to write dozens of tests per
each specification (e.g. for each fd arg one needs at least 3 tests:
-1, valid fd, inclid fd; an enum may need 5 various inputs of
something; let alone netlink specifications).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-26 8:32 ` Dmitry Vyukov
@ 2025-06-26 8:37 ` Dmitry Vyukov
2025-06-26 16:23 ` Sasha Levin
0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-26 8:37 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Thu, 26 Jun 2025 at 10:32, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, 25 Jun 2025 at 17:55, Sasha Levin <sashal@kernel.org> wrote:
> >
> > On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
> > >On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> > >
> > >> >6. What's the goal of validation of the input arguments?
> > >> >Kernel code must do this validation anyway, right.
> > >> >Any non-trivial validation is hard, e.g. even for open the validation function
> > >> >for file name would need to have access to flags and check file precense for
> > >> >some flags combinations. That may add significant amount of non-trivial code
> > >> >that duplicates main syscall logic, and that logic may also have bugs and
> > >> >memory leaks.
> > >>
> > >> Mostly to catch divergence from the spec: think of a scenario where
> > >> someone added a new param/flag/etc but forgot to update the spec - this
> > >> will help catch it.
> > >
> > >How exactly is this supposed to work?
> > >Even if we run with a unit test suite, a test suite may include some
> > >incorrect inputs to check for error conditions. The framework will
> > >report violations on these incorrect inputs. These are not bugs in the
> > >API specifications, nor in the test suite (read false positives).
> >
> > Right now it would be something along the lines of the test checking for
> > an expected failure message in dmesg, something along the lines of:
> >
> > https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
> >
> > I'm not opposed to coming up with a better story...
If the goal of validation is just indirectly validating correctness of
the specification itself, then I would look for other ways of
validating correctness of the spec.
Either removing duplication between specification and actual code
(i.e. generating it from SYSCALL_DEFINE, or the other way around) ,
then spec is correct by construction. Or, cross-validating it with
info automatically extracted from the source (using
clang/dwarf/pahole).
This would be more scalable (O(1) work, rather than thousands more
manually written tests).
> Oh, you mean special tests for this framework (rather than existing tests).
> I don't think this is going to work in practice. Besides writing all
> these specifications, we will also need to write dozens of tests per
> each specification (e.g. for each fd arg one needs at least 3 tests:
> -1, valid fd, inclid fd; an enum may need 5 various inputs of
> something; let alone netlink specifications).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-26 8:37 ` Dmitry Vyukov
@ 2025-06-26 16:23 ` Sasha Levin
2025-06-27 6:23 ` Dmitry Vyukov
0 siblings, 1 reply; 44+ messages in thread
From: Sasha Levin @ 2025-06-26 16:23 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Thu, Jun 26, 2025 at 10:37:33AM +0200, Dmitry Vyukov wrote:
>On Thu, 26 Jun 2025 at 10:32, Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> On Wed, 25 Jun 2025 at 17:55, Sasha Levin <sashal@kernel.org> wrote:
>> >
>> > On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
>> > >On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
>> > >
>> > >> >6. What's the goal of validation of the input arguments?
>> > >> >Kernel code must do this validation anyway, right.
>> > >> >Any non-trivial validation is hard, e.g. even for open the validation function
>> > >> >for file name would need to have access to flags and check file precense for
>> > >> >some flags combinations. That may add significant amount of non-trivial code
>> > >> >that duplicates main syscall logic, and that logic may also have bugs and
>> > >> >memory leaks.
>> > >>
>> > >> Mostly to catch divergence from the spec: think of a scenario where
>> > >> someone added a new param/flag/etc but forgot to update the spec - this
>> > >> will help catch it.
>> > >
>> > >How exactly is this supposed to work?
>> > >Even if we run with a unit test suite, a test suite may include some
>> > >incorrect inputs to check for error conditions. The framework will
>> > >report violations on these incorrect inputs. These are not bugs in the
>> > >API specifications, nor in the test suite (read false positives).
>> >
>> > Right now it would be something along the lines of the test checking for
>> > an expected failure message in dmesg, something along the lines of:
>> >
>> > https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
>> >
>> > I'm not opposed to coming up with a better story...
>
>If the goal of validation is just indirectly validating correctness of
>the specification itself, then I would look for other ways of
>validating correctness of the spec.
>Either removing duplication between specification and actual code
>(i.e. generating it from SYSCALL_DEFINE, or the other way around) ,
>then spec is correct by construction. Or, cross-validating it with
>info automatically extracted from the source (using
>clang/dwarf/pahole).
>This would be more scalable (O(1) work, rather than thousands more
>manually written tests).
>
>> Oh, you mean special tests for this framework (rather than existing tests).
>> I don't think this is going to work in practice. Besides writing all
>> these specifications, we will also need to write dozens of tests per
>> each specification (e.g. for each fd arg one needs at least 3 tests:
>> -1, valid fd, inclid fd; an enum may need 5 various inputs of
>> something; let alone netlink specifications).
I didn't mean just for the framework: being able to specify the APIs in
machine readable format will enable us to automatically generate
exhaustive tests for each such API.
I've been playing with the kapi tool (see last patch) which already
supports different formatters. Right now it outputs human readable
output, but I have proof-of-concept code that outputs testcases for
specced APIs.
The dream here is to be able to automatically generate
hundreds/thousands of tests for each API in an automated fashion, and
verify the results with:
1. Simply checking expected return value.
2. Checking that the actual action happened (i.e. we called close(fd),
verify that `fd` is really closed).
3. Check for side effects (i.e. close(fd) isn't supposed to allocate
memory - verify that it didn't allocate memory).
4. Code coverage: our tests are supposed to cover 100% of the code in
that APIs call chain, do we have code that didn't run (missing/incorrect
specs).
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-26 16:23 ` Sasha Levin
@ 2025-06-27 6:23 ` Dmitry Vyukov
2025-06-30 14:27 ` Sasha Levin
0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Vyukov @ 2025-06-27 6:23 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Thu, 26 Jun 2025 at 18:23, Sasha Levin <sashal@kernel.org> wrote:
>
> On Thu, Jun 26, 2025 at 10:37:33AM +0200, Dmitry Vyukov wrote:
> >On Thu, 26 Jun 2025 at 10:32, Dmitry Vyukov <dvyukov@google.com> wrote:
> >>
> >> On Wed, 25 Jun 2025 at 17:55, Sasha Levin <sashal@kernel.org> wrote:
> >> >
> >> > On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
> >> > >On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> >> > >
> >> > >> >6. What's the goal of validation of the input arguments?
> >> > >> >Kernel code must do this validation anyway, right.
> >> > >> >Any non-trivial validation is hard, e.g. even for open the validation function
> >> > >> >for file name would need to have access to flags and check file precense for
> >> > >> >some flags combinations. That may add significant amount of non-trivial code
> >> > >> >that duplicates main syscall logic, and that logic may also have bugs and
> >> > >> >memory leaks.
> >> > >>
> >> > >> Mostly to catch divergence from the spec: think of a scenario where
> >> > >> someone added a new param/flag/etc but forgot to update the spec - this
> >> > >> will help catch it.
> >> > >
> >> > >How exactly is this supposed to work?
> >> > >Even if we run with a unit test suite, a test suite may include some
> >> > >incorrect inputs to check for error conditions. The framework will
> >> > >report violations on these incorrect inputs. These are not bugs in the
> >> > >API specifications, nor in the test suite (read false positives).
> >> >
> >> > Right now it would be something along the lines of the test checking for
> >> > an expected failure message in dmesg, something along the lines of:
> >> >
> >> > https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
> >> >
> >> > I'm not opposed to coming up with a better story...
> >
> >If the goal of validation is just indirectly validating correctness of
> >the specification itself, then I would look for other ways of
> >validating correctness of the spec.
> >Either removing duplication between specification and actual code
> >(i.e. generating it from SYSCALL_DEFINE, or the other way around) ,
> >then spec is correct by construction. Or, cross-validating it with
> >info automatically extracted from the source (using
> >clang/dwarf/pahole).
> >This would be more scalable (O(1) work, rather than thousands more
> >manually written tests).
> >
> >> Oh, you mean special tests for this framework (rather than existing tests).
> >> I don't think this is going to work in practice. Besides writing all
> >> these specifications, we will also need to write dozens of tests per
> >> each specification (e.g. for each fd arg one needs at least 3 tests:
> >> -1, valid fd, inclid fd; an enum may need 5 various inputs of
> >> something; let alone netlink specifications).
>
> I didn't mean just for the framework: being able to specify the APIs in
> machine readable format will enable us to automatically generate
> exhaustive tests for each such API.
>
> I've been playing with the kapi tool (see last patch) which already
> supports different formatters. Right now it outputs human readable
> output, but I have proof-of-concept code that outputs testcases for
> specced APIs.
>
> The dream here is to be able to automatically generate
> hundreds/thousands of tests for each API in an automated fashion, and
> verify the results with:
>
> 1. Simply checking expected return value.
>
> 2. Checking that the actual action happened (i.e. we called close(fd),
> verify that `fd` is really closed).
>
> 3. Check for side effects (i.e. close(fd) isn't supposed to allocate
> memory - verify that it didn't allocate memory).
>
> 4. Code coverage: our tests are supposed to cover 100% of the code in
> that APIs call chain, do we have code that didn't run (missing/incorrect
> specs).
This is all good. I was asking the argument verification part of the
framework. Is it required for any of this? How?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-27 6:23 ` Dmitry Vyukov
@ 2025-06-30 14:27 ` Sasha Levin
2025-07-01 6:11 ` Dmitry Vyukov
0 siblings, 1 reply; 44+ messages in thread
From: Sasha Levin @ 2025-06-30 14:27 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Fri, Jun 27, 2025 at 08:23:41AM +0200, Dmitry Vyukov wrote:
>On Thu, 26 Jun 2025 at 18:23, Sasha Levin <sashal@kernel.org> wrote:
>>
>> On Thu, Jun 26, 2025 at 10:37:33AM +0200, Dmitry Vyukov wrote:
>> >On Thu, 26 Jun 2025 at 10:32, Dmitry Vyukov <dvyukov@google.com> wrote:
>> >>
>> >> On Wed, 25 Jun 2025 at 17:55, Sasha Levin <sashal@kernel.org> wrote:
>> >> >
>> >> > On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
>> >> > >On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
>> >> > >
>> >> > >> >6. What's the goal of validation of the input arguments?
>> >> > >> >Kernel code must do this validation anyway, right.
>> >> > >> >Any non-trivial validation is hard, e.g. even for open the validation function
>> >> > >> >for file name would need to have access to flags and check file precense for
>> >> > >> >some flags combinations. That may add significant amount of non-trivial code
>> >> > >> >that duplicates main syscall logic, and that logic may also have bugs and
>> >> > >> >memory leaks.
>> >> > >>
>> >> > >> Mostly to catch divergence from the spec: think of a scenario where
>> >> > >> someone added a new param/flag/etc but forgot to update the spec - this
>> >> > >> will help catch it.
>> >> > >
>> >> > >How exactly is this supposed to work?
>> >> > >Even if we run with a unit test suite, a test suite may include some
>> >> > >incorrect inputs to check for error conditions. The framework will
>> >> > >report violations on these incorrect inputs. These are not bugs in the
>> >> > >API specifications, nor in the test suite (read false positives).
>> >> >
>> >> > Right now it would be something along the lines of the test checking for
>> >> > an expected failure message in dmesg, something along the lines of:
>> >> >
>> >> > https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
>> >> >
>> >> > I'm not opposed to coming up with a better story...
>> >
>> >If the goal of validation is just indirectly validating correctness of
>> >the specification itself, then I would look for other ways of
>> >validating correctness of the spec.
>> >Either removing duplication between specification and actual code
>> >(i.e. generating it from SYSCALL_DEFINE, or the other way around) ,
>> >then spec is correct by construction. Or, cross-validating it with
>> >info automatically extracted from the source (using
>> >clang/dwarf/pahole).
>> >This would be more scalable (O(1) work, rather than thousands more
>> >manually written tests).
>> >
>> >> Oh, you mean special tests for this framework (rather than existing tests).
>> >> I don't think this is going to work in practice. Besides writing all
>> >> these specifications, we will also need to write dozens of tests per
>> >> each specification (e.g. for each fd arg one needs at least 3 tests:
>> >> -1, valid fd, inclid fd; an enum may need 5 various inputs of
>> >> something; let alone netlink specifications).
>>
>> I didn't mean just for the framework: being able to specify the APIs in
>> machine readable format will enable us to automatically generate
>> exhaustive tests for each such API.
>>
>> I've been playing with the kapi tool (see last patch) which already
>> supports different formatters. Right now it outputs human readable
>> output, but I have proof-of-concept code that outputs testcases for
>> specced APIs.
>>
>> The dream here is to be able to automatically generate
>> hundreds/thousands of tests for each API in an automated fashion, and
>> verify the results with:
>>
>> 1. Simply checking expected return value.
>>
>> 2. Checking that the actual action happened (i.e. we called close(fd),
>> verify that `fd` is really closed).
>>
>> 3. Check for side effects (i.e. close(fd) isn't supposed to allocate
>> memory - verify that it didn't allocate memory).
>>
>> 4. Code coverage: our tests are supposed to cover 100% of the code in
>> that APIs call chain, do we have code that didn't run (missing/incorrect
>> specs).
>
>
>This is all good. I was asking the argument verification part of the
>framework. Is it required for any of this? How?
Specifications without enforcement are just documentation :)
In my mind, there are a few reasons we want this:
1. For folks coding against the kernel, it's a way for them to know that
the code they're writing fits within the spec of the kernel's API.
2. Enforcement around kernel changes: think of a scenario where a flag
is added to a syscall - the author of that change will have to also
update the spec because otherwise the verification layer will complain
about the new flag. This helps prevent divergence between the code and
the spec.
3. Extra layer of security: we can choose to enable this as an
additional layer to protect us from missing checks in our userspace
facing API.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC 00/19] Kernel API Specification Framework
2025-06-30 14:27 ` Sasha Levin
@ 2025-07-01 6:11 ` Dmitry Vyukov
0 siblings, 0 replies; 44+ messages in thread
From: Dmitry Vyukov @ 2025-07-01 6:11 UTC (permalink / raw)
To: Sasha Levin; +Cc: kees, elver, linux-api, linux-kernel, tools, workflows
On Mon, 30 Jun 2025 at 16:27, Sasha Levin <sashal@kernel.org> wrote:
>
> On Fri, Jun 27, 2025 at 08:23:41AM +0200, Dmitry Vyukov wrote:
> >On Thu, 26 Jun 2025 at 18:23, Sasha Levin <sashal@kernel.org> wrote:
> >>
> >> On Thu, Jun 26, 2025 at 10:37:33AM +0200, Dmitry Vyukov wrote:
> >> >On Thu, 26 Jun 2025 at 10:32, Dmitry Vyukov <dvyukov@google.com> wrote:
> >> >>
> >> >> On Wed, 25 Jun 2025 at 17:55, Sasha Levin <sashal@kernel.org> wrote:
> >> >> >
> >> >> > On Wed, Jun 25, 2025 at 10:52:46AM +0200, Dmitry Vyukov wrote:
> >> >> > >On Tue, 24 Jun 2025 at 22:04, Sasha Levin <sashal@kernel.org> wrote:
> >> >> > >
> >> >> > >> >6. What's the goal of validation of the input arguments?
> >> >> > >> >Kernel code must do this validation anyway, right.
> >> >> > >> >Any non-trivial validation is hard, e.g. even for open the validation function
> >> >> > >> >for file name would need to have access to flags and check file precense for
> >> >> > >> >some flags combinations. That may add significant amount of non-trivial code
> >> >> > >> >that duplicates main syscall logic, and that logic may also have bugs and
> >> >> > >> >memory leaks.
> >> >> > >>
> >> >> > >> Mostly to catch divergence from the spec: think of a scenario where
> >> >> > >> someone added a new param/flag/etc but forgot to update the spec - this
> >> >> > >> will help catch it.
> >> >> > >
> >> >> > >How exactly is this supposed to work?
> >> >> > >Even if we run with a unit test suite, a test suite may include some
> >> >> > >incorrect inputs to check for error conditions. The framework will
> >> >> > >report violations on these incorrect inputs. These are not bugs in the
> >> >> > >API specifications, nor in the test suite (read false positives).
> >> >> >
> >> >> > Right now it would be something along the lines of the test checking for
> >> >> > an expected failure message in dmesg, something along the lines of:
> >> >> >
> >> >> > https://github.com/linux-test-project/ltp/blob/0c99c7915f029d32de893b15b0a213ff3de210af/testcases/commands/sysctl/sysctl02.sh#L67
> >> >> >
> >> >> > I'm not opposed to coming up with a better story...
> >> >
> >> >If the goal of validation is just indirectly validating correctness of
> >> >the specification itself, then I would look for other ways of
> >> >validating correctness of the spec.
> >> >Either removing duplication between specification and actual code
> >> >(i.e. generating it from SYSCALL_DEFINE, or the other way around) ,
> >> >then spec is correct by construction. Or, cross-validating it with
> >> >info automatically extracted from the source (using
> >> >clang/dwarf/pahole).
> >> >This would be more scalable (O(1) work, rather than thousands more
> >> >manually written tests).
> >> >
> >> >> Oh, you mean special tests for this framework (rather than existing tests).
> >> >> I don't think this is going to work in practice. Besides writing all
> >> >> these specifications, we will also need to write dozens of tests per
> >> >> each specification (e.g. for each fd arg one needs at least 3 tests:
> >> >> -1, valid fd, inclid fd; an enum may need 5 various inputs of
> >> >> something; let alone netlink specifications).
> >>
> >> I didn't mean just for the framework: being able to specify the APIs in
> >> machine readable format will enable us to automatically generate
> >> exhaustive tests for each such API.
> >>
> >> I've been playing with the kapi tool (see last patch) which already
> >> supports different formatters. Right now it outputs human readable
> >> output, but I have proof-of-concept code that outputs testcases for
> >> specced APIs.
> >>
> >> The dream here is to be able to automatically generate
> >> hundreds/thousands of tests for each API in an automated fashion, and
> >> verify the results with:
> >>
> >> 1. Simply checking expected return value.
> >>
> >> 2. Checking that the actual action happened (i.e. we called close(fd),
> >> verify that `fd` is really closed).
> >>
> >> 3. Check for side effects (i.e. close(fd) isn't supposed to allocate
> >> memory - verify that it didn't allocate memory).
> >>
> >> 4. Code coverage: our tests are supposed to cover 100% of the code in
> >> that APIs call chain, do we have code that didn't run (missing/incorrect
> >> specs).
> >
> >
> >This is all good. I was asking the argument verification part of the
> >framework. Is it required for any of this? How?
>
> Specifications without enforcement are just documentation :)
>
> In my mind, there are a few reasons we want this:
>
> 1. For folks coding against the kernel, it's a way for them to know that
> the code they're writing fits within the spec of the kernel's API.
How is this different from just running the kernel normally? Running
the kernel normally is simpler, faster, and more precise.
> 2. Enforcement around kernel changes: think of a scenario where a flag
> is added to a syscall - the author of that change will have to also
> update the spec because otherwise the verification layer will complain
> about the new flag. This helps prevent divergence between the code and
> the spec.
It may be more useful to invoke verification, but does not return
early on verification errors, but instead memorize the result, and
still always run the actual syscall normally. Then if verification
produced an error, but the actual syscall has not returned the same
error, then WARN loudly.
This should provide the same value. But also does not rely on
correctly marked manually written tests to test the specification. It
will work automatically with any fuzzing/randomized testing, which I
assume will be more valuable for specification testing.
But then, as Cyril mentioned, this verification layer does not really
need to live in the kernel. Once the kernel has exported the
specification in machine-usable form, the same verification can be
done in user-space. Which is always a good idea.
> 3. Extra layer of security: we can choose to enable this as an
> additional layer to protect us from missing checks in our userspace
> facing API.
This will have additional risks, and performance overhead. Such
mitigations are usually assessed with % of past CVEs this could
prevent. That would allow us to assess cost/benefit.
Intuitively this does not look like worth doing to me.
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2025-07-01 6:12 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-14 13:48 [RFC 00/19] Kernel API Specification Framework Sasha Levin
2025-06-14 13:48 ` [RFC 01/19] kernel/api: introduce kernel API specification framework Sasha Levin
2025-06-14 13:48 ` [RFC 02/19] eventpoll: add API specification for epoll_create1 Sasha Levin
2025-06-14 13:48 ` [RFC 03/19] eventpoll: add API specification for epoll_create Sasha Levin
2025-06-14 13:48 ` [RFC 04/19] eventpoll: add API specification for epoll_ctl Sasha Levin
2025-06-14 13:48 ` [RFC 05/19] eventpoll: add API specification for epoll_wait Sasha Levin
2025-06-14 13:48 ` [RFC 06/19] eventpoll: add API specification for epoll_pwait Sasha Levin
2025-06-14 13:48 ` [RFC 07/19] eventpoll: add API specification for epoll_pwait2 Sasha Levin
2025-06-14 13:48 ` [RFC 08/19] exec: add API specification for execve Sasha Levin
2025-06-16 21:39 ` Florian Weimer
2025-06-17 1:51 ` Sasha Levin
2025-06-17 7:13 ` Florian Weimer
2025-06-17 22:58 ` Sasha Levin
2025-06-14 13:48 ` [RFC 09/19] exec: add API specification for execveat Sasha Levin
2025-06-14 13:48 ` [RFC 10/19] mm/mlock: add API specification for mlock Sasha Levin
2025-06-14 13:48 ` [RFC 11/19] mm/mlock: add API specification for mlock2 Sasha Levin
2025-06-14 13:48 ` [RFC 12/19] mm/mlock: add API specification for mlockall Sasha Levin
2025-06-14 13:48 ` [RFC 13/19] mm/mlock: add API specification for munlock Sasha Levin
2025-06-14 13:48 ` [RFC 14/19] mm/mlock: add API specification for munlockall Sasha Levin
2025-06-14 13:48 ` [RFC 15/19] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
2025-06-14 13:48 ` [RFC 16/19] kernel/api: add IOCTL specification infrastructure Sasha Levin
2025-06-14 13:48 ` [RFC 17/19] fwctl: add detailed IOCTL API specifications Sasha Levin
2025-06-14 13:48 ` [RFC 18/19] binder: " Sasha Levin
2025-06-14 13:48 ` [RFC 19/19] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2025-06-17 12:08 ` [RFC 00/19] Kernel API Specification Framework David Laight
2025-06-18 21:29 ` Kees Cook
2025-06-19 0:22 ` Sasha Levin
2025-06-23 13:28 ` Dmitry Vyukov
2025-06-24 14:06 ` Cyril Hrubis
2025-06-24 14:30 ` Dmitry Vyukov
2025-06-24 15:27 ` Cyril Hrubis
2025-06-24 20:04 ` Sasha Levin
2025-06-25 8:49 ` Dmitry Vyukov
2025-06-25 8:52 ` Dmitry Vyukov
2025-06-25 15:46 ` Cyril Hrubis
2025-06-25 15:55 ` Sasha Levin
2025-06-26 8:32 ` Dmitry Vyukov
2025-06-26 8:37 ` Dmitry Vyukov
2025-06-26 16:23 ` Sasha Levin
2025-06-27 6:23 ` Dmitry Vyukov
2025-06-30 14:27 ` Sasha Levin
2025-07-01 6:11 ` Dmitry Vyukov
2025-06-25 8:56 ` Dmitry Vyukov
2025-06-25 16:23 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).