* [RFC v2 00/22] Kernel API specification framework
@ 2025-06-24 18:07 Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
` (22 more replies)
0 siblings, 23 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Hey folks,
This is a second attempt at a "Kernel API Specification" framework, addressing
the feedback from the initial RFC and expanding the scope to include sysfs
attribute specifications.
Motivation
==========
The Linux kernel has one fundamental promise to userspace: we don't
break it. This promise is the foundation of Linux's success, allowing
applications written decades ago to still run on modern kernels. Yet
despite this being our most important commitment, we lack reliable
mechanisms to detect when we're about to break this promise.
Currently, we rely on:
- Developer vigilance and review
- User reports after release (often too late)
- Limited testing that can't cover all API usage patterns
- Documentation that's often incomplete or outdated
This gap between our commitment and our tooling is a fundamental problem.
We have sophisticated tools to catch memory leaks, race conditions, and
other bugs, but no systematic way to catch API breakage before it impacts
users.
As the kernel continues to grow, so do the interfaces exposed by it. This
applies to both the userspace API as well as the numerous internal APIs.
Over the years, we've accumulated a lot of documentation, but it's
sometimes lacking, spread out, and often out of date.
In the same way that we have runtime and static checkers to validate that
code is correct, we need a way to validate that the *use* of the various
APIs is correct and that changes don't break existing contracts.
This work aims to provide:
1. A machine-readable format to describe APIs.
2. Runtime validation of API contracts.
3. Generation of documentation and other artifacts.
4. Improved tooling for API exploration and debugging.
5. Most importantly: automated detection of API breakage.
With formal API specifications, we can:
- Detect when patches change API behavior in incompatible ways
- Validate that error codes, parameter constraints, and return values
remain consistent across kernel versions
- Generate automated tests that verify API contracts
- Provide userspace with machine-readable guarantees about kernel behavior
- Catch subtle breakage (like removing error codes, changing semantics,
or tightening constraints) that manual review might miss
The idea is to have an in-kernel API specification format that can be used
by tools such as:
- Static analysis tools (checkpatch, sparse, Coccinelle) to detect API
contract violations at compile time
- CI/CD systems to automatically flag potential userspace breakage
- Runtime verification (API contract validation) during testing
- Tracing and debugging (better BPF/ftrace integration)
- Documentation generation (automated, always up-to-date)
- Userspace helpers (interceptors, mocking frameworks, etc.)
- Fuzzers (can detect API contract violations, not just kernel crashes)
Where are we now?
=================
This series introduces a framework that allows developers to declare API
specifications directly in their subsystem code. These specifications are
then:
1. Exported via debugfs (making them runtime queryable)
2. Compiled into the kernel binary (accessible to tools)
3. Used for runtime validation (when enabled)
The `kapi` tool in tools/kapi/ can extract API specifications from:
- Source code (by parsing the KAPI macros)
- Running kernel (via debugfs)
- vmlinux binary (for offline analysis)
It produces output in multiple formats (plain text, JSON, RST) for easy
integration with existing workflows.
Changes since v1
================
- Added sysfs attribute validation support (patches 19-20)
- Added socket() syscall specification (patch 21)
- Enhanced signal handling with new actions (QUEUE, DISCARD, TRANSFORM)
- Expanded all API specifications with more detailed constraints
- Improved error handling documentation across all patches
- Added network/socket infrastructure to core framework
- Plumbed in syscall runtime validation
Sasha Levin (22):
kernel/api: introduce kernel API specification framework
eventpoll: add API specification for epoll_create1
eventpoll: add API specification for epoll_create
eventpoll: add API specification for epoll_ctl
eventpoll: add API specification for epoll_wait
eventpoll: add API specification for epoll_pwait
eventpoll: add API specification for epoll_pwait2
exec: add API specification for execve
exec: add API specification for execveat
mm/mlock: add API specification for mlock
mm/mlock: add API specification for mlock2
mm/mlock: add API specification for mlockall
mm/mlock: add API specification for munlock
mm/mlock: add API specification for munlockall
kernel/api: add debugfs interface for kernel API specifications
kernel/api: add IOCTL specification infrastructure
fwctl: add detailed IOCTL API specifications
binder: add detailed IOCTL API specifications
kernel/api: Add sysfs validation support to kernel API specification
framework
block: sysfs API specifications
net/socket: add API specification for socket()
tools/kapi: Add kernel API specification extraction tool
Documentation/admin-guide/kernel-api-spec.rst | 699 +++++++
MAINTAINERS | 9 +
arch/um/kernel/dyn.lds.S | 3 +
arch/um/kernel/uml.lds.S | 3 +
arch/x86/kernel/vmlinux.lds.S | 3 +
block/blk-integrity.c | 131 ++
block/blk-sysfs.c | 243 +++
block/genhd.c | 99 +
drivers/android/binder.c | 701 +++++++
drivers/fwctl/main.c | 285 ++-
fs/eventpoll.c | 1163 +++++++++++
fs/exec.c | 702 +++++++
include/asm-generic/vmlinux.lds.h | 20 +
include/linux/kernel_api_spec.h | 1841 +++++++++++++++++
include/linux/syscall_api_spec.h | 137 ++
include/linux/syscalls.h | 38 +
init/Kconfig | 2 +
kernel/Makefile | 1 +
kernel/api/Kconfig | 55 +
kernel/api/Makefile | 13 +
kernel/api/ioctl_validation.c | 355 ++++
kernel/api/kapi_debugfs.c | 340 +++
kernel/api/kernel_api_spec.c | 1531 ++++++++++++++
mm/mlock.c | 774 +++++++
net/socket.c | 489 +++++
tools/kapi/.gitignore | 4 +
tools/kapi/Cargo.toml | 19 +
tools/kapi/src/extractor/debugfs.rs | 415 ++++
tools/kapi/src/extractor/mod.rs | 411 ++++
tools/kapi/src/extractor/source_parser.rs | 1625 +++++++++++++++
.../src/extractor/vmlinux/binary_utils.rs | 283 +++
tools/kapi/src/extractor/vmlinux/mod.rs | 989 +++++++++
tools/kapi/src/formatter/json.rs | 420 ++++
tools/kapi/src/formatter/mod.rs | 130 ++
tools/kapi/src/formatter/plain.rs | 465 +++++
tools/kapi/src/formatter/rst.rs | 468 +++++
tools/kapi/src/formatter/shall.rs | 605 ++++++
tools/kapi/src/main.rs | 130 ++
38 files changed, 15598 insertions(+), 3 deletions(-)
create mode 100644 Documentation/admin-guide/kernel-api-spec.rst
create mode 100644 include/linux/kernel_api_spec.h
create mode 100644 include/linux/syscall_api_spec.h
create mode 100644 kernel/api/Kconfig
create mode 100644 kernel/api/Makefile
create mode 100644 kernel/api/ioctl_validation.c
create mode 100644 kernel/api/kapi_debugfs.c
create mode 100644 kernel/api/kernel_api_spec.c
create mode 100644 tools/kapi/.gitignore
create mode 100644 tools/kapi/Cargo.toml
create mode 100644 tools/kapi/src/extractor/debugfs.rs
create mode 100644 tools/kapi/src/extractor/mod.rs
create mode 100644 tools/kapi/src/extractor/source_parser.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
create mode 100644 tools/kapi/src/formatter/json.rs
create mode 100644 tools/kapi/src/formatter/mod.rs
create mode 100644 tools/kapi/src/formatter/plain.rs
create mode 100644 tools/kapi/src/formatter/rst.rs
create mode 100644 tools/kapi/src/formatter/shall.rs
create mode 100644 tools/kapi/src/main.rs
--
2.39.5
^ permalink raw reply [flat|nested] 33+ messages in thread
* [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-30 19:53 ` Jonathan Corbet
2025-06-24 18:07 ` [RFC v2 02/22] eventpoll: add API specification for epoll_create1 Sasha Levin
` (21 subsequent siblings)
22 siblings, 1 reply; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add a comprehensive framework for formally documenting kernel APIs with
inline specifications. This framework provides:
- Structured API documentation with parameter specifications, return
values, error conditions, and execution context requirements
- Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
- Export of specifications via debugfs for tooling integration
- Support for both internal kernel APIs and system calls
The framework stores specifications in a dedicated ELF section and
provides infrastructure for:
- Compile-time validation of specifications
- Runtime querying of API documentation
- Machine-readable export formats
- Integration with existing SYSCALL_DEFINE macros
This commit introduces the core infrastructure without modifying any
existing APIs. Subsequent patches will add specifications to individual
subsystems.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Documentation/admin-guide/kernel-api-spec.rst | 507 ++++++
MAINTAINERS | 9 +
arch/um/kernel/dyn.lds.S | 3 +
arch/um/kernel/uml.lds.S | 3 +
arch/x86/kernel/vmlinux.lds.S | 3 +
include/asm-generic/vmlinux.lds.h | 20 +
include/linux/kernel_api_spec.h | 1513 +++++++++++++++++
include/linux/syscall_api_spec.h | 137 ++
include/linux/syscalls.h | 38 +
init/Kconfig | 2 +
kernel/Makefile | 1 +
kernel/api/Kconfig | 35 +
kernel/api/Makefile | 7 +
kernel/api/kernel_api_spec.c | 1122 ++++++++++++
14 files changed, 3400 insertions(+)
create mode 100644 Documentation/admin-guide/kernel-api-spec.rst
create mode 100644 include/linux/kernel_api_spec.h
create mode 100644 include/linux/syscall_api_spec.h
create mode 100644 kernel/api/Kconfig
create mode 100644 kernel/api/Makefile
create mode 100644 kernel/api/kernel_api_spec.c
diff --git a/Documentation/admin-guide/kernel-api-spec.rst b/Documentation/admin-guide/kernel-api-spec.rst
new file mode 100644
index 0000000000000..3a63f6711e27b
--- /dev/null
+++ b/Documentation/admin-guide/kernel-api-spec.rst
@@ -0,0 +1,507 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Kernel API Specification Framework
+======================================
+
+:Author: Sasha Levin <sashal@kernel.org>
+:Date: June 2025
+
+.. contents:: Table of Contents
+ :depth: 3
+ :local:
+
+Introduction
+============
+
+The Kernel API Specification Framework (KAPI) provides a comprehensive system for
+formally documenting, validating, and introspecting kernel APIs. This framework
+addresses the long-standing challenge of maintaining accurate, machine-readable
+documentation for the thousands of internal kernel APIs and system calls.
+
+Purpose and Goals
+-----------------
+
+The framework aims to:
+
+1. **Improve API Documentation**: Provide structured, inline documentation that
+ lives alongside the code and is maintained as part of the development process.
+
+2. **Enable Runtime Validation**: Optionally validate API usage at runtime to catch
+ common programming errors during development and testing.
+
+3. **Support Tooling**: Export API specifications in machine-readable formats for
+ use by static analyzers, documentation generators, and development tools.
+
+4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
+ for debugging and introspection.
+
+5. **Formalize Contracts**: Explicitly document API contracts including parameter
+ constraints, execution contexts, locking requirements, and side effects.
+
+Architecture Overview
+=====================
+
+Components
+----------
+
+The framework consists of several key components:
+
+1. **Core Framework** (``kernel/api/kernel_api_spec.c``)
+
+ - API specification registration and storage
+ - Runtime validation engine
+ - Specification lookup and querying
+
+2. **DebugFS Interface** (``kernel/api/kapi_debugfs.c``)
+
+ - Runtime introspection via ``/sys/kernel/debug/kapi/``
+ - JSON and XML export formats
+ - Per-API detailed information
+
+3. **IOCTL Support** (``kernel/api/ioctl_validation.c``)
+
+ - Extended framework for IOCTL specifications
+ - Automatic validation wrappers
+ - Structure field validation
+
+4. **Specification Macros** (``include/linux/kernel_api_spec.h``)
+
+ - Declarative macros for API documentation
+ - Type-safe parameter specifications
+ - Context and constraint definitions
+
+Data Model
+----------
+
+The framework uses a hierarchical data model::
+
+ kernel_api_spec
+ ├── Basic Information
+ │ ├── name (API function name)
+ │ ├── version (specification version)
+ │ ├── description (human-readable description)
+ │ └── kernel_version (when API was introduced)
+ │
+ ├── Parameters (up to 16)
+ │ └── kapi_param_spec
+ │ ├── name
+ │ ├── type (int, pointer, string, etc.)
+ │ ├── direction (in, out, inout)
+ │ ├── constraints (range, mask, enum values)
+ │ └── validation rules
+ │
+ ├── Return Value
+ │ └── kapi_return_spec
+ │ ├── type
+ │ ├── success conditions
+ │ └── validation rules
+ │
+ ├── Error Conditions (up to 32)
+ │ └── kapi_error_spec
+ │ ├── error code
+ │ ├── condition description
+ │ └── recovery advice
+ │
+ ├── Execution Context
+ │ ├── allowed contexts (process, interrupt, etc.)
+ │ ├── locking requirements
+ │ └── preemption/interrupt state
+ │
+ └── Side Effects
+ ├── memory allocation
+ ├── state changes
+ └── signal handling
+
+Usage Guide
+===========
+
+Basic API Specification
+-----------------------
+
+To document a kernel API, use the specification macros in the implementation file:
+
+.. code-block:: c
+
+ #include <linux/kernel_api_spec.h>
+
+ KAPI_DEFINE_SPEC(kmalloc_spec, kmalloc, "3.0")
+ KAPI_DESCRIPTION("Allocate kernel memory")
+ KAPI_PARAM(0, size, KAPI_TYPE_SIZE_T, KAPI_DIR_IN,
+ "Number of bytes to allocate")
+ KAPI_PARAM_RANGE(0, 0, KMALLOC_MAX_SIZE)
+ KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+ "Allocation flags (GFP_*)")
+ KAPI_PARAM_MASK(1, __GFP_BITS_MASK)
+ KAPI_RETURN(KAPI_TYPE_POINTER, "Pointer to allocated memory or NULL")
+ KAPI_ERROR(ENOMEM, "Out of memory")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SOFTIRQ | KAPI_CTX_HARDIRQ)
+ KAPI_SIDE_EFFECT("Allocates memory from kernel heap")
+ KAPI_LOCK_NOT_REQUIRED("Any lock")
+ KAPI_END_SPEC
+
+ void *kmalloc(size_t size, gfp_t flags)
+ {
+ /* Implementation */
+ }
+
+System Call Specification
+-------------------------
+
+System calls use specialized macros:
+
+.. code-block:: c
+
+ KAPI_DEFINE_SYSCALL_SPEC(open_spec, open, "1.0")
+ KAPI_DESCRIPTION("Open a file")
+ KAPI_PARAM(0, pathname, KAPI_TYPE_USER_STRING, KAPI_DIR_IN,
+ "Path to file")
+ KAPI_PARAM_PATH(0, PATH_MAX)
+ KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+ "Open flags (O_*)")
+ KAPI_PARAM(2, mode, KAPI_TYPE_MODE_T, KAPI_DIR_IN,
+ "File permissions (if creating)")
+ KAPI_RETURN(KAPI_TYPE_INT, "File descriptor or -1")
+ KAPI_ERROR(EACCES, "Permission denied")
+ KAPI_ERROR(ENOENT, "File does not exist")
+ KAPI_ERROR(EMFILE, "Too many open files")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+ KAPI_SIGNAL(EINTR, "Open can be interrupted by signal")
+ KAPI_END_SYSCALL_SPEC
+
+IOCTL Specification
+-------------------
+
+IOCTLs have extended support for structure validation:
+
+.. code-block:: c
+
+ KAPI_DEFINE_IOCTL_SPEC(vidioc_querycap_spec, VIDIOC_QUERYCAP,
+ "VIDIOC_QUERYCAP",
+ sizeof(struct v4l2_capability),
+ sizeof(struct v4l2_capability),
+ "video_fops")
+ KAPI_DESCRIPTION("Query device capabilities")
+ KAPI_IOCTL_FIELD(driver, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+ "Driver name", 16)
+ KAPI_IOCTL_FIELD(card, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+ "Device name", 32)
+ KAPI_IOCTL_FIELD(version, KAPI_TYPE_U32, KAPI_DIR_OUT,
+ "Driver version")
+ KAPI_IOCTL_FIELD(capabilities, KAPI_TYPE_FLAGS, KAPI_DIR_OUT,
+ "Device capabilities")
+ KAPI_END_IOCTL_SPEC
+
+Runtime Validation
+==================
+
+Enabling Validation
+-------------------
+
+Runtime validation is controlled by kernel configuration:
+
+1. Enable ``CONFIG_KAPI_SPEC`` to build the framework
+2. Enable ``CONFIG_KAPI_RUNTIME_CHECKS`` for runtime validation
+3. Optionally enable ``CONFIG_KAPI_SPEC_DEBUGFS`` for debugfs interface
+
+Validation Modes
+----------------
+
+The framework supports several validation modes:
+
+.. code-block:: c
+
+ /* Enable validation for specific API */
+ kapi_enable_validation("kmalloc");
+
+ /* Enable validation for all APIs */
+ kapi_enable_all_validation();
+
+ /* Set validation level */
+ kapi_set_validation_level(KAPI_VALIDATE_FULL);
+
+Validation Levels:
+
+- ``KAPI_VALIDATE_NONE``: No validation
+- ``KAPI_VALIDATE_BASIC``: Type and NULL checks only
+- ``KAPI_VALIDATE_NORMAL``: Basic + range and constraint checks
+- ``KAPI_VALIDATE_FULL``: All checks including custom validators
+
+Custom Validators
+-----------------
+
+APIs can register custom validation functions:
+
+.. code-block:: c
+
+ static bool validate_buffer_size(const struct kapi_param_spec *spec,
+ const void *value, void *context)
+ {
+ size_t size = *(size_t *)value;
+ struct my_context *ctx = context;
+
+ return size > 0 && size <= ctx->max_buffer_size;
+ }
+
+ KAPI_PARAM_CUSTOM_VALIDATOR(0, validate_buffer_size)
+
+DebugFS Interface
+=================
+
+The debugfs interface provides runtime access to API specifications:
+
+Directory Structure
+-------------------
+
+::
+
+ /sys/kernel/debug/kapi/
+ ├── apis/ # All registered APIs
+ │ ├── kmalloc/
+ │ │ ├── specification # Human-readable spec
+ │ │ ├── json # JSON format
+ │ │ └── xml # XML format
+ │ └── open/
+ │ └── ...
+ ├── summary # Overview of all APIs
+ ├── validation/ # Validation controls
+ │ ├── enabled # Global enable/disable
+ │ ├── level # Validation level
+ │ └── stats # Validation statistics
+ └── export/ # Bulk export options
+ ├── all.json # All specs in JSON
+ └── all.xml # All specs in XML
+
+Usage Examples
+--------------
+
+Query specific API::
+
+ $ cat /sys/kernel/debug/kapi/apis/kmalloc/specification
+ API: kmalloc
+ Version: 3.0
+ Description: Allocate kernel memory
+
+ Parameters:
+ [0] size (size_t, in): Number of bytes to allocate
+ Range: 0 - 4194304
+ [1] flags (flags, in): Allocation flags (GFP_*)
+ Mask: 0x1ffffff
+
+ Returns: pointer - Pointer to allocated memory or NULL
+
+ Errors:
+ ENOMEM: Out of memory
+
+ Context: process, softirq, hardirq
+
+ Side Effects:
+ - Allocates memory from kernel heap
+
+Export all specifications::
+
+ $ cat /sys/kernel/debug/kapi/export/all.json > kernel-apis.json
+
+Enable validation for specific API::
+
+ $ echo 1 > /sys/kernel/debug/kapi/apis/kmalloc/validate
+
+Performance Considerations
+==========================
+
+Memory Overhead
+---------------
+
+Each API specification consumes approximately 2-4KB of memory. With thousands
+of kernel APIs, this can add up to several megabytes. Consider:
+
+1. Building with ``CONFIG_KAPI_SPEC=n`` for production kernels
+2. Using ``__init`` annotations for APIs only used during boot
+3. Implementing lazy loading for rarely used specifications
+
+Runtime Overhead
+----------------
+
+When ``CONFIG_KAPI_RUNTIME_CHECKS`` is enabled:
+
+- Each validated API call adds 50-200ns overhead
+- Complex validations (custom validators) may add more
+- Use validation only in development/testing kernels
+
+Optimization Strategies
+-----------------------
+
+1. **Compile-time optimization**: When validation is disabled, all
+ validation code is optimized away by the compiler.
+
+2. **Selective validation**: Enable validation only for specific APIs
+ or subsystems under test.
+
+3. **Caching**: The framework caches validation results for repeated
+ calls with identical parameters.
+
+Documentation Generation
+------------------------
+
+The framework exports specifications via debugfs that can be used
+to generate documentation. Tools for automatic documentation generation
+from specifications are planned for future development.
+
+IDE Integration
+---------------
+
+Modern IDEs can use the JSON export for:
+
+- Parameter hints
+- Type checking
+- Context validation
+- Error code documentation
+
+Testing Framework
+-----------------
+
+The framework includes test helpers::
+
+ #ifdef CONFIG_KAPI_TESTING
+ /* Verify API behaves according to specification */
+ kapi_test_api("kmalloc", test_cases);
+ #endif
+
+Best Practices
+==============
+
+Writing Specifications
+----------------------
+
+1. **Be Comprehensive**: Document all parameters, errors, and side effects
+2. **Keep Updated**: Update specs when API behavior changes
+3. **Use Examples**: Include usage examples in descriptions
+4. **Validate Constraints**: Define realistic constraints for parameters
+5. **Document Context**: Clearly specify allowed execution contexts
+
+Maintenance
+-----------
+
+1. **Version Specifications**: Increment version when API changes
+2. **Deprecation**: Mark deprecated APIs and suggest replacements
+3. **Cross-reference**: Link related APIs in descriptions
+4. **Test Specifications**: Verify specs match implementation
+
+Common Patterns
+---------------
+
+**Optional Parameters**::
+
+ KAPI_PARAM(2, optional_arg, KAPI_TYPE_POINTER, KAPI_DIR_IN,
+ "Optional argument (may be NULL)")
+ KAPI_PARAM_OPTIONAL(2)
+
+**Variable Arguments**::
+
+ KAPI_PARAM(1, fmt, KAPI_TYPE_FORMAT_STRING, KAPI_DIR_IN,
+ "Printf-style format string")
+ KAPI_PARAM_VARIADIC(2, "Format arguments")
+
+**Callback Functions**::
+
+ KAPI_PARAM(1, callback, KAPI_TYPE_FUNCTION_PTR, KAPI_DIR_IN,
+ "Callback function")
+ KAPI_PARAM_CALLBACK(1, "int (*)(void *data)", "data")
+
+Troubleshooting
+===============
+
+Common Issues
+-------------
+
+**Specification Not Found**::
+
+ kernel: KAPI: Specification for 'my_api' not found
+
+ Solution: Ensure KAPI_DEFINE_SPEC is in the same translation unit
+ as the function implementation.
+
+**Validation Failures**::
+
+ kernel: KAPI: Validation failed for kmalloc parameter 'size':
+ value 5242880 exceeds maximum 4194304
+
+ Solution: Check parameter constraints or adjust specification if
+ the constraint is incorrect.
+
+**Build Errors**::
+
+ error: 'KAPI_TYPE_UNKNOWN' undeclared
+
+ Solution: Include <linux/kernel_api_spec.h> and ensure
+ CONFIG_KAPI_SPEC is enabled.
+
+Debug Options
+-------------
+
+Enable verbose debugging::
+
+ echo 8 > /proc/sys/kernel/printk
+ echo 1 > /sys/kernel/debug/kapi/debug/verbose
+
+Future Directions
+=================
+
+Planned Features
+----------------
+
+1. **Automatic Extraction**: Tool to extract specifications from existing
+ kernel-doc comments
+
+2. **Contract Verification**: Static analysis to verify implementation
+ matches specification
+
+3. **Performance Profiling**: Measure actual API performance against
+ documented expectations
+
+4. **Fuzzing Integration**: Use specifications to guide intelligent
+ fuzzing of kernel APIs
+
+5. **Version Compatibility**: Track API changes across kernel versions
+
+Research Areas
+--------------
+
+1. **Formal Verification**: Use specifications for mathematical proofs
+ of correctness
+
+2. **Runtime Monitoring**: Detect specification violations in production
+ with minimal overhead
+
+3. **API Evolution**: Analyze how kernel APIs change over time
+
+4. **Security Applications**: Use specifications for security policy
+ enforcement
+
+Contributing
+============
+
+Submitting Specifications
+-------------------------
+
+1. Add specifications to the same file as the API implementation
+2. Follow existing patterns and naming conventions
+3. Test with CONFIG_KAPI_RUNTIME_CHECKS enabled
+4. Verify debugfs output is correct
+5. Run scripts/checkpatch.pl on your changes
+
+Review Criteria
+---------------
+
+Specifications will be reviewed for:
+
+1. **Completeness**: All parameters and errors documented
+2. **Accuracy**: Specification matches implementation
+3. **Clarity**: Descriptions are clear and helpful
+4. **Consistency**: Follows framework conventions
+5. **Performance**: No unnecessary runtime overhead
+
+Contact
+-------
+
+- Maintainer: Sasha Levin <sashal@kernel.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index c3f7fbd0d67af..759f6c0b9a4dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13047,6 +13047,15 @@ W: https://linuxtv.org
T: git git://linuxtv.org/media.git
F: drivers/media/radio/radio-keene*
+KERNEL API SPECIFICATION FRAMEWORK (KAPI)
+M: Sasha Levin <sashal@kernel.org>
+L: linux-api@vger.kernel.org
+S: Maintained
+F: Documentation/admin-guide/kernel-api-spec.rst
+F: include/linux/kernel_api_spec.h
+F: kernel/api/
+F: scripts/extract-kapi-spec.sh
+
KERNEL AUTOMOUNTER
M: Ian Kent <raven@themaw.net>
L: autofs@vger.kernel.org
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index a36b7918a011a..283ab11788d8c 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -102,6 +102,9 @@ SECTIONS
init.data : { INIT_DATA }
__init_end = .;
+ /* Kernel API specifications in dedicated section */
+ KAPI_SPECS_SECTION()
+
/* Ensure the __preinit_array_start label is properly aligned. We
could instead move the label definition inside the section, but
the linker would then create the section even if it turns out to
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index a409d4b66114f..e3850d8293436 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -74,6 +74,9 @@ SECTIONS
init.data : { INIT_DATA }
__init_end = .;
+ /* Kernel API specifications in dedicated section */
+ KAPI_SPECS_SECTION()
+
.data :
{
INIT_TASK_DATA(KERNEL_STACK_SIZE)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4fa0be732af10..8cc508adc9d51 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -173,6 +173,9 @@ SECTIONS
RO_DATA(PAGE_SIZE)
X86_ALIGN_RODATA_END
+ /* Kernel API specifications in dedicated section */
+ KAPI_SPECS_SECTION()
+
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) {
/* Start of data section */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index fa5f19b8d53a0..7b47736057e01 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -279,6 +279,26 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
#define TRACE_SYSCALLS()
#endif
+#ifdef CONFIG_KAPI_SPEC
+#define KAPI_SPECS() \
+ . = ALIGN(8); \
+ __start_kapi_specs = .; \
+ KEEP(*(.kapi_specs)) \
+ __stop_kapi_specs = .;
+
+/* For placing KAPI specs in a dedicated section */
+#define KAPI_SPECS_SECTION() \
+ .kapi_specs : AT(ADDR(.kapi_specs) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __start_kapi_specs = .; \
+ KEEP(*(.kapi_specs)) \
+ __stop_kapi_specs = .; \
+ }
+#else
+#define KAPI_SPECS()
+#define KAPI_SPECS_SECTION()
+#endif
+
#ifdef CONFIG_BPF_EVENTS
#define BPF_RAW_TP() STRUCT_ALIGN(); \
BOUNDED_SECTION_BY(__bpf_raw_tp_map, __bpf_raw_tp)
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
new file mode 100644
index 0000000000000..d8439d411f41e
--- /dev/null
+++ b/include/linux/kernel_api_spec.h
@@ -0,0 +1,1513 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * kernel_api_spec.h - Kernel API Formal Specification Framework
+ *
+ * This framework provides structures and macros to formally specify kernel APIs
+ * in both human and machine-readable formats. It supports comprehensive documentation
+ * of function signatures, parameters, return values, error conditions, and constraints.
+ */
+
+#ifndef _LINUX_KERNEL_API_SPEC_H
+#define _LINUX_KERNEL_API_SPEC_H
+
+#include <linux/types.h>
+#include <linux/stringify.h>
+#include <linux/compiler.h>
+
+struct sigaction;
+
+#define KAPI_MAX_PARAMS 16
+#define KAPI_MAX_ERRORS 32
+#define KAPI_MAX_CONSTRAINTS 16
+#define KAPI_MAX_SIGNALS 32
+#define KAPI_MAX_NAME_LEN 128
+#define KAPI_MAX_DESC_LEN 512
+#define KAPI_MAX_CAPABILITIES 8
+#define KAPI_MAX_SOCKET_STATES 16
+#define KAPI_MAX_PROTOCOL_BEHAVIORS 8
+#define KAPI_MAX_NET_ERRORS 16
+#define KAPI_MAX_SOCKOPTS 16
+#define KAPI_MAX_ADDR_FAMILIES 8
+
+/**
+ * enum kapi_param_type - Parameter type classification
+ * @KAPI_TYPE_VOID: void type
+ * @KAPI_TYPE_INT: Integer types (int, long, etc.)
+ * @KAPI_TYPE_UINT: Unsigned integer types
+ * @KAPI_TYPE_PTR: Pointer types
+ * @KAPI_TYPE_STRUCT: Structure types
+ * @KAPI_TYPE_UNION: Union types
+ * @KAPI_TYPE_ENUM: Enumeration types
+ * @KAPI_TYPE_FUNC_PTR: Function pointer types
+ * @KAPI_TYPE_ARRAY: Array types
+ * @KAPI_TYPE_FD: File descriptor - validated in process context
+ * @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size
+ * @KAPI_TYPE_PATH: Pathname - validated for access and path limits
+ * @KAPI_TYPE_CUSTOM: Custom/complex types
+ */
+enum kapi_param_type {
+ KAPI_TYPE_VOID = 0,
+ KAPI_TYPE_INT,
+ KAPI_TYPE_UINT,
+ KAPI_TYPE_PTR,
+ KAPI_TYPE_STRUCT,
+ KAPI_TYPE_UNION,
+ KAPI_TYPE_ENUM,
+ KAPI_TYPE_FUNC_PTR,
+ KAPI_TYPE_ARRAY,
+ KAPI_TYPE_FD, /* File descriptor - validated in process context */
+ KAPI_TYPE_USER_PTR, /* User space pointer - validated for access and size */
+ KAPI_TYPE_PATH, /* Pathname - validated for access and path limits */
+ KAPI_TYPE_CUSTOM,
+};
+
+/**
+ * enum kapi_param_flags - Parameter attribute flags
+ * @KAPI_PARAM_IN: Input parameter
+ * @KAPI_PARAM_OUT: Output parameter
+ * @KAPI_PARAM_INOUT: Input/output parameter
+ * @KAPI_PARAM_OPTIONAL: Optional parameter (can be NULL)
+ * @KAPI_PARAM_CONST: Const qualified parameter
+ * @KAPI_PARAM_VOLATILE: Volatile qualified parameter
+ * @KAPI_PARAM_USER: User space pointer
+ * @KAPI_PARAM_DMA: DMA-capable memory required
+ * @KAPI_PARAM_ALIGNED: Alignment requirements
+ */
+enum kapi_param_flags {
+ KAPI_PARAM_IN = (1 << 0),
+ KAPI_PARAM_OUT = (1 << 1),
+ KAPI_PARAM_INOUT = (1 << 2),
+ KAPI_PARAM_OPTIONAL = (1 << 3),
+ KAPI_PARAM_CONST = (1 << 4),
+ KAPI_PARAM_VOLATILE = (1 << 5),
+ KAPI_PARAM_USER = (1 << 6),
+ KAPI_PARAM_DMA = (1 << 7),
+ KAPI_PARAM_ALIGNED = (1 << 8),
+};
+
+/**
+ * enum kapi_context_flags - Function execution context flags
+ * @KAPI_CTX_PROCESS: Can be called from process context
+ * @KAPI_CTX_SOFTIRQ: Can be called from softirq context
+ * @KAPI_CTX_HARDIRQ: Can be called from hardirq context
+ * @KAPI_CTX_NMI: Can be called from NMI context
+ * @KAPI_CTX_ATOMIC: Must be called in atomic context
+ * @KAPI_CTX_SLEEPABLE: May sleep
+ * @KAPI_CTX_PREEMPT_DISABLED: Requires preemption disabled
+ * @KAPI_CTX_IRQ_DISABLED: Requires interrupts disabled
+ */
+enum kapi_context_flags {
+ KAPI_CTX_PROCESS = (1 << 0),
+ KAPI_CTX_SOFTIRQ = (1 << 1),
+ KAPI_CTX_HARDIRQ = (1 << 2),
+ KAPI_CTX_NMI = (1 << 3),
+ KAPI_CTX_ATOMIC = (1 << 4),
+ KAPI_CTX_SLEEPABLE = (1 << 5),
+ KAPI_CTX_PREEMPT_DISABLED = (1 << 6),
+ KAPI_CTX_IRQ_DISABLED = (1 << 7),
+};
+
+/**
+ * enum kapi_lock_type - Lock types used/required by the function
+ * @KAPI_LOCK_NONE: No locking requirements
+ * @KAPI_LOCK_MUTEX: Mutex lock
+ * @KAPI_LOCK_SPINLOCK: Spinlock
+ * @KAPI_LOCK_RWLOCK: Read-write lock
+ * @KAPI_LOCK_SEQLOCK: Sequence lock
+ * @KAPI_LOCK_RCU: RCU lock
+ * @KAPI_LOCK_SEMAPHORE: Semaphore
+ * @KAPI_LOCK_CUSTOM: Custom locking mechanism
+ */
+enum kapi_lock_type {
+ KAPI_LOCK_NONE = 0,
+ KAPI_LOCK_MUTEX,
+ KAPI_LOCK_SPINLOCK,
+ KAPI_LOCK_RWLOCK,
+ KAPI_LOCK_SEQLOCK,
+ KAPI_LOCK_RCU,
+ KAPI_LOCK_SEMAPHORE,
+ KAPI_LOCK_CUSTOM,
+};
+
+/**
+ * enum kapi_constraint_type - Types of parameter constraints
+ * @KAPI_CONSTRAINT_NONE: No constraint
+ * @KAPI_CONSTRAINT_RANGE: Numeric range constraint
+ * @KAPI_CONSTRAINT_MASK: Bitmask constraint
+ * @KAPI_CONSTRAINT_ENUM: Enumerated values constraint
+ * @KAPI_CONSTRAINT_CUSTOM: Custom validation function
+ */
+enum kapi_constraint_type {
+ KAPI_CONSTRAINT_NONE = 0,
+ KAPI_CONSTRAINT_RANGE,
+ KAPI_CONSTRAINT_MASK,
+ KAPI_CONSTRAINT_ENUM,
+ KAPI_CONSTRAINT_CUSTOM,
+};
+
+/**
+ * struct kapi_param_spec - Parameter specification
+ * @name: Parameter name
+ * @type_name: Type name as string
+ * @type: Parameter type classification
+ * @flags: Parameter attribute flags
+ * @size: Size in bytes (for arrays/buffers)
+ * @alignment: Required alignment
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag parameters)
+ * @enum_values: Array of valid enumerated values
+ * @enum_count: Number of valid enumerated values
+ * @constraint_type: Type of constraint applied
+ * @validate: Custom validation function
+ * @description: Human-readable description
+ * @constraints: Additional constraints description
+ * @size_param_idx: Index of parameter that determines size (-1 if fixed size)
+ * @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct))
+ */
+struct kapi_param_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ char type_name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ u32 flags;
+ size_t size;
+ size_t alignment;
+ s64 min_value;
+ s64 max_value;
+ u64 valid_mask;
+ const s64 *enum_values;
+ u32 enum_count;
+ enum kapi_constraint_type constraint_type;
+ bool (*validate)(s64 value);
+ char description[KAPI_MAX_DESC_LEN];
+ char constraints[KAPI_MAX_DESC_LEN];
+ int size_param_idx; /* Index of param that determines size, -1 if N/A */
+ size_t size_multiplier; /* Size per unit (e.g., sizeof(struct epoll_event)) */
+} __attribute__((packed));
+
+/**
+ * struct kapi_error_spec - Error condition specification
+ * @error_code: Error code value
+ * @name: Error code name (e.g., "EINVAL")
+ * @condition: Condition that triggers this error
+ * @description: Detailed error description
+ */
+struct kapi_error_spec {
+ int error_code;
+ char name[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_return_check_type - Return value check types
+ * @KAPI_RETURN_EXACT: Success is an exact value
+ * @KAPI_RETURN_RANGE: Success is within a range
+ * @KAPI_RETURN_ERROR_CHECK: Success is when NOT in error list
+ * @KAPI_RETURN_FD: Return value is a file descriptor (>= 0 is success)
+ * @KAPI_RETURN_CUSTOM: Custom validation function
+ */
+enum kapi_return_check_type {
+ KAPI_RETURN_EXACT,
+ KAPI_RETURN_RANGE,
+ KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_FD,
+ KAPI_RETURN_CUSTOM,
+};
+
+/**
+ * struct kapi_return_spec - Return value specification
+ * @type_name: Return type name
+ * @type: Return type classification
+ * @check_type: Type of success check to perform
+ * @success_value: Exact value indicating success (for EXACT)
+ * @success_min: Minimum success value (for RANGE)
+ * @success_max: Maximum success value (for RANGE)
+ * @error_values: Array of error values (for ERROR_CHECK)
+ * @error_count: Number of error values
+ * @is_success: Custom function to check success
+ * @description: Return value description
+ */
+struct kapi_return_spec {
+ char type_name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ enum kapi_return_check_type check_type;
+ s64 success_value;
+ s64 success_min;
+ s64 success_max;
+ const s64 *error_values;
+ u32 error_count;
+ bool (*is_success)(s64 retval);
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_lock_spec - Lock requirement specification
+ * @lock_name: Name of the lock
+ * @lock_type: Type of lock
+ * @acquired: Whether function acquires this lock
+ * @released: Whether function releases this lock
+ * @held_on_entry: Whether lock must be held on entry
+ * @held_on_exit: Whether lock is held on exit
+ * @description: Additional lock requirements
+ */
+struct kapi_lock_spec {
+ char lock_name[KAPI_MAX_NAME_LEN];
+ enum kapi_lock_type lock_type;
+ bool acquired;
+ bool released;
+ bool held_on_entry;
+ bool held_on_exit;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_constraint_spec - Additional constraint specification
+ * @name: Constraint name
+ * @description: Constraint description
+ * @expression: Formal expression (if applicable)
+ */
+struct kapi_constraint_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ char expression[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_signal_direction - Signal flow direction
+ * @KAPI_SIGNAL_RECEIVE: Function may receive this signal
+ * @KAPI_SIGNAL_SEND: Function may send this signal
+ * @KAPI_SIGNAL_HANDLE: Function handles this signal specially
+ * @KAPI_SIGNAL_BLOCK: Function blocks this signal
+ * @KAPI_SIGNAL_IGNORE: Function ignores this signal
+ */
+enum kapi_signal_direction {
+ KAPI_SIGNAL_RECEIVE = (1 << 0),
+ KAPI_SIGNAL_SEND = (1 << 1),
+ KAPI_SIGNAL_HANDLE = (1 << 2),
+ KAPI_SIGNAL_BLOCK = (1 << 3),
+ KAPI_SIGNAL_IGNORE = (1 << 4),
+};
+
+/**
+ * enum kapi_signal_action - What the function does with the signal
+ * @KAPI_SIGNAL_ACTION_DEFAULT: Default signal action applies
+ * @KAPI_SIGNAL_ACTION_TERMINATE: Causes termination
+ * @KAPI_SIGNAL_ACTION_COREDUMP: Causes termination with core dump
+ * @KAPI_SIGNAL_ACTION_STOP: Stops the process
+ * @KAPI_SIGNAL_ACTION_CONTINUE: Continues a stopped process
+ * @KAPI_SIGNAL_ACTION_CUSTOM: Custom handling described in notes
+ * @KAPI_SIGNAL_ACTION_RETURN: Returns from syscall with EINTR
+ * @KAPI_SIGNAL_ACTION_RESTART: Restarts the syscall
+ * @KAPI_SIGNAL_ACTION_QUEUE: Queues the signal for later delivery
+ * @KAPI_SIGNAL_ACTION_DISCARD: Discards the signal
+ * @KAPI_SIGNAL_ACTION_TRANSFORM: Transforms to another signal
+ */
+enum kapi_signal_action {
+ KAPI_SIGNAL_ACTION_DEFAULT = 0,
+ KAPI_SIGNAL_ACTION_TERMINATE,
+ KAPI_SIGNAL_ACTION_COREDUMP,
+ KAPI_SIGNAL_ACTION_STOP,
+ KAPI_SIGNAL_ACTION_CONTINUE,
+ KAPI_SIGNAL_ACTION_CUSTOM,
+ KAPI_SIGNAL_ACTION_RETURN,
+ KAPI_SIGNAL_ACTION_RESTART,
+ KAPI_SIGNAL_ACTION_QUEUE,
+ KAPI_SIGNAL_ACTION_DISCARD,
+ KAPI_SIGNAL_ACTION_TRANSFORM,
+};
+
+/**
+ * struct kapi_signal_spec - Signal specification
+ * @signal_num: Signal number (e.g., SIGKILL, SIGTERM)
+ * @signal_name: Signal name as string
+ * @direction: Direction flags (OR of kapi_signal_direction)
+ * @action: What happens when signal is received
+ * @target: Description of target process/thread for sent signals
+ * @condition: Condition under which signal is sent/received/handled
+ * @description: Detailed description of signal handling
+ * @restartable: Whether syscall is restartable after this signal
+ * @sa_flags_required: Required signal action flags (SA_*)
+ * @sa_flags_forbidden: Forbidden signal action flags
+ * @error_on_signal: Error code returned when signal occurs (-EINTR, etc)
+ * @transform_to: Signal number to transform to (if action is TRANSFORM)
+ * @timing: When signal can occur ("entry", "during", "exit", "anytime")
+ * @priority: Signal handling priority (lower processed first)
+ * @interruptible: Whether this operation is interruptible by this signal
+ * @queue_behavior: How signal is queued ("realtime", "standard", "coalesce")
+ * @state_required: Required process state for signal to be delivered
+ * @state_forbidden: Forbidden process state for signal delivery
+ */
+struct kapi_signal_spec {
+ int signal_num;
+ char signal_name[32];
+ u32 direction;
+ enum kapi_signal_action action;
+ char target[KAPI_MAX_DESC_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ bool restartable;
+ u32 sa_flags_required;
+ u32 sa_flags_forbidden;
+ int error_on_signal;
+ int transform_to;
+ char timing[32];
+ u8 priority;
+ bool interruptible;
+ char queue_behavior[128];
+ u32 state_required;
+ u32 state_forbidden;
+} __attribute__((packed));
+
+/**
+ * struct kapi_signal_mask_spec - Signal mask specification
+ * @mask_name: Name of the signal mask
+ * @signals: Array of signal numbers in the mask
+ * @signal_count: Number of signals in the mask
+ * @description: Description of what this mask represents
+ */
+struct kapi_signal_mask_spec {
+ char mask_name[KAPI_MAX_NAME_LEN];
+ int signals[KAPI_MAX_SIGNALS];
+ u32 signal_count;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_field - Structure field specification
+ * @name: Field name
+ * @type: Field type classification
+ * @type_name: Type name as string
+ * @offset: Offset within structure
+ * @size: Size of field in bytes
+ * @flags: Field attribute flags
+ * @constraint_type: Type of constraint applied
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag fields)
+ * @description: Field description
+ */
+struct kapi_struct_field {
+ char name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ char type_name[KAPI_MAX_NAME_LEN];
+ size_t offset;
+ size_t size;
+ u32 flags;
+ enum kapi_constraint_type constraint_type;
+ s64 min_value;
+ s64 max_value;
+ u64 valid_mask;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_spec - Structure type specification
+ * @name: Structure name
+ * @size: Total size of structure
+ * @alignment: Required alignment
+ * @field_count: Number of fields
+ * @fields: Field specifications
+ * @description: Structure description
+ */
+struct kapi_struct_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ size_t size;
+ size_t alignment;
+ u32 field_count;
+ struct kapi_struct_field fields[KAPI_MAX_PARAMS];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_capability_action - What the capability allows
+ * @KAPI_CAP_BYPASS_CHECK: Bypasses a check entirely
+ * @KAPI_CAP_INCREASE_LIMIT: Increases or removes a limit
+ * @KAPI_CAP_OVERRIDE_RESTRICTION: Overrides a restriction
+ * @KAPI_CAP_GRANT_PERMISSION: Grants permission that would otherwise be denied
+ * @KAPI_CAP_MODIFY_BEHAVIOR: Changes the behavior of the operation
+ * @KAPI_CAP_ACCESS_RESOURCE: Allows access to restricted resources
+ * @KAPI_CAP_PERFORM_OPERATION: Allows performing a privileged operation
+ */
+enum kapi_capability_action {
+ KAPI_CAP_BYPASS_CHECK = 0,
+ KAPI_CAP_INCREASE_LIMIT,
+ KAPI_CAP_OVERRIDE_RESTRICTION,
+ KAPI_CAP_GRANT_PERMISSION,
+ KAPI_CAP_MODIFY_BEHAVIOR,
+ KAPI_CAP_ACCESS_RESOURCE,
+ KAPI_CAP_PERFORM_OPERATION,
+};
+
+/**
+ * struct kapi_capability_spec - Capability requirement specification
+ * @capability: The capability constant (e.g., CAP_IPC_LOCK)
+ * @cap_name: Capability name as string
+ * @action: What the capability allows (kapi_capability_action)
+ * @allows: Description of what the capability allows
+ * @without_cap: What happens without the capability
+ * @check_condition: Condition when capability is checked
+ * @priority: Check priority (lower checked first)
+ * @alternative: Alternative capabilities that can be used
+ * @alternative_count: Number of alternative capabilities
+ */
+struct kapi_capability_spec {
+ int capability;
+ char cap_name[KAPI_MAX_NAME_LEN];
+ enum kapi_capability_action action;
+ char allows[KAPI_MAX_DESC_LEN];
+ char without_cap[KAPI_MAX_DESC_LEN];
+ char check_condition[KAPI_MAX_DESC_LEN];
+ u8 priority;
+ int alternative[KAPI_MAX_CAPABILITIES];
+ u32 alternative_count;
+} __attribute__((packed));
+
+/**
+ * enum kapi_side_effect_type - Types of side effects
+ * @KAPI_EFFECT_NONE: No side effects
+ * @KAPI_EFFECT_ALLOC_MEMORY: Allocates memory
+ * @KAPI_EFFECT_FREE_MEMORY: Frees memory
+ * @KAPI_EFFECT_MODIFY_STATE: Modifies global/shared state
+ * @KAPI_EFFECT_SIGNAL_SEND: Sends signals
+ * @KAPI_EFFECT_FILE_POSITION: Modifies file position
+ * @KAPI_EFFECT_LOCK_ACQUIRE: Acquires locks
+ * @KAPI_EFFECT_LOCK_RELEASE: Releases locks
+ * @KAPI_EFFECT_RESOURCE_CREATE: Creates system resources (FDs, PIDs, etc)
+ * @KAPI_EFFECT_RESOURCE_DESTROY: Destroys system resources
+ * @KAPI_EFFECT_SCHEDULE: May cause scheduling/context switch
+ * @KAPI_EFFECT_HARDWARE: Interacts with hardware
+ * @KAPI_EFFECT_NETWORK: Network I/O operation
+ * @KAPI_EFFECT_FILESYSTEM: Filesystem modification
+ * @KAPI_EFFECT_PROCESS_STATE: Modifies process state
+ */
+enum kapi_side_effect_type {
+ KAPI_EFFECT_NONE = 0,
+ KAPI_EFFECT_ALLOC_MEMORY = (1 << 0),
+ KAPI_EFFECT_FREE_MEMORY = (1 << 1),
+ KAPI_EFFECT_MODIFY_STATE = (1 << 2),
+ KAPI_EFFECT_SIGNAL_SEND = (1 << 3),
+ KAPI_EFFECT_FILE_POSITION = (1 << 4),
+ KAPI_EFFECT_LOCK_ACQUIRE = (1 << 5),
+ KAPI_EFFECT_LOCK_RELEASE = (1 << 6),
+ KAPI_EFFECT_RESOURCE_CREATE = (1 << 7),
+ KAPI_EFFECT_RESOURCE_DESTROY = (1 << 8),
+ KAPI_EFFECT_SCHEDULE = (1 << 9),
+ KAPI_EFFECT_HARDWARE = (1 << 10),
+ KAPI_EFFECT_NETWORK = (1 << 11),
+ KAPI_EFFECT_FILESYSTEM = (1 << 12),
+ KAPI_EFFECT_PROCESS_STATE = (1 << 13),
+};
+
+/**
+ * struct kapi_side_effect - Side effect specification
+ * @type: Bitmask of effect types
+ * @target: What is affected (e.g., "process memory", "file descriptor table")
+ * @condition: Condition under which effect occurs
+ * @description: Detailed description of the effect
+ * @reversible: Whether the effect can be undone
+ */
+struct kapi_side_effect {
+ u32 type;
+ char target[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ bool reversible;
+} __attribute__((packed));
+
+/**
+ * struct kapi_state_transition - State transition specification
+ * @from_state: Starting state description
+ * @to_state: Ending state description
+ * @condition: Condition for transition
+ * @object: Object whose state changes
+ * @description: Detailed description
+ */
+struct kapi_state_transition {
+ char from_state[KAPI_MAX_NAME_LEN];
+ char to_state[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char object[KAPI_MAX_NAME_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+#define KAPI_MAX_STRUCT_SPECS 8
+#define KAPI_MAX_SIDE_EFFECTS 16
+#define KAPI_MAX_STATE_TRANS 8
+
+#ifdef CONFIG_NET
+/**
+ * enum kapi_socket_state - Socket states for state machine
+ */
+enum kapi_socket_state {
+ KAPI_SOCK_STATE_UNSPEC = 0,
+ KAPI_SOCK_STATE_CLOSED,
+ KAPI_SOCK_STATE_OPEN,
+ KAPI_SOCK_STATE_BOUND,
+ KAPI_SOCK_STATE_LISTEN,
+ KAPI_SOCK_STATE_SYN_SENT,
+ KAPI_SOCK_STATE_SYN_RECV,
+ KAPI_SOCK_STATE_ESTABLISHED,
+ KAPI_SOCK_STATE_FIN_WAIT1,
+ KAPI_SOCK_STATE_FIN_WAIT2,
+ KAPI_SOCK_STATE_CLOSE_WAIT,
+ KAPI_SOCK_STATE_CLOSING,
+ KAPI_SOCK_STATE_LAST_ACK,
+ KAPI_SOCK_STATE_TIME_WAIT,
+ KAPI_SOCK_STATE_CONNECTED,
+ KAPI_SOCK_STATE_DISCONNECTED,
+};
+
+/**
+ * enum kapi_socket_protocol - Socket protocol types
+ */
+enum kapi_socket_protocol {
+ KAPI_PROTO_TCP = (1 << 0),
+ KAPI_PROTO_UDP = (1 << 1),
+ KAPI_PROTO_UNIX = (1 << 2),
+ KAPI_PROTO_RAW = (1 << 3),
+ KAPI_PROTO_PACKET = (1 << 4),
+ KAPI_PROTO_NETLINK = (1 << 5),
+ KAPI_PROTO_SCTP = (1 << 6),
+ KAPI_PROTO_DCCP = (1 << 7),
+ KAPI_PROTO_ALL = 0xFFFFFFFF,
+};
+
+/**
+ * enum kapi_buffer_behavior - Network buffer handling behaviors
+ */
+enum kapi_buffer_behavior {
+ KAPI_BUF_PEEK = (1 << 0),
+ KAPI_BUF_TRUNCATE = (1 << 1),
+ KAPI_BUF_SCATTER = (1 << 2),
+ KAPI_BUF_ZERO_COPY = (1 << 3),
+ KAPI_BUF_KERNEL_ALLOC = (1 << 4),
+ KAPI_BUF_DMA_CAPABLE = (1 << 5),
+ KAPI_BUF_FRAGMENT = (1 << 6),
+};
+
+/**
+ * enum kapi_async_behavior - Asynchronous operation behaviors
+ */
+enum kapi_async_behavior {
+ KAPI_ASYNC_BLOCK = 0,
+ KAPI_ASYNC_NONBLOCK = (1 << 0),
+ KAPI_ASYNC_POLL_READY = (1 << 1),
+ KAPI_ASYNC_SIGNAL_DRIVEN = (1 << 2),
+ KAPI_ASYNC_AIO = (1 << 3),
+ KAPI_ASYNC_IO_URING = (1 << 4),
+ KAPI_ASYNC_EPOLL = (1 << 5),
+};
+
+/**
+ * struct kapi_socket_state_spec - Socket state requirement/transition
+ */
+struct kapi_socket_state_spec {
+ enum kapi_socket_state required_states[KAPI_MAX_SOCKET_STATES];
+ u32 required_state_count;
+ enum kapi_socket_state forbidden_states[KAPI_MAX_SOCKET_STATES];
+ u32 forbidden_state_count;
+ enum kapi_socket_state resulting_state;
+ char state_condition[KAPI_MAX_DESC_LEN];
+ u32 applicable_protocols;
+} __attribute__((packed));
+
+/**
+ * struct kapi_protocol_behavior - Protocol-specific behavior
+ */
+struct kapi_protocol_behavior {
+ u32 applicable_protocols;
+ char behavior[KAPI_MAX_DESC_LEN];
+ s64 protocol_flags;
+ char flag_description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_buffer_spec - Network buffer specification
+ */
+struct kapi_buffer_spec {
+ u32 buffer_behaviors;
+ size_t min_buffer_size;
+ size_t max_buffer_size;
+ size_t optimal_buffer_size;
+ char fragmentation_rules[KAPI_MAX_DESC_LEN];
+ bool can_partial_transfer;
+ char partial_transfer_rules[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_async_spec - Asynchronous behavior specification
+ */
+struct kapi_async_spec {
+ enum kapi_async_behavior supported_modes;
+ int nonblock_errno;
+ u32 poll_events_in;
+ u32 poll_events_out;
+ char completion_condition[KAPI_MAX_DESC_LEN];
+ bool supports_timeout;
+ char timeout_behavior[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_addr_family_spec - Address family specification
+ */
+struct kapi_addr_family_spec {
+ int family;
+ char family_name[32];
+ size_t addr_struct_size;
+ size_t min_addr_len;
+ size_t max_addr_len;
+ char addr_format[KAPI_MAX_DESC_LEN];
+ bool supports_wildcard;
+ bool supports_multicast;
+ bool supports_broadcast;
+ char special_addresses[KAPI_MAX_DESC_LEN];
+ u32 port_range_min;
+ u32 port_range_max;
+} __attribute__((packed));
+#endif /* CONFIG_NET */
+
+/**
+ * struct kernel_api_spec - Complete kernel API specification
+ * @name: Function name
+ * @version: API version
+ * @description: Brief description
+ * @long_description: Detailed description
+ * @context_flags: Execution context flags
+ * @param_count: Number of parameters
+ * @params: Parameter specifications
+ * @return_spec: Return value specification
+ * @error_count: Number of possible errors
+ * @errors: Error specifications
+ * @lock_count: Number of lock specifications
+ * @locks: Lock requirement specifications
+ * @constraint_count: Number of additional constraints
+ * @constraints: Additional constraint specifications
+ * @examples: Usage examples
+ * @notes: Additional notes
+ * @since_version: Kernel version when introduced
+ * @deprecated: Whether API is deprecated
+ * @replacement: Replacement API if deprecated
+ * @signal_count: Number of signal specifications
+ * @signals: Signal handling specifications
+ * @signal_mask_count: Number of signal mask specifications
+ * @signal_masks: Signal mask specifications
+ * @struct_spec_count: Number of structure specifications
+ * @struct_specs: Structure type specifications
+ * @side_effect_count: Number of side effect specifications
+ * @side_effects: Side effect specifications
+ * @state_trans_count: Number of state transition specifications
+ * @state_transitions: State transition specifications
+ */
+struct kernel_api_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ u32 version;
+ char description[KAPI_MAX_DESC_LEN];
+ char long_description[KAPI_MAX_DESC_LEN * 4];
+ u32 context_flags;
+
+ /* Parameters */
+ u32 param_count;
+ struct kapi_param_spec params[KAPI_MAX_PARAMS];
+
+ /* Return value */
+ struct kapi_return_spec return_spec;
+
+ /* Errors */
+ u32 error_count;
+ struct kapi_error_spec errors[KAPI_MAX_ERRORS];
+
+ /* Locking */
+ u32 lock_count;
+ struct kapi_lock_spec locks[KAPI_MAX_CONSTRAINTS];
+
+ /* Constraints */
+ u32 constraint_count;
+ struct kapi_constraint_spec constraints[KAPI_MAX_CONSTRAINTS];
+
+ /* Additional information */
+ char examples[KAPI_MAX_DESC_LEN * 2];
+ char notes[KAPI_MAX_DESC_LEN * 2];
+ char since_version[32];
+ bool deprecated;
+ char replacement[KAPI_MAX_NAME_LEN];
+
+ /* Signal specifications */
+ u32 signal_count;
+ struct kapi_signal_spec signals[KAPI_MAX_SIGNALS];
+
+ /* Signal mask specifications */
+ u32 signal_mask_count;
+ struct kapi_signal_mask_spec signal_masks[KAPI_MAX_SIGNALS];
+
+ /* Structure specifications */
+ u32 struct_spec_count;
+ struct kapi_struct_spec struct_specs[KAPI_MAX_STRUCT_SPECS];
+
+ /* Side effects */
+ u32 side_effect_count;
+ struct kapi_side_effect side_effects[KAPI_MAX_SIDE_EFFECTS];
+
+ /* State transitions */
+ u32 state_trans_count;
+ struct kapi_state_transition state_transitions[KAPI_MAX_STATE_TRANS];
+
+ /* Capability specifications */
+ u32 capability_count;
+ struct kapi_capability_spec capabilities[KAPI_MAX_CAPABILITIES];
+
+#ifdef CONFIG_NET
+ /* Networking-specific fields */
+ struct kapi_socket_state_spec socket_state;
+ struct kapi_protocol_behavior protocol_behaviors[KAPI_MAX_PROTOCOL_BEHAVIORS];
+ u32 protocol_behavior_count;
+ struct kapi_buffer_spec buffer_spec;
+ struct kapi_async_spec async_spec;
+ struct kapi_addr_family_spec addr_families[KAPI_MAX_ADDR_FAMILIES];
+ u32 addr_family_count;
+
+ /* Network operation characteristics */
+ bool is_connection_oriented;
+ bool is_message_oriented;
+ bool supports_oob_data;
+ bool supports_peek;
+ bool supports_select_poll;
+ bool is_reentrant;
+
+ /* Network semantic descriptions */
+ char connection_establishment[KAPI_MAX_DESC_LEN];
+ char connection_termination[KAPI_MAX_DESC_LEN];
+ char data_transfer_semantics[KAPI_MAX_DESC_LEN];
+#endif /* CONFIG_NET */
+} __attribute__((packed));
+
+/* Macros for defining API specifications */
+
+/**
+ * DEFINE_KERNEL_API_SPEC - Define a kernel API specification
+ * @func_name: Function name to specify
+ */
+#define DEFINE_KERNEL_API_SPEC(func_name) \
+ static struct kernel_api_spec __kapi_spec_##func_name \
+ __used __section(".kapi_specs") = { \
+ .name = __stringify(func_name), \
+ .version = 1,
+
+#define KAPI_END_SPEC };
+
+/**
+ * KAPI_DESCRIPTION - Set API description
+ * @desc: Description string
+ */
+#define KAPI_DESCRIPTION(desc) \
+ .description = desc,
+
+/**
+ * KAPI_LONG_DESC - Set detailed API description
+ * @desc: Detailed description string
+ */
+#define KAPI_LONG_DESC(desc) \
+ .long_description = desc,
+
+/**
+ * KAPI_CONTEXT - Set execution context flags
+ * @flags: Context flags (OR'ed KAPI_CTX_* values)
+ */
+#define KAPI_CONTEXT(flags) \
+ .context_flags = flags,
+
+/**
+ * KAPI_PARAM - Define a parameter specification
+ * @idx: Parameter index (0-based)
+ * @pname: Parameter name
+ * @ptype: Type name string
+ * @pdesc: Parameter description
+ */
+#define KAPI_PARAM(idx, pname, ptype, pdesc) \
+ .params[idx] = { \
+ .name = pname, \
+ .type_name = ptype, \
+ .description = pdesc, \
+ .size_param_idx = -1, /* Default: no dynamic sizing */
+
+#define KAPI_PARAM_TYPE(ptype) \
+ .type = ptype,
+
+#define KAPI_PARAM_FLAGS(pflags) \
+ .flags = pflags,
+
+#define KAPI_PARAM_SIZE(psize) \
+ .size = psize,
+
+#define KAPI_PARAM_RANGE(pmin, pmax) \
+ .min_value = pmin, \
+ .max_value = pmax,
+
+#define KAPI_PARAM_CONSTRAINT_TYPE(ctype) \
+ .constraint_type = ctype,
+
+#define KAPI_PARAM_CONSTRAINT(desc) \
+ .constraints = desc,
+
+#define KAPI_PARAM_VALID_MASK(mask) \
+ .valid_mask = mask,
+
+#define KAPI_PARAM_ENUM_VALUES(values) \
+ .enum_values = values, \
+ .enum_count = ARRAY_SIZE(values),
+
+#define KAPI_PARAM_END },
+
+/**
+ * KAPI_RETURN - Define return value specification
+ * @rtype: Return type name
+ * @rdesc: Return value description
+ */
+#define KAPI_RETURN(rtype, rdesc) \
+ .return_spec = { \
+ .type_name = rtype, \
+ .description = rdesc,
+
+#define KAPI_RETURN_SUCCESS(val) \
+ .success_value = val,
+
+#define KAPI_RETURN_TYPE(rtype) \
+ .type = rtype,
+
+#define KAPI_RETURN_CHECK_TYPE(ctype) \
+ .check_type = ctype,
+
+#define KAPI_RETURN_ERROR_VALUES(values) \
+ .error_values = values,
+
+#define KAPI_RETURN_ERROR_COUNT(count) \
+ .error_count = count,
+
+#define KAPI_RETURN_SUCCESS_RANGE(min, max) \
+ .success_min = min, \
+ .success_max = max,
+
+#define KAPI_RETURN_END },
+
+/**
+ * KAPI_ERROR - Define an error condition
+ * @idx: Error index
+ * @ecode: Error code value
+ * @ename: Error name
+ * @econd: Error condition
+ * @edesc: Error description
+ */
+#define KAPI_ERROR(idx, ecode, ename, econd, edesc) \
+ .errors[idx] = { \
+ .error_code = ecode, \
+ .name = ename, \
+ .condition = econd, \
+ .description = edesc, \
+ },
+
+/**
+ * KAPI_LOCK - Define a lock requirement
+ * @idx: Lock index
+ * @lname: Lock name
+ * @ltype: Lock type
+ */
+#define KAPI_LOCK(idx, lname, ltype) \
+ .locks[idx] = { \
+ .lock_name = lname, \
+ .lock_type = ltype,
+
+#define KAPI_LOCK_ACQUIRED \
+ .acquired = true,
+
+#define KAPI_LOCK_RELEASED \
+ .released = true,
+
+#define KAPI_LOCK_HELD_ENTRY \
+ .held_on_entry = true,
+
+#define KAPI_LOCK_HELD_EXIT \
+ .held_on_exit = true,
+
+#define KAPI_LOCK_DESC(ldesc) \
+ .description = ldesc,
+
+#define KAPI_LOCK_END },
+
+/**
+ * KAPI_CONSTRAINT - Define an additional constraint
+ * @idx: Constraint index
+ * @cname: Constraint name
+ * @cdesc: Constraint description
+ */
+#define KAPI_CONSTRAINT(idx, cname, cdesc) \
+ .constraints[idx] = { \
+ .name = cname, \
+ .description = cdesc,
+
+#define KAPI_CONSTRAINT_EXPR(expr) \
+ .expression = expr,
+
+#define KAPI_CONSTRAINT_END },
+
+/**
+ * KAPI_EXAMPLES - Set API usage examples
+ * @examples: Examples string
+ */
+#define KAPI_EXAMPLES(ex) \
+ .examples = ex,
+
+/**
+ * KAPI_NOTES - Set API notes
+ * @notes: Notes string
+ */
+#define KAPI_NOTES(n) \
+ .notes = n,
+
+/**
+ * KAPI_SIGNAL - Define a signal specification
+ * @idx: Signal index
+ * @signum: Signal number (e.g., SIGKILL)
+ * @signame: Signal name string
+ * @dir: Direction flags
+ * @act: Action taken
+ */
+#define KAPI_SIGNAL(idx, signum, signame, dir, act) \
+ .signals[idx] = { \
+ .signal_num = signum, \
+ .signal_name = signame, \
+ .direction = dir, \
+ .action = act,
+
+#define KAPI_SIGNAL_TARGET(tgt) \
+ .target = tgt,
+
+#define KAPI_SIGNAL_CONDITION(cond) \
+ .condition = cond,
+
+#define KAPI_SIGNAL_DESC(desc) \
+ .description = desc,
+
+#define KAPI_SIGNAL_RESTARTABLE \
+ .restartable = true,
+
+#define KAPI_SIGNAL_SA_FLAGS_REQ(flags) \
+ .sa_flags_required = flags,
+
+#define KAPI_SIGNAL_SA_FLAGS_FORBID(flags) \
+ .sa_flags_forbidden = flags,
+
+#define KAPI_SIGNAL_ERROR(err) \
+ .error_on_signal = err,
+
+#define KAPI_SIGNAL_TRANSFORM(sig) \
+ .transform_to = sig,
+
+#define KAPI_SIGNAL_TIMING(when) \
+ .timing = when,
+
+#define KAPI_SIGNAL_PRIORITY(prio) \
+ .priority = prio,
+
+#define KAPI_SIGNAL_INTERRUPTIBLE \
+ .interruptible = true,
+
+#define KAPI_SIGNAL_QUEUE(behavior) \
+ .queue_behavior = behavior,
+
+#define KAPI_SIGNAL_STATE_REQ(state) \
+ .state_required = state,
+
+#define KAPI_SIGNAL_STATE_FORBID(state) \
+ .state_forbidden = state,
+
+#define KAPI_SIGNAL_END },
+
+#define KAPI_SIGNAL_COUNT(n) \
+ .signal_count = n,
+
+/**
+ * KAPI_SIGNAL_MASK - Define a signal mask specification
+ * @idx: Mask index
+ * @name: Mask name
+ * @desc: Mask description
+ */
+#define KAPI_SIGNAL_MASK(idx, name, desc) \
+ .signal_masks[idx] = { \
+ .mask_name = name, \
+ .description = desc,
+
+#define KAPI_SIGNAL_MASK_ADD(signum) \
+ .signals[.signal_count++] = signum,
+
+#define KAPI_SIGNAL_MASK_END },
+
+/**
+ * KAPI_STRUCT_SPEC - Define a structure specification
+ * @idx: Structure spec index
+ * @sname: Structure name
+ * @sdesc: Structure description
+ */
+#define KAPI_STRUCT_SPEC(idx, sname, sdesc) \
+ .struct_specs[idx] = { \
+ .name = #sname, \
+ .description = sdesc,
+
+#define KAPI_STRUCT_SIZE(ssize, salign) \
+ .size = ssize, \
+ .alignment = salign,
+
+#define KAPI_STRUCT_FIELD_COUNT(n) \
+ .field_count = n,
+
+/**
+ * KAPI_STRUCT_FIELD - Define a structure field
+ * @fidx: Field index
+ * @fname: Field name
+ * @ftype: Field type (KAPI_TYPE_*)
+ * @ftype_name: Type name as string
+ * @fdesc: Field description
+ */
+#define KAPI_STRUCT_FIELD(fidx, fname, ftype, ftype_name, fdesc) \
+ .fields[fidx] = { \
+ .name = fname, \
+ .type = ftype, \
+ .type_name = ftype_name, \
+ .description = fdesc,
+
+#define KAPI_FIELD_OFFSET(foffset) \
+ .offset = foffset,
+
+#define KAPI_FIELD_SIZE(fsize) \
+ .size = fsize,
+
+#define KAPI_FIELD_FLAGS(fflags) \
+ .flags = fflags,
+
+#define KAPI_FIELD_CONSTRAINT_RANGE(min, max) \
+ .constraint_type = KAPI_CONSTRAINT_RANGE, \
+ .min_value = min, \
+ .max_value = max,
+
+#define KAPI_FIELD_CONSTRAINT_MASK(mask) \
+ .constraint_type = KAPI_CONSTRAINT_MASK, \
+ .valid_mask = mask,
+
+#define KAPI_FIELD_CONSTRAINT_ENUM(values, count) \
+ .constraint_type = KAPI_CONSTRAINT_ENUM, \
+ .enum_values = values, \
+ .enum_count = count,
+
+#define KAPI_STRUCT_FIELD_END },
+
+#define KAPI_STRUCT_SPEC_END },
+
+/* Counter for structure specifications */
+#define KAPI_STRUCT_SPEC_COUNT(n) \
+ .struct_spec_count = n,
+
+/* Additional lock-related macros */
+#define KAPI_LOCK_COUNT(n) \
+ .lock_count = n,
+
+/**
+ * KAPI_SIDE_EFFECT - Define a side effect
+ * @idx: Side effect index
+ * @etype: Effect type bitmask (OR'ed KAPI_EFFECT_* values)
+ * @etarget: What is affected
+ * @edesc: Effect description
+ */
+#define KAPI_SIDE_EFFECT(idx, etype, etarget, edesc) \
+ .side_effects[idx] = { \
+ .type = etype, \
+ .target = etarget, \
+ .description = edesc, \
+ .reversible = false, /* Default to non-reversible */
+
+#define KAPI_EFFECT_CONDITION(cond) \
+ .condition = cond,
+
+#define KAPI_EFFECT_REVERSIBLE \
+ .reversible = true,
+
+#define KAPI_SIDE_EFFECT_END },
+
+/**
+ * KAPI_STATE_TRANS - Define a state transition
+ * @idx: State transition index
+ * @obj: Object whose state changes
+ * @from: From state
+ * @to: To state
+ * @desc: Transition description
+ */
+#define KAPI_STATE_TRANS(idx, obj, from, to, desc) \
+ .state_transitions[idx] = { \
+ .object = obj, \
+ .from_state = from, \
+ .to_state = to, \
+ .description = desc,
+
+#define KAPI_STATE_TRANS_COND(cond) \
+ .condition = cond,
+
+#define KAPI_STATE_TRANS_END },
+
+/* Counters for side effects and state transitions */
+#define KAPI_SIDE_EFFECT_COUNT(n) \
+ .side_effect_count = n,
+
+#define KAPI_STATE_TRANS_COUNT(n) \
+ .state_trans_count = n,
+
+/* Helper macros for common side effect patterns */
+#define KAPI_EFFECTS_MEMORY (KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_MEMORY)
+#define KAPI_EFFECTS_LOCKING (KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_RELEASE)
+#define KAPI_EFFECTS_RESOURCES (KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_RESOURCE_DESTROY)
+#define KAPI_EFFECTS_IO (KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM)
+
+/* Helper macros for common patterns */
+
+#define KAPI_PARAM_IN (KAPI_PARAM_IN)
+#define KAPI_PARAM_OUT (KAPI_PARAM_OUT)
+#define KAPI_PARAM_INOUT (KAPI_PARAM_IN | KAPI_PARAM_OUT)
+#define KAPI_PARAM_OPTIONAL (KAPI_PARAM_OPTIONAL)
+#define KAPI_PARAM_USER_PTR (KAPI_PARAM_USER | KAPI_PARAM_PTR)
+
+/* Common signal timing constants */
+#define KAPI_SIGNAL_TIME_ENTRY "entry"
+#define KAPI_SIGNAL_TIME_DURING "during"
+#define KAPI_SIGNAL_TIME_EXIT "exit"
+#define KAPI_SIGNAL_TIME_ANYTIME "anytime"
+#define KAPI_SIGNAL_TIME_BLOCKING "while_blocked"
+#define KAPI_SIGNAL_TIME_SLEEPING "while_sleeping"
+
+/* Common signal queue behaviors */
+#define KAPI_SIGNAL_QUEUE_STANDARD "standard"
+#define KAPI_SIGNAL_QUEUE_REALTIME "realtime"
+#define KAPI_SIGNAL_QUEUE_COALESCE "coalesce"
+#define KAPI_SIGNAL_QUEUE_REPLACE "replace"
+#define KAPI_SIGNAL_QUEUE_DISCARD "discard"
+
+/* Process state flags for signal delivery */
+#define KAPI_SIGNAL_STATE_RUNNING (1 << 0)
+#define KAPI_SIGNAL_STATE_SLEEPING (1 << 1)
+#define KAPI_SIGNAL_STATE_STOPPED (1 << 2)
+#define KAPI_SIGNAL_STATE_TRACED (1 << 3)
+#define KAPI_SIGNAL_STATE_ZOMBIE (1 << 4)
+#define KAPI_SIGNAL_STATE_DEAD (1 << 5)
+
+/* Capability specification macros */
+
+/**
+ * KAPI_CAPABILITY - Define a capability requirement
+ * @idx: Capability index
+ * @cap: Capability constant (e.g., CAP_IPC_LOCK)
+ * @name: Capability name string
+ * @act: Action type (kapi_capability_action)
+ */
+#define KAPI_CAPABILITY(idx, cap, name, act) \
+ .capabilities[idx] = { \
+ .capability = cap, \
+ .cap_name = name, \
+ .action = act,
+
+#define KAPI_CAP_ALLOWS(desc) \
+ .allows = desc,
+
+#define KAPI_CAP_WITHOUT(desc) \
+ .without_cap = desc,
+
+#define KAPI_CAP_CONDITION(cond) \
+ .check_condition = cond,
+
+#define KAPI_CAP_PRIORITY(prio) \
+ .priority = prio,
+
+#define KAPI_CAP_ALTERNATIVE(caps, count) \
+ .alternative = caps, \
+ .alternative_count = count,
+
+#define KAPI_CAPABILITY_END },
+
+/* Counter for capability specifications */
+#define KAPI_CAPABILITY_COUNT(n) \
+ .capability_count = n,
+
+/* Common signal patterns for syscalls */
+#define KAPI_SIGNAL_INTERRUPTIBLE_SLEEP \
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+ KAPI_SIGNAL_ERROR(-EINTR) \
+ KAPI_SIGNAL_RESTARTABLE \
+ KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+ KAPI_SIGNAL_END, \
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+ KAPI_SIGNAL_ERROR(-EINTR) \
+ KAPI_SIGNAL_RESTARTABLE \
+ KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+ KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_FATAL_DEFAULT \
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+ KAPI_SIGNAL_PRIORITY(0) \
+ KAPI_SIGNAL_DESC("Process terminated immediately") \
+ KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_STOP_CONT \
+ KAPI_SIGNAL(3, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+ KAPI_SIGNAL_DESC("Process stopped") \
+ KAPI_SIGNAL_END, \
+ KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+ KAPI_SIGNAL_DESC("Process continued") \
+ KAPI_SIGNAL_END
+
+/* Validation and runtime checking */
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+bool kapi_validate_params(const struct kernel_api_spec *spec, ...);
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value);
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count);
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value);
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count);
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval);
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval);
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval);
+void kapi_check_context(const struct kernel_api_spec *spec);
+void kapi_check_locks(const struct kernel_api_spec *spec);
+bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int signum);
+bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
+ struct sigaction *act);
+int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum);
+bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum);
+#else
+static inline bool kapi_validate_params(const struct kernel_api_spec *spec, ...)
+{
+ return true;
+}
+static inline bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+ return true;
+}
+static inline bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count)
+{
+ return true;
+}
+static inline int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value)
+{
+ return 0;
+}
+static inline int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count)
+{
+ return 0;
+}
+static inline bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+ return true;
+}
+static inline bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+ return true;
+}
+static inline int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+ return 0;
+}
+static inline void kapi_check_context(const struct kernel_api_spec *spec) {}
+static inline void kapi_check_locks(const struct kernel_api_spec *spec) {}
+static inline bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int signum)
+{
+ return true;
+}
+static inline bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
+ struct sigaction *act)
+{
+ return true;
+}
+static inline int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum)
+{
+ return -EINTR;
+}
+static inline bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum)
+{
+ return false;
+}
+#endif
+
+/* Export/query functions */
+const struct kernel_api_spec *kapi_get_spec(const char *name);
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size);
+void kapi_print_spec(const struct kernel_api_spec *spec);
+
+/* Registration for dynamic APIs */
+int kapi_register_spec(struct kernel_api_spec *spec);
+void kapi_unregister_spec(const char *name);
+
+/* Helper to get parameter constraint info */
+static inline bool kapi_get_param_constraint(const char *api_name, int param_idx,
+ enum kapi_constraint_type *type,
+ u64 *valid_mask, s64 *min_val, s64 *max_val)
+{
+ const struct kernel_api_spec *spec = kapi_get_spec(api_name);
+
+ if (!spec || param_idx >= spec->param_count)
+ return false;
+
+ if (type)
+ *type = spec->params[param_idx].constraint_type;
+ if (valid_mask)
+ *valid_mask = spec->params[param_idx].valid_mask;
+ if (min_val)
+ *min_val = spec->params[param_idx].min_value;
+ if (max_val)
+ *max_val = spec->params[param_idx].max_value;
+
+ return true;
+}
+
+#ifdef CONFIG_NET
+/* Networking-specific macros */
+
+/* Socket state requirement macros */
+#define KAPI_SOCKET_STATE_REQ(...) \
+ .socket_state = { \
+ .required_states = { __VA_ARGS__ }, \
+ .required_state_count = sizeof((enum kapi_socket_state[]){__VA_ARGS__})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_FORBID(...) \
+ .forbidden_states = { __VA_ARGS__ }, \
+ .forbidden_state_count = sizeof((enum kapi_socket_state[]){__VA_ARGS__})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_RESULT(state) \
+ .resulting_state = state,
+
+#define KAPI_SOCKET_STATE_COND(cond) \
+ .state_condition = cond,
+
+#define KAPI_SOCKET_STATE_PROTOS(protos) \
+ .applicable_protocols = protos,
+
+#define KAPI_SOCKET_STATE_END },
+
+/* Protocol behavior macros */
+#define KAPI_PROTOCOL_BEHAVIOR(idx, protos, desc) \
+ .protocol_behaviors[idx] = { \
+ .applicable_protocols = protos, \
+ .behavior = desc,
+
+#define KAPI_PROTOCOL_FLAGS(flags, desc) \
+ .protocol_flags = flags, \
+ .flag_description = desc,
+
+#define KAPI_PROTOCOL_BEHAVIOR_END },
+
+/* Async behavior macros */
+#define KAPI_ASYNC_SPEC(modes, errno) \
+ .async_spec = { \
+ .supported_modes = modes, \
+ .nonblock_errno = errno,
+
+#define KAPI_ASYNC_POLL(in, out) \
+ .poll_events_in = in, \
+ .poll_events_out = out,
+
+#define KAPI_ASYNC_COMPLETION(cond) \
+ .completion_condition = cond,
+
+#define KAPI_ASYNC_TIMEOUT(supported, desc) \
+ .supports_timeout = supported, \
+ .timeout_behavior = desc,
+
+#define KAPI_ASYNC_END },
+
+/* Buffer behavior macros */
+#define KAPI_BUFFER_SPEC(behaviors) \
+ .buffer_spec = { \
+ .buffer_behaviors = behaviors,
+
+#define KAPI_BUFFER_SIZE(min, max, optimal) \
+ .min_buffer_size = min, \
+ .max_buffer_size = max, \
+ .optimal_buffer_size = optimal,
+
+#define KAPI_BUFFER_PARTIAL(allowed, rules) \
+ .can_partial_transfer = allowed, \
+ .partial_transfer_rules = rules,
+
+#define KAPI_BUFFER_FRAGMENT(rules) \
+ .fragmentation_rules = rules,
+
+#define KAPI_BUFFER_END },
+
+/* Address family macros */
+#define KAPI_ADDR_FAMILY(idx, fam, name, struct_sz, min_len, max_len) \
+ .addr_families[idx] = { \
+ .family = fam, \
+ .family_name = name, \
+ .addr_struct_size = struct_sz, \
+ .min_addr_len = min_len, \
+ .max_addr_len = max_len,
+
+#define KAPI_ADDR_FORMAT(fmt) \
+ .addr_format = fmt,
+
+#define KAPI_ADDR_FEATURES(wildcard, multicast, broadcast) \
+ .supports_wildcard = wildcard, \
+ .supports_multicast = multicast, \
+ .supports_broadcast = broadcast,
+
+#define KAPI_ADDR_SPECIAL(addrs) \
+ .special_addresses = addrs,
+
+#define KAPI_ADDR_PORTS(min, max) \
+ .port_range_min = min, \
+ .port_range_max = max,
+
+#define KAPI_ADDR_FAMILY_END },
+
+#define KAPI_ADDR_FAMILY_COUNT(n) \
+ .addr_family_count = n,
+
+#define KAPI_PROTOCOL_BEHAVIOR_COUNT(n) \
+ .protocol_behavior_count = n,
+
+#define KAPI_CONSTRAINT_COUNT(n) \
+ .constraint_count = n,
+
+/* Network operation characteristics macros */
+#define KAPI_NET_CONNECTION_ORIENTED \
+ .is_connection_oriented = true,
+
+#define KAPI_NET_MESSAGE_ORIENTED \
+ .is_message_oriented = true,
+
+#define KAPI_NET_SUPPORTS_OOB \
+ .supports_oob_data = true,
+
+#define KAPI_NET_SUPPORTS_PEEK \
+ .supports_peek = true,
+
+#define KAPI_NET_REENTRANT \
+ .is_reentrant = true,
+
+/* Semantic description macros */
+#define KAPI_NET_CONN_ESTABLISH(desc) \
+ .connection_establishment = desc,
+
+#define KAPI_NET_CONN_TERMINATE(desc) \
+ .connection_termination = desc,
+
+#define KAPI_NET_DATA_TRANSFER(desc) \
+ .data_transfer_semantics = desc,
+
+#endif /* CONFIG_NET */
+
+#endif /* _LINUX_KERNEL_API_SPEC_H */
diff --git a/include/linux/syscall_api_spec.h b/include/linux/syscall_api_spec.h
new file mode 100644
index 0000000000000..0a813ad89ea15
--- /dev/null
+++ b/include/linux/syscall_api_spec.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_api_spec.h - System Call API Specification Integration
+ *
+ * This header extends the SYSCALL_DEFINEX macros to support inline API specifications,
+ * allowing syscall documentation to be written alongside the implementation in a
+ * human-readable and machine-parseable format.
+ */
+
+#ifndef _LINUX_SYSCALL_API_SPEC_H
+#define _LINUX_SYSCALL_API_SPEC_H
+
+#include <linux/kernel_api_spec.h>
+
+
+
+/* Automatic syscall validation infrastructure */
+/*
+ * The validation is now integrated directly into the SYSCALL_DEFINEx macros
+ * in syscalls.h when CONFIG_KAPI_RUNTIME_CHECKS is enabled.
+ *
+ * The validation happens in the __do_kapi_sys##name wrapper function which:
+ * 1. Validates all parameters before calling the actual syscall
+ * 2. Calls the real syscall implementation
+ * 3. Validates the return value
+ * 4. Returns the result
+ */
+
+
+/*
+ * Helper macros for common syscall patterns
+ */
+
+/* For syscalls that can sleep */
+#define KAPI_SYSCALL_SLEEPABLE \
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+/* For syscalls that must be atomic */
+#define KAPI_SYSCALL_ATOMIC \
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC)
+
+/* Common parameter specifications */
+#define KAPI_PARAM_FD(idx, desc) \
+ KAPI_PARAM(idx, "fd", "int", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_FD, \
+ .constraint_type = KAPI_CONSTRAINT_NONE, \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_BUF(idx, name, desc) \
+ KAPI_PARAM(idx, name, "void __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER_PTR | KAPI_PARAM_IN) \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_STRUCT(idx, name, struct_type, desc) \
+ KAPI_PARAM(idx, name, #struct_type " __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size = sizeof(struct_type), \
+ .constraint_type = KAPI_CONSTRAINT_NONE, \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_SIZE_T(idx, name, desc) \
+ KAPI_PARAM(idx, name, "size_t", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ KAPI_PARAM_RANGE(0, SIZE_MAX) \
+ KAPI_PARAM_END
+
+/* Common error specifications */
+#define KAPI_ERROR_EBADF(idx) \
+ KAPI_ERROR(idx, -EBADF, "EBADF", "Invalid file descriptor", \
+ "The file descriptor is not valid or has been closed")
+
+#define KAPI_ERROR_EINVAL(idx, condition) \
+ KAPI_ERROR(idx, -EINVAL, "EINVAL", condition, \
+ "Invalid argument provided")
+
+#define KAPI_ERROR_ENOMEM(idx) \
+ KAPI_ERROR(idx, -ENOMEM, "ENOMEM", "Insufficient memory", \
+ "Cannot allocate memory for the operation")
+
+#define KAPI_ERROR_EPERM(idx) \
+ KAPI_ERROR(idx, -EPERM, "EPERM", "Operation not permitted", \
+ "The calling process does not have the required permissions")
+
+#define KAPI_ERROR_EFAULT(idx) \
+ KAPI_ERROR(idx, -EFAULT, "EFAULT", "Bad address", \
+ "Invalid user space address provided")
+
+/* Standard return value specifications */
+#define KAPI_RETURN_SUCCESS_ZERO \
+ KAPI_RETURN("long", "0 on success, negative error code on failure") \
+ KAPI_RETURN_SUCCESS(0, "== 0") \
+ KAPI_RETURN_END
+
+#define KAPI_RETURN_FD_SPEC \
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure") \
+ .check_type = KAPI_RETURN_FD, \
+ KAPI_RETURN_END
+
+#define KAPI_RETURN_COUNT \
+ KAPI_RETURN("long", "Number of bytes processed on success, negative error code on failure") \
+ KAPI_RETURN_SUCCESS(0, ">= 0") \
+ KAPI_RETURN_END
+
+/**
+ * KAPI_ERROR_COUNT - Set the error count
+ * @count: Number of errors defined
+ */
+#define KAPI_ERROR_COUNT(count) \
+ .error_count = count,
+
+/**
+ * KAPI_PARAM_COUNT - Set the parameter count
+ * @count: Number of parameters defined
+ */
+#define KAPI_PARAM_COUNT(count) \
+ .param_count = count,
+
+/**
+ * KAPI_SINCE_VERSION - Set the since version
+ * @version: Version string when the API was introduced
+ */
+#define KAPI_SINCE_VERSION(version) \
+ .since_version = version,
+
+
+/**
+ * KAPI_SIGNAL_MASK_COUNT - Set the signal mask count
+ * @count: Number of signal masks defined
+ */
+#define KAPI_SIGNAL_MASK_COUNT(count) \
+ .signal_mask_count = count,
+
+
+
+#endif /* _LINUX_SYSCALL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e5603cc91963d..62a8edc14ec87 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -87,6 +87,7 @@ struct xattr_args;
#include <linux/bug.h>
#include <linux/sem.h>
#include <asm/siginfo.h>
+#include <linux/syscall_api_spec.h>
#include <linux/unistd.h>
#include <linux/quota.h>
#include <linux/key.h>
@@ -132,6 +133,7 @@ struct xattr_args;
#define __SC_TYPE(t, a) t
#define __SC_ARGS(t, a) a
#define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof(t) > sizeof(long))
+#define __SC_CAST_TO_S64(t, a) (s64)(a)
#ifdef CONFIG_FTRACE_SYSCALLS
#define __SC_STR_ADECL(t, a) #a
@@ -242,6 +244,41 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
* done within __do_sys_*().
*/
#ifndef __SYSCALL_DEFINEx
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+#define __SYSCALL_DEFINEx(x, name, ...) \
+ __diag_push(); \
+ __diag_ignore(GCC, 8, "-Wattribute-alias", \
+ "Type aliasing is used to sanitize syscall arguments");\
+ asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
+ __attribute__((alias(__stringify(__se_sys##name)))); \
+ ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
+ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
+ static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
+ asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
+ asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
+ { \
+ long ret = __do_kapi_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
+ __MAP(x,__SC_TEST,__VA_ARGS__); \
+ __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
+ return ret; \
+ } \
+ __diag_pop(); \
+ static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
+ { \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[x] = { __MAP(x,__SC_CAST_TO_S64,__VA_ARGS__) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, x); \
+ if (__ret) return __ret; \
+ } \
+ long ret = __do_sys##name(__MAP(x,__SC_ARGS,__VA_ARGS__)); \
+ if (__spec) { \
+ kapi_validate_syscall_return(__spec, (s64)ret); \
+ } \
+ return ret; \
+ } \
+ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
#define __SYSCALL_DEFINEx(x, name, ...) \
__diag_push(); \
__diag_ignore(GCC, 8, "-Wattribute-alias", \
@@ -260,6 +297,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
} \
__diag_pop(); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
#endif /* __SYSCALL_DEFINEx */
/* For split 64-bit arguments on 32-bit architectures */
diff --git a/init/Kconfig b/init/Kconfig
index af4c2f0854554..7a15248933895 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2079,6 +2079,8 @@ config TRACEPOINTS
source "kernel/Kconfig.kexec"
+source "kernel/api/Kconfig"
+
endmenu # General setup
source "arch/Kconfig"
diff --git a/kernel/Makefile b/kernel/Makefile
index 32e80dd626af0..ba94ee4bb2292 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -56,6 +56,7 @@ obj-y += livepatch/
obj-y += dma/
obj-y += entry/
obj-$(CONFIG_MODULES) += module/
+obj-$(CONFIG_KAPI_SPEC) += api/
obj-$(CONFIG_KCMP) += kcmp.o
obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
new file mode 100644
index 0000000000000..fde25ec70e134
--- /dev/null
+++ b/kernel/api/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel API Specification Framework Configuration
+#
+
+config KAPI_SPEC
+ bool "Kernel API Specification Framework"
+ help
+ This option enables the kernel API specification framework,
+ which provides formal documentation of kernel APIs in both
+ human and machine-readable formats.
+
+ The framework allows developers to document APIs inline with
+ their implementation, including parameter specifications,
+ return values, error conditions, locking requirements, and
+ execution context constraints.
+
+ When enabled, API specifications can be queried at runtime
+ and exported in various formats (JSON, XML) through debugfs.
+
+ If unsure, say N.
+
+config KAPI_RUNTIME_CHECKS
+ bool "Runtime API specification checks"
+ depends on KAPI_SPEC
+ depends on DEBUG_KERNEL
+ help
+ Enable runtime validation of API usage against specifications.
+ This includes checking execution context requirements, parameter
+ validation, and lock state verification.
+
+ This adds overhead and should only be used for debugging and
+ development. The checks use WARN_ONCE to report violations.
+
+ If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
new file mode 100644
index 0000000000000..4120ded7e5cf1
--- /dev/null
+++ b/kernel/api/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Kernel API Specification Framework
+#
+
+# Core API specification framework
+obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
\ No newline at end of file
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
new file mode 100644
index 0000000000000..8827e9f96c111
--- /dev/null
+++ b/kernel/api/kernel_api_spec.c
@@ -0,0 +1,1122 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel_api_spec.c - Kernel API Specification Framework Implementation
+ *
+ * Provides runtime support for kernel API specifications including validation,
+ * export to various formats, and querying capabilities.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/export.h>
+#include <linux/preempt.h>
+#include <linux/hardirq.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <linux/limits.h>
+#include <linux/fcntl.h>
+
+/* Section where API specifications are stored */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+/* Dynamic API registration */
+static LIST_HEAD(dynamic_api_specs);
+static DEFINE_MUTEX(api_spec_mutex);
+
+struct dynamic_api_spec {
+ struct list_head list;
+ struct kernel_api_spec *spec;
+};
+
+/**
+ * kapi_get_spec - Get API specification by name
+ * @name: Function name to look up
+ *
+ * Return: Pointer to API specification or NULL if not found
+ */
+const struct kernel_api_spec *kapi_get_spec(const char *name)
+{
+ struct kernel_api_spec *spec;
+ struct dynamic_api_spec *dyn_spec;
+
+ /* Search static specifications */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ if (strcmp(spec->name, name) == 0)
+ return spec;
+ }
+
+ /* Search dynamic specifications */
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+ if (strcmp(dyn_spec->spec->name, name) == 0) {
+ mutex_unlock(&api_spec_mutex);
+ return dyn_spec->spec;
+ }
+ }
+ mutex_unlock(&api_spec_mutex);
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(kapi_get_spec);
+
+/**
+ * kapi_register_spec - Register a dynamic API specification
+ * @spec: API specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_spec(struct kernel_api_spec *spec)
+{
+ struct dynamic_api_spec *dyn_spec;
+
+ if (!spec || !spec->name[0])
+ return -EINVAL;
+
+ /* Check if already exists */
+ if (kapi_get_spec(spec->name))
+ return -EEXIST;
+
+ dyn_spec = kzalloc(sizeof(*dyn_spec), GFP_KERNEL);
+ if (!dyn_spec)
+ return -ENOMEM;
+
+ dyn_spec->spec = spec;
+
+ mutex_lock(&api_spec_mutex);
+ list_add_tail(&dyn_spec->list, &dynamic_api_specs);
+ mutex_unlock(&api_spec_mutex);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_register_spec);
+
+/**
+ * kapi_unregister_spec - Unregister a dynamic API specification
+ * @name: Name of API to unregister
+ */
+void kapi_unregister_spec(const char *name)
+{
+ struct dynamic_api_spec *dyn_spec, *tmp;
+
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry_safe(dyn_spec, tmp, &dynamic_api_specs, list) {
+ if (strcmp(dyn_spec->spec->name, name) == 0) {
+ list_del(&dyn_spec->list);
+ kfree(dyn_spec);
+ break;
+ }
+ }
+ mutex_unlock(&api_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_spec);
+
+/**
+ * param_type_to_string - Convert parameter type to string
+ * @type: Parameter type
+ *
+ * Return: String representation of type
+ */
+static const char *param_type_to_string(enum kapi_param_type type)
+{
+ static const char * const type_names[] = {
+ [KAPI_TYPE_VOID] = "void",
+ [KAPI_TYPE_INT] = "int",
+ [KAPI_TYPE_UINT] = "uint",
+ [KAPI_TYPE_PTR] = "pointer",
+ [KAPI_TYPE_STRUCT] = "struct",
+ [KAPI_TYPE_UNION] = "union",
+ [KAPI_TYPE_ENUM] = "enum",
+ [KAPI_TYPE_FUNC_PTR] = "function_pointer",
+ [KAPI_TYPE_ARRAY] = "array",
+ [KAPI_TYPE_FD] = "file_descriptor",
+ [KAPI_TYPE_USER_PTR] = "user_pointer",
+ [KAPI_TYPE_PATH] = "pathname",
+ [KAPI_TYPE_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(type_names))
+ return "unknown";
+
+ return type_names[type];
+}
+
+/**
+ * lock_type_to_string - Convert lock type to string
+ * @type: Lock type
+ *
+ * Return: String representation of lock type
+ */
+static const char *lock_type_to_string(enum kapi_lock_type type)
+{
+ static const char * const lock_names[] = {
+ [KAPI_LOCK_NONE] = "none",
+ [KAPI_LOCK_MUTEX] = "mutex",
+ [KAPI_LOCK_SPINLOCK] = "spinlock",
+ [KAPI_LOCK_RWLOCK] = "rwlock",
+ [KAPI_LOCK_SEQLOCK] = "seqlock",
+ [KAPI_LOCK_RCU] = "rcu",
+ [KAPI_LOCK_SEMAPHORE] = "semaphore",
+ [KAPI_LOCK_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(lock_names))
+ return "unknown";
+
+ return lock_names[type];
+}
+
+/**
+ * return_check_type_to_string - Convert return check type to string
+ * @type: Return check type
+ *
+ * Return: String representation of return check type
+ */
+static const char *return_check_type_to_string(enum kapi_return_check_type type)
+{
+ static const char * const check_names[] = {
+ [KAPI_RETURN_EXACT] = "exact",
+ [KAPI_RETURN_RANGE] = "range",
+ [KAPI_RETURN_ERROR_CHECK] = "error_check",
+ [KAPI_RETURN_FD] = "file_descriptor",
+ [KAPI_RETURN_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(check_names))
+ return "unknown";
+
+ return check_names[type];
+}
+
+/**
+ * capability_action_to_string - Convert capability action to string
+ * @action: Capability action
+ *
+ * Return: String representation of capability action
+ */
+static const char *capability_action_to_string(enum kapi_capability_action action)
+{
+ static const char * const action_names[] = {
+ [KAPI_CAP_BYPASS_CHECK] = "bypass_check",
+ [KAPI_CAP_INCREASE_LIMIT] = "increase_limit",
+ [KAPI_CAP_OVERRIDE_RESTRICTION] = "override_restriction",
+ [KAPI_CAP_GRANT_PERMISSION] = "grant_permission",
+ [KAPI_CAP_MODIFY_BEHAVIOR] = "modify_behavior",
+ [KAPI_CAP_ACCESS_RESOURCE] = "access_resource",
+ [KAPI_CAP_PERFORM_OPERATION] = "perform_operation",
+ };
+
+ if (action >= ARRAY_SIZE(action_names))
+ return "unknown";
+
+ return action_names[action];
+}
+
+/**
+ * kapi_export_json - Export API specification to JSON format
+ * @spec: API specification to export
+ * @buf: Buffer to write JSON to
+ * @size: Size of buffer
+ *
+ * Return: Number of bytes written or negative error
+ */
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
+{
+ int ret = 0;
+ int i;
+
+ if (!spec || !buf || size == 0)
+ return -EINVAL;
+
+ ret = scnprintf(buf, size,
+ "{\n"
+ " \"name\": \"%s\",\n"
+ " \"version\": %u,\n"
+ " \"description\": \"%s\",\n"
+ " \"long_description\": \"%s\",\n"
+ " \"context_flags\": \"0x%x\",\n",
+ spec->name,
+ spec->version,
+ spec->description,
+ spec->long_description,
+ spec->context_flags);
+
+ /* Parameters */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"parameters\": [\n");
+
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"name\": \"%s\",\n"
+ " \"type\": \"%s\",\n"
+ " \"type_class\": \"%s\",\n"
+ " \"flags\": \"0x%x\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ param->name,
+ param->type_name,
+ param_type_to_string(param->type),
+ param->flags,
+ param->description,
+ (i < spec->param_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Return value */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"return\": {\n"
+ " \"type\": \"%s\",\n"
+ " \"type_class\": \"%s\",\n"
+ " \"check_type\": \"%s\",\n",
+ spec->return_spec.type_name,
+ param_type_to_string(spec->return_spec.type),
+ return_check_type_to_string(spec->return_spec.check_type));
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"success_value\": %lld,\n",
+ spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"success_min\": %lld,\n"
+ " \"success_max\": %lld,\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"error_count\": %u,\n",
+ spec->return_spec.error_count);
+ break;
+ default:
+ break;
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ " \"description\": \"%s\"\n"
+ " },\n",
+ spec->return_spec.description);
+
+ /* Errors */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"errors\": [\n");
+
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"code\": %d,\n"
+ " \"name\": \"%s\",\n"
+ " \"condition\": \"%s\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ error->error_code,
+ error->name,
+ error->condition,
+ error->description,
+ (i < spec->error_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Locks */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"locks\": [\n");
+
+ for (i = 0; i < spec->lock_count && i < KAPI_MAX_CONSTRAINTS; i++) {
+ const struct kapi_lock_spec *lock = &spec->locks[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"name\": \"%s\",\n"
+ " \"type\": \"%s\",\n"
+ " \"acquired\": %s,\n"
+ " \"released\": %s,\n"
+ " \"held_on_entry\": %s,\n"
+ " \"held_on_exit\": %s,\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ lock->lock_name,
+ lock_type_to_string(lock->lock_type),
+ lock->acquired ? "true" : "false",
+ lock->released ? "true" : "false",
+ lock->held_on_entry ? "true" : "false",
+ lock->held_on_exit ? "true" : "false",
+ lock->description,
+ (i < spec->lock_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Capabilities */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"capabilities\": [\n");
+
+ for (i = 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i++) {
+ const struct kapi_capability_spec *cap = &spec->capabilities[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"capability\": %d,\n"
+ " \"name\": \"%s\",\n"
+ " \"action\": \"%s\",\n"
+ " \"allows\": \"%s\",\n"
+ " \"without_cap\": \"%s\",\n"
+ " \"check_condition\": \"%s\",\n"
+ " \"priority\": %u",
+ cap->capability,
+ cap->cap_name,
+ capability_action_to_string(cap->action),
+ cap->allows,
+ cap->without_cap,
+ cap->check_condition,
+ cap->priority);
+
+ if (cap->alternative_count > 0) {
+ int j;
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"alternatives\": [");
+ for (j = 0; j < cap->alternative_count; j++) {
+ ret += scnprintf(buf + ret, size - ret,
+ "%d%s", cap->alternative[j],
+ (j < cap->alternative_count - 1) ? ", " : "");
+ }
+ ret += scnprintf(buf + ret, size - ret, "]");
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ "\n }%s\n",
+ (i < spec->capability_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Additional info */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"since_version\": \"%s\",\n"
+ " \"deprecated\": %s,\n"
+ " \"replacement\": \"%s\",\n"
+ " \"examples\": \"%s\",\n"
+ " \"notes\": \"%s\"\n"
+ "}\n",
+ spec->since_version,
+ spec->deprecated ? "true" : "false",
+ spec->replacement,
+ spec->examples,
+ spec->notes);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_export_json);
+
+
+/**
+ * kapi_print_spec - Print API specification to kernel log
+ * @spec: API specification to print
+ */
+void kapi_print_spec(const struct kernel_api_spec *spec)
+{
+ int i;
+
+ if (!spec)
+ return;
+
+ pr_info("=== Kernel API Specification ===\n");
+ pr_info("Name: %s\n", spec->name);
+ pr_info("Version: %u\n", spec->version);
+ pr_info("Description: %s\n", spec->description);
+
+ if (spec->long_description[0])
+ pr_info("Long Description: %s\n", spec->long_description);
+
+ pr_info("Context Flags: 0x%x\n", spec->context_flags);
+
+ /* Parameters */
+ if (spec->param_count > 0) {
+ pr_info("Parameters:\n");
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+ pr_info(" [%d] %s: %s (flags: 0x%x)\n",
+ i, param->name, param->type_name, param->flags);
+ if (param->description[0])
+ pr_info(" Description: %s\n", param->description);
+ }
+ }
+
+ /* Return value */
+ pr_info("Return: %s\n", spec->return_spec.type_name);
+ if (spec->return_spec.description[0])
+ pr_info(" Description: %s\n", spec->return_spec.description);
+
+ /* Errors */
+ if (spec->error_count > 0) {
+ pr_info("Possible Errors:\n");
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+ pr_info(" %s (%d): %s\n",
+ error->name, error->error_code, error->condition);
+ }
+ }
+
+ /* Capabilities */
+ if (spec->capability_count > 0) {
+ pr_info("Capabilities:\n");
+ for (i = 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i++) {
+ const struct kapi_capability_spec *cap = &spec->capabilities[i];
+ pr_info(" %s (%d):\n", cap->cap_name, cap->capability);
+ pr_info(" Action: %s\n", capability_action_to_string(cap->action));
+ pr_info(" Allows: %s\n", cap->allows);
+ pr_info(" Without: %s\n", cap->without_cap);
+ if (cap->check_condition[0])
+ pr_info(" Condition: %s\n", cap->check_condition);
+ }
+ }
+
+ pr_info("================================\n");
+}
+EXPORT_SYMBOL_GPL(kapi_print_spec);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_fd - Validate that a file descriptor is valid in current context
+ * @fd: File descriptor to validate
+ *
+ * Return: true if fd is valid in current process context, false otherwise
+ */
+static bool kapi_validate_fd(int fd)
+{
+ struct fd f;
+
+ /* Special case: AT_FDCWD is always valid */
+ if (fd == AT_FDCWD)
+ return true;
+
+ /* Check basic range */
+ if (fd < 0)
+ return false;
+
+ /* Check if fd is valid in current process context */
+ f = fdget(fd);
+ if (fd_empty(f)) {
+ return false;
+ }
+
+ /* fd is valid, release reference */
+ fdput(f);
+ return true;
+}
+
+/**
+ * kapi_validate_user_ptr - Validate that a user pointer is accessible
+ * @ptr: User pointer to validate
+ * @size: Size in bytes to validate
+ * @write: Whether write access is required
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr(const void __user *ptr, size_t size, bool write)
+{
+ /* NULL is valid if parameter is marked optional */
+ if (!ptr)
+ return false;
+
+ /* Check if the user memory region is accessible */
+ if (write) {
+ return access_ok(ptr, size);
+ } else {
+ return access_ok(ptr, size);
+ }
+}
+
+/**
+ * kapi_validate_user_ptr_with_params - Validate user pointer with dynamic size
+ * @param_spec: Parameter specification
+ * @ptr: User pointer to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr_with_params(const struct kapi_param_spec *param_spec,
+ const void __user *ptr,
+ const s64 *all_params,
+ int param_count)
+{
+ size_t actual_size;
+ bool write;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ /* Calculate actual size based on related parameter */
+ if (param_spec->size_param_idx >= 0 &&
+ param_spec->size_param_idx < param_count) {
+ s64 count = all_params[param_spec->size_param_idx];
+
+ /* Validate count is positive */
+ if (count <= 0) {
+ pr_warn("Parameter %s: size determinant is non-positive (%lld)\n",
+ param_spec->name, count);
+ return false;
+ }
+
+ /* Check for multiplication overflow */
+ if (param_spec->size_multiplier > 0 &&
+ count > SIZE_MAX / param_spec->size_multiplier) {
+ pr_warn("Parameter %s: size calculation overflow\n",
+ param_spec->name);
+ return false;
+ }
+
+ actual_size = count * param_spec->size_multiplier;
+ } else {
+ /* Use fixed size */
+ actual_size = param_spec->size;
+ }
+
+ write = (param_spec->flags & KAPI_PARAM_OUT) ||
+ (param_spec->flags & KAPI_PARAM_INOUT);
+
+ return kapi_validate_user_ptr(ptr, actual_size, write);
+}
+
+/**
+ * kapi_validate_path - Validate that a pathname is accessible and within limits
+ * @path: User pointer to pathname
+ * @param_spec: Parameter specification
+ *
+ * Return: true if path is valid, false otherwise
+ */
+static bool kapi_validate_path(const char __user *path,
+ const struct kapi_param_spec *param_spec)
+{
+ size_t len;
+
+ /* NULL is allowed for optional parameters */
+ if (!path && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!path) {
+ pr_warn("Parameter %s: NULL path not allowed\n", param_spec->name);
+ return false;
+ }
+
+ /* Check if the path is accessible */
+ if (!access_ok(path, 1)) {
+ pr_warn("Parameter %s: path pointer %p not accessible\n",
+ param_spec->name, path);
+ return false;
+ }
+
+ /* Use strnlen_user to get the length and validate accessibility */
+ len = strnlen_user(path, PATH_MAX + 1);
+ if (len == 0) {
+ pr_warn("Parameter %s: invalid path pointer %p\n",
+ param_spec->name, path);
+ return false;
+ }
+
+ /* Check path length limit */
+ if (len > PATH_MAX) {
+ pr_warn("Parameter %s: path too long (exceeds PATH_MAX)\n",
+ param_spec->name);
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * kapi_validate_param - Validate a parameter against its specification
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+ int i;
+
+ /* Special handling for file descriptor type */
+ if (param_spec->type == KAPI_TYPE_FD) {
+ if (!kapi_validate_fd((int)value)) {
+ pr_warn("Parameter %s: invalid file descriptor %lld\n",
+ param_spec->name, value);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* Special handling for user pointer type */
+ if (param_spec->type == KAPI_TYPE_USER_PTR) {
+ const void __user *ptr = (const void __user *)value;
+ bool write = (param_spec->flags & KAPI_PARAM_OUT) ||
+ (param_spec->flags & KAPI_PARAM_INOUT);
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!kapi_validate_user_ptr(ptr, param_spec->size, write)) {
+ pr_warn("Parameter %s: invalid user pointer %p (size: %zu, %s)\n",
+ param_spec->name, ptr, param_spec->size,
+ write ? "write" : "read");
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* Special handling for path type */
+ if (param_spec->type == KAPI_TYPE_PATH) {
+ const char __user *path = (const char __user *)value;
+
+ if (!kapi_validate_path(path, param_spec)) {
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ switch (param_spec->constraint_type) {
+ case KAPI_CONSTRAINT_NONE:
+ return true;
+
+ case KAPI_CONSTRAINT_RANGE:
+ if (value < param_spec->min_value || value > param_spec->max_value) {
+ pr_warn("Parameter %s value %lld out of range [%lld, %lld]\n",
+ param_spec->name, value,
+ param_spec->min_value, param_spec->max_value);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_MASK:
+ if (value & ~param_spec->valid_mask) {
+ pr_warn("Parameter %s value 0x%llx contains invalid bits (valid mask: 0x%llx)\n",
+ param_spec->name, value, param_spec->valid_mask);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_ENUM:
+ if (!param_spec->enum_values || param_spec->enum_count == 0)
+ return true;
+
+ for (i = 0; i < param_spec->enum_count; i++) {
+ if (value == param_spec->enum_values[i])
+ return true;
+ }
+ pr_warn("Parameter %s value %lld not in valid enumeration\n",
+ param_spec->name, value);
+ return false;
+
+ case KAPI_CONSTRAINT_CUSTOM:
+ if (param_spec->validate)
+ return param_spec->validate(value);
+ return true;
+
+ default:
+ return true;
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param);
+
+/**
+ * kapi_validate_param_with_context - Validate parameter with access to all params
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count)
+{
+ /* Special handling for user pointer type with dynamic sizing */
+ if (param_spec->type == KAPI_TYPE_USER_PTR) {
+ const void __user *ptr = (const void __user *)value;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!kapi_validate_user_ptr_with_params(param_spec, ptr, all_params, param_count)) {
+ pr_warn("Parameter %s: invalid user pointer %p\n",
+ param_spec->name, ptr);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* For other types, fall back to regular validation */
+ return kapi_validate_param(param_spec, value);
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param_with_context);
+
+/**
+ * kapi_validate_syscall_param - Validate syscall parameter with enforcement
+ * @spec: API specification
+ * @param_idx: Parameter index
+ * @value: Parameter value
+ *
+ * Return: -EINVAL if invalid, 0 if valid
+ */
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value)
+{
+ const struct kapi_param_spec *param_spec;
+
+ if (!spec || param_idx >= spec->param_count)
+ return 0;
+
+ param_spec = &spec->params[param_idx];
+
+ if (!kapi_validate_param(param_spec, value)) {
+ if (strncmp(spec->name, "sys_", 4) == 0) {
+ /* For syscalls, we can return EINVAL to userspace */
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_param);
+
+/**
+ * kapi_validate_syscall_params - Validate all syscall parameters together
+ * @spec: API specification
+ * @params: Array of parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: -EINVAL if any parameter is invalid, 0 if all valid
+ */
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count)
+{
+ int i;
+
+ if (!spec || !params)
+ return 0;
+
+ /* Validate that we have the expected number of parameters */
+ if (param_count != spec->param_count) {
+ pr_warn("API %s: parameter count mismatch (expected %u, got %d)\n",
+ spec->name, spec->param_count, param_count);
+ return -EINVAL;
+ }
+
+ /* Validate each parameter with context */
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param_spec = &spec->params[i];
+
+ if (!kapi_validate_param_with_context(param_spec, params[i], params, param_count)) {
+ if (strncmp(spec->name, "sys_", 4) == 0) {
+ /* For syscalls, we can return EINVAL to userspace */
+ return -EINVAL;
+ }
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_params);
+
+/**
+ * kapi_check_return_success - Check if return value indicates success
+ * @return_spec: Return specification
+ * @retval: Return value to check
+ *
+ * Returns true if the return value indicates success according to the spec.
+ */
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+ u32 i;
+
+ if (!return_spec)
+ return true; /* No spec means we can't validate */
+
+ switch (return_spec->check_type) {
+ case KAPI_RETURN_EXACT:
+ return retval == return_spec->success_value;
+
+ case KAPI_RETURN_RANGE:
+ return retval >= return_spec->success_min &&
+ retval <= return_spec->success_max;
+
+ case KAPI_RETURN_ERROR_CHECK:
+ /* Success if NOT in error list */
+ if (return_spec->error_values) {
+ for (i = 0; i < return_spec->error_count; i++) {
+ if (retval == return_spec->error_values[i])
+ return false; /* Found in error list */
+ }
+ }
+ return true; /* Not in error list = success */
+
+ case KAPI_RETURN_FD:
+ /* File descriptors: >= 0 is success, < 0 is error */
+ return retval >= 0;
+
+ case KAPI_RETURN_CUSTOM:
+ if (return_spec->is_success)
+ return return_spec->is_success(retval);
+ fallthrough;
+
+ default:
+ return true; /* Unknown check type, assume success */
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_check_return_success);
+
+/**
+ * kapi_validate_return_value - Validate that return value matches spec
+ * @spec: API specification
+ * @retval: Return value to validate
+ *
+ * Return: true if return value is valid according to spec, false otherwise.
+ *
+ * This function checks:
+ * 1. If the value indicates success, it must match the success criteria
+ * 2. If the value indicates error, it must be one of the specified error codes
+ */
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+ int i;
+ bool is_success;
+
+ if (!spec)
+ return true; /* No spec means we can't validate */
+
+ /* First check if this is a success return */
+ is_success = kapi_check_return_success(&spec->return_spec, retval);
+
+ if (is_success) {
+ /* Success case - already validated by kapi_check_return_success */
+ return true;
+ }
+
+ /* Special validation for file descriptor returns */
+ if (spec->return_spec.check_type == KAPI_RETURN_FD && is_success) {
+ /* For successful FD returns, validate it's a valid FD */
+ if (!kapi_validate_fd((int)retval)) {
+ pr_warn("API %s returned invalid file descriptor %lld\n",
+ spec->name, retval);
+ return false;
+ }
+ return true;
+ }
+
+ /* Error case - check if it's one of the specified errors */
+ if (spec->error_count == 0) {
+ /* No errors specified, so any error is potentially valid */
+ pr_debug("API %s returned unspecified error %lld\n",
+ spec->name, retval);
+ return true;
+ }
+
+ /* Check if the error is in our list of specified errors */
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ if (retval == spec->errors[i].error_code)
+ return true;
+ }
+
+ /* Error not in spec */
+ pr_warn("API %s returned unspecified error code %lld. Valid errors are:\n",
+ spec->name, retval);
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ pr_warn(" %s (%d): %s\n",
+ spec->errors[i].name,
+ spec->errors[i].error_code,
+ spec->errors[i].condition);
+ }
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_return_value);
+
+/**
+ * kapi_validate_syscall_return - Validate syscall return value with enforcement
+ * @spec: API specification
+ * @retval: Return value
+ *
+ * Return: 0 if valid, -EINVAL if the return value doesn't match spec
+ *
+ * For syscalls, this can help detect kernel bugs where unspecified error
+ * codes are returned to userspace.
+ */
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+ if (!spec)
+ return 0;
+
+ if (!kapi_validate_return_value(spec, retval)) {
+ /* Log the violation but don't change the return value */
+ WARN_ONCE(1, "Syscall %s returned unspecified value %lld\n",
+ spec->name, retval);
+ /* Could return -EINVAL here to enforce, but that might break userspace */
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_return);
+
+/**
+ * kapi_check_context - Check if current context matches API requirements
+ * @spec: API specification to check against
+ */
+void kapi_check_context(const struct kernel_api_spec *spec)
+{
+ u32 ctx = spec->context_flags;
+ bool valid = false;
+
+ if (!ctx)
+ return;
+
+ /* Check if we're in an allowed context */
+ if ((ctx & KAPI_CTX_PROCESS) && !in_interrupt())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_SOFTIRQ) && in_softirq())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_HARDIRQ) && in_hardirq())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_NMI) && in_nmi())
+ valid = true;
+
+ if (!valid) {
+ WARN_ONCE(1, "API %s called from invalid context\n", spec->name);
+ }
+
+ /* Check specific requirements */
+ if ((ctx & KAPI_CTX_ATOMIC) && preemptible()) {
+ WARN_ONCE(1, "API %s requires atomic context\n", spec->name);
+ }
+
+ if ((ctx & KAPI_CTX_SLEEPABLE) && !preemptible()) {
+ WARN_ONCE(1, "API %s requires sleepable context\n", spec->name);
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_check_context);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+
+/* DebugFS interface */
+#ifdef CONFIG_DEBUG_FS
+
+static struct dentry *kapi_debugfs_root;
+
+static int kapi_spec_show(struct seq_file *s, void *v)
+{
+ struct kernel_api_spec *spec = s->private;
+ char *buf;
+ int ret;
+
+ buf = kmalloc(PAGE_SIZE * 4, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ ret = kapi_export_json(spec, buf, PAGE_SIZE * 4);
+ if (ret > 0)
+ seq_printf(s, "%s", buf);
+
+ kfree(buf);
+ return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops = {
+ .open = kapi_spec_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int kapi_list_show(struct seq_file *s, void *v)
+{
+ struct kernel_api_spec *spec;
+ struct dynamic_api_spec *dyn_spec;
+
+ seq_printf(s, "Kernel API Specifications:\n\n");
+
+ /* List static specifications */
+ seq_printf(s, "Static APIs:\n");
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ seq_printf(s, " %s (v%u): %s\n",
+ spec->name, spec->version, spec->description);
+ }
+
+ /* List dynamic specifications */
+ seq_printf(s, "\nDynamic APIs:\n");
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+ spec = dyn_spec->spec;
+ seq_printf(s, " %s (v%u): %s\n",
+ spec->name, spec->version, spec->description);
+ }
+ mutex_unlock(&api_spec_mutex);
+
+ return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops = {
+ .open = kapi_list_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+ struct kernel_api_spec *spec;
+ struct dentry *spec_dir;
+
+ kapi_debugfs_root = debugfs_create_dir("kapi", NULL);
+ if (!kapi_debugfs_root)
+ return -ENOMEM;
+
+ /* Create list file */
+ debugfs_create_file("list", 0444, kapi_debugfs_root, NULL,
+ &kapi_list_fops);
+
+ /* Create directory for specifications */
+ spec_dir = debugfs_create_dir("specs", kapi_debugfs_root);
+
+ /* Create files for each static specification */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ debugfs_create_file(spec->name, 0444, spec_dir, spec,
+ &kapi_spec_fops);
+ }
+
+ return 0;
+}
+
+late_initcall(kapi_debugfs_init);
+
+#endif /* CONFIG_DEBUG_FS */
\ No newline at end of file
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 02/22] eventpoll: add API specification for epoll_create1
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 03/22] eventpoll: add API specification for epoll_create Sasha Levin
` (20 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_create1() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 124 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index d4dbffdedd08e..620de3ccc7708 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -21,6 +21,7 @@
#include <linux/hash.h>
#include <linux/spinlock.h>
#include <linux/syscalls.h>
+#include <linux/syscall_api_spec.h>
#include <linux/rbtree.h>
#include <linux/wait.h>
#include <linux/eventpoll.h>
@@ -2265,6 +2266,129 @@ static int do_epoll_create(int flags)
return error;
}
+
+/* Valid values for epoll_create1 flags parameter */
+static const s64 epoll_create1_valid_values[] = { 0, EPOLL_CLOEXEC };
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_create1)
+ KAPI_DESCRIPTION("Create an epoll instance")
+ KAPI_LONG_DESC("Creates a new epoll instance and returns a file descriptor "
+ "referring to that instance. The file descriptor is used for all "
+ "subsequent calls to the epoll interface.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "flags", "int", "Creation flags for the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_ENUM)
+ KAPI_PARAM_ENUM_VALUES(epoll_create1_valid_values)
+ KAPI_PARAM_CONSTRAINT("Must be 0 or EPOLL_CLOEXEC")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_FD)
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid flags specified",
+ "The flags parameter contains invalid values. Only EPOLL_CLOEXEC is allowed.")
+ KAPI_ERROR(1, -EMFILE, "EMFILE", "Per-process file descriptor limit reached",
+ "The per-process limit on the number of open file descriptors has been reached.")
+ KAPI_ERROR(2, -ENFILE, "ENFILE", "System file table overflow",
+ "The system-wide limit on the total number of open files has been reached.")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Insufficient kernel memory",
+ "There was insufficient kernel memory to create the epoll instance.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
+ "The system call was interrupted by a signal before the epoll instance could be created.")
+
+ .error_count = 5,
+ .param_count = 1,
+ .since_version = "2.6.27",
+ KAPI_EXAMPLES("int epfd = epoll_create1(EPOLL_CLOEXEC);")
+ KAPI_NOTES("EPOLL_CLOEXEC sets the close-on-exec (FD_CLOEXEC) flag on the new file descriptor. "
+ "When all file descriptors referring to an epoll instance are closed, the kernel "
+ "destroys the instance and releases associated resources. "
+ "Memory consumption: Each registered fd uses approximately 90 bytes on 32-bit kernels "
+ "and 160 bytes on 64-bit kernels. The total number of file descriptors registered "
+ "across all epoll instances is limited by /proc/sys/fs/epoll/max_user_watches. "
+ "When using dup() or fork(), multiple file descriptors may refer to the same epoll "
+ "instance and all will receive events.")
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "epoll instance",
+ "Creates a new epoll instance and allocates kernel memory for it")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_RESOURCE_CREATE,
+ "file descriptor",
+ "Allocates a new file descriptor in the process's file descriptor table")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(2)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "epoll instance", "non-existent", "created and empty",
+ "A new epoll instance is created with no monitored file descriptors")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(1)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During creation if process receives SIGINT")
+ KAPI_SIGNAL_DESC("If interrupted during kernel memory allocation, returns -EINTR")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During creation if process receives SIGTERM")
+ KAPI_SIGNAL_DESC("If interrupted during kernel memory allocation, returns -EINTR")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("At any point during the syscall")
+ KAPI_SIGNAL_DESC("Process is terminated immediately, epoll instance creation may be incomplete")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_QUEUE("uncatchable")
+ KAPI_SIGNAL_END
+
+ .signal_count = 3,
+
+ /* Additional constraints */
+ KAPI_CONSTRAINT(0, "User Watch Limit",
+ "Although epoll_create1() itself doesn't register any watches, the "
+ "user is subject to a global limit on total watches across all epoll "
+ "instances. This limit is configured via /proc/sys/fs/epoll/max_user_watches "
+ "(default: 1/25 of lowmem or 1/32 of total memory). Each registered "
+ "file descriptor counts against this limit.")
+ KAPI_CONSTRAINT_EXPR("current_user_watches < max_user_watches")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(1, "Memory Accounting",
+ "Each epoll instance consumes kernel memory that is not swappable. "
+ "The instance itself uses approximately 1KB, plus additional memory "
+ "for each registered file descriptor (90 bytes on 32-bit, 160 bytes "
+ "on 64-bit systems). This memory is charged to the user's locked "
+ "memory limit if memory cgroups are enabled.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT_COUNT(2)
+
+KAPI_END_SPEC;
SYSCALL_DEFINE1(epoll_create1, int, flags)
{
return do_epoll_create(flags);
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 03/22] eventpoll: add API specification for epoll_create
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
2025-06-24 18:07 ` [RFC v2 02/22] eventpoll: add API specification for epoll_create1 Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 04/22] eventpoll: add API specification for epoll_ctl Sasha Levin
` (19 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_create() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 122 +++++++++++++++++++++++++++++++-
include/linux/kernel_api_spec.h | 13 ++++
2 files changed, 132 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 620de3ccc7708..c3c16f8e6ac7d 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2300,8 +2300,8 @@ DEFINE_KERNEL_API_SPEC(sys_epoll_create1)
KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
"The system call was interrupted by a signal before the epoll instance could be created.")
- .error_count = 5,
- .param_count = 1,
+ KAPI_ERROR_COUNT(5)
+ KAPI_PARAM_COUNT(1)
.since_version = "2.6.27",
KAPI_EXAMPLES("int epfd = epoll_create1(EPOLL_CLOEXEC);")
KAPI_NOTES("EPOLL_CLOEXEC sets the close-on-exec (FD_CLOEXEC) flag on the new file descriptor. "
@@ -2366,7 +2366,7 @@ DEFINE_KERNEL_API_SPEC(sys_epoll_create1)
KAPI_SIGNAL_QUEUE("uncatchable")
KAPI_SIGNAL_END
- .signal_count = 3,
+ KAPI_SIGNAL_COUNT(3)
/* Additional constraints */
KAPI_CONSTRAINT(0, "User Watch Limit",
@@ -2394,6 +2394,122 @@ SYSCALL_DEFINE1(epoll_create1, int, flags)
return do_epoll_create(flags);
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_create)
+ KAPI_DESCRIPTION("Create an epoll instance (obsolete)")
+ KAPI_LONG_DESC("Creates a new epoll instance and returns a file descriptor "
+ "referring to that instance. This is the obsolete interface; "
+ "new applications should use epoll_create1() instead.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "size", "int", "Ignored hint about expected number of file descriptors")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_RANGE(1, INT_MAX)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_CONSTRAINT("Must be greater than zero (ignored since Linux 2.6.8)")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_FD)
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "size <= 0",
+ "The size parameter must be greater than zero.")
+ KAPI_ERROR(1, -EMFILE, "EMFILE", "Per-process file descriptor limit reached",
+ "The per-process limit on the number of open file descriptors has been reached.")
+ KAPI_ERROR(2, -ENFILE, "ENFILE", "System file table overflow",
+ "The system-wide limit on the total number of open files has been reached.")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Insufficient kernel memory",
+ "There was insufficient kernel memory to create the epoll instance.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
+ "The system call was interrupted by a signal before the epoll instance could be created.")
+
+ KAPI_ERROR_COUNT(5)
+ KAPI_PARAM_COUNT(1)
+ KAPI_SINCE_VERSION("2.6")
+ KAPI_DEPRECATED
+ KAPI_REPLACEMENT("epoll_create1")
+ KAPI_EXAMPLES("int epfd = epoll_create(1024); // size is ignored since Linux 2.6.8")
+ KAPI_NOTES("Since Linux 2.6.8, the size argument is ignored but must be greater than zero. "
+ "The kernel dynamically sizes the data structures as needed. "
+ "For new applications, epoll_create1() should be preferred as it allows "
+ "setting close-on-exec flag atomically. "
+ "Memory consumption: Each registered fd uses approximately 90 bytes on 32-bit kernels "
+ "and 160 bytes on 64-bit kernels. The total number of file descriptors registered "
+ "across all epoll instances is limited by /proc/sys/fs/epoll/max_user_watches. "
+ "When using dup() or fork(), multiple file descriptors may refer to the same epoll "
+ "instance and all will receive events.")
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "epoll instance",
+ "Creates a new epoll instance and allocates kernel memory for it")
+ KAPI_EFFECT_CONDITION("Always when successful")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_RESOURCE_CREATE,
+ "file descriptor",
+ "Allocates a new file descriptor in the process's file descriptor table")
+ KAPI_EFFECT_CONDITION("Always when successful")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "kernel file table",
+ "Adds new file structure to system-wide file table")
+ KAPI_EFFECT_CONDITION("Always when successful")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(3)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "epoll instance", "non-existent", "created and empty",
+ "A new epoll instance is created with no monitored file descriptors")
+ KAPI_STATE_TRANS_COND("On successful creation")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "file descriptor", "unallocated", "allocated and open",
+ "A new file descriptor is allocated in the process's fd table")
+ KAPI_STATE_TRANS_COND("On successful creation")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(2)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During kernel memory allocation")
+ KAPI_SIGNAL_DESC("If interrupted during memory allocation or fd allocation, returns -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During kernel memory allocation")
+ KAPI_SIGNAL_DESC("If interrupted during memory allocation or fd allocation, returns -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("At any point during the syscall")
+ KAPI_SIGNAL_DESC("Process is terminated immediately, epoll instance creation may be incomplete")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, SIGHUP, "SIGHUP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During kernel operations")
+ KAPI_SIGNAL_DESC("If process is being terminated due to terminal hangup, may return -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGPIPE, "SIGPIPE", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("Never generated by epoll_create")
+ KAPI_SIGNAL_DESC("This signal is not relevant to epoll_create as it doesn't involve pipes")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL_COUNT(5)
+KAPI_END_SPEC;
+
SYSCALL_DEFINE1(epoll_create, int, size)
{
if (size <= 0)
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
index d8439d411f41e..1ee76a5f3ee1f 100644
--- a/include/linux/kernel_api_spec.h
+++ b/include/linux/kernel_api_spec.h
@@ -963,6 +963,19 @@ struct kernel_api_spec {
#define KAPI_NOTES(n) \
.notes = n,
+/**
+ * KAPI_DEPRECATED - Mark API as deprecated
+ */
+#define KAPI_DEPRECATED \
+ .deprecated = true,
+
+/**
+ * KAPI_REPLACEMENT - Set replacement API for deprecated function
+ * @repl: Replacement API name
+ */
+#define KAPI_REPLACEMENT(repl) \
+ .replacement = repl,
+
/**
* KAPI_SIGNAL - Define a signal specification
* @idx: Signal index
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 04/22] eventpoll: add API specification for epoll_ctl
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (2 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 03/22] eventpoll: add API specification for epoll_create Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 05/22] eventpoll: add API specification for epoll_wait Sasha Levin
` (18 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_ctl() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 254 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 254 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index c3c16f8e6ac7d..b0334f1af0875 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2690,6 +2690,260 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
* the eventpoll file that enables the insertion/removal/change of
* file descriptors inside the interest set.
*/
+
+/* Valid values for epoll_ctl op parameter */
+static const s64 epoll_ctl_valid_ops[] = {
+ EPOLL_CTL_ADD,
+ EPOLL_CTL_DEL,
+ EPOLL_CTL_MOD,
+};
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_ctl)
+ KAPI_DESCRIPTION("Control interface for an epoll file descriptor")
+ KAPI_LONG_DESC("Performs control operations on the epoll instance referred to by epfd. "
+ "It requests that the operation op be performed for the target file "
+ "descriptor fd. Valid operations are adding, modifying, or deleting "
+ "file descriptors from the interest set.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_FD)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "op", "int", "Operation to be performed")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_ENUM)
+ .enum_values = epoll_ctl_valid_ops,
+ .enum_count = ARRAY_SIZE(epoll_ctl_valid_ops),
+ KAPI_PARAM_CONSTRAINT("Must be EPOLL_CTL_ADD, EPOLL_CTL_DEL, or EPOLL_CTL_MOD")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "fd", "int", "File descriptor to be monitored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_FD)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Must refer to a file that supports poll operations")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "event", "struct epoll_event __user *", "Settings for the file descriptor")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event))
+ KAPI_PARAM_CONSTRAINT("Required for ADD and MOD operations, ignored for DEL")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_EXACT,
+ KAPI_RETURN_SUCCESS(0)
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd or fd is not a valid file descriptor",
+ "One of the file descriptors is invalid or has been closed.")
+ KAPI_ERROR(1, -EEXIST, "EEXIST", "op is EPOLL_CTL_ADD and fd is already registered",
+ "The file descriptor is already present in the epoll instance.")
+ KAPI_ERROR(2, -EINVAL, "EINVAL", "Invalid operation or parameters",
+ "epfd is not an epoll file descriptor, epfd == fd, op is not valid, "
+ "or EPOLLEXCLUSIVE was specified with invalid events.")
+ KAPI_ERROR(3, -ENOENT, "ENOENT", "op is EPOLL_CTL_MOD or EPOLL_CTL_DEL and fd is not registered",
+ "The file descriptor is not registered with this epoll instance.")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Insufficient kernel memory",
+ "There was insufficient memory to handle the requested operation.")
+ KAPI_ERROR(5, -EPERM, "EPERM", "Target file does not support epoll",
+ "The target file fd does not support poll operations.")
+ KAPI_ERROR(6, -ELOOP, "ELOOP", "Circular monitoring detected",
+ "fd refers to an epoll instance and this operation would result "
+ "in a circular loop of epoll instances monitoring one another.")
+ KAPI_ERROR(7, -EFAULT, "EFAULT", "event points outside accessible address space",
+ "The memory area pointed to by event is not accessible with write permissions.")
+ KAPI_ERROR(8, -EAGAIN, "EAGAIN", "Nonblocking mode and lock not available",
+ "The operation was called in nonblocking mode and could not acquire necessary locks.")
+ KAPI_ERROR(9, -ENOSPC, "ENOSPC", "User epoll watch limit exceeded",
+ "The limit on the total number of epoll watches was exceeded. "
+ "See /proc/sys/fs/epoll/max_user_watches.")
+ KAPI_ERROR(10, -EINTR, "EINTR", "Interrupted by signal",
+ "The system call was interrupted by a signal before completion.")
+
+ .error_count = 11,
+ .param_count = 4,
+ .since_version = "2.6",
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the epoll instance during control operations")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "epnested_mutex", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_DESC("Global mutex to prevent circular epoll structures (acquired for nested epoll)")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "epoll interest list",
+ "Adds new epitem structure to the epoll interest list")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_ADD")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_FREE_MEMORY,
+ "epoll interest list",
+ "Removes epitem structure from the epoll interest list")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_DEL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "epoll event mask",
+ "Modifies the event mask for an existing epitem")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_MOD")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "file reference count",
+ "Increases reference count on the monitored file")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_ADD")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "file reference count",
+ "Decreases reference count on the monitored file")
+ KAPI_EFFECT_CONDITION("When op is EPOLL_CTL_DEL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "May wake up processes waiting on the epoll instance if events become available")
+ KAPI_EFFECT_CONDITION("When adding or modifying entries that match current events")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(6)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "epoll entry", "non-existent", "monitored",
+ "File descriptor is added to epoll interest list")
+ KAPI_STATE_TRANS_COND("When op is EPOLL_CTL_ADD")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "epoll entry", "monitored", "non-existent",
+ "File descriptor is removed from epoll interest list")
+ KAPI_STATE_TRANS_COND("When op is EPOLL_CTL_DEL")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "epoll entry", "monitored with events A", "monitored with events B",
+ "Event mask for file descriptor is modified")
+ KAPI_STATE_TRANS_COND("When op is EPOLL_CTL_MOD")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "epoll ready list", "empty or partial", "contains new events",
+ "Ready list may be updated if new/modified entry has pending events")
+ KAPI_STATE_TRANS_COND("When monitored fd has events matching the mask")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During mutex acquisition or memory allocation")
+ KAPI_SIGNAL_DESC("Returns -EINTR if interrupted before completing the operation")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During mutex acquisition or memory allocation")
+ KAPI_SIGNAL_DESC("Returns -EINTR if interrupted before completing the operation")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("At any point during the syscall")
+ KAPI_SIGNAL_DESC("Process is terminated immediately, operation may be partially completed")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, SIGHUP, "SIGHUP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("During blocking operations")
+ KAPI_SIGNAL_DESC("Returns -EINTR if terminal hangup occurs")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGURG, "SIGURG", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("May be generated by monitored sockets")
+ KAPI_SIGNAL_DESC("Urgent data signals from monitored sockets do not affect epoll_ctl")
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+
+ /* Capability specifications */
+ KAPI_CAPABILITY(0, CAP_BLOCK_SUSPEND, "CAP_BLOCK_SUSPEND", KAPI_CAP_GRANT_PERMISSION)
+ KAPI_CAP_ALLOWS("Use EPOLLWAKEUP flag to prevent system suspend while epoll events are pending")
+ KAPI_CAP_WITHOUT("EPOLLWAKEUP flag is silently ignored/stripped from events mask")
+ KAPI_CAP_CONDITION("Checked when EPOLLWAKEUP flag is set in epoll_event.events")
+ KAPI_CAP_PRIORITY(0)
+ KAPI_CAPABILITY_END
+
+ KAPI_CAPABILITY_COUNT(1)
+
+ KAPI_EXAMPLES("struct epoll_event ev;\n"
+ "ev.events = EPOLLIN | EPOLLOUT;\n"
+ "ev.data.fd = fd;\n"
+ "if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)\n"
+ " handle_error();\n")
+ KAPI_NOTES("EPOLL_CTL_DEL ignores the event parameter (can be NULL). "
+ "EPOLLEXCLUSIVE flag (Linux 4.5+) can only be used with EPOLL_CTL_ADD and is "
+ "restricted to EPOLLIN/EPOLLOUT events; it provides exclusive wakeup behavior "
+ "where only one thread is woken for the event instead of all threads. "
+ "The epoll instance maintains a reference to registered files until they are "
+ "explicitly removed with EPOLL_CTL_DEL or the epoll instance is closed. "
+ "Memory consumption: Each registered fd uses approximately 90 bytes on 32-bit "
+ "kernels and 160 bytes on 64-bit kernels. "
+ "When using dup() or fork(), multiple epoll instances may monitor the same "
+ "underlying file and all will receive events for that file descriptor.")
+
+ /* Additional constraints */
+ KAPI_CONSTRAINT(0, "Nested Epoll Depth Limit",
+ "Epoll file descriptors can monitor other epoll fds, but nesting is "
+ "limited to EP_MAX_NESTS (4) levels deep to prevent kernel stack "
+ "overflow and excessive resource consumption. The limit is checked "
+ "during EPOLL_CTL_ADD operations.")
+ KAPI_CONSTRAINT_EXPR("epoll_nesting_depth <= 4")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(1, "Circular Dependency Detection",
+ "The kernel detects and prevents circular epoll references where "
+ "epoll instances would monitor each other in a loop. This check "
+ "involves traversing the epoll tree and has O(N) complexity where "
+ "N is the number of epoll fds in the potential loop.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(2, "User Watch Limit",
+ "Total number of file descriptors a user can watch across all epoll "
+ "instances is limited by /proc/sys/fs/epoll/max_user_watches "
+ "(default: 1/25 of lowmem or 1/32 of total memory). This prevents "
+ "memory exhaustion attacks.")
+ KAPI_CONSTRAINT_EXPR("user_watches + 1 <= max_user_watches")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(3, "EPOLLEXCLUSIVE Restrictions",
+ "EPOLLEXCLUSIVE can only be used with EPOLL_CTL_ADD operation and "
+ "is limited to EPOLLIN and EPOLLOUT events. It cannot be used with "
+ "EPOLLONESHOT or level-triggered mode. Only applies to sockets and "
+ "pipes that support exclusive wakeups.")
+ KAPI_CONSTRAINT_EXPR("EPOLLEXCLUSIVE => (op == EPOLL_CTL_ADD && events & (EPOLLIN|EPOLLOUT))")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT_COUNT(4)
+
+KAPI_END_SPEC;
+
SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
struct epoll_event __user *, event)
{
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 05/22] eventpoll: add API specification for epoll_wait
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (3 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 04/22] eventpoll: add API specification for epoll_ctl Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 06/22] eventpoll: add API specification for epoll_pwait Sasha Levin
` (17 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_wait() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 187 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b0334f1af0875..dc2c7d7e777f3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -3026,6 +3026,193 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
return ep_poll(ep, events, maxevents, to);
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_wait)
+ KAPI_DESCRIPTION("Wait for events on an epoll instance")
+ KAPI_LONG_DESC("Waits for events on the epoll instance referred to by epfd. "
+ "The function blocks the calling thread until either at least one of the "
+ "file descriptors referred to by epfd becomes ready for some I/O operation, "
+ "the call is interrupted by a signal handler, or the timeout expires.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_FD)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "events", "struct epoll_event __user *", "Buffer where ready events will be stored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event)) /* Base size of single element */
+ .size_param_idx = 2, /* Size determined by maxevents parameter */
+ .size_multiplier = sizeof(struct epoll_event),
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Must point to an array of at least maxevents epoll_event structures")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "maxevents", "int", "Maximum number of events to return")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_RANGE(1, INT_MAX / sizeof(struct epoll_event)) /* EP_MAX_EVENTS */
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_CONSTRAINT("Must be greater than zero and not exceed system limits")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "timeout", "int", "Timeout in milliseconds")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("-1 blocks indefinitely, 0 returns immediately, >0 specifies milliseconds to wait")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Number of ready file descriptors on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_RANGE,
+ .success_min = 0,
+ .success_max = INT_MAX,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd is not a valid file descriptor",
+ "The epoll file descriptor is invalid or has been closed.")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "events points outside accessible address space",
+ "The memory area pointed to by events is not accessible with write permissions.")
+ KAPI_ERROR(2, -EINTR, "EINTR", "Call interrupted by signal handler",
+ "The call was interrupted by a signal handler before any events "
+ "became ready or the timeout expired.")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid parameters",
+ "epfd is not an epoll file descriptor, or maxevents is less than or equal to zero.")
+
+ .error_count = 4,
+ .param_count = 4,
+ .since_version = "2.6",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "ready list",
+ "Removes events from the epoll ready list as they are reported")
+ KAPI_EFFECT_CONDITION("When events are available and level-triggered")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "Blocks the calling thread until events are available or timeout")
+ KAPI_EFFECT_CONDITION("When timeout != 0 and no events are immediately available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "user memory",
+ "Writes event data to user-provided buffer")
+ KAPI_EFFECT_CONDITION("When events are available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_PROCESS_STATE,
+ "signal state",
+ "Clears TIF_SIGPENDING if a signal was pending")
+ KAPI_EFFECT_CONDITION("When returning due to signal interruption")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(4)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "process", "running", "blocked",
+ "Process blocks waiting for events")
+ KAPI_STATE_TRANS_COND("When no events available and timeout != 0")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process", "blocked", "running",
+ "Process wakes up due to events, timeout, or signal")
+ KAPI_STATE_TRANS_COND("When wait condition is satisfied")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "epoll ready list", "has events", "events consumed",
+ "Ready events are consumed from the epoll instance")
+ KAPI_STATE_TRANS_COND("When returning events to userspace")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "events buffer", "uninitialized", "contains event data",
+ "User buffer is populated with ready events")
+ KAPI_STATE_TRANS_COND("When events are available")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "ANY", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Any pending signal")
+ KAPI_SIGNAL_DESC("Any signal delivered to the thread will interrupt epoll_wait() "
+ "and cause it to return -EINTR. This is checked via signal_pending() "
+ "after checking for available events.")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Always delivered, cannot be blocked")
+ KAPI_SIGNAL_DESC("SIGKILL will terminate the process. The epoll_wait call will "
+ "not return as the process is terminated immediately.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP)
+ KAPI_SIGNAL_CONDITION("Always delivered, cannot be blocked")
+ KAPI_SIGNAL_DESC("SIGSTOP will stop the process. When continued with SIGCONT, "
+ "epoll_wait may return -EINTR if the timeout has not expired.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE)
+ KAPI_SIGNAL_CONDITION("When process is stopped")
+ KAPI_SIGNAL_DESC("SIGCONT resumes a stopped process. If epoll_wait was interrupted "
+ "by SIGSTOP, it may return -EINTR when continued.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGALRM, "SIGALRM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Timer expiration")
+ KAPI_SIGNAL_DESC("SIGALRM from timer expiration will interrupt epoll_wait with -EINTR")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+ .signal_mask_count = 0, /* No signal mask manipulation in epoll_wait */
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the ready list while checking for and consuming events")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects against concurrent epoll_ctl operations during wait")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ KAPI_EXAMPLES("struct epoll_event events[10];\n"
+ "int nfds = epoll_wait(epfd, events, 10, 1000);\n"
+ "if (nfds == -1) {\n"
+ " perror(\"epoll_wait\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}\n"
+ "for (int n = 0; n < nfds; ++n) {\n"
+ " if (events[n].data.fd == listen_sock) {\n"
+ " accept_new_connection();\n"
+ " } else {\n"
+ " handle_io(events[n].data.fd);\n"
+ " }\n"
+ "}")
+ KAPI_NOTES("The timeout uses CLOCK_MONOTONIC and may be rounded up to system clock granularity. "
+ "A timeout of -1 causes epoll_wait to block indefinitely, while a timeout of 0 "
+ "causes it to return immediately even if no events are available. "
+ "The struct epoll_event is defined as containing events (uint32_t) and data (epoll_data_t union). "
+ "Edge-triggered mode (EPOLLET) can cause starvation if not all available data is "
+ "drained when an event is received - new events are only generated on transitions "
+ "from no data to data available. Always read/write until EAGAIN to avoid missing events. "
+ "When using dup() or fork(), events may be delivered to multiple epoll instances "
+ "monitoring the same file descriptor.")
+KAPI_END_SPEC;
+
SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
int, maxevents, int, timeout)
{
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 06/22] eventpoll: add API specification for epoll_pwait
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (4 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 05/22] eventpoll: add API specification for epoll_wait Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2 Sasha Levin
` (16 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_pwait() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 234 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 234 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index dc2c7d7e777f3..07477643b9380 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -3247,6 +3247,240 @@ static int do_epoll_pwait(int epfd, struct epoll_event __user *events,
return error;
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_pwait)
+ KAPI_DESCRIPTION("Wait for events on an epoll instance with signal handling")
+ KAPI_LONG_DESC("Similar to epoll_wait(), but allows the caller to safely wait for "
+ "either events on the epoll instance or the delivery of a signal. "
+ "The sigmask argument specifies a signal mask which is atomically "
+ "set during the wait, allowing signals to be blocked while not waiting "
+ "and ensuring no signal is lost between checking for events and blocking.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_FD)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "events", "struct epoll_event __user *", "Buffer where ready events will be stored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event))
+ .size_param_idx = 2, /* Size determined by maxevents parameter */
+ .size_multiplier = sizeof(struct epoll_event),
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Must point to an array of at least maxevents epoll_event structures")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "maxevents", "int", "Maximum number of events to return")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_RANGE(1, INT_MAX / sizeof(struct epoll_event)) /* EP_MAX_EVENTS */
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_CONSTRAINT("Must be greater than zero and not exceed system limits")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "timeout", "int", "Timeout in milliseconds")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("-1 blocks indefinitely, 0 returns immediately, >0 specifies milliseconds to wait")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "sigmask", "const sigset_t __user *", "Signal mask to atomically set during wait")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(sigset_t))
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Can be NULL if no signal mask change is desired")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(5, "sigsetsize", "size_t", "Size of the signal set in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_RANGE(sizeof(sigset_t), sizeof(sigset_t))
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_CONSTRAINT("Must be sizeof(sigset_t)")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Number of ready file descriptors on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_RANGE,
+ .success_min = 0,
+ .success_max = INT_MAX,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd is not a valid file descriptor",
+ "The epoll file descriptor is invalid or has been closed.")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Memory area not accessible",
+ "The memory area pointed to by events or sigmask is not accessible.")
+ KAPI_ERROR(2, -EINTR, "EINTR", "Call interrupted by signal handler",
+ "The call was interrupted by a signal handler before any events "
+ "became ready or the timeout expired; see signal(7).")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid parameters",
+ "epfd is not an epoll file descriptor, maxevents is less than or equal to zero, "
+ "or sigsetsize is not equal to sizeof(sigset_t).")
+
+ .error_count = 4,
+ .param_count = 6,
+ .since_version = "2.6.19",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "signal mask",
+ "Atomically sets the signal mask for the calling thread")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "ready list",
+ "Removes events from the epoll ready list as they are reported")
+ KAPI_EFFECT_CONDITION("When events are available and level-triggered")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "Blocks the calling thread until events are available, timeout, or signal")
+ KAPI_EFFECT_CONDITION("When timeout != 0 and no events are immediately available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "user memory",
+ "Writes event data to user-provided buffer")
+ KAPI_EFFECT_CONDITION("When events are available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_PROCESS_STATE,
+ "saved signal mask",
+ "Saves and restores the original signal mask")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(5)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "signal mask", "original mask", "user-specified mask",
+ "Thread's signal mask is atomically changed to the provided mask")
+ KAPI_STATE_TRANS_COND("When sigmask is not NULL")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process", "running", "blocked",
+ "Process blocks waiting for events with specified signal mask")
+ KAPI_STATE_TRANS_COND("When no events available and timeout != 0")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "process", "blocked", "running",
+ "Process wakes up due to events, timeout, or unblocked signal")
+ KAPI_STATE_TRANS_COND("When wait condition is satisfied")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal mask", "user-specified mask", "original mask",
+ "Thread's signal mask is restored to its original value")
+ KAPI_STATE_TRANS_COND("When returning from epoll_pwait")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "pending signals", "blocked", "deliverable",
+ "Signals that were blocked by the temporary mask become deliverable")
+ KAPI_STATE_TRANS_COND("When signal mask is restored and signals were pending")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(5)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "ANY_UNBLOCKED", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Signal not blocked by provided sigmask")
+ KAPI_SIGNAL_DESC("Any signal not blocked by the sigmask parameter will interrupt "
+ "epoll_pwait() and cause it to return -EINTR. The signal mask is "
+ "atomically set via set_user_sigmask() and restored via "
+ "restore_saved_sigmask_unless() before returning.")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGKILL cannot be blocked and will terminate the process immediately. "
+ "The epoll_pwait call will not return.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGSTOP cannot be blocked and will stop the process. When continued "
+ "with SIGCONT, epoll_pwait may return -EINTR.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, 0, "BLOCKED_SIGNALS", KAPI_SIGNAL_BLOCK, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("Signals in provided sigmask")
+ KAPI_SIGNAL_DESC("Signals specified in the sigmask parameter are blocked for the "
+ "duration of the epoll_pwait call. They remain pending and will be "
+ "delivered after the signal mask is restored.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE)
+ KAPI_SIGNAL_CONDITION("When process is stopped")
+ KAPI_SIGNAL_DESC("SIGCONT resumes a stopped process. If epoll_pwait was interrupted "
+ "by SIGSTOP, it may return -EINTR when continued.")
+ KAPI_SIGNAL_END
+
+ .signal_count = 5,
+
+ /* Signal mask specifications */
+ KAPI_SIGNAL_MASK(0, "user_sigmask", "User-provided signal mask atomically applied")
+ .description = "The signal mask provided in the sigmask parameter is atomically "
+ "set for the duration of the wait operation. This prevents race "
+ "conditions between checking for events and blocking. The original "
+ "signal mask is restored before epoll_pwait returns, unless the "
+ "return value is -EINTR (in which case the mask is restored by "
+ "the signal delivery mechanism)."
+ KAPI_SIGNAL_MASK_END
+
+ .signal_mask_count = 1,
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the ready list while checking for and consuming events")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects against concurrent epoll_ctl operations during wait")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ KAPI_EXAMPLES("sigset_t sigmask;\n"
+ "struct epoll_event events[10];\n\n"
+ "/* Block SIGINT during epoll_pwait */\n"
+ "sigemptyset(&sigmask);\n"
+ "sigaddset(&sigmask, SIGINT);\n\n"
+ "int nfds = epoll_pwait(epfd, events, 10, 1000, &sigmask, sizeof(sigmask));\n"
+ "if (nfds == -1) {\n"
+ " if (errno == EINTR) {\n"
+ " /* Handle signal */\n"
+ " }\n"
+ " perror(\"epoll_pwait\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}")
+ KAPI_NOTES("epoll_pwait() is equivalent to atomically executing:\n"
+ " sigset_t origmask;\n"
+ " pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);\n"
+ " ready = epoll_wait(epfd, events, maxevents, timeout);\n"
+ " pthread_sigmask(SIG_SETMASK, &origmask, NULL);\n"
+ "This atomicity prevents race conditions where a signal could be delivered "
+ "after checking for events but before blocking in epoll_wait(). "
+ "The signal mask is always restored before epoll_pwait() returns. "
+ "Edge-triggered mode (EPOLLET) can cause starvation if not all available data is "
+ "drained when an event is received. Always read/write until EAGAIN. "
+ "When using dup() or fork(), events may be delivered to multiple epoll instances "
+ "monitoring the same file descriptor.")
+KAPI_END_SPEC;
+
SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
int, maxevents, int, timeout, const sigset_t __user *, sigmask,
size_t, sigsetsize)
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (5 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 06/22] eventpoll: add API specification for epoll_pwait Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 08/22] exec: add API specification for execve Sasha Levin
` (15 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the epoll_pwait2() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/eventpoll.c | 248 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 248 insertions(+)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 07477643b9380..438551d3e13fd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -3492,6 +3492,254 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
sigmask, sigsetsize);
}
+
+DEFINE_KERNEL_API_SPEC(sys_epoll_pwait2)
+ KAPI_DESCRIPTION("Wait for events on an epoll instance with nanosecond precision timeout")
+ KAPI_LONG_DESC("Similar to epoll_pwait(), but takes a timespec structure that allows "
+ "nanosecond precision for the timeout value. This provides more accurate "
+ "timeout control compared to the millisecond precision of epoll_pwait(). "
+ "Like epoll_pwait(), it atomically sets a signal mask during the wait.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "epfd", "int", "File descriptor referring to the epoll instance")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_FD)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "events", "struct epoll_event __user *", "Buffer where ready events will be stored")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(struct epoll_event))
+ .size_param_idx = 2, /* Size determined by maxevents parameter */
+ .size_multiplier = sizeof(struct epoll_event),
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Must point to an array of at least maxevents epoll_event structures")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "maxevents", "int", "Maximum number of events to return")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_RANGE(1, INT_MAX / sizeof(struct epoll_event)) /* EP_MAX_EVENTS */
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_CONSTRAINT("Must be greater than zero and not exceed system limits")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "timeout", "const struct __kernel_timespec __user *", "Timeout with nanosecond precision")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(struct __kernel_timespec))
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("NULL means block indefinitely, {0, 0} returns immediately, "
+ "negative values are invalid")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "sigmask", "const sigset_t __user *", "Signal mask to atomically set during wait")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_SIZE(sizeof(sigset_t))
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Can be NULL if no signal mask change is desired")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(5, "sigsetsize", "size_t", "Size of the signal set in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_RANGE(sizeof(sigset_t), sizeof(sigset_t))
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_CONSTRAINT("Must be sizeof(sigset_t)")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Number of ready file descriptors on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_RANGE,
+ .success_min = 0,
+ .success_max = INT_MAX,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -EBADF, "EBADF", "epfd is not a valid file descriptor",
+ "The epoll file descriptor is invalid or has been closed.")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Memory area not accessible",
+ "The memory area pointed to by events, timeout, or sigmask is not accessible.")
+ KAPI_ERROR(2, -EINTR, "EINTR", "Call interrupted by signal handler",
+ "The call was interrupted by a signal handler before any events "
+ "became ready or the timeout expired.")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid parameters",
+ "epfd is not an epoll file descriptor, maxevents is less than or equal to zero, "
+ "sigsetsize is not equal to sizeof(sigset_t), or timeout values are invalid.")
+
+ .error_count = 4,
+ .param_count = 6,
+ .since_version = "5.11",
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "signal mask",
+ "Atomically sets the signal mask for the calling thread")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "ready list",
+ "Removes events from the epoll ready list as they are reported")
+ KAPI_EFFECT_CONDITION("When events are available and level-triggered")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "Blocks the calling thread until events, timeout, or signal")
+ KAPI_EFFECT_CONDITION("When timeout != NULL or timeout->tv_sec/tv_nsec != 0")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "user memory",
+ "Writes event data to user-provided buffer")
+ KAPI_EFFECT_CONDITION("When events are available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_PROCESS_STATE,
+ "saved signal mask",
+ "Saves and restores the original signal mask")
+ KAPI_EFFECT_CONDITION("When sigmask is not NULL")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_MODIFY_STATE,
+ "timer precision",
+ "Timeout may be rounded up to system timer granularity")
+ KAPI_EFFECT_CONDITION("When timeout is specified")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(6)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "signal mask", "original mask", "user-specified mask",
+ "Thread's signal mask is atomically changed to the provided mask")
+ KAPI_STATE_TRANS_COND("When sigmask is not NULL")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process", "running", "blocked",
+ "Process blocks waiting for events with specified signal mask")
+ KAPI_STATE_TRANS_COND("When no events available and not immediate return")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "process", "blocked", "running",
+ "Process wakes up due to events, timeout expiry, or unblocked signal")
+ KAPI_STATE_TRANS_COND("When wait condition is satisfied")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal mask", "user-specified mask", "original mask",
+ "Thread's signal mask is restored to its original value")
+ KAPI_STATE_TRANS_COND("When returning from epoll_pwait2")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "pending signals", "blocked", "deliverable",
+ "Signals that were blocked by the temporary mask become deliverable")
+ KAPI_STATE_TRANS_COND("When signal mask is restored and signals were pending")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(5, "timeout timer", "not started", "armed with nanosecond precision",
+ "High resolution timer is armed with the specified timeout")
+ KAPI_STATE_TRANS_COND("When timeout is specified and > 0")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(6)
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "ANY_UNBLOCKED", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Signal not blocked by provided sigmask")
+ KAPI_SIGNAL_DESC("Any signal not blocked by the sigmask parameter will interrupt "
+ "epoll_pwait2() and cause it to return -EINTR. Signal handling is "
+ "identical to epoll_pwait().")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGKILL cannot be blocked and will terminate the process immediately.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP)
+ KAPI_SIGNAL_CONDITION("Cannot be blocked by sigmask")
+ KAPI_SIGNAL_DESC("SIGSTOP cannot be blocked and will stop the process.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, 0, "BLOCKED_SIGNALS", KAPI_SIGNAL_BLOCK, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_CONDITION("Signals in provided sigmask")
+ KAPI_SIGNAL_DESC("Signals specified in the sigmask parameter are blocked during "
+ "the epoll_pwait2 call.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE)
+ KAPI_SIGNAL_CONDITION("When process is stopped")
+ KAPI_SIGNAL_DESC("SIGCONT resumes a stopped process. If epoll_pwait2 was interrupted "
+ "by SIGSTOP, it may return -EINTR when continued.")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(5, SIGALRM, "SIGALRM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Timer expiration")
+ KAPI_SIGNAL_DESC("SIGALRM or other timer signals will interrupt epoll_pwait2 with -EINTR "
+ "if not blocked by sigmask")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ .signal_count = 6,
+
+ /* Signal mask specifications */
+ KAPI_SIGNAL_MASK(0, "user_sigmask", "User-provided signal mask atomically applied")
+ .description = "The signal mask is atomically set and restored exactly as in "
+ "epoll_pwait(), providing the same race-condition prevention."
+ KAPI_SIGNAL_MASK_END
+
+ .signal_mask_count = 1,
+
+ /* Locking specifications */
+ KAPI_LOCK(0, "ep->lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects the ready list while checking for and consuming events")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "ep->mtx", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects against concurrent epoll_ctl operations during wait")
+ KAPI_LOCK_END
+
+ .lock_count = 2,
+
+ KAPI_EXAMPLES("sigset_t sigmask;\n"
+ "struct epoll_event events[10];\n"
+ "struct __kernel_timespec ts;\n\n"
+ "/* Block SIGINT during epoll_pwait2 */\n"
+ "sigemptyset(&sigmask);\n"
+ "sigaddset(&sigmask, SIGINT);\n\n"
+ "/* Wait for 1.5 seconds */\n"
+ "ts.tv_sec = 1;\n"
+ "ts.tv_nsec = 500000000; /* 500 milliseconds */\n\n"
+ "int nfds = epoll_pwait2(epfd, events, 10, &ts, &sigmask, sizeof(sigmask));\n"
+ "if (nfds == -1) {\n"
+ " if (errno == EINTR) {\n"
+ " /* Handle signal */\n"
+ " }\n"
+ " perror(\"epoll_pwait2\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}\n\n"
+ "/* Example with infinite timeout */\n"
+ "nfds = epoll_pwait2(epfd, events, 10, NULL, &sigmask, sizeof(sigmask));")
+ KAPI_NOTES("epoll_pwait2() provides nanosecond precision timeouts, addressing the limitation "
+ "of epoll_pwait() which only supports millisecond precision. The timeout parameter "
+ "uses struct __kernel_timespec which is compatible with 64-bit time values, making "
+ "it Y2038-safe. Like epoll_pwait(), the signal mask operation is atomic. "
+ "The timeout is still subject to system timer granularity and may be rounded up. "
+ "Edge-triggered mode (EPOLLET) can cause starvation if not all available data is "
+ "drained when an event is received. Always read/write until EAGAIN. "
+ "When using dup() or fork(), events may be delivered to multiple epoll instances "
+ "monitoring the same file descriptor.")
+KAPI_END_SPEC;
+
SYSCALL_DEFINE6(epoll_pwait2, int, epfd, struct epoll_event __user *, events,
int, maxevents, const struct __kernel_timespec __user *, timeout,
const sigset_t __user *, sigmask, size_t, sigsetsize)
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 08/22] exec: add API specification for execve
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (6 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2 Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 09/22] exec: add API specification for execveat Sasha Levin
` (14 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add comprehensive kernel API specification for the execve() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/exec.c | 364 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 364 insertions(+)
diff --git a/fs/exec.c b/fs/exec.c
index 1f5fdd2e096e3..fd0c88f7be33b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -52,6 +52,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/syscall_api_spec.h>
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/audit.h>
@@ -1997,7 +1998,370 @@ void set_dumpable(struct mm_struct *mm, int value)
set_mask_bits(&mm->flags, MMF_DUMPABLE_MASK, value);
}
+
+DEFINE_KERNEL_API_SPEC(sys_execve)
+ KAPI_DESCRIPTION("Execute a new program")
+ KAPI_LONG_DESC("Executes the program referred to by filename. This causes the program "
+ "that is currently being run by the calling process to be replaced with "
+ "a new program, with newly initialized stack, heap, and (initialized and "
+ "uninitialized) data segments. The process ID remains the same.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "filename", "const char __user *", "Pathname of the program to execute")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_PATH)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Must be a valid pathname to an executable file or script")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "argv", "const char __user *const __user *", "Array of argument strings passed to the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("NULL-terminated array of pointers to null-terminated strings")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "envp", "const char __user *const __user *", "Array of environment strings for the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("NULL-terminated array of pointers to null-terminated strings in form key=value")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "Does not return on success; negative error code on error")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -E2BIG, "E2BIG", "Argument list too long",
+ "The total size of argv and envp exceeds the system limit.")
+ KAPI_ERROR(1, -EACCES, "EACCES", "Permission denied",
+ "Search permission denied on a component of the path, file is not regular, "
+ "or execute permission denied for file or interpreter.")
+ KAPI_ERROR(2, -EFAULT, "EFAULT", "Bad address",
+ "filename, argv, or envp points outside accessible address space.")
+ KAPI_ERROR(3, -EINVAL, "EINVAL", "Invalid executable format",
+ "An ELF executable has more than one PT_INTERP segment.")
+ KAPI_ERROR(4, -EIO, "EIO", "I/O error",
+ "An I/O error occurred while reading from the file system.")
+ KAPI_ERROR(5, -EISDIR, "EISDIR", "Is a directory",
+ "An ELF interpreter was a directory.")
+ KAPI_ERROR(6, -ELIBBAD, "ELIBBAD", "Invalid ELF interpreter",
+ "An ELF interpreter was not in a recognized format.")
+ KAPI_ERROR(7, -ELOOP, "ELOOP", "Too many symbolic links",
+ "Too many symbolic links encountered while resolving filename or interpreter.")
+ KAPI_ERROR(8, -EMFILE, "EMFILE", "Too many open files",
+ "The per-process limit on open file descriptors has been reached.")
+ KAPI_ERROR(9, -ENAMETOOLONG, "ENAMETOOLONG", "Filename too long",
+ "filename or one of the strings in argv or envp is too long.")
+ KAPI_ERROR(10, -ENFILE, "ENFILE", "System file table overflow",
+ "The system-wide limit on open files has been reached.")
+ KAPI_ERROR(11, -ENOENT, "ENOENT", "File not found",
+ "The file filename or an interpreter does not exist.")
+ KAPI_ERROR(12, -ENOEXEC, "ENOEXEC", "Exec format error",
+ "An executable is not in a recognized format, is for wrong architecture, "
+ "or has other format errors preventing execution.")
+ KAPI_ERROR(13, -ENOMEM, "ENOMEM", "Out of memory",
+ "Insufficient kernel memory available.")
+ KAPI_ERROR(14, -ENOTDIR, "ENOTDIR", "Not a directory",
+ "A component of the path prefix is not a directory.")
+ KAPI_ERROR(15, -EPERM, "EPERM", "Operation not permitted",
+ "The filesystem is mounted nosuid, the user is not root, and the file has "
+ "set-user-ID or set-group-ID bit set.")
+ KAPI_ERROR(16, -ETXTBSY, "ETXTBSY", "Text file busy",
+ "The executable was open for writing by one or more processes.")
+ KAPI_ERROR(17, -EAGAIN, "EAGAIN", "Resource temporarily unavailable",
+ "RLIMIT_NPROC limit exceeded - too many processes for this user.")
+
+ KAPI_ERROR_COUNT(18)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("1.0")
+ KAPI_EXAMPLES("char *argv[] = { \"echo\", \"hello\", \"world\", NULL };\n"
+ "char *envp[] = { \"PATH=/bin\", NULL };\n"
+ "execve(\"/bin/echo\", argv, envp);\n"
+ "/* This point is only reached on error */\n"
+ "perror(\"execve failed\");\n"
+ "exit(EXIT_FAILURE);")
+ KAPI_NOTES("On success, execve() does not return; the new program is executed. "
+ "File descriptors remain open unless marked close-on-exec. "
+ "Signal handling undergoes major changes: handlers reset to default (except SIG_IGN), "
+ "pending signals cleared, alternate signal stack removed, but signal mask preserved. "
+ "Fatal signals can interrupt exec causing process termination. "
+ "Multi-threaded programs: all threads except caller are killed with SIGKILL. "
+ "Traced processes receive SIGTRAP after successful exec. "
+ "Point of no return: if exec fails after destroying old image, SIGSEGV is forced.")
+
+ /* Fatal signals can interrupt exec */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during exec setup")
+ KAPI_SIGNAL_DESC("Fatal signals (checked via fatal_signal_pending()) can interrupt "
+ "exec during setup phases like de_thread(). This causes exec to fail "
+ "and the process to exit.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_END
+
+ /* SIGKILL sent to other threads */
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_TARGET("All other threads in the thread group")
+ KAPI_SIGNAL_CONDITION("Multi-threaded process doing exec")
+ KAPI_SIGNAL_DESC("During de_thread(), zap_other_threads() sends SIGKILL to all "
+ "other threads in the thread group to ensure only the execing "
+ "thread survives.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE(KAPI_SIGNAL_QUEUE_STANDARD)
+ KAPI_SIGNAL_END
+
+ /* Signal handlers reset */
+ KAPI_SIGNAL(2, 0, "ALL_HANDLERS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal has a handler installed")
+ KAPI_SIGNAL_DESC("flush_signal_handlers() resets all signal handlers to SIG_DFL "
+ "except for signals that are ignored (SIG_IGN). This happens "
+ "after de_thread() completes.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Handler information cleared from signal struct")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_SA_FLAGS_FORBID(SA_RESTART | SA_NODEFER | SA_RESETHAND | SA_SIGINFO | SA_ONSTACK)
+ KAPI_SIGNAL_END
+
+ /* Ignored signals preserved */
+ KAPI_SIGNAL(3, 0, "IGNORED_SIGNALS", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal disposition is SIG_IGN")
+ KAPI_SIGNAL_DESC("Signals set to SIG_IGN are preserved across exec. This is "
+ "POSIX-compliant behavior allowing parent processes to ignore "
+ "signals in children.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Ignored signals remain ignored")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* Pending signals cleared */
+ KAPI_SIGNAL(4, 0, "PENDING_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_DISCARD)
+ KAPI_SIGNAL_CONDITION("Any pending signals")
+ KAPI_SIGNAL_DESC("All pending signals are cleared during exec. This includes "
+ "both thread-specific and process-wide pending signals.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("All pending signals discarded")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* Timer signals cleared */
+ KAPI_SIGNAL(5, 0, "TIMER_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Timer-generated signals pending")
+ KAPI_SIGNAL_DESC("flush_itimer_signals() clears any pending timer signals "
+ "(SIGALRM, SIGVTALRM, SIGPROF) to prevent confusion in the "
+ "new program.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Timer signals cleared from pending queue")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* Exit signal set to SIGCHLD */
+ KAPI_SIGNAL(6, SIGCHLD, "SIGCHLD", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Parent process when this process exits")
+ KAPI_SIGNAL_CONDITION("Process exit after exec")
+ KAPI_SIGNAL_DESC("The exit_signal is set to SIGCHLD during exec, ensuring the "
+ "parent will receive SIGCHLD when this process terminates.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_EXIT)
+ KAPI_SIGNAL_PRIORITY(2)
+ KAPI_SIGNAL_QUEUE("Standard signal delivery to parent")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* Alternate signal stack cleared */
+ KAPI_SIGNAL(7, 0, "SIGALTSTACK", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Process had alternate signal stack")
+ KAPI_SIGNAL_DESC("Any alternate signal stack (sigaltstack) is not preserved "
+ "across exec. The new program starts with no alternate stack.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Alternate stack configuration cleared")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_SA_FLAGS_FORBID(SA_ONSTACK)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* SIGTRAP sent when process is being traced */
+ KAPI_SIGNAL(8, SIGTRAP, "SIGTRAP", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Process is being traced (PTRACE_ATTACH or PTRACE_TRACEME)")
+ KAPI_SIGNAL_DESC("If the process is being traced, a SIGTRAP is sent to the "
+ "process after successful exec. This allows debuggers to gain "
+ "control after the new program is loaded but before it starts executing.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_EXIT)
+ KAPI_SIGNAL_PRIORITY(10)
+ KAPI_SIGNAL_QUEUE(KAPI_SIGNAL_QUEUE_STANDARD)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_TRACED)
+ KAPI_SIGNAL_END
+
+ /* SIGSEGV sent on point of no return failure */
+ KAPI_SIGNAL(9, SIGSEGV, "SIGSEGV", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_COREDUMP)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Exec fails after point of no return")
+ KAPI_SIGNAL_DESC("If exec fails after the point of no return (when the old "
+ "process image has been destroyed), force_fatal_sig(SIGSEGV) "
+ "is called to terminate the process since it cannot continue.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_EXIT)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_STATE_FORBID(KAPI_SIGNAL_STATE_ZOMBIE | KAPI_SIGNAL_STATE_DEAD)
+ KAPI_SIGNAL_END
+
+ /* Signal mask preserved */
+ KAPI_SIGNAL(10, 0, "SIGNAL_MASK", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Process has blocked signals")
+ KAPI_SIGNAL_DESC("The signal mask (blocked signals) is preserved across exec. "
+ "This allows processes to block signals before exec and have "
+ "them remain blocked in the new program.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(2)
+ KAPI_SIGNAL_QUEUE("Signal mask preserved across exec")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* Realtime signal queues cleared */
+ KAPI_SIGNAL(11, 0, "REALTIME_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_DISCARD)
+ KAPI_SIGNAL_CONDITION("Realtime signals queued")
+ KAPI_SIGNAL_DESC("All queued realtime signals (SIGRTMIN to SIGRTMAX) are "
+ "discarded during exec. The realtime signal queue is cleared.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE(KAPI_SIGNAL_QUEUE_REALTIME)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ .signal_count = 12,
+
+ /* Signal masks */
+ KAPI_SIGNAL_MASK(0, "critical_section_mask", "Signals blocked during critical exec sections")
+ /* During de_thread and other critical sections, certain signals may be blocked */
+ KAPI_SIGNAL_MASK_END
+
+ KAPI_SIGNAL_MASK_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_FREE_MEMORY | KAPI_EFFECT_ALLOC_MEMORY,
+ "process image",
+ "Replaces entire process image including code, data, heap, and stack")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_RESOURCE_DESTROY,
+ "file descriptors",
+ "Closes all file descriptors with close-on-exec flag set")
+ KAPI_EFFECT_CONDITION("FD_CLOEXEC flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "signal handlers",
+ "Resets all signal handlers to default, preserves ignored signals")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_SIGNAL_SEND,
+ "thread group",
+ "Kills all other threads in the thread group with SIGKILL")
+ KAPI_EFFECT_CONDITION("Multi-threaded process")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "process attributes",
+ "Clears pending signals, timers, alternate signal stack, and various process attributes")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_FILESYSTEM,
+ "executable file",
+ "Opens and reads the executable file, may trigger filesystem operations")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(6, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_FREE_MEMORY,
+ "shared memory",
+ "Detaches and unmaps all POSIX shared memory regions (shm_open/mmap)")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(7)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "process memory",
+ "old program image", "new program image",
+ "Complete replacement of process address space with new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process credentials",
+ "current credentials", "potentially modified credentials",
+ "May change effective UID/GID based on file permissions")
+ KAPI_STATE_TRANS_COND("setuid/setgid binary")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "thread state",
+ "multi-threaded", "single-threaded",
+ "Process becomes single-threaded after killing other threads")
+ KAPI_STATE_TRANS_COND("Multi-threaded process")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal state",
+ "custom handlers and pending signals", "default handlers, no pending signals",
+ "Signal handling reset to clean state for new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+
+ /* Additional constraints */
+ KAPI_CONSTRAINT(0, "Argument Size Limits",
+ "Total size of argv[] and envp[] is limited by ARG_MAX (usually 2MB on "
+ "modern systems). This includes both the string data and the pointer "
+ "arrays. The limit is checked early to prevent excessive memory allocation. "
+ "Individual argument strings are limited to MAX_ARG_STRLEN (131072 bytes).")
+ KAPI_CONSTRAINT_EXPR("total_arg_size <= ARG_MAX && each_arg_len <= MAX_ARG_STRLEN")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(1, "RLIMIT_NPROC Process Limit",
+ "The RLIMIT_NPROC limit is enforced during execve() to prevent fork bombs. "
+ "If the user would exceed their process limit after exec, the PF_NPROC_EXCEEDED "
+ "flag is set and exec may fail. This check happens even though exec doesn't "
+ "create new processes, to catch programs that exec after fork.")
+ KAPI_CONSTRAINT_EXPR("user_processes < RLIMIT_NPROC || PF_NPROC_EXCEEDED")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(2, "Binary Format Handlers",
+ "The kernel tries binary format handlers in order: built-in formats "
+ "(ELF, script, etc.), then modular formats. Script interpreters can "
+ "recurse up to 5 levels deep (BINPRM_MAX_RECURSION). Each recursion "
+ "consumes one argv slot for the interpreter path.")
+ KAPI_CONSTRAINT_EXPR("recursion_depth <= 5")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(3, "Security Transitions",
+ "Security modules (SELinux, AppArmor, etc.) may deny exec based on "
+ "policy. Capabilities are recalculated based on file capabilities "
+ "and setuid/setgid bits. The dumpable flag is cleared for security "
+ "transitions. AT_SECURE is set in auxiliary vector for secure execs.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(4, "Stack Size Calculation",
+ "Stack size is limited by RLIMIT_STACK but adjusted based on "
+ "personality flags and architecture requirements. The stack must "
+ "accommodate argv, envp, auxiliary vector, and initial program stack. "
+ "On some architectures, executable stack is determined by PT_GNU_STACK.")
+ KAPI_CONSTRAINT_EXPR("stack_size >= ARG_MAX && stack_size <= RLIMIT_STACK")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT_COUNT(5)
+
+KAPI_END_SPEC;
SYSCALL_DEFINE3(execve,
+
const char __user *, filename,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 09/22] exec: add API specification for execveat
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (7 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 08/22] exec: add API specification for execve Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 10/22] mm/mlock: add API specification for mlock Sasha Levin
` (13 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add comprehensive kernel API specification for the execveat() system
call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/exec.c | 342 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 340 insertions(+), 2 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index fd0c88f7be33b..078781d4510f0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2029,8 +2029,8 @@ DEFINE_KERNEL_API_SPEC(sys_execve)
KAPI_PARAM_END
KAPI_RETURN("long", "Does not return on success; negative error code on error")
- .type = KAPI_TYPE_INT,
- .check_type = KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
KAPI_RETURN_END
KAPI_ERROR(0, -E2BIG, "E2BIG", "Argument list too long",
@@ -2369,6 +2369,344 @@ SYSCALL_DEFINE3(execve,
return do_execve(getname(filename), argv, envp);
}
+
+/* Valid flag combinations for execveat */
+static const s64 execveat_valid_flags[] = {
+ 0,
+ AT_EMPTY_PATH,
+ AT_SYMLINK_NOFOLLOW,
+ AT_EXECVE_CHECK,
+ AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+ AT_EMPTY_PATH | AT_EXECVE_CHECK,
+ AT_SYMLINK_NOFOLLOW | AT_EXECVE_CHECK,
+ AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW | AT_EXECVE_CHECK,
+};
+
+DEFINE_KERNEL_API_SPEC(sys_execveat)
+ KAPI_DESCRIPTION("Execute a new program relative to a directory file descriptor")
+ KAPI_LONG_DESC("Executes the program referred to by the combination of fd and filename. "
+ "This system call is useful when implementing a secure execution environment "
+ "or when the calling process has an open file descriptor but no access to "
+ "the corresponding pathname. Like execve(), it replaces the current process "
+ "image with a new process image.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "fd", "int", "Directory file descriptor")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_FD)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("AT_FDCWD for current directory, or valid directory file descriptor")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "filename", "const char __user *", "Pathname of the program to execute")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL)
+ KAPI_PARAM_TYPE(KAPI_TYPE_PATH)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Relative or absolute path; empty string with AT_EMPTY_PATH to use fd directly")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "argv", "const char __user *const __user *", "Array of argument strings passed to the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("NULL-terminated array of pointers to null-terminated strings")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "envp", "const char __user *const __user *", "Array of environment strings for the new program")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER)
+ KAPI_PARAM_TYPE(KAPI_TYPE_USER_PTR)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("NULL-terminated array of pointers to null-terminated strings in form key=value")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "flags", "int", "Execution flags")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_MASK)
+ KAPI_PARAM_VALID_MASK(AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW | AT_EXECVE_CHECK)
+ KAPI_PARAM_CONSTRAINT("0 or combination of AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW, and AT_EXECVE_CHECK")
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "Does not return on success (except with AT_EXECVE_CHECK which returns 0); negative error code on error")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -E2BIG, "E2BIG", "Argument list too long", "The total size of argv and envp exceeds the system limit.")
+ KAPI_ERROR(1, -EACCES, "EACCES", "Permission denied", "Search permission denied on a component of the path, file is not regular, or execute permission denied for file or interpreter.")
+ KAPI_ERROR(2, -EBADF, "EBADF", "Bad file descriptor", "fd is not a valid file descriptor, or (rare bug) all file descriptors are closed when executing a script.")
+ KAPI_ERROR(3, -EFAULT, "EFAULT", "Bad address", "filename, argv, or envp points outside accessible address space.")
+ KAPI_ERROR(4, -EINVAL, "EINVAL", "Invalid flags or executable format", "Invalid flags specified, or ELF executable has more than one PT_INTERP segment.")
+ KAPI_ERROR(5, -EIO, "EIO", "I/O error", "An I/O error occurred while reading from the file system.")
+ KAPI_ERROR(6, -EISDIR, "EISDIR", "Is a directory", "An ELF interpreter was a directory.")
+ KAPI_ERROR(7, -ELIBBAD, "ELIBBAD", "Invalid ELF interpreter", "An ELF interpreter was not in a recognized format.")
+ KAPI_ERROR(8, -ELOOP, "ELOOP", "Too many symbolic links", "Too many symbolic links encountered, or AT_SYMLINK_NOFOLLOW was specified but filename refers to a symbolic link.")
+ KAPI_ERROR(9, -EMFILE, "EMFILE", "Too many open files", "The per-process limit on open file descriptors has been reached.")
+ KAPI_ERROR(10, -ENAMETOOLONG, "ENAMETOOLONG", "Filename too long", "filename or one of the strings in argv or envp is too long.")
+ KAPI_ERROR(11, -ENFILE, "ENFILE", "System file table overflow", "The system-wide limit on open files has been reached.")
+ KAPI_ERROR(12, -ENOENT, "ENOENT", "File not found", "The file filename or an interpreter does not exist, or filename is empty and AT_EMPTY_PATH was not specified in flags.")
+ KAPI_ERROR(13, -ENOEXEC, "ENOEXEC", "Exec format error", "An executable is not in a recognized format, is for wrong architecture, or has other format errors preventing execution.")
+ KAPI_ERROR(14, -ENOMEM, "ENOMEM", "Out of memory", "Insufficient kernel memory available.")
+ KAPI_ERROR(15, -ENOTDIR, "ENOTDIR", "Not a directory", "A component of the path prefix is not a directory, or fd is not a directory when a relative path is given.")
+ KAPI_ERROR(16, -EPERM, "EPERM", "Operation not permitted", "The filesystem is mounted nosuid, the user is not root, and the file has set-user-ID or set-group-ID bit set.")
+ KAPI_ERROR(17, -ETXTBSY, "ETXTBSY", "Text file busy", "The executable was open for writing by one or more processes.")
+ KAPI_ERROR(18, -EAGAIN, "EAGAIN", "Resource temporarily unavailable", "RLIMIT_NPROC limit exceeded - too many processes for this user.")
+ KAPI_ERROR(19, -EINTR, "EINTR", "Interrupted by signal", "The exec was interrupted by a signal during setup phase.")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during exec setup")
+ KAPI_SIGNAL_DESC("Fatal signals (checked via fatal_signal_pending()) can interrupt exec during setup phases like de_thread(). This causes exec to fail and the process to exit.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGKILL, "SIGKILL", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_TERMINATE)
+ KAPI_SIGNAL_TARGET("All other threads in the thread group")
+ KAPI_SIGNAL_CONDITION("Multi-threaded process doing exec")
+ KAPI_SIGNAL_DESC("During de_thread(), zap_other_threads() sends SIGKILL to all other threads in the thread group to ensure only the execing thread survives.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_QUEUE("Cannot be blocked, caught, or ignored")
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(2, 0, "ALL_HANDLERS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal has a handler installed")
+ KAPI_SIGNAL_DESC("flush_signal_handlers() resets all signal handlers to SIG_DFL except for signals that are ignored (SIG_IGN). This happens after de_thread() completes.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Handler information cleared from signal struct")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_SA_FLAGS_FORBID(SA_SIGINFO | SA_ONSTACK | SA_RESTART | SA_NODEFER | SA_RESETHAND)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(3, 0, "IGNORED_SIGNALS", KAPI_SIGNAL_IGNORE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal disposition is SIG_IGN")
+ KAPI_SIGNAL_DESC("Signals set to SIG_IGN are preserved across exec. This is POSIX-compliant behavior allowing parent processes to ignore signals in children.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Ignored signals remain ignored")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(4, 0, "PENDING_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Any pending signals")
+ KAPI_SIGNAL_DESC("All pending signals are cleared during exec. This includes both thread-specific and process-wide pending signals.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("All pending signals discarded")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(5, 0, "TIMER_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Timer-generated signals pending")
+ KAPI_SIGNAL_DESC("flush_itimer_signals() clears any pending timer signals (SIGALRM, SIGVTALRM, SIGPROF) to prevent confusion in the new program.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Timer signals cleared from pending queue")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(6, SIGCHLD, "SIGCHLD", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Parent process when this process exits")
+ KAPI_SIGNAL_CONDITION("Process exit after exec")
+ KAPI_SIGNAL_DESC("The exit_signal is set to SIGCHLD during exec, ensuring the parent will receive SIGCHLD when this process terminates.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_EXIT)
+ KAPI_SIGNAL_PRIORITY(2)
+ KAPI_SIGNAL_QUEUE("Standard signal delivery to parent")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(7, 0, "SIGALTSTACK", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Process had alternate signal stack")
+ KAPI_SIGNAL_DESC("Any alternate signal stack (sigaltstack) is not preserved across exec. The new program starts with no alternate stack.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Alternate stack configuration cleared")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_SA_FLAGS_FORBID(SA_ONSTACK)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(8, SIGTRAP, "SIGTRAP", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Process is being traced (PTRACE_ATTACH or PTRACE_TRACEME)")
+ KAPI_SIGNAL_DESC("If the process is being traced, a SIGTRAP is sent to the process after successful exec. This allows debuggers to gain control after the new program is loaded but before it starts executing.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_EXIT)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Standard signal delivery")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_TRACED)
+ KAPI_SIGNAL_SA_FLAGS_REQ(SA_SIGINFO)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(9, SIGSEGV, "SIGSEGV", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_COREDUMP)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Past point-of-no-return and exec fails")
+ KAPI_SIGNAL_DESC("If exec fails after the point-of-no-return (after de_thread() or exec_mmap()), SIGSEGV is sent to terminate the process with a core dump.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_QUEUE("Cannot be blocked")
+ KAPI_SIGNAL_STATE_FORBID(KAPI_SIGNAL_STATE_ZOMBIE | KAPI_SIGNAL_STATE_DEAD)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(10, 0, "BLOCKED_SIGNALS", KAPI_SIGNAL_BLOCK, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Signal mask inherited from caller")
+ KAPI_SIGNAL_DESC("The signal mask is preserved across exec. Blocked signals remain blocked in the new program unless it explicitly changes the mask.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME)
+ KAPI_SIGNAL_PRIORITY(2)
+ KAPI_SIGNAL_QUEUE("Signal mask preserved across exec")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(11, 0, "REALTIME_SIGNALS", KAPI_SIGNAL_HANDLE, KAPI_SIGNAL_ACTION_CUSTOM)
+ KAPI_SIGNAL_CONDITION("Realtime signals (SIGRTMIN-SIGRTMAX)")
+ KAPI_SIGNAL_DESC("Realtime signals are handled like standard signals - handlers reset to default, pending signals cleared, but signal mask preserved.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_QUEUE("Pending realtime signals discarded")
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_FREE_MEMORY | KAPI_EFFECT_ALLOC_MEMORY,
+ "process image",
+ "Replaces entire process image including code, data, heap, and stack")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_RESOURCE_DESTROY,
+ "file descriptors",
+ "Closes all file descriptors with close-on-exec flag set")
+ KAPI_EFFECT_CONDITION("FD_CLOEXEC flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "signal handlers",
+ "Resets all signal handlers to default, preserves ignored signals")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_PROCESS_STATE | KAPI_EFFECT_SIGNAL_SEND,
+ "thread group",
+ "Kills all other threads in the thread group with SIGKILL")
+ KAPI_EFFECT_CONDITION("Multi-threaded process")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "process attributes",
+ "Clears pending signals, timers, alternate signal stack, and various process attributes")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_FILESYSTEM,
+ "executable file",
+ "Opens and reads the executable file, may trigger filesystem operations")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(6, KAPI_EFFECT_MODIFY_STATE,
+ "security context",
+ "May change SELinux/AppArmor context based on file labels and transitions")
+ KAPI_EFFECT_CONDITION("LSM enabled")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(7, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_FREE_MEMORY,
+ "shared memory",
+ "Detaches and unmaps all POSIX shared memory regions (shm_open/mmap)")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "process memory",
+ "old program image", "new program image",
+ "Complete replacement of process address space with new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "process credentials",
+ "current credentials", "potentially modified credentials",
+ "May change effective UID/GID based on file permissions")
+ KAPI_STATE_TRANS_COND("setuid/setgid binary")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "thread state",
+ "multi-threaded", "single-threaded",
+ "Process becomes single-threaded after killing other threads")
+ KAPI_STATE_TRANS_COND("Multi-threaded process")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "signal state",
+ "custom handlers and pending signals", "default handlers, no pending signals",
+ "Signal handling reset to clean state for new program")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "file descriptor table",
+ "contains close-on-exec FDs", "close-on-exec FDs closed",
+ "All file descriptors marked FD_CLOEXEC are closed during exec")
+ KAPI_STATE_TRANS_COND("FDs with FD_CLOEXEC")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(5, "working directory",
+ "fd-relative operations", "resolved to absolute paths",
+ "Directory fd operations resolved before exec completes")
+ KAPI_STATE_TRANS_COND("Using dirfd != AT_FDCWD")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "cred_guard_mutex", KAPI_LOCK_MUTEX)
+ KAPI_LOCK_DESC("Protects against concurrent credential changes during exec")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_DESC("Ensures atomic credential transition during exec process")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "sighand->siglock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Protects signal handler modifications")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken during signal handler reset and pending signal clearing")
+ KAPI_LOCK_END
+
+ KAPI_SIDE_EFFECT_COUNT(8)
+ KAPI_STATE_TRANS_COUNT(6)
+
+ .error_count = 20,
+ .param_count = 5,
+ .since_version = "3.19",
+ .examples = "/* Execute /bin/echo using AT_FDCWD */\n"
+ "char *argv[] = { \"echo\", \"hello\", NULL };\n"
+ "char *envp[] = { \"PATH=/bin\", NULL };\n"
+ "execveat(AT_FDCWD, \"/bin/echo\", argv, envp, 0);\n\n"
+ "/* Execute via file descriptor */\n"
+ "int fd = open(\"/bin/echo\", O_PATH);\n"
+ "execveat(fd, \"\", argv, envp, AT_EMPTY_PATH);\n\n"
+ "/* Execute relative to directory fd */\n"
+ "int dirfd = open(\"/bin\", O_RDONLY | O_DIRECTORY);\n"
+ "execveat(dirfd, \"echo\", argv, envp, 0);",
+ KAPI_NOTES("execveat() was added to allow fexecve() to be implemented on systems that "
+ "do not have /proc mounted. When filename is an empty string and AT_EMPTY_PATH "
+ "is specified, the file descriptor fd specifies the file to be executed. "
+ "AT_SYMLINK_NOFOLLOW prevents following symbolic links. "
+ "AT_EXECVE_CHECK (since Linux 6.12) only checks if execution would be allowed "
+ "without actually executing. Like execve(), on success execveat() does not return "
+ "(except with AT_EXECVE_CHECK which returns 0).\n\n"
+ "Known bug: If a script is executed and the close-on-exec flag is set for all "
+ "file descriptors (including 0, 1, and 2), the execveat() call can fail with "
+ "EBADF when the kernel tries to open the interpreter, as it needs an available "
+ "file descriptor slot.")
+ .signal_count = 9,
+ .lock_count = 2,
+KAPI_END_SPEC;
+
SYSCALL_DEFINE5(execveat,
int, fd, const char __user *, filename,
const char __user *const __user *, argv,
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 10/22] mm/mlock: add API specification for mlock
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (8 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 09/22] exec: add API specification for execveat Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 11/22] mm/mlock: add API specification for mlock2 Sasha Levin
` (12 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the mlock() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 3cb72b579ffd3..b97768b1cfa60 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -25,6 +25,7 @@
#include <linux/memcontrol.h>
#include <linux/mm_inline.h>
#include <linux/secretmem.h>
+#include <linux/syscall_api_spec.h>
#include "internal.h"
@@ -658,6 +659,147 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
return 0;
}
+
+DEFINE_KERNEL_API_SPEC(sys_mlock)
+ KAPI_DESCRIPTION("Lock pages in memory")
+ KAPI_LONG_DESC("Locks pages in the specified address range into RAM, "
+ "preventing them from being paged to swap. Requires "
+ "CAP_IPC_LOCK capability or RLIMIT_MEMLOCK resource limit.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address of memory range to lock")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Rounded down to page boundary")
+ KAPI_PARAM_END
+ KAPI_PARAM(1, "len", "size_t", "Length of memory range to lock in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, LONG_MAX)
+ KAPI_PARAM_CONSTRAINT("Rounded up to page boundary")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_SUCCESS(0)
+ KAPI_RETURN_END
+
+ KAPI_ERROR(0, -ENOMEM, "ENOMEM", "Address range issue",
+ "Some of the specified range is not mapped, has unmapped gaps, "
+ "or the lock would cause the number of mapped regions to exceed the limit.")
+ KAPI_ERROR(1, -EPERM, "EPERM", "Insufficient privileges",
+ "The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.")
+ KAPI_ERROR(2, -EINVAL, "EINVAL", "Address overflow",
+ "The result of the addition start+len was less than start (arithmetic overflow).")
+ KAPI_ERROR(3, -EAGAIN, "EAGAIN", "Some or all memory could not be locked",
+ "Some or all of the specified address range could not be locked.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal",
+ "The operation was interrupted by a fatal signal before completion.")
+
+ KAPI_ERROR_COUNT(5)
+ KAPI_PARAM_COUNT(2)
+ KAPI_SINCE_VERSION("2.0")
+
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_END
+
+ KAPI_LOCK_COUNT(1)
+
+ /* Signal specifications */
+ KAPI_SIGNAL_COUNT(1)
+
+ /* Fatal signals can interrupt mmap_write_lock_killable */
+ KAPI_SIGNAL(0, 0, "FATAL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, etc.) can interrupt the operation "
+ "when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_END
+
+ KAPI_EXAMPLES("mlock(addr, 4096); // Lock one page\n"
+ "mlock(addr, len); // Lock range of pages")
+ KAPI_NOTES("Memory locks do not stack - multiple calls on the same range can be "
+ "undone by a single munlock. Locks are not inherited by child processes. "
+ "Pages are locked on whole page boundaries. Commonly used by real-time "
+ "applications to prevent page faults during time-critical operations. "
+ "Also used for security to prevent sensitive data (e.g., cryptographic keys) "
+ "from being written to swap. Note: locked pages may still be saved to "
+ "swap during system suspend/hibernate.")
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "process memory",
+ "Locks pages into physical memory, preventing swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Increases process locked memory counter")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_ALLOC_MEMORY,
+ "physical pages",
+ "May allocate and populate page table entries")
+ KAPI_EFFECT_CONDITION("Pages not already present")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(3)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "memory pages", "swappable", "locked in RAM",
+ "Pages become non-swappable and pinned in physical memory")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "VMA flags", "unlocked", "VM_LOCKED set",
+ "Virtual memory area marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(2)
+
+ /* Capability specifications */
+ KAPI_CAPABILITY(0, CAP_IPC_LOCK, "CAP_IPC_LOCK", KAPI_CAP_BYPASS_CHECK)
+ KAPI_CAP_ALLOWS("Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)")
+ KAPI_CAP_WITHOUT("Must respect RLIMIT_MEMLOCK resource limit")
+ KAPI_CAP_CONDITION("Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit")
+ KAPI_CAP_PRIORITY(0)
+ KAPI_CAPABILITY_END
+
+ KAPI_CAPABILITY_COUNT(1)
+
+ /* Additional constraints */
+ KAPI_CONSTRAINT(0, "RLIMIT_MEMLOCK Resource Limit",
+ "The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes "
+ "of memory that may be locked into RAM. Unprivileged processes are "
+ "restricted to this limit. CAP_IPC_LOCK capability allows bypassing "
+ "this limit entirely. The limit is enforced per-process, not per-user.")
+ KAPI_CONSTRAINT_EXPR("locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(1, "Memory Pressure and OOM",
+ "Locking large amounts of memory can cause system-wide memory pressure "
+ "and potentially trigger the OOM killer. The kernel does not prevent "
+ "locking memory that would destabilize the system.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(2, "Special Memory Areas",
+ "Some memory types cannot be locked or behave specially: "
+ "VM_IO/VM_PFNMAP areas fail with EINVAL; "
+ "Hugetlb pages are inherently pinned; "
+ "DAX mappings are always present in memory; "
+ "VM_LOCKED areas are already locked.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT_COUNT(3)
+
+KAPI_END_SPEC;
+
SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
{
return do_mlock(start, len, VM_LOCKED);
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 11/22] mm/mlock: add API specification for mlock2
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (9 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 10/22] mm/mlock: add API specification for mlock Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 12/22] mm/mlock: add API specification for mlockall Sasha Levin
` (11 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the mlock2() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 163 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index b97768b1cfa60..869c6ba0a7ec8 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -805,6 +805,169 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
return do_mlock(start, len, VM_LOCKED);
}
+
+DEFINE_KERNEL_API_SPEC(sys_mlock2)
+ KAPI_DESCRIPTION("Lock pages in memory with flags")
+ KAPI_LONG_DESC("Enhanced version of mlock() that supports flags. "
+ "MLOCK_ONFAULT flag allows locking pages on fault rather than immediately.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address of memory range to lock")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Rounded down to page boundary")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "len", "size_t", "Length of memory range to lock in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, LONG_MAX)
+ KAPI_PARAM_CONSTRAINT("Rounded up to page boundary")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "flags", "int", "Flags controlling lock behavior")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_MASK)
+ KAPI_PARAM_VALID_MASK(MLOCK_ONFAULT)
+ KAPI_PARAM_CONSTRAINT("Only MLOCK_ONFAULT flag is currently supported")
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid flags", "Unknown flags were specified (flags & ~MLOCK_ONFAULT).")
+ KAPI_ERROR(1, -ENOMEM, "ENOMEM", "Address range issue", "Some of the specified range is not mapped, has unmapped gaps, or the lock would cause the number of mapped regions to exceed the limit.")
+ KAPI_ERROR(2, -EPERM, "EPERM", "Insufficient privileges", "The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.")
+ KAPI_ERROR(3, -EAGAIN, "EAGAIN", "Some or all memory could not be locked", "Some or all of the specified address range could not be locked.")
+ KAPI_ERROR(4, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a fatal signal before completion.")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGBUS, "SIGBUS", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Memory access to locked page fails")
+ KAPI_SIGNAL_DESC("Can be generated if accessing a locked page that cannot be brought into memory (e.g., truncated file mapping)")
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "process memory",
+ "Locks pages into physical memory, preventing swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Pages become resident in RAM")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Increases process locked memory counter")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Counted against RLIMIT_MEMLOCK")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_ALLOC_MEMORY,
+ "page tables",
+ "May allocate and populate page table entries")
+ KAPI_EFFECT_CONDITION("Pages not already present")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "VMA flags",
+ "Sets VM_LOCKED and optionally VM_LOCKONFAULT on affected VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_FILESYSTEM,
+ "page fault behavior",
+ "With MLOCK_ONFAULT, changes how future page faults are handled")
+ KAPI_EFFECT_CONDITION("MLOCK_ONFAULT flag specified")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "memory pages",
+ "swappable", "locked in RAM",
+ "Pages become non-swappable and pinned in physical memory")
+ KAPI_STATE_TRANS_COND("Without MLOCK_ONFAULT")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "VMA flags",
+ "unlocked", "VM_LOCKED set",
+ "Virtual memory area marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "VMA flags",
+ "normal fault", "VM_LOCKONFAULT set",
+ "VMA marked to lock pages on future faults")
+ KAPI_STATE_TRANS_COND("MLOCK_ONFAULT flag specified")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "page residency",
+ "may be swapped", "resident in memory",
+ "Pages brought into RAM and kept there")
+ KAPI_STATE_TRANS_COND("Without MLOCK_ONFAULT")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "process statistics",
+ "normal memory accounting", "locked memory accounting",
+ "Memory counted against RLIMIT_MEMLOCK")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during lock operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving pages to unevictable list when locking pages")
+ KAPI_LOCK_END
+
+ KAPI_ERROR_COUNT(5)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("4.4")
+ KAPI_SIGNAL_COUNT(2)
+ KAPI_SIDE_EFFECT_COUNT(5)
+ KAPI_STATE_TRANS_COUNT(5)
+ KAPI_LOCK_COUNT(2)
+
+ /* Capability specifications */
+ KAPI_CAPABILITY(0, CAP_IPC_LOCK, "CAP_IPC_LOCK", KAPI_CAP_BYPASS_CHECK)
+ KAPI_CAP_ALLOWS("Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)")
+ KAPI_CAP_WITHOUT("Must respect RLIMIT_MEMLOCK resource limit")
+ KAPI_CAP_CONDITION("Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit")
+ KAPI_CAP_PRIORITY(0)
+ KAPI_CAPABILITY_END
+
+ KAPI_CAPABILITY_COUNT(1)
+
+ KAPI_EXAMPLES("mlock2(addr, len, 0); // Same as mlock()\n"
+ "mlock2(addr, len, MLOCK_ONFAULT); // Lock on fault")
+ KAPI_NOTES("MLOCK_ONFAULT flag defers actual page locking until pages are accessed. "
+ "Memory locks do not stack. Locks are not inherited by child processes. "
+ "Commonly used by real-time applications to prevent page faults. Also used "
+ "for security to prevent sensitive data (e.g., cryptographic keys) from being "
+ "written to swap. Note: locked pages may still be saved to swap during "
+ "system suspend/hibernate.")
+KAPI_END_SPEC;
+
SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
{
vm_flags_t vm_flags = VM_LOCKED;
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 12/22] mm/mlock: add API specification for mlockall
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (10 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 11/22] mm/mlock: add API specification for mlock2 Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 13/22] mm/mlock: add API specification for munlock Sasha Levin
` (10 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the mlockall() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 186 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 869c6ba0a7ec8..8f24a31ac5934 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -1049,6 +1049,192 @@ static int apply_mlockall_flags(int flags)
return 0;
}
+
+DEFINE_KERNEL_API_SPEC(sys_mlockall)
+ KAPI_DESCRIPTION("Lock all process pages in memory")
+ KAPI_LONG_DESC("Locks all pages mapped into the process address space. "
+ "MCL_CURRENT locks current pages, MCL_FUTURE locks future mappings, "
+ "MCL_ONFAULT defers locking until page fault.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "flags", "int", "Flags controlling which pages to lock")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_MASK)
+ .valid_mask = MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT,
+ KAPI_PARAM_CONSTRAINT("Must specify MCL_CURRENT and/or MCL_FUTURE; MCL_ONFAULT can be OR'd")
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid flags", "Invalid combination of flags specified, or no flags set, or only MCL_ONFAULT without MCL_CURRENT or MCL_FUTURE.")
+ KAPI_ERROR(1, -EPERM, "EPERM", "Insufficient privileges", "The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.")
+ KAPI_ERROR(2, -ENOMEM, "ENOMEM", "Insufficient resources", "MCL_CURRENT is set and total VM size exceeds RLIMIT_MEMLOCK and caller lacks CAP_IPC_LOCK.")
+ KAPI_ERROR(3, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a signal before completion.")
+ KAPI_ERROR(4, -EAGAIN, "EAGAIN", "Some memory could not be locked", "Some pages could not be locked, possibly due to memory pressure.")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL(1, SIGBUS, "SIGBUS", KAPI_SIGNAL_SEND, KAPI_SIGNAL_ACTION_DEFAULT)
+ KAPI_SIGNAL_TARGET("Current process")
+ KAPI_SIGNAL_CONDITION("Memory access to locked page fails")
+ KAPI_SIGNAL_DESC("Can be generated later if accessing a locked page that cannot be brought into memory (e.g., truncated file mapping)")
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY,
+ "all process memory",
+ "Locks all current pages into physical memory, preventing swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_CURRENT flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->def_flags",
+ "Sets VM_LOCKED in default flags for future mappings")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_FUTURE flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Increases process locked memory counter for entire address space")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_CURRENT flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_ALLOC_MEMORY,
+ "page tables",
+ "May allocate and populate page table entries for all mappings")
+ KAPI_EFFECT_CONDITION("MCL_CURRENT without MCL_ONFAULT")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "VMA flags",
+ "Sets VM_LOCKED on all existing VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_CURRENT flag set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_SCHEDULE,
+ "mm_populate",
+ "Triggers population of entire address space")
+ KAPI_EFFECT_CONDITION("MCL_CURRENT without MCL_ONFAULT")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "all memory pages",
+ "swappable", "locked in RAM",
+ "All pages in process become non-swappable")
+ KAPI_STATE_TRANS_COND("MCL_CURRENT flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "future mappings",
+ "normal", "auto-locked",
+ "New mappings will be automatically locked")
+ KAPI_STATE_TRANS_COND("MCL_FUTURE flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "VMA flags",
+ "varied", "all VM_LOCKED",
+ "All virtual memory areas marked as locked")
+ KAPI_STATE_TRANS_COND("MCL_CURRENT flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "page fault behavior",
+ "normal faulting", "lock on fault",
+ "Pages locked when faulted in rather than immediately")
+ KAPI_STATE_TRANS_COND("MCL_ONFAULT flag set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "process statistics",
+ "partial locked memory", "all memory locked",
+ "Entire VM size counted against RLIMIT_MEMLOCK")
+ KAPI_STATE_TRANS_COND("MCL_CURRENT flag set")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during mlockall operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving pages to unevictable list for all locked pages")
+ KAPI_LOCK_END
+
+ KAPI_ERROR_COUNT(5)
+ KAPI_PARAM_COUNT(1)
+ KAPI_SINCE_VERSION("2.0")
+ KAPI_SIGNAL_COUNT(2)
+ KAPI_SIDE_EFFECT_COUNT(6)
+ KAPI_STATE_TRANS_COUNT(5)
+ KAPI_LOCK_COUNT(2)
+
+ /* Capability specifications */
+ KAPI_CAPABILITY(0, CAP_IPC_LOCK, "CAP_IPC_LOCK", KAPI_CAP_BYPASS_CHECK)
+ KAPI_CAP_ALLOWS("Lock entire process memory exceeding RLIMIT_MEMLOCK")
+ KAPI_CAP_WITHOUT("Total VM size must not exceed RLIMIT_MEMLOCK when MCL_CURRENT is set")
+ KAPI_CAP_CONDITION("Checked when MCL_CURRENT is set and total VM size exceeds RLIMIT_MEMLOCK")
+ KAPI_CAP_PRIORITY(0)
+ KAPI_CAPABILITY_END
+
+ KAPI_CAPABILITY_COUNT(1)
+
+ KAPI_EXAMPLES("mlockall(MCL_CURRENT); // Lock current mappings\n"
+ "mlockall(MCL_CURRENT | MCL_FUTURE); // Lock current and future\n"
+ "mlockall(MCL_CURRENT | MCL_ONFAULT); // Lock current on fault")
+ KAPI_NOTES("Affects all current VMAs and optionally future mappings via mm->def_flags. "
+ "Memory locks are not inherited by child processes after fork(). Commonly used "
+ "by real-time applications to prevent page faults. Also used for security to "
+ "prevent sensitive data (e.g., cryptographic keys) from being written to swap. "
+ "Note: locked pages may still be saved to swap during system suspend/hibernate.")
+
+ /* Additional constraints */
+ KAPI_CONSTRAINT(0, "MCL_FUTURE Persistence",
+ "The MCL_FUTURE flag persists across execve() calls by setting "
+ "mm->def_flags. This means all future memory mappings in the new "
+ "program will be locked. Care must be taken as this can cause "
+ "unexpected memory exhaustion in executed programs.")
+ KAPI_CONSTRAINT_EXPR("MCL_FUTURE => mm->def_flags |= VM_LOCKED")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(1, "Total VM Size Limit",
+ "When MCL_CURRENT is set, the total virtual memory size of the "
+ "process is checked against RLIMIT_MEMLOCK. This differs from "
+ "mlock() which only counts actually locked pages. CAP_IPC_LOCK "
+ "bypasses this check entirely.")
+ KAPI_CONSTRAINT_EXPR("(flags & MCL_CURRENT) => total_vm <= RLIMIT_MEMLOCK || CAP_IPC_LOCK")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(2, "Memory Accounting",
+ "mlockall() with MCL_CURRENT can lock significantly more memory "
+ "than expected, including all shared libraries, heap, stack, and "
+ "mapped files. This can easily exhaust memory limits or cause "
+ "system-wide memory pressure.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT_COUNT(3)
+
+KAPI_END_SPEC;
+
SYSCALL_DEFINE1(mlockall, int, flags)
{
unsigned long lock_limit;
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 13/22] mm/mlock: add API specification for munlock
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (11 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 12/22] mm/mlock: add API specification for mlockall Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 14/22] mm/mlock: add API specification for munlockall Sasha Levin
` (9 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the munlock() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 130 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 8f24a31ac5934..1c9328ec8485c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -981,6 +981,136 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
return do_mlock(start, len, vm_flags);
}
+
+DEFINE_KERNEL_API_SPEC(sys_munlock)
+ KAPI_DESCRIPTION("Unlock pages in memory")
+ KAPI_LONG_DESC("Unlocks pages in the specified address range, allowing them "
+ "to be paged out to swap if needed.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address of memory range to unlock")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Rounded down to page boundary")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "len", "size_t", "Length of memory range to unlock in bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, LONG_MAX)
+ KAPI_PARAM_CONSTRAINT("Rounded up to page boundary")
+ KAPI_PARAM_END
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -ENOMEM, "ENOMEM", "Memory range not mapped", "(Linux 2.6.9 and later) Some of the specified address range does not correspond to mapped pages in the process address space.")
+ KAPI_ERROR(1, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a signal before completion.")
+ KAPI_ERROR(2, -EINVAL, "EINVAL", "Address overflow", "The result of the addition start+len was less than start (arithmetic overflow).")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "process memory",
+ "Unlocks pages, making them eligible for swapping")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Pages were previously locked")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Decreases process locked memory counter")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Pages were counted in locked_vm")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "VMA flags",
+ "Clears VM_LOCKED and VM_LOCKONFAULT from affected VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "page flags",
+ "Clears PG_mlocked flag from unlocked pages")
+ KAPI_EFFECT_CONDITION("Pages had PG_mlocked set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "LRU lists",
+ "Moves pages from unevictable to appropriate LRU list")
+ KAPI_EFFECT_CONDITION("Pages were on unevictable list")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "memory pages",
+ "locked in RAM", "swappable",
+ "Pages become eligible for swap out")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "VMA flags",
+ "VM_LOCKED set", "VM_LOCKED cleared",
+ "Virtual memory areas no longer marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "page residency",
+ "guaranteed resident", "may be swapped",
+ "Pages can now be evicted under memory pressure")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "process statistics",
+ "locked memory accounted", "normal memory accounting",
+ "Memory no longer counted against RLIMIT_MEMLOCK")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "page LRU status",
+ "unevictable list", "active/inactive list",
+ "Pages moved to normal LRU lists for reclaim")
+ KAPI_STATE_TRANS_COND("Pages were mlocked")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during unlock operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving pages from unevictable to normal LRU lists")
+ KAPI_LOCK_END
+
+ KAPI_ERROR_COUNT(3)
+ KAPI_PARAM_COUNT(2)
+ KAPI_SINCE_VERSION("2.0")
+ KAPI_SIGNAL_COUNT(1)
+ KAPI_SIDE_EFFECT_COUNT(5)
+ KAPI_STATE_TRANS_COUNT(5)
+ KAPI_LOCK_COUNT(2)
+ KAPI_EXAMPLES("munlock(addr, 4096); // Unlock one page\n"
+ "munlock(addr, len); // Unlock range of pages")
+ KAPI_NOTES("No special permissions required to unlock memory. A single munlock() "
+ "can undo multiple mlock() calls on the same range since locks don't stack.")
+KAPI_END_SPEC;
+
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 14/22] mm/mlock: add API specification for munlockall
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (12 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 13/22] mm/mlock: add API specification for munlock Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
` (8 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the munlockall() system call.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/mlock.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+)
diff --git a/mm/mlock.c b/mm/mlock.c
index 1c9328ec8485c..cf537103ebbc6 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -719,6 +719,12 @@ DEFINE_KERNEL_API_SPEC(sys_mlock)
KAPI_SIGNAL_CONDITION("Fatal signal pending")
KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, etc.) can interrupt the operation "
"when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ENTRY)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
+ KAPI_SIGNAL_RESTARTABLE
KAPI_SIGNAL_END
KAPI_EXAMPLES("mlock(addr, 4096); // Lock one page\n"
@@ -854,6 +860,11 @@ DEFINE_KERNEL_API_SPEC(sys_mlock2)
KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ENTRY)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
KAPI_SIGNAL_RESTARTABLE
KAPI_SIGNAL_END
@@ -861,6 +872,9 @@ DEFINE_KERNEL_API_SPEC(sys_mlock2)
KAPI_SIGNAL_TARGET("Current process")
KAPI_SIGNAL_CONDITION("Memory access to locked page fails")
KAPI_SIGNAL_DESC("Can be generated if accessing a locked page that cannot be brought into memory (e.g., truncated file mapping)")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_SA_FLAGS_REQ(SA_SIGINFO)
KAPI_SIGNAL_END
/* Side effects */
@@ -1020,6 +1034,11 @@ DEFINE_KERNEL_API_SPEC(sys_munlock)
KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ENTRY)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
KAPI_SIGNAL_RESTARTABLE
KAPI_SIGNAL_END
@@ -1214,6 +1233,11 @@ DEFINE_KERNEL_API_SPEC(sys_mlockall)
KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ENTRY)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
KAPI_SIGNAL_RESTARTABLE
KAPI_SIGNAL_END
@@ -1221,6 +1245,9 @@ DEFINE_KERNEL_API_SPEC(sys_mlockall)
KAPI_SIGNAL_TARGET("Current process")
KAPI_SIGNAL_CONDITION("Memory access to locked page fails")
KAPI_SIGNAL_DESC("Can be generated later if accessing a locked page that cannot be brought into memory (e.g., truncated file mapping)")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME)
+ KAPI_SIGNAL_PRIORITY(1)
+ KAPI_SIGNAL_SA_FLAGS_REQ(SA_SIGINFO)
KAPI_SIGNAL_END
/* Side effects */
@@ -1394,6 +1421,132 @@ SYSCALL_DEFINE1(mlockall, int, flags)
return ret;
}
+
+DEFINE_KERNEL_API_SPEC(sys_munlockall)
+ KAPI_DESCRIPTION("Unlock all process pages")
+ KAPI_LONG_DESC("Unlocks all pages mapped into the process address space and "
+ "clears the MCL_FUTURE flag if set.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* No parameters - this is a SYSCALL_DEFINE0 */
+ .param_count = 0,
+
+ /* Return specification */
+ KAPI_RETURN("long", "0 on success, negative error code on failure")
+ .type = KAPI_TYPE_INT,
+ .check_type = KAPI_RETURN_ERROR_CHECK,
+ .success_value = 0,
+ KAPI_RETURN_END
+
+ /* Error codes */
+ KAPI_ERROR(0, -EINTR, "EINTR", "Interrupted by signal", "The operation was interrupted by a signal before completion.")
+ KAPI_ERROR(1, -ENOMEM, "ENOMEM", "Memory operation failed", "Failed to modify memory mappings (should not normally occur).")
+
+ /* Signal specifications */
+ KAPI_SIGNAL(0, 0, "FATAL_SIGNALS", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN)
+ KAPI_SIGNAL_CONDITION("Fatal signal pending during mmap_write_lock_killable")
+ KAPI_SIGNAL_DESC("Fatal signals (SIGKILL, SIGTERM, etc.) can interrupt the operation when acquiring mmap_write_lock_killable(), causing -EINTR return")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ENTRY)
+ KAPI_SIGNAL_PRIORITY(0)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_ERROR(-EINTR)
+ KAPI_SIGNAL_STATE_REQ(KAPI_SIGNAL_STATE_RUNNING | KAPI_SIGNAL_STATE_SLEEPING)
+ KAPI_SIGNAL_RESTARTABLE
+ KAPI_SIGNAL_END
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE,
+ "all process memory",
+ "Unlocks all pages, making entire address space swappable")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("Process had locked pages")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "mm->def_flags",
+ "Clears VM_LOCKED from default flags for future mappings")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_EFFECT_CONDITION("MCL_FUTURE was previously set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_MODIFY_STATE,
+ "mm->locked_vm",
+ "Resets process locked memory counter to zero")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "all VMA flags",
+ "Clears VM_LOCKED and VM_LOCKONFAULT from all VMAs")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "page flags",
+ "Clears PG_mlocked flag from all locked pages")
+ KAPI_EFFECT_CONDITION("Pages had PG_mlocked set")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_MODIFY_STATE,
+ "LRU lists",
+ "Moves all pages from unevictable to normal LRU lists")
+ KAPI_EFFECT_CONDITION("Pages were on unevictable list")
+ KAPI_SIDE_EFFECT_END
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "all memory pages",
+ "locked in RAM", "swappable",
+ "All pages in process become eligible for swap out")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "future mappings",
+ "auto-locked", "normal",
+ "New mappings will no longer be automatically locked")
+ KAPI_STATE_TRANS_COND("MCL_FUTURE was set")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "all VMA flags",
+ "VM_LOCKED set", "VM_LOCKED cleared",
+ "All virtual memory areas no longer marked as locked")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "process statistics",
+ "all memory locked", "no memory locked",
+ "Entire locked memory accounting reset to zero")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(4, "page LRU status",
+ "unevictable list", "active/inactive list",
+ "All pages moved to normal LRU lists for reclaim")
+ KAPI_STATE_TRANS_COND("Pages were mlocked")
+ KAPI_STATE_TRANS_END
+
+ /* Locking information */
+ KAPI_LOCK(0, "mmap_lock", KAPI_LOCK_RWLOCK)
+ KAPI_LOCK_DESC("Process memory map write lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects VMA modifications during munlockall operation")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "lru_lock", KAPI_LOCK_SPINLOCK)
+ KAPI_LOCK_DESC("Per-memcg LRU list lock")
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Taken when moving all pages from unevictable to normal LRU lists")
+ KAPI_LOCK_END
+
+ KAPI_ERROR_COUNT(2)
+ KAPI_SINCE_VERSION("2.0")
+ KAPI_SIGNAL_COUNT(1)
+ KAPI_SIDE_EFFECT_COUNT(6)
+ KAPI_STATE_TRANS_COUNT(5)
+ KAPI_LOCK_COUNT(2)
+ KAPI_EXAMPLES("munlockall(); // Unlock all pages")
+ KAPI_NOTES("Clears VM_LOCKED and VM_LOCKONFAULT from all VMAs and mm->def_flags. "
+ "A single munlockall() can undo multiple mlockall() calls since locks don't stack.")
+KAPI_END_SPEC;
+
SYSCALL_DEFINE0(munlockall)
{
int ret;
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (13 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 14/22] mm/mlock: add API specification for munlockall Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure Sasha Levin
` (7 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add a debugfs interface to expose kernel API specifications at runtime.
This allows tools and users to query the complete API specifications
through the debugfs filesystem.
The interface provides:
- /sys/kernel/debug/kapi/list - lists all available API specifications
- /sys/kernel/debug/kapi/specs/<name> - detailed info for each API
Each specification file includes:
- Function name, version, and descriptions
- Execution context requirements and flags
- Parameter details with types, flags, and constraints
- Return value specifications and success conditions
- Error codes with descriptions and conditions
- Locking requirements and constraints
- Signal handling specifications
- Examples, notes, and deprecation status
This enables runtime introspection of kernel APIs for documentation
tools, static analyzers, and debugging purposes.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/api/Kconfig | 20 +++
kernel/api/Makefile | 5 +-
kernel/api/kapi_debugfs.c | 340 ++++++++++++++++++++++++++++++++++++++
3 files changed, 364 insertions(+), 1 deletion(-)
create mode 100644 kernel/api/kapi_debugfs.c
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
index fde25ec70e134..d2754b21acc43 100644
--- a/kernel/api/Kconfig
+++ b/kernel/api/Kconfig
@@ -33,3 +33,23 @@ config KAPI_RUNTIME_CHECKS
development. The checks use WARN_ONCE to report violations.
If unsure, say N.
+
+config KAPI_SPEC_DEBUGFS
+ bool "Export kernel API specifications via debugfs"
+ depends on KAPI_SPEC
+ depends on DEBUG_FS
+ help
+ This option enables exporting kernel API specifications through
+ the debugfs filesystem. When enabled, specifications can be
+ accessed at /sys/kernel/debug/kapi/.
+
+ The debugfs interface provides:
+ - A list of all available API specifications
+ - Detailed information for each API including parameters,
+ return values, errors, locking requirements, and constraints
+ - Complete machine-readable representation of the specs
+
+ This is useful for documentation tools, static analyzers, and
+ runtime introspection of kernel APIs.
+
+ If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index 4120ded7e5cf1..07b8c007ec156 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -4,4 +4,7 @@
#
# Core API specification framework
-obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
\ No newline at end of file
+obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
+
+# Debugfs interface for kernel API specs
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
\ No newline at end of file
diff --git a/kernel/api/kapi_debugfs.c b/kernel/api/kapi_debugfs.c
new file mode 100644
index 0000000000000..bf65ea6a49205
--- /dev/null
+++ b/kernel/api/kapi_debugfs.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kernel API specification debugfs interface
+ *
+ * This provides a debugfs interface to expose kernel API specifications
+ * at runtime, allowing tools and users to query the complete API specs.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+/* External symbols for kernel API spec section */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+static struct dentry *kapi_debugfs_root;
+
+/* Helper function to print parameter type as string */
+static const char *param_type_str(enum kapi_param_type type)
+{
+ switch (type) {
+ case KAPI_TYPE_INT: return "int";
+ case KAPI_TYPE_UINT: return "uint";
+ case KAPI_TYPE_PTR: return "ptr";
+ case KAPI_TYPE_STRUCT: return "struct";
+ case KAPI_TYPE_UNION: return "union";
+ case KAPI_TYPE_ARRAY: return "array";
+ case KAPI_TYPE_FD: return "fd";
+ case KAPI_TYPE_ENUM: return "enum";
+ case KAPI_TYPE_USER_PTR: return "user_ptr";
+ case KAPI_TYPE_PATH: return "path";
+ case KAPI_TYPE_FUNC_PTR: return "func_ptr";
+ case KAPI_TYPE_CUSTOM: return "custom";
+ default: return "unknown";
+ }
+}
+
+/* Helper to print parameter flags */
+static void print_param_flags(struct seq_file *m, u32 flags)
+{
+ seq_printf(m, " flags: ");
+ if (flags & KAPI_PARAM_IN) seq_printf(m, "IN ");
+ if (flags & KAPI_PARAM_OUT) seq_printf(m, "OUT ");
+ if (flags & KAPI_PARAM_INOUT) seq_printf(m, "INOUT ");
+ if (flags & KAPI_PARAM_OPTIONAL) seq_printf(m, "OPTIONAL ");
+ if (flags & KAPI_PARAM_CONST) seq_printf(m, "CONST ");
+ if (flags & KAPI_PARAM_USER) seq_printf(m, "USER ");
+ if (flags & KAPI_PARAM_VOLATILE) seq_printf(m, "VOLATILE ");
+ if (flags & KAPI_PARAM_DMA) seq_printf(m, "DMA ");
+ if (flags & KAPI_PARAM_ALIGNED) seq_printf(m, "ALIGNED ");
+ seq_printf(m, "\n");
+}
+
+/* Helper to print context flags */
+static void print_context_flags(struct seq_file *m, u32 flags)
+{
+ seq_printf(m, "Context flags: ");
+ if (flags & KAPI_CTX_PROCESS) seq_printf(m, "PROCESS ");
+ if (flags & KAPI_CTX_HARDIRQ) seq_printf(m, "HARDIRQ ");
+ if (flags & KAPI_CTX_SOFTIRQ) seq_printf(m, "SOFTIRQ ");
+ if (flags & KAPI_CTX_NMI) seq_printf(m, "NMI ");
+ if (flags & KAPI_CTX_SLEEPABLE) seq_printf(m, "SLEEPABLE ");
+ if (flags & KAPI_CTX_ATOMIC) seq_printf(m, "ATOMIC ");
+ if (flags & KAPI_CTX_PREEMPT_DISABLED) seq_printf(m, "PREEMPT_DISABLED ");
+ if (flags & KAPI_CTX_IRQ_DISABLED) seq_printf(m, "IRQ_DISABLED ");
+ seq_printf(m, "\n");
+}
+
+/* Show function for individual API spec */
+static int kapi_spec_show(struct seq_file *m, void *v)
+{
+ struct kernel_api_spec *spec = m->private;
+ int i;
+
+ seq_printf(m, "Kernel API Specification\n");
+ seq_printf(m, "========================\n\n");
+
+ /* Basic info */
+ seq_printf(m, "Name: %s\n", spec->name);
+ seq_printf(m, "Version: %u\n", spec->version);
+ seq_printf(m, "Description: %s\n", spec->description);
+ if (strlen(spec->long_description) > 0)
+ seq_printf(m, "Long description: %s\n", spec->long_description);
+
+ /* Context */
+ print_context_flags(m, spec->context_flags);
+ seq_printf(m, "\n");
+
+ /* Parameters */
+ if (spec->param_count > 0) {
+ seq_printf(m, "Parameters (%u):\n", spec->param_count);
+ for (i = 0; i < spec->param_count; i++) {
+ struct kapi_param_spec *param = &spec->params[i];
+ seq_printf(m, " [%d] %s:\n", i, param->name);
+ seq_printf(m, " type: %s (%s)\n",
+ param_type_str(param->type), param->type_name);
+ print_param_flags(m, param->flags);
+ if (strlen(param->description) > 0)
+ seq_printf(m, " description: %s\n", param->description);
+ if (param->size > 0)
+ seq_printf(m, " size: %zu\n", param->size);
+ if (param->alignment > 0)
+ seq_printf(m, " alignment: %zu\n", param->alignment);
+
+ /* Print constraints if any */
+ if (param->constraint_type != KAPI_CONSTRAINT_NONE) {
+ seq_printf(m, " constraints:\n");
+ switch (param->constraint_type) {
+ case KAPI_CONSTRAINT_RANGE:
+ seq_printf(m, " type: range\n");
+ seq_printf(m, " min: %lld\n", param->min_value);
+ seq_printf(m, " max: %lld\n", param->max_value);
+ break;
+ case KAPI_CONSTRAINT_MASK:
+ seq_printf(m, " type: mask\n");
+ seq_printf(m, " valid_bits: 0x%llx\n", param->valid_mask);
+ break;
+ case KAPI_CONSTRAINT_ENUM:
+ seq_printf(m, " type: enum\n");
+ seq_printf(m, " count: %u\n", param->enum_count);
+ break;
+ case KAPI_CONSTRAINT_CUSTOM:
+ seq_printf(m, " type: custom\n");
+ if (strlen(param->constraints) > 0)
+ seq_printf(m, " description: %s\n",
+ param->constraints);
+ break;
+ default:
+ break;
+ }
+ }
+ seq_printf(m, "\n");
+ }
+ }
+
+ /* Return value */
+ seq_printf(m, "Return value:\n");
+ seq_printf(m, " type: %s\n", spec->return_spec.type_name);
+ if (strlen(spec->return_spec.description) > 0)
+ seq_printf(m, " description: %s\n", spec->return_spec.description);
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ seq_printf(m, " success: == %lld\n", spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ seq_printf(m, " success: [%lld, %lld]\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_FD:
+ seq_printf(m, " success: valid file descriptor (>= 0)\n");
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ seq_printf(m, " success: error check\n");
+ break;
+ case KAPI_RETURN_CUSTOM:
+ seq_printf(m, " success: custom check\n");
+ break;
+ default:
+ break;
+ }
+ seq_printf(m, "\n");
+
+ /* Errors */
+ if (spec->error_count > 0) {
+ seq_printf(m, "Errors (%u):\n", spec->error_count);
+ for (i = 0; i < spec->error_count; i++) {
+ struct kapi_error_spec *err = &spec->errors[i];
+ seq_printf(m, " %s (%d): %s\n",
+ err->name, err->error_code, err->description);
+ if (strlen(err->condition) > 0)
+ seq_printf(m, " condition: %s\n", err->condition);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Locks */
+ if (spec->lock_count > 0) {
+ seq_printf(m, "Locks (%u):\n", spec->lock_count);
+ for (i = 0; i < spec->lock_count; i++) {
+ struct kapi_lock_spec *lock = &spec->locks[i];
+ const char *type_str;
+ switch (lock->lock_type) {
+ case KAPI_LOCK_MUTEX: type_str = "mutex"; break;
+ case KAPI_LOCK_SPINLOCK: type_str = "spinlock"; break;
+ case KAPI_LOCK_RWLOCK: type_str = "rwlock"; break;
+ case KAPI_LOCK_SEMAPHORE: type_str = "semaphore"; break;
+ case KAPI_LOCK_RCU: type_str = "rcu"; break;
+ case KAPI_LOCK_SEQLOCK: type_str = "seqlock"; break;
+ default: type_str = "unknown"; break;
+ }
+ seq_printf(m, " %s (%s): %s\n",
+ lock->lock_name, type_str, lock->description);
+ if (lock->acquired)
+ seq_printf(m, " acquired by function\n");
+ if (lock->released)
+ seq_printf(m, " released by function\n");
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Constraints */
+ if (spec->constraint_count > 0) {
+ seq_printf(m, "Additional constraints (%u):\n", spec->constraint_count);
+ for (i = 0; i < spec->constraint_count; i++) {
+ seq_printf(m, " - %s\n", spec->constraints[i].description);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Signals */
+ if (spec->signal_count > 0) {
+ seq_printf(m, "Signal handling (%u):\n", spec->signal_count);
+ for (i = 0; i < spec->signal_count; i++) {
+ struct kapi_signal_spec *sig = &spec->signals[i];
+ seq_printf(m, " %s (%d):\n", sig->signal_name, sig->signal_num);
+ seq_printf(m, " direction: ");
+ if (sig->direction & KAPI_SIGNAL_SEND) seq_printf(m, "send ");
+ if (sig->direction & KAPI_SIGNAL_RECEIVE) seq_printf(m, "receive ");
+ if (sig->direction & KAPI_SIGNAL_HANDLE) seq_printf(m, "handle ");
+ if (sig->direction & KAPI_SIGNAL_BLOCK) seq_printf(m, "block ");
+ if (sig->direction & KAPI_SIGNAL_IGNORE) seq_printf(m, "ignore ");
+ seq_printf(m, "\n");
+ seq_printf(m, " action: ");
+ switch (sig->action) {
+ case KAPI_SIGNAL_ACTION_DEFAULT: seq_printf(m, "default"); break;
+ case KAPI_SIGNAL_ACTION_TERMINATE: seq_printf(m, "terminate"); break;
+ case KAPI_SIGNAL_ACTION_COREDUMP: seq_printf(m, "coredump"); break;
+ case KAPI_SIGNAL_ACTION_STOP: seq_printf(m, "stop"); break;
+ case KAPI_SIGNAL_ACTION_CONTINUE: seq_printf(m, "continue"); break;
+ case KAPI_SIGNAL_ACTION_CUSTOM: seq_printf(m, "custom"); break;
+ case KAPI_SIGNAL_ACTION_RETURN: seq_printf(m, "return"); break;
+ case KAPI_SIGNAL_ACTION_RESTART: seq_printf(m, "restart"); break;
+ default: seq_printf(m, "unknown"); break;
+ }
+ seq_printf(m, "\n");
+ if (strlen(sig->description) > 0)
+ seq_printf(m, " description: %s\n", sig->description);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Additional info */
+ if (strlen(spec->examples) > 0) {
+ seq_printf(m, "Examples:\n%s\n\n", spec->examples);
+ }
+ if (strlen(spec->notes) > 0) {
+ seq_printf(m, "Notes:\n%s\n\n", spec->notes);
+ }
+ if (strlen(spec->since_version) > 0) {
+ seq_printf(m, "Since: %s\n", spec->since_version);
+ }
+ if (spec->deprecated) {
+ seq_printf(m, "DEPRECATED");
+ if (strlen(spec->replacement) > 0)
+ seq_printf(m, " - use %s instead", spec->replacement);
+ seq_printf(m, "\n");
+ }
+
+ return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops = {
+ .open = kapi_spec_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+/* Show all available API specs */
+static int kapi_list_show(struct seq_file *m, void *v)
+{
+ struct kernel_api_spec *spec;
+ int count = 0;
+
+ seq_printf(m, "Available Kernel API Specifications\n");
+ seq_printf(m, "===================================\n\n");
+
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ seq_printf(m, "%s - %s\n", spec->name, spec->description);
+ count++;
+ }
+
+ seq_printf(m, "\nTotal: %d specifications\n", count);
+ return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops = {
+ .open = kapi_list_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+ struct kernel_api_spec *spec;
+ struct dentry *spec_dir;
+
+ /* Create main directory */
+ kapi_debugfs_root = debugfs_create_dir("kapi", NULL);
+
+ /* Create list file */
+ debugfs_create_file("list", 0444, kapi_debugfs_root, NULL, &kapi_list_fops);
+
+ /* Create specs subdirectory */
+ spec_dir = debugfs_create_dir("specs", kapi_debugfs_root);
+
+ /* Create a file for each API spec */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ debugfs_create_file(spec->name, 0444, spec_dir, spec, &kapi_spec_fops);
+ }
+
+ pr_info("Kernel API debugfs interface initialized\n");
+ return 0;
+}
+
+static void __exit kapi_debugfs_exit(void)
+{
+ debugfs_remove_recursive(kapi_debugfs_root);
+}
+
+/* Initialize as part of kernel, not as a module */
+fs_initcall(kapi_debugfs_init);
\ No newline at end of file
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (14 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 17/22] fwctl: add detailed IOCTL API specifications Sasha Levin
` (6 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add IOCTL API specification support to the kernel API specification
framework. This enables detailed documentation and runtime validation of
IOCTL interfaces.
Key features:
- IOCTL specification structure with command info and parameter details
- Registration/unregistration functions for IOCTL specs
- Helper macros for defining IOCTL specifications
- KAPI_IOCTL_SPEC_DRIVER macro for simplified driver integration
- Runtime validation support with KAPI_DEFINE_FOPS wrapper
- Validation of IOCTL parameters and return values
- Integration with existing kernel API spec infrastructure
The validation framework checks:
- Parameter constraints (ranges, enums, masks)
- User pointer validity
- Buffer size constraints
- Return value correctness against specification
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/kernel_api_spec.h | 199 +++++++++++++++++-
kernel/api/Makefile | 5 +-
kernel/api/ioctl_validation.c | 355 ++++++++++++++++++++++++++++++++
kernel/api/kernel_api_spec.c | 89 +++++++-
4 files changed, 642 insertions(+), 6 deletions(-)
create mode 100644 kernel/api/ioctl_validation.c
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
index 1ee76a5f3ee1f..4be9636b19158 100644
--- a/include/linux/kernel_api_spec.h
+++ b/include/linux/kernel_api_spec.h
@@ -779,6 +779,13 @@ struct kernel_api_spec {
char connection_termination[KAPI_MAX_DESC_LEN];
char data_transfer_semantics[KAPI_MAX_DESC_LEN];
#endif /* CONFIG_NET */
+
+ /* IOCTL-specific fields */
+ unsigned int cmd; /* IOCTL command number */
+ char cmd_name[KAPI_MAX_NAME_LEN]; /* Human-readable command name */
+ size_t input_size; /* Size of input structure (0 if none) */
+ size_t output_size; /* Size of output structure (0 if none) */
+ char file_ops_name[KAPI_MAX_NAME_LEN]; /* Name of the file_operations structure */
} __attribute__((packed));
/* Macros for defining API specifications */
@@ -963,6 +970,13 @@ struct kernel_api_spec {
#define KAPI_NOTES(n) \
.notes = n,
+/**
+ * KAPI_SINCE_VERSION - Set the since version
+ * @version: Version string when the API was introduced
+ */
+#define KAPI_SINCE_VERSION(version) \
+ .since_version = version,
+
/**
* KAPI_DEPRECATED - Mark API as deprecated
*/
@@ -1105,10 +1119,10 @@ struct kernel_api_spec {
.constraint_type = KAPI_CONSTRAINT_MASK, \
.valid_mask = mask,
-#define KAPI_FIELD_CONSTRAINT_ENUM(values, count) \
+#define KAPI_FIELD_CONSTRAINT_ENUM(...) \
.constraint_type = KAPI_CONSTRAINT_ENUM, \
- .enum_values = values, \
- .enum_count = count,
+ .enum_values = __VA_ARGS__, \
+ .enum_count = ARRAY_SIZE(__VA_ARGS__),
#define KAPI_STRUCT_FIELD_END },
@@ -1171,6 +1185,20 @@ struct kernel_api_spec {
#define KAPI_STATE_TRANS_COUNT(n) \
.state_trans_count = n,
+/**
+ * KAPI_ERROR_COUNT - Set the error count
+ * @count: Number of errors defined
+ */
+#define KAPI_ERROR_COUNT(count) \
+ .error_count = count,
+
+/**
+ * KAPI_PARAM_COUNT - Set the parameter count
+ * @count: Number of parameters defined
+ */
+#define KAPI_PARAM_COUNT(count) \
+ .param_count = count,
+
/* Helper macros for common side effect patterns */
#define KAPI_EFFECTS_MEMORY (KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_MEMORY)
#define KAPI_EFFECTS_LOCKING (KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_RELEASE)
@@ -1183,7 +1211,7 @@ struct kernel_api_spec {
#define KAPI_PARAM_OUT (KAPI_PARAM_OUT)
#define KAPI_PARAM_INOUT (KAPI_PARAM_IN | KAPI_PARAM_OUT)
#define KAPI_PARAM_OPTIONAL (KAPI_PARAM_OPTIONAL)
-#define KAPI_PARAM_USER_PTR (KAPI_PARAM_USER | KAPI_PARAM_PTR)
+#define KAPI_PARAM_USER_PTR (KAPI_PARAM_USER)
/* Common signal timing constants */
#define KAPI_SIGNAL_TIME_ENTRY "entry"
@@ -1495,6 +1523,169 @@ static inline bool kapi_get_param_constraint(const char *api_name, int param_idx
#define KAPI_CONSTRAINT_COUNT(n) \
.constraint_count = n,
+/* IOCTL-specific functions */
+#ifdef CONFIG_KAPI_SPEC
+int kapi_register_ioctl_spec(const struct kernel_api_spec *spec);
+void kapi_unregister_ioctl_spec(unsigned int cmd);
+const struct kernel_api_spec *kapi_get_ioctl_spec(unsigned int cmd);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+struct file;
+int kapi_validate_ioctl(struct file *filp, unsigned int cmd, void __user *arg);
+int kapi_validate_ioctl_struct(const struct kernel_api_spec *spec,
+ const void *data, size_t size);
+
+/* IOCTL validation wrapper support */
+struct kapi_fops_wrapper {
+ const struct file_operations *real_fops;
+ struct file_operations *wrapped_fops;
+ long (*real_ioctl)(struct file *, unsigned int, unsigned long);
+};
+
+void kapi_register_wrapper(struct kapi_fops_wrapper *wrapper);
+long kapi_ioctl_validation_wrapper(struct file *filp, unsigned int cmd,
+ unsigned long arg);
+
+/* Macro for defining file operations with automatic IOCTL validation */
+#define KAPI_DEFINE_FOPS(name, ...) \
+static const struct file_operations __kapi_real_##name = { \
+ __VA_ARGS__ \
+}; \
+static struct file_operations __kapi_wrapped_##name; \
+static struct kapi_fops_wrapper __kapi_wrapper_##name; \
+static const struct file_operations *name; \
+static void kapi_init_fops_##name(void) \
+{ \
+ if (__kapi_real_##name.unlocked_ioctl) { \
+ __kapi_wrapped_##name = __kapi_real_##name; \
+ __kapi_wrapper_##name.real_fops = &__kapi_real_##name; \
+ __kapi_wrapper_##name.wrapped_fops = &__kapi_wrapped_##name; \
+ __kapi_wrapper_##name.real_ioctl = \
+ __kapi_real_##name.unlocked_ioctl; \
+ __kapi_wrapped_##name.unlocked_ioctl = \
+ kapi_ioctl_validation_wrapper; \
+ kapi_register_wrapper(&__kapi_wrapper_##name); \
+ name = &__kapi_wrapped_##name; \
+ } else { \
+ name = &__kapi_real_##name; \
+ } \
+}
+
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
+
+/* When runtime checks are disabled, no wrapping occurs */
+#define KAPI_DEFINE_FOPS(name, ...) \
+static const struct file_operations name = { __VA_ARGS__ }; \
+static inline void kapi_init_fops_##name(void) {}
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
+#else /* !CONFIG_KAPI_SPEC */
+static inline int kapi_register_ioctl_spec(const struct kernel_api_spec *spec)
+{
+ return 0;
+}
+static inline void kapi_unregister_ioctl_spec(unsigned int cmd) {}
+static inline const struct kernel_api_spec *kapi_get_ioctl_spec(unsigned int cmd)
+{
+ return NULL;
+}
+#endif /* CONFIG_KAPI_SPEC */
+
+/* IOCTL-specific macros */
+
+/**
+ * DEFINE_KAPI_IOCTL_SPEC - Define an IOCTL API specification using kernel_api_spec
+ * @ioctl_name: IOCTL command name/identifier
+ */
+#define DEFINE_KAPI_IOCTL_SPEC(ioctl_name) \
+ static struct kernel_api_spec __kapi_ioctl_spec_##ioctl_name \
+ __used __section(".kapi_specs") = { \
+ .name = __stringify(ioctl_name), \
+ .api_type = KAPI_API_IOCTL, \
+ .version = 1,
+
+/**
+ * KAPI_IOCTL_CMD - Set the IOCTL command number
+ * @cmd_val: The IOCTL command value
+ */
+#define KAPI_IOCTL_CMD(cmd_val) \
+ .cmd = cmd_val,
+
+/**
+ * KAPI_IOCTL_CMD_NAME - Set the IOCTL command name
+ * @name_str: String name of the command
+ */
+#define KAPI_IOCTL_CMD_NAME(name_str) \
+ .cmd_name = name_str,
+
+/**
+ * KAPI_IOCTL_INPUT_SIZE - Set the input structure size
+ * @size: Size of the input structure
+ */
+#define KAPI_IOCTL_INPUT_SIZE(size) \
+ .input_size = size,
+
+/**
+ * KAPI_IOCTL_OUTPUT_SIZE - Set the output structure size
+ * @size: Size of the output structure
+ */
+#define KAPI_IOCTL_OUTPUT_SIZE(size) \
+ .output_size = size,
+
+/**
+ * KAPI_IOCTL_FILE_OPS_NAME - Set the file operations name
+ * @ops_name: Name of the file_operations structure
+ */
+#define KAPI_IOCTL_FILE_OPS_NAME(ops_name) \
+ .file_ops_name = ops_name,
+
+/**
+ * Common IOCTL parameter specifications
+ */
+#define KAPI_IOCTL_PARAM_SIZE \
+ KAPI_PARAM(0, "size", "__u32", "Size of the structure") \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_UINT, \
+ .constraint_type = KAPI_CONSTRAINT_CUSTOM, \
+ .constraints = "Must match sizeof(struct)", \
+ KAPI_PARAM_END
+
+#define KAPI_IOCTL_PARAM_FLAGS \
+ KAPI_PARAM(1, "flags", "__u32", "Feature flags") \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_UINT, \
+ .constraint_type = KAPI_CONSTRAINT_MASK, \
+ .valid_mask = 0, /* 0 means no flags currently */ \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_IOCTL_PARAM_USER_BUF - User buffer parameter
+ * @idx: Parameter index
+ * @name: Parameter name
+ * @desc: Parameter description
+ * @len_idx: Index of the length parameter
+ */
+#define KAPI_IOCTL_PARAM_USER_BUF(idx, name, desc, len_idx) \
+ KAPI_PARAM(idx, name, "__aligned_u64", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN | KAPI_PARAM_USER_PTR) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size_param_idx = len_idx, \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_IOCTL_PARAM_USER_OUT_BUF - User output buffer parameter
+ * @idx: Parameter index
+ * @name: Parameter name
+ * @desc: Parameter description
+ * @len_idx: Index of the length parameter
+ */
+#define KAPI_IOCTL_PARAM_USER_OUT_BUF(idx, name, desc, len_idx) \
+ KAPI_PARAM(idx, name, "__aligned_u64", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT | KAPI_PARAM_USER_PTR) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size_param_idx = len_idx, \
+ KAPI_PARAM_END
+
/* Network operation characteristics macros */
#define KAPI_NET_CONNECTION_ORIENTED \
.is_connection_oriented = true,
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index 07b8c007ec156..9d2daf38f0029 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -6,5 +6,8 @@
# Core API specification framework
obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
+# IOCTL validation framework
+obj-$(CONFIG_KAPI_SPEC) += ioctl_validation.o
+
# Debugfs interface for kernel API specs
-obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
\ No newline at end of file
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
diff --git a/kernel/api/ioctl_validation.c b/kernel/api/ioctl_validation.c
new file mode 100644
index 0000000000000..cf3aa761eec2b
--- /dev/null
+++ b/kernel/api/ioctl_validation.c
@@ -0,0 +1,355 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ioctl_validation.c - Runtime validation for IOCTL API specifications
+ *
+ * Provides functions to validate ioctl parameters against their specifications
+ * at runtime when CONFIG_KAPI_RUNTIME_CHECKS is enabled.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/uaccess.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/container_of.h>
+#include <linux/export.h>
+#include <uapi/fwctl/fwctl.h>
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_ioctl - Validate an ioctl call against its specification
+ * @filp: File pointer
+ * @cmd: IOCTL command
+ * @arg: IOCTL argument
+ *
+ * Return: 0 if valid, negative errno if validation fails
+ */
+int kapi_validate_ioctl(struct file *filp, unsigned int cmd, void __user *arg)
+{
+ const struct kernel_api_spec *spec;
+ void *data = NULL;
+ size_t copy_size;
+ int ret = 0;
+ int i;
+
+ spec = kapi_get_ioctl_spec(cmd);
+ if (!spec)
+ return 0; /* No spec, can't validate */
+
+ pr_debug("kapi: validating ioctl %s (0x%x)\n", spec->cmd_name, cmd);
+
+ /* Check if this ioctl requires specific capabilities */
+ if (spec->param_count > 0) {
+ for (i = 0; i < spec->param_count; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+
+ /* Check for capability requirements in constraints */
+ if (param->constraint_type == KAPI_CONSTRAINT_CUSTOM &&
+ param->constraints[0] && strstr(param->constraints, "CAP_")) {
+ /* Could add capability checks here if needed */
+ }
+ }
+ }
+
+ /* For ioctls with input/output structures, copy and validate */
+ if (spec->input_size > 0 || spec->output_size > 0) {
+ copy_size = max(spec->input_size, spec->output_size);
+
+ /* Allocate temporary buffer for validation */
+ data = kzalloc(copy_size, GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ /* Copy input data from user */
+ if (spec->input_size > 0) {
+ ret = copy_from_user(data, arg, spec->input_size);
+ if (ret) {
+ ret = -EFAULT;
+ goto out;
+ }
+ }
+
+ /* Validate structure fields */
+ ret = kapi_validate_ioctl_struct(spec, data, copy_size);
+ if (ret)
+ goto out;
+ }
+
+out:
+ kfree(data);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_ioctl);
+
+/**
+ * struct field_offset - Maps structure fields to their offsets
+ * @field_idx: Parameter index
+ * @offset: Offset in structure
+ * @size: Size of field
+ */
+struct field_offset {
+ int field_idx;
+ size_t offset;
+ size_t size;
+};
+
+/* Common ioctl structure layouts */
+static const struct field_offset fwctl_info_offsets[] = {
+ {0, 0, sizeof(u32)}, /* size */
+ {1, 4, sizeof(u32)}, /* flags */
+ {2, 8, sizeof(u32)}, /* out_device_type */
+ {3, 12, sizeof(u32)}, /* device_data_len */
+ {4, 16, sizeof(u64)}, /* out_device_data */
+};
+
+static const struct field_offset fwctl_rpc_offsets[] = {
+ {0, 0, sizeof(u32)}, /* size */
+ {1, 4, sizeof(u32)}, /* scope */
+ {2, 8, sizeof(u32)}, /* in_len */
+ {3, 12, sizeof(u32)}, /* out_len */
+ {4, 16, sizeof(u64)}, /* in */
+ {5, 24, sizeof(u64)}, /* out */
+};
+
+/**
+ * get_field_offsets - Get field offset information for an ioctl
+ * @cmd: IOCTL command
+ * @count: Returns number of fields
+ *
+ * Return: Array of field offsets or NULL
+ */
+static const struct field_offset *get_field_offsets(unsigned int cmd, int *count)
+{
+ switch (cmd) {
+ case FWCTL_INFO:
+ *count = ARRAY_SIZE(fwctl_info_offsets);
+ return fwctl_info_offsets;
+ case FWCTL_RPC:
+ *count = ARRAY_SIZE(fwctl_rpc_offsets);
+ return fwctl_rpc_offsets;
+ default:
+ *count = 0;
+ return NULL;
+ }
+}
+
+/**
+ * extract_field_value - Extract a field value from structure
+ * @data: Structure data
+ * @param: Parameter specification
+ * @offset_info: Field offset information
+ *
+ * Return: Field value or 0 on error
+ */
+static s64 extract_field_value(const void *data,
+ const struct kapi_param_spec *param,
+ const struct field_offset *offset_info)
+{
+ const void *field = data + offset_info->offset;
+
+ switch (param->type) {
+ case KAPI_TYPE_UINT:
+ if (offset_info->size == sizeof(u32))
+ return *(u32 *)field;
+ else if (offset_info->size == sizeof(u64))
+ return *(u64 *)field;
+ break;
+ case KAPI_TYPE_INT:
+ if (offset_info->size == sizeof(s32))
+ return *(s32 *)field;
+ else if (offset_info->size == sizeof(s64))
+ return *(s64 *)field;
+ break;
+ case KAPI_TYPE_USER_PTR:
+ /* User pointers are typically u64 in ioctl structures */
+ return (s64)(*(u64 *)field);
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+/**
+ * kapi_validate_ioctl_struct - Validate an ioctl structure against specification
+ * @spec: IOCTL specification
+ * @data: Structure data
+ * @size: Size of the structure
+ *
+ * Return: 0 if valid, negative errno if validation fails
+ */
+int kapi_validate_ioctl_struct(const struct kernel_api_spec *spec,
+ const void *data, size_t size)
+{
+ const struct field_offset *offsets;
+ int offset_count;
+ int i, j;
+
+ if (!spec || !data)
+ return -EINVAL;
+
+ /* Get field offset information for this ioctl */
+ offsets = get_field_offsets(spec->cmd, &offset_count);
+
+ /* Validate each parameter in the structure */
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+ const struct field_offset *offset_info = NULL;
+ s64 value;
+
+ /* Find offset information for this parameter */
+ if (offsets) {
+ for (j = 0; j < offset_count; j++) {
+ if (offsets[j].field_idx == i) {
+ offset_info = &offsets[j];
+ break;
+ }
+ }
+ }
+
+ if (!offset_info) {
+ pr_debug("kapi: no offset info for param %d\n", i);
+ continue;
+ }
+
+ /* Extract field value */
+ value = extract_field_value(data, param, offset_info);
+
+ /* Special handling for user pointers */
+ if (param->type == KAPI_TYPE_USER_PTR) {
+ /* Check if pointer looks valid (non-kernel address) */
+ if (value && (value >= TASK_SIZE)) {
+ pr_warn("ioctl %s: parameter %s has kernel pointer %llx\n",
+ spec->cmd_name, param->name, value);
+ return -EINVAL;
+ }
+
+ /* For size validation, check against size_param_idx */
+ if (param->size_param_idx >= 0 &&
+ param->size_param_idx < offset_count) {
+ const struct field_offset *size_offset = NULL;
+
+ for (j = 0; j < offset_count; j++) {
+ if (offsets[j].field_idx == param->size_param_idx) {
+ size_offset = &offsets[j];
+ break;
+ }
+ }
+
+ if (size_offset) {
+ s64 buf_size = extract_field_value(data,
+ &spec->params[param->size_param_idx],
+ size_offset);
+
+ /* Validate buffer size constraints */
+ if (buf_size > 0 &&
+ !kapi_validate_param(&spec->params[param->size_param_idx],
+ buf_size)) {
+ pr_warn("ioctl %s: buffer size %lld invalid for %s\n",
+ spec->cmd_name, buf_size, param->name);
+ return -EINVAL;
+ }
+ }
+ }
+ } else {
+ /* Validate using the standard parameter validation */
+ if (!kapi_validate_param(param, value)) {
+ pr_warn("ioctl %s: parameter %s validation failed (value=%lld)\n",
+ spec->cmd_name, param->name, value);
+ return -EINVAL;
+ }
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_ioctl_struct);
+
+/* Global registry of wrappers - in real implementation this would be per-module */
+static struct kapi_fops_wrapper *kapi_global_wrapper;
+
+/**
+ * kapi_register_wrapper - Register a wrapper (called from macro)
+ * @wrapper: Wrapper to register
+ */
+void kapi_register_wrapper(struct kapi_fops_wrapper *wrapper)
+{
+ /* Simple implementation - just store the last one */
+ kapi_global_wrapper = wrapper;
+}
+EXPORT_SYMBOL_GPL(kapi_register_wrapper);
+
+/**
+ * kapi_find_wrapper - Find wrapper for given file_operations
+ * @fops: File operations structure to check
+ *
+ * Return: Wrapper structure or NULL if not wrapped
+ */
+static struct kapi_fops_wrapper *kapi_find_wrapper(const struct file_operations *fops)
+{
+ /* Simple implementation - just return the global one if it matches */
+ if (kapi_global_wrapper && kapi_global_wrapper->wrapped_fops == fops)
+ return kapi_global_wrapper;
+ return NULL;
+}
+
+/**
+ * kapi_ioctl_validation_wrapper - Wrapper function for transparent validation
+ * @filp: File pointer
+ * @cmd: IOCTL command
+ * @arg: User argument
+ *
+ * This function is called instead of the real ioctl handler when validation
+ * is enabled. It performs pre-validation, calls the real handler, then does
+ * post-validation.
+ *
+ * Return: Result from the real ioctl handler or error
+ */
+long kapi_ioctl_validation_wrapper(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kapi_fops_wrapper *wrapper;
+ const struct kernel_api_spec *spec;
+ long ret;
+
+ wrapper = kapi_find_wrapper(filp->f_op);
+ if (!wrapper || !wrapper->real_ioctl)
+ return -EINVAL;
+
+ /* Pre-validation */
+ spec = kapi_get_ioctl_spec(cmd);
+ if (spec) {
+ ret = kapi_validate_ioctl(filp, cmd, (void __user *)arg);
+ if (ret)
+ return ret;
+ }
+
+ /* Call the real ioctl handler */
+ ret = wrapper->real_ioctl(filp, cmd, arg);
+
+ /* Post-validation - check return value against spec */
+ if (spec && spec->error_count > 0) {
+ /* Validate that returned error is in the spec */
+ if (ret < 0) {
+ int i;
+ bool found = false;
+ for (i = 0; i < spec->error_count; i++) {
+ if (ret == spec->errors[i].error_code) {
+ found = true;
+ break;
+ }
+ }
+ if (!found) {
+ pr_warn("IOCTL %s returned unexpected error %ld\n",
+ spec->cmd_name, ret);
+ }
+ }
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_ioctl_validation_wrapper);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
index 8827e9f96c111..7be653ac2333b 100644
--- a/kernel/api/kernel_api_spec.c
+++ b/kernel/api/kernel_api_spec.c
@@ -1119,4 +1119,91 @@ static int __init kapi_debugfs_init(void)
late_initcall(kapi_debugfs_init);
-#endif /* CONFIG_DEBUG_FS */
\ No newline at end of file
+#endif /* CONFIG_DEBUG_FS */
+
+/* IOCTL specification registry */
+#ifdef CONFIG_KAPI_SPEC
+
+
+static DEFINE_MUTEX(ioctl_spec_mutex);
+static LIST_HEAD(ioctl_specs);
+
+struct ioctl_spec_entry {
+ struct list_head list;
+ const struct kernel_api_spec *spec;
+};
+
+/**
+ * kapi_register_ioctl_spec - Register an IOCTL API specification
+ * @spec: IOCTL specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_ioctl_spec(const struct kernel_api_spec *spec)
+{
+ struct ioctl_spec_entry *entry;
+
+ if (!spec || spec->cmd_name[0] == '\0')
+ return -EINVAL;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->spec = spec;
+
+ mutex_lock(&ioctl_spec_mutex);
+ list_add_tail(&entry->list, &ioctl_specs);
+ mutex_unlock(&ioctl_spec_mutex);
+
+ pr_debug("Registered IOCTL spec: %s (0x%x)\n", spec->cmd_name, spec->cmd);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_register_ioctl_spec);
+
+/**
+ * kapi_unregister_ioctl_spec - Unregister an IOCTL API specification
+ * @cmd: IOCTL command number to unregister
+ */
+void kapi_unregister_ioctl_spec(unsigned int cmd)
+{
+ struct ioctl_spec_entry *entry, *tmp;
+
+ mutex_lock(&ioctl_spec_mutex);
+ list_for_each_entry_safe(entry, tmp, &ioctl_specs, list) {
+ if (entry->spec->cmd == cmd) {
+ list_del(&entry->list);
+ kfree(entry);
+ pr_debug("Unregistered IOCTL spec for cmd 0x%x\n", cmd);
+ break;
+ }
+ }
+ mutex_unlock(&ioctl_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_ioctl_spec);
+
+/**
+ * kapi_get_ioctl_spec - Retrieve IOCTL specification by command number
+ * @cmd: IOCTL command number
+ *
+ * Return: Pointer to the specification or NULL if not found
+ */
+const struct kernel_api_spec *kapi_get_ioctl_spec(unsigned int cmd)
+{
+ struct ioctl_spec_entry *entry;
+ const struct kernel_api_spec *spec = NULL;
+
+ mutex_lock(&ioctl_spec_mutex);
+ list_for_each_entry(entry, &ioctl_specs, list) {
+ if (entry->spec->cmd == cmd) {
+ spec = entry->spec;
+ break;
+ }
+ }
+ mutex_unlock(&ioctl_spec_mutex);
+
+ return spec;
+}
+EXPORT_SYMBOL_GPL(kapi_get_ioctl_spec);
+
+#endif /* CONFIG_KAPI_SPEC */
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 17/22] fwctl: add detailed IOCTL API specifications
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (15 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 18/22] binder: " Sasha Levin
` (5 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specifications to the fwctl driver using the IOCTL
specification framework. This provides detailed documentation and
enables runtime validation of the fwctl IOCTL interface.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/fwctl/main.c | 285 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 282 insertions(+), 3 deletions(-)
diff --git a/drivers/fwctl/main.c b/drivers/fwctl/main.c
index bc6378506296c..cc43b7270c9f8 100644
--- a/drivers/fwctl/main.c
+++ b/drivers/fwctl/main.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/sizes.h>
#include <linux/slab.h>
+#include <linux/kernel_api_spec.h>
#include <uapi/fwctl/fwctl.h>
@@ -261,12 +262,279 @@ static int fwctl_fops_release(struct inode *inode, struct file *filp)
return 0;
}
-static const struct file_operations fwctl_fops = {
+/* Use KAPI_DEFINE_FOPS for automatic validation wrapping */
+KAPI_DEFINE_FOPS(fwctl_fops,
.owner = THIS_MODULE,
.open = fwctl_fops_open,
.release = fwctl_fops_release,
.unlocked_ioctl = fwctl_fops_ioctl,
-};
+);
+
+/* IOCTL API Specifications */
+
+DEFINE_KAPI_IOCTL_SPEC(fwctl_info)
+ KAPI_IOCTL_CMD(FWCTL_INFO)
+ KAPI_IOCTL_CMD_NAME("FWCTL_INFO")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct fwctl_info))
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct fwctl_info))
+ KAPI_IOCTL_FILE_OPS_NAME("fwctl_fops")
+ KAPI_DESCRIPTION("Query device information and capabilities")
+ KAPI_LONG_DESC("Returns basic information about the fwctl instance, "
+ "including the device type and driver-specific data. "
+ "The driver-specific data format depends on the device type.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_IOCTL_PARAM_SIZE
+ KAPI_IOCTL_PARAM_FLAGS
+
+ KAPI_PARAM(2, "out_device_type", "__u32", "Device type from enum fwctl_device_type")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_ENUM)
+ KAPI_PARAM_ENUM_VALUES(((const s64[]){FWCTL_DEVICE_TYPE_ERROR,
+ FWCTL_DEVICE_TYPE_MLX5,
+ FWCTL_DEVICE_TYPE_CXL,
+ FWCTL_DEVICE_TYPE_PDS}))
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "device_data_len", "__u32", "Length of device data buffer")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, SZ_1M) /* Reasonable limit for device info */
+ KAPI_PARAM_END
+
+ KAPI_IOCTL_PARAM_USER_OUT_BUF(4, "out_device_data",
+ "Driver-specific device data", 3)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EFAULT, -EOPNOTSUPP, -ENODEV}))
+ KAPI_RETURN_ERROR_COUNT(3)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy data to/from user space",
+ "Check that provided pointers are valid user space addresses")
+ KAPI_ERROR(1, -EOPNOTSUPP, "EOPNOTSUPP", "Invalid flags provided",
+ "Currently flags must be 0")
+ KAPI_ERROR(2, -ENODEV, "ENODEV", "Device has been hot-unplugged",
+ "The underlying device is no longer available")
+
+ KAPI_ERROR_COUNT(3)
+ KAPI_PARAM_COUNT(5)
+ KAPI_SINCE_VERSION("6.13")
+
+ /* Structure specifications */
+ KAPI_STRUCT_SPEC(0, fwctl_info, "Device information query structure")
+ KAPI_STRUCT_SIZE(sizeof(struct fwctl_info), __alignof__(struct fwctl_info))
+ KAPI_STRUCT_FIELD_COUNT(4)
+
+ KAPI_STRUCT_FIELD(0, "size", KAPI_TYPE_UINT, "__u32",
+ "Structure size for versioning")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, size))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(1, "flags", KAPI_TYPE_UINT, "__u32",
+ "Must be 0, reserved for future use")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, flags))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_FIELD_CONSTRAINT_RANGE(0, 0)
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(2, "out_device_type", KAPI_TYPE_UINT, "__u32",
+ "Device type identifier")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, out_device_type))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(3, "device_data_len", KAPI_TYPE_UINT, "__u32",
+ "Length of device-specific data")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_info, device_data_len))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+ KAPI_STRUCT_SPEC_END
+
+ KAPI_STRUCT_SPEC_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_NONE,
+ "none",
+ "Read-only operation with no side effects")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(1)
+
+ /* State transitions */
+ KAPI_STATE_TRANS_COUNT(0) /* No state transitions for query operation */
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(fwctl_rpc)
+ KAPI_IOCTL_CMD(FWCTL_RPC)
+ KAPI_IOCTL_CMD_NAME("FWCTL_RPC")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct fwctl_rpc))
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct fwctl_rpc))
+ KAPI_IOCTL_FILE_OPS_NAME("fwctl_fops")
+ KAPI_DESCRIPTION("Execute a Remote Procedure Call to device firmware")
+ KAPI_LONG_DESC("Delivers an RPC to the device firmware and returns the response. "
+ "The RPC format is device-specific and determined by out_device_type "
+ "from FWCTL_INFO. Different scopes have different permission requirements.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_IOCTL_PARAM_SIZE
+
+ KAPI_PARAM(1, "scope", "__u32", "Access scope from enum fwctl_rpc_scope")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_ENUM)
+ KAPI_PARAM_ENUM_VALUES(((const s64[]){FWCTL_RPC_CONFIGURATION,
+ FWCTL_RPC_DEBUG_READ_ONLY,
+ FWCTL_RPC_DEBUG_WRITE,
+ FWCTL_RPC_DEBUG_WRITE_FULL}))
+ KAPI_PARAM_CONSTRAINT("FWCTL_RPC_DEBUG_WRITE_FULL requires CAP_SYS_RAWIO")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "in_len", "__u32", "Length of input buffer")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, MAX_RPC_LEN)
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "out_len", "__u32", "Length of output buffer")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, MAX_RPC_LEN)
+ KAPI_PARAM_END
+
+ KAPI_IOCTL_PARAM_USER_BUF(4, "in", "RPC request in device-specific format", 2)
+ KAPI_IOCTL_PARAM_USER_OUT_BUF(5, "out", "RPC response in device-specific format", 3)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EMSGSIZE, -EOPNOTSUPP, -EPERM,
+ -ENOMEM, -EFAULT, -ENODEV}))
+ KAPI_RETURN_ERROR_COUNT(6)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EMSGSIZE, "EMSGSIZE", "RPC message too large",
+ "in_len or out_len exceeds MAX_RPC_LEN (2MB)")
+ KAPI_ERROR(1, -EOPNOTSUPP, "EOPNOTSUPP", "Invalid scope value",
+ "scope must be one of the defined fwctl_rpc_scope values")
+ KAPI_ERROR(2, -EPERM, "EPERM", "Insufficient permissions",
+ "FWCTL_RPC_DEBUG_WRITE_FULL requires CAP_SYS_RAWIO")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Memory allocation failed",
+ "Unable to allocate buffers for RPC")
+ KAPI_ERROR(4, -EFAULT, "EFAULT", "Failed to copy data to/from user space",
+ "Check that provided pointers are valid user space addresses")
+ KAPI_ERROR(5, -ENODEV, "ENODEV", "Device has been hot-unplugged",
+ "The underlying device is no longer available")
+
+ KAPI_ERROR_COUNT(6)
+ KAPI_PARAM_COUNT(6)
+ KAPI_SINCE_VERSION("6.13")
+ KAPI_NOTES("FWCTL_RPC_DEBUG_WRITE and FWCTL_RPC_DEBUG_WRITE_FULL will "
+ "taint the kernel with TAINT_FWCTL on first use")
+
+ /* Structure specifications */
+ KAPI_STRUCT_SPEC(0, fwctl_rpc, "RPC request/response structure")
+ KAPI_STRUCT_SIZE(sizeof(struct fwctl_rpc), __alignof__(struct fwctl_rpc))
+ KAPI_STRUCT_FIELD_COUNT(6)
+
+ KAPI_STRUCT_FIELD(0, "size", KAPI_TYPE_UINT, "__u32",
+ "Structure size for versioning")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, size))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(1, "scope", KAPI_TYPE_UINT, "__u32",
+ "Access scope level")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, scope))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_FIELD_CONSTRAINT_RANGE(FWCTL_RPC_CONFIGURATION, FWCTL_RPC_DEBUG_WRITE_FULL)
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(2, "in_len", KAPI_TYPE_UINT, "__u32",
+ "Input data length")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, in_len))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(3, "out_len", KAPI_TYPE_UINT, "__u32",
+ "Output buffer length")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, out_len))
+ KAPI_FIELD_SIZE(sizeof(__u32))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(4, "in", KAPI_TYPE_PTR, "__aligned_u64",
+ "Pointer to input data")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, in))
+ KAPI_FIELD_SIZE(sizeof(__aligned_u64))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(5, "out", KAPI_TYPE_PTR, "__aligned_u64",
+ "Pointer to output buffer")
+ KAPI_FIELD_OFFSET(offsetof(struct fwctl_rpc, out))
+ KAPI_FIELD_SIZE(sizeof(__aligned_u64))
+ KAPI_STRUCT_FIELD_END
+ KAPI_STRUCT_SPEC_END
+
+ KAPI_STRUCT_SPEC_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_HARDWARE | KAPI_EFFECT_MODIFY_STATE,
+ "device firmware",
+ "May modify device configuration or firmware state")
+ KAPI_EFFECT_CONDITION("scope >= FWCTL_RPC_DEBUG_WRITE")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE,
+ "kernel taint",
+ "Taints kernel with TAINT_FWCTL on first debug write")
+ KAPI_EFFECT_CONDITION("scope >= FWCTL_RPC_DEBUG_WRITE && first use")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_SCHEDULE,
+ "process",
+ "May block while firmware processes the RPC")
+ KAPI_EFFECT_CONDITION("firmware operation takes time")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(3)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "device state",
+ "current configuration", "modified configuration",
+ "Device configuration changed by RPC command")
+ KAPI_STATE_TRANS_COND("RPC modifies device settings")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "kernel taint state",
+ "untainted", "TAINT_FWCTL set",
+ "Kernel marked as tainted due to firmware modification")
+ KAPI_STATE_TRANS_COND("First debug write operation")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(2)
+KAPI_END_SPEC;
+
+static int kapi_ioctl_specs_init(void)
+{
+ return 0;
+}
+
+static void kapi_ioctl_specs_exit(void)
+{
+}
static void fwctl_device_release(struct device *device)
{
@@ -325,7 +593,7 @@ struct fwctl_device *_fwctl_alloc_device(struct device *parent,
if (!fwctl)
return NULL;
- cdev_init(&fwctl->cdev, &fwctl_fops);
+ cdev_init(&fwctl->cdev, fwctl_fops);
/*
* The driver module is protected by fwctl_register/unregister(),
* unregister won't complete until we are done with the driver's module.
@@ -395,6 +663,9 @@ static int __init fwctl_init(void)
{
int ret;
+ /* Initialize the wrapped file_operations */
+ kapi_init_fops_fwctl_fops();
+
ret = alloc_chrdev_region(&fwctl_dev, 0, FWCTL_MAX_DEVICES, "fwctl");
if (ret)
return ret;
@@ -402,8 +673,15 @@ static int __init fwctl_init(void)
ret = class_register(&fwctl_class);
if (ret)
goto err_chrdev;
+
+ ret = kapi_ioctl_specs_init();
+ if (ret)
+ goto err_class;
+
return 0;
+err_class:
+ class_unregister(&fwctl_class);
err_chrdev:
unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
return ret;
@@ -411,6 +689,7 @@ static int __init fwctl_init(void)
static void __exit fwctl_exit(void)
{
+ kapi_ioctl_specs_exit();
class_unregister(&fwctl_class);
unregister_chrdev_region(fwctl_dev, FWCTL_MAX_DEVICES);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 18/22] binder: add detailed IOCTL API specifications
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (16 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 17/22] fwctl: add detailed IOCTL API specifications Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework Sasha Levin
` (4 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specifications to the binder driver using the IOCTL
specification framework. This provides detailed documentation and
enables runtime validation of all binder IOCTL interfaces.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/android/binder.c | 701 ++++++++++++++++++++++++++++++++
include/linux/kernel_api_spec.h | 3 +
2 files changed, 704 insertions(+)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index c463ca4a8fff8..8879263cdb061 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -67,6 +67,7 @@
#include <linux/task_work.h>
#include <linux/sizes.h>
#include <linux/ktime.h>
+#include <linux/kernel_api_spec.h>
#include <uapi/linux/android/binder.h>
@@ -6930,6 +6931,7 @@ static int transaction_log_show(struct seq_file *m, void *unused)
return 0;
}
+/* Define the actual binder_fops structure */
const struct file_operations binder_fops = {
.owner = THIS_MODULE,
.poll = binder_poll,
@@ -6941,6 +6943,695 @@ const struct file_operations binder_fops = {
.release = binder_release,
};
+/* Define wrapper for KAPI validation */
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+static struct file_operations __kapi_wrapped_binder_fops;
+static struct kapi_fops_wrapper __kapi_wrapper_binder_fops;
+
+static void kapi_init_fops_binder_fops(void)
+{
+ if (binder_fops.unlocked_ioctl) {
+ __kapi_wrapped_binder_fops = binder_fops;
+ __kapi_wrapper_binder_fops.real_fops = &binder_fops;
+ __kapi_wrapper_binder_fops.wrapped_fops = &__kapi_wrapped_binder_fops;
+ __kapi_wrapper_binder_fops.real_ioctl = binder_fops.unlocked_ioctl;
+ __kapi_wrapped_binder_fops.unlocked_ioctl = kapi_ioctl_validation_wrapper;
+ kapi_register_wrapper(&__kapi_wrapper_binder_fops);
+ }
+}
+#else
+static inline void kapi_init_fops_binder_fops(void) {}
+#endif
+
+/* IOCTL API Specifications for Binder */
+
+DEFINE_KAPI_IOCTL_SPEC(binder_write_read)
+ KAPI_IOCTL_CMD(BINDER_WRITE_READ)
+ KAPI_IOCTL_CMD_NAME("BINDER_WRITE_READ")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct binder_write_read))
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct binder_write_read))
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Perform read/write operations on binder")
+ KAPI_LONG_DESC("Main workhorse of binder IPC. Allows writing commands to "
+ "binder driver and reading responses. Commands are encoded "
+ "in a special protocol format. Both read and write operations "
+ "can be performed in a single ioctl call.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "write_size", "binder_size_t", "Bytes to write")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, SZ_4M) /* Reasonable limit for IPC */
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "write_consumed", "binder_size_t", "Bytes consumed by driver")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, SZ_4M)
+ KAPI_PARAM_END
+
+ KAPI_IOCTL_PARAM_USER_BUF(2, "write_buffer", "User buffer with commands", 0)
+
+ KAPI_PARAM(3, "read_size", "binder_size_t", "Bytes to read")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, SZ_4M)
+ KAPI_PARAM_END
+
+ KAPI_PARAM(4, "read_consumed", "binder_size_t", "Bytes consumed by driver")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, SZ_4M)
+ KAPI_PARAM_END
+
+ KAPI_IOCTL_PARAM_USER_OUT_BUF(5, "read_buffer", "User buffer for responses", 3)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EFAULT, -EINVAL, -EAGAIN, -EINTR,
+ -ENOMEM, -ECONNREFUSED}))
+ KAPI_RETURN_ERROR_COUNT(6)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy data to/from user space",
+ "Check buffer pointers are valid user space addresses")
+ KAPI_ERROR(1, -EINVAL, "EINVAL", "Invalid parameters",
+ "Buffer sizes or commands are invalid")
+ KAPI_ERROR(2, -EAGAIN, "EAGAIN", "Try again",
+ "Non-blocking read with no data available")
+ KAPI_ERROR(3, -EINTR, "EINTR", "Interrupted by signal",
+ "Operation interrupted, should be retried")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate memory for operation")
+ KAPI_ERROR(5, -ECONNREFUSED, "ECONNREFUSED", "Connection refused",
+ "Process is being destroyed, no further operations allowed")
+
+ KAPI_ERROR_COUNT(6)
+ KAPI_PARAM_COUNT(6)
+ KAPI_SINCE_VERSION("3.0")
+ KAPI_NOTES("This is the primary interface for binder IPC. Most other "
+ "ioctls are for configuration and management.")
+
+ /* Structure specifications */
+ KAPI_STRUCT_SPEC(0, binder_write_read, "Read/write operation structure")
+ KAPI_STRUCT_SIZE(sizeof(struct binder_write_read), __alignof__(struct binder_write_read))
+ KAPI_STRUCT_FIELD_COUNT(6)
+
+ KAPI_STRUCT_FIELD(0, "write_size", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes to write")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, write_size))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(1, "write_consumed", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes consumed by driver")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, write_consumed))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(2, "write_buffer", KAPI_TYPE_PTR, "binder_uintptr_t",
+ "Pointer to write buffer")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, write_buffer))
+ KAPI_FIELD_SIZE(sizeof(binder_uintptr_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(3, "read_size", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes to read")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, read_size))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(4, "read_consumed", KAPI_TYPE_UINT, "binder_size_t",
+ "Number of bytes consumed by driver")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, read_consumed))
+ KAPI_FIELD_SIZE(sizeof(binder_size_t))
+ KAPI_STRUCT_FIELD_END
+
+ KAPI_STRUCT_FIELD(5, "read_buffer", KAPI_TYPE_PTR, "binder_uintptr_t",
+ "Pointer to read buffer")
+ KAPI_FIELD_OFFSET(offsetof(struct binder_write_read, read_buffer))
+ KAPI_FIELD_SIZE(sizeof(binder_uintptr_t))
+ KAPI_STRUCT_FIELD_END
+ KAPI_STRUCT_SPEC_END
+
+ KAPI_STRUCT_SPEC_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_NETWORK,
+ "binder transaction queue",
+ "Enqueues transactions or commands to target process")
+ KAPI_EFFECT_CONDITION("write_size > 0")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_SCHEDULE,
+ "process state",
+ "May block waiting for incoming transactions")
+ KAPI_EFFECT_CONDITION("read_size > 0 && no data available")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_RESOURCE_CREATE,
+ "binder nodes/refs",
+ "May create or destroy binder nodes and references")
+ KAPI_EFFECT_CONDITION("specific commands")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_SIGNAL_SEND,
+ "target process",
+ "May trigger death notifications to linked processes")
+ KAPI_EFFECT_CONDITION("death notification")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(4)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "transaction",
+ "pending in sender", "queued in target",
+ "Transaction moves from sender to target's queue")
+ KAPI_STATE_TRANS_COND("BC_TRANSACTION command")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "thread state",
+ "running", "waiting for work",
+ "Thread blocks waiting for incoming transactions")
+ KAPI_STATE_TRANS_COND("read with no work available")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "binder ref",
+ "active", "released",
+ "Reference count decremented, may trigger cleanup")
+ KAPI_STATE_TRANS_COND("BC_RELEASE command")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(3)
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_set_max_threads)
+ KAPI_IOCTL_CMD(BINDER_SET_MAX_THREADS)
+ KAPI_IOCTL_CMD_NAME("BINDER_SET_MAX_THREADS")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(__u32))
+ KAPI_IOCTL_OUTPUT_SIZE(0)
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Set maximum number of binder threads")
+ KAPI_LONG_DESC("Sets the maximum number of threads that the binder driver "
+ "will request this process to spawn for handling incoming "
+ "transactions. The driver sends BR_SPAWN_LOOPER when it needs "
+ "more threads.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "max_threads", "__u32", "Maximum number of threads")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, INT_MAX)
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EINVAL, -EFAULT}))
+ KAPI_RETURN_ERROR_COUNT(2)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid thread count",
+ "Thread count exceeds system limits")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+
+ KAPI_ERROR_COUNT(2)
+ KAPI_PARAM_COUNT(1)
+ KAPI_SINCE_VERSION("3.0")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_set_context_mgr)
+ KAPI_IOCTL_CMD(BINDER_SET_CONTEXT_MGR)
+ KAPI_IOCTL_CMD_NAME("BINDER_SET_CONTEXT_MGR")
+ KAPI_IOCTL_INPUT_SIZE(0)
+ KAPI_IOCTL_OUTPUT_SIZE(0)
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Become the context manager (handle 0)")
+ KAPI_LONG_DESC("Registers the calling process as the context manager for "
+ "this binder domain. The context manager has special handle 0 "
+ "and typically implements the service manager. Only one process "
+ "per binder domain can be the context manager.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EBUSY, -EPERM, -ENOMEM}))
+ KAPI_RETURN_ERROR_COUNT(3)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EBUSY, "EBUSY", "Context manager already set",
+ "Another process is already the context manager")
+ KAPI_ERROR(1, -EPERM, "EPERM", "Permission denied",
+ "Caller lacks permission or wrong UID")
+ KAPI_ERROR(2, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate context manager node")
+
+ KAPI_ERROR_COUNT(3)
+ KAPI_PARAM_COUNT(0)
+ KAPI_SINCE_VERSION("3.0")
+ KAPI_NOTES("Requires CAP_SYS_NICE or proper SELinux permissions")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_set_context_mgr_ext)
+ KAPI_IOCTL_CMD(BINDER_SET_CONTEXT_MGR_EXT)
+ KAPI_IOCTL_CMD_NAME("BINDER_SET_CONTEXT_MGR_EXT")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct flat_binder_object))
+ KAPI_IOCTL_OUTPUT_SIZE(0)
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Become context manager with extended info")
+ KAPI_LONG_DESC("Extended version of BINDER_SET_CONTEXT_MGR that allows "
+ "specifying additional properties of the context manager "
+ "through a flat_binder_object structure.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "object", "struct flat_binder_object", "Context manager properties")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_STRUCT,
+ .size = sizeof(struct flat_binder_object),
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EINVAL, -EFAULT, -EBUSY, -EPERM, -ENOMEM}))
+ KAPI_RETURN_ERROR_COUNT(5)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid parameters",
+ "Invalid flat_binder_object structure")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+ KAPI_ERROR(2, -EBUSY, "EBUSY", "Context manager already set",
+ "Another process is already the context manager")
+ KAPI_ERROR(3, -EPERM, "EPERM", "Permission denied",
+ "Caller lacks permission or wrong UID")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate context manager node")
+
+ KAPI_ERROR_COUNT(5)
+ KAPI_PARAM_COUNT(1)
+ KAPI_SINCE_VERSION("4.14")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_thread_exit)
+ KAPI_IOCTL_CMD(BINDER_THREAD_EXIT)
+ KAPI_IOCTL_CMD_NAME("BINDER_THREAD_EXIT")
+ KAPI_IOCTL_INPUT_SIZE(0)
+ KAPI_IOCTL_OUTPUT_SIZE(0)
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Exit binder thread")
+ KAPI_LONG_DESC("Notifies the binder driver that this thread is exiting. "
+ "The driver will clean up any pending transactions and "
+ "remove the thread from the thread pool.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES((const s64[]){})
+ KAPI_RETURN_ERROR_COUNT(0)
+ KAPI_RETURN_END
+
+ KAPI_ERROR_COUNT(0)
+ KAPI_PARAM_COUNT(0)
+ KAPI_SINCE_VERSION("3.0")
+ KAPI_NOTES("Should be called before thread termination to ensure clean shutdown")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_version)
+ KAPI_IOCTL_CMD(BINDER_VERSION)
+ KAPI_IOCTL_CMD_NAME("BINDER_VERSION")
+ KAPI_IOCTL_INPUT_SIZE(0)
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct binder_version))
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Get binder protocol version")
+ KAPI_LONG_DESC("Returns the current binder protocol version supported "
+ "by the driver. Used for compatibility checking.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "protocol_version", "__s32", "Binder protocol version")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_ENUM)
+ .enum_values = (const s64[]){BINDER_CURRENT_PROTOCOL_VERSION},
+ .enum_count = 1,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EINVAL, -EFAULT}))
+ KAPI_RETURN_ERROR_COUNT(2)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid version structure",
+ "Invalid user pointer for version structure")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy to user",
+ "Unable to write version to user space")
+
+ KAPI_ERROR_COUNT(2)
+ KAPI_PARAM_COUNT(1)
+ KAPI_SINCE_VERSION("3.0")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_get_node_info_for_ref)
+ KAPI_IOCTL_CMD(BINDER_GET_NODE_INFO_FOR_REF)
+ KAPI_IOCTL_CMD_NAME("BINDER_GET_NODE_INFO_FOR_REF")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct binder_node_info_for_ref))
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct binder_node_info_for_ref))
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Get node information for a reference")
+ KAPI_LONG_DESC("Retrieves information about a binder node given its handle. "
+ "Returns the current strong and weak reference counts.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "handle", "__u32", "Binder handle")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "strong_count", "__u32", "Strong reference count")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "weak_count", "__u32", "Weak reference count")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EINVAL, -EFAULT, -ENOENT}))
+ KAPI_RETURN_ERROR_COUNT(3)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid parameters",
+ "Reserved fields must be zero")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy data",
+ "Invalid user pointer provided")
+ KAPI_ERROR(2, -ENOENT, "ENOENT", "Handle not found",
+ "No node exists for the given handle")
+
+ KAPI_ERROR_COUNT(3)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("4.14")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_get_node_debug_info)
+ KAPI_IOCTL_CMD(BINDER_GET_NODE_DEBUG_INFO)
+ KAPI_IOCTL_CMD_NAME("BINDER_GET_NODE_DEBUG_INFO")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct binder_node_debug_info))
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct binder_node_debug_info))
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Get debug info for binder nodes")
+ KAPI_LONG_DESC("Iterates through all binder nodes in the process. "
+ "Start with ptr=NULL to get first node, then use "
+ "returned ptr for next call. Returns ptr=0 when done.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "ptr", "binder_uintptr_t", "Node pointer (NULL for first)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_INOUT)
+ .type = KAPI_TYPE_PTR,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "cookie", "binder_uintptr_t", "Node cookie value")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "has_strong_ref", "__u32", "Has strong references")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(3, "has_weak_ref", "__u32", "Has weak references")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EFAULT, -EINVAL}))
+ KAPI_RETURN_ERROR_COUNT(2)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy data",
+ "Invalid user pointer provided")
+ KAPI_ERROR(1, -EINVAL, "EINVAL", "Invalid node pointer",
+ "Provided ptr is not a valid node")
+
+ KAPI_ERROR_COUNT(2)
+ KAPI_PARAM_COUNT(4)
+ KAPI_SINCE_VERSION("4.14")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_freeze)
+ KAPI_IOCTL_CMD(BINDER_FREEZE)
+ KAPI_IOCTL_CMD_NAME("BINDER_FREEZE")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct binder_freeze_info))
+ KAPI_IOCTL_OUTPUT_SIZE(0)
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Freeze or unfreeze a binder process")
+ KAPI_LONG_DESC("Controls whether a process can receive binder transactions. "
+ "When frozen, new transactions are blocked. Can wait for "
+ "existing transactions to complete with timeout.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "pid", "__u32", "Process ID to freeze/unfreeze")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 1,
+ .max_value = PID_MAX_LIMIT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "enable", "__u32", "1 to freeze, 0 to unfreeze")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "timeout_ms", "__u32", "Timeout in milliseconds (0 = no wait)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 0,
+ .max_value = 60000, /* 1 minute max */
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EINVAL, -EAGAIN, -EFAULT, -ENOMEM}))
+ KAPI_RETURN_ERROR_COUNT(4)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Invalid process",
+ "Process not found or invalid parameters")
+ KAPI_ERROR(1, -EAGAIN, "EAGAIN", "Timeout waiting for transactions",
+ "Existing transactions did not complete within timeout")
+ KAPI_ERROR(2, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+ KAPI_ERROR(3, -ENOMEM, "ENOMEM", "Out of memory",
+ "Unable to allocate memory for freeze operation")
+
+ KAPI_ERROR_COUNT(4)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("5.9")
+ KAPI_NOTES("Requires appropriate permissions to freeze other processes")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_get_frozen_info)
+ KAPI_IOCTL_CMD(BINDER_GET_FROZEN_INFO)
+ KAPI_IOCTL_CMD_NAME("BINDER_GET_FROZEN_INFO")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(struct binder_frozen_status_info))
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct binder_frozen_status_info))
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Get frozen status of a process")
+ KAPI_LONG_DESC("Queries whether a process is frozen and if it has "
+ "received transactions while frozen. Useful for "
+ "debugging frozen process issues.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "pid", "__u32", "Process ID to query")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 1,
+ .max_value = PID_MAX_LIMIT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "sync_recv", "__u32", "Sync transactions received while frozen")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT("Bit 0: received after frozen, Bit 1: pending during freeze")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "async_recv", "__u32", "Async transactions received while frozen")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EINVAL, -EFAULT}))
+ KAPI_RETURN_ERROR_COUNT(2)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EINVAL, "EINVAL", "Process not found",
+ "No binder process found with given PID")
+ KAPI_ERROR(1, -EFAULT, "EFAULT", "Failed to copy data",
+ "Invalid user pointer provided")
+
+ KAPI_ERROR_COUNT(2)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("5.9")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_enable_oneway_spam_detection)
+ KAPI_IOCTL_CMD(BINDER_ENABLE_ONEWAY_SPAM_DETECTION)
+ KAPI_IOCTL_CMD_NAME("BINDER_ENABLE_ONEWAY_SPAM_DETECTION")
+ KAPI_IOCTL_INPUT_SIZE(sizeof(__u32))
+ KAPI_IOCTL_OUTPUT_SIZE(0)
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Enable/disable oneway spam detection")
+ KAPI_LONG_DESC("Controls whether the driver monitors for excessive "
+ "oneway transactions that might indicate spam or abuse. "
+ "When enabled, BR_ONEWAY_SPAM_SUSPECT is sent when threshold exceeded.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "enable", "__u32", "1 to enable, 0 to disable")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = 0,
+ .max_value = 1,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EFAULT}))
+ KAPI_RETURN_ERROR_COUNT(1)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy from user",
+ "Invalid user pointer provided")
+
+ KAPI_ERROR_COUNT(1)
+ KAPI_PARAM_COUNT(1)
+ KAPI_SINCE_VERSION("5.13")
+KAPI_END_SPEC;
+
+DEFINE_KAPI_IOCTL_SPEC(binder_get_extended_error)
+ KAPI_IOCTL_CMD(BINDER_GET_EXTENDED_ERROR)
+ KAPI_IOCTL_CMD_NAME("BINDER_GET_EXTENDED_ERROR")
+ KAPI_IOCTL_INPUT_SIZE(0)
+ KAPI_IOCTL_OUTPUT_SIZE(sizeof(struct binder_extended_error))
+ KAPI_IOCTL_FILE_OPS_NAME("binder_fops")
+ KAPI_DESCRIPTION("Get extended error information")
+ KAPI_LONG_DESC("Retrieves detailed error information from the last "
+ "failed binder operation on this thread. Clears the "
+ "error after reading.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ /* Parameters */
+ KAPI_PARAM(0, "id", "__u32", "Error identifier")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "command", "__u32", "Binder command that failed")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ .type = KAPI_TYPE_UINT,
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "param", "__s32", "Error parameter (negative errno)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_OUT)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ .min_value = -MAX_ERRNO,
+ .max_value = 0,
+ KAPI_PARAM_END
+
+ /* Return value */
+ KAPI_RETURN("int", "0 on success, negative errno on failure")
+ KAPI_RETURN_TYPE(KAPI_TYPE_INT)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_ERROR_VALUES(((const s64[]){-EFAULT}))
+ KAPI_RETURN_ERROR_COUNT(1)
+ KAPI_RETURN_END
+
+ /* Errors */
+ KAPI_ERROR(0, -EFAULT, "EFAULT", "Failed to copy to user",
+ "Invalid user pointer provided")
+
+ KAPI_ERROR_COUNT(1)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("5.16")
+ KAPI_NOTES("Error is cleared after reading, subsequent calls return BR_OK")
+KAPI_END_SPEC;
+
+static int kapi_ioctl_specs_init(void)
+{
+ return 0;
+}
+
+static void kapi_ioctl_specs_exit(void)
+{
+}
+
DEFINE_SHOW_ATTRIBUTE(state);
DEFINE_SHOW_ATTRIBUTE(state_hashed);
DEFINE_SHOW_ATTRIBUTE(stats);
@@ -7050,6 +7741,13 @@ static int __init binder_init(void)
if (ret)
return ret;
+ /* Initialize the wrapped file_operations */
+ kapi_init_fops_binder_fops();
+
+ ret = kapi_ioctl_specs_init();
+ if (ret)
+ goto err_kapi_init;
+
atomic_set(&binder_transaction_log.cur, ~0U);
atomic_set(&binder_transaction_log_failed.cur, ~0U);
@@ -7102,6 +7800,9 @@ static int __init binder_init(void)
err_alloc_device_names_failed:
debugfs_remove_recursive(binder_debugfs_dir_entry_root);
+ kapi_ioctl_specs_exit();
+
+err_kapi_init:
binder_alloc_shrinker_exit();
return ret;
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
index 4be9636b19158..ee7371909d0e4 100644
--- a/include/linux/kernel_api_spec.h
+++ b/include/linux/kernel_api_spec.h
@@ -863,6 +863,9 @@ struct kernel_api_spec {
.enum_values = values, \
.enum_count = ARRAY_SIZE(values),
+#define KAPI_PARAM_SIZE_PARAM_IDX(idx) \
+ .size_param_idx = idx,
+
#define KAPI_PARAM_END },
/**
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (17 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 18/22] binder: " Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 20/22] block: sysfs API specifications Sasha Levin
` (3 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Extend the kernel API specification infrastructure to support sysfs attributes,
enabling runtime validation and comprehensive documentation of sysfs interfaces.
This patch integrates sysfs support into the existing KAPI framework,
maintaining consistency across different kernel API types. The
implementation adds new parameter types (STRING, BOOL, HEX, BINARY,
BITMAP) specifically for sysfs attributes, along with sysfs-specific
fields in the kapi_param_spec structure including path, permissions,
default values, units, and allowed string values.
Runtime validation functions have been added to check sysfs read/write
operations, validate parameter types and ranges, and enforce permission
constraints.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/kernel_api_spec.h | 123 +++++++++++-
kernel/api/kernel_api_spec.c | 336 +++++++++++++++++++++++++++++++-
2 files changed, 451 insertions(+), 8 deletions(-)
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
index ee7371909d0e4..fd8ff6ec85d99 100644
--- a/include/linux/kernel_api_spec.h
+++ b/include/linux/kernel_api_spec.h
@@ -43,6 +43,11 @@ struct sigaction;
* @KAPI_TYPE_FD: File descriptor - validated in process context
* @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size
* @KAPI_TYPE_PATH: Pathname - validated for access and path limits
+ * @KAPI_TYPE_STRING: String type - for sysfs and other string attributes
+ * @KAPI_TYPE_BOOL: Boolean type - for sysfs and other boolean attributes
+ * @KAPI_TYPE_HEX: Hexadecimal type - for sysfs hex values
+ * @KAPI_TYPE_BINARY: Binary data type - for sysfs binary attributes
+ * @KAPI_TYPE_BITMAP: Bitmap type - for sysfs bitmap attributes
* @KAPI_TYPE_CUSTOM: Custom/complex types
*/
enum kapi_param_type {
@@ -58,6 +63,11 @@ enum kapi_param_type {
KAPI_TYPE_FD, /* File descriptor - validated in process context */
KAPI_TYPE_USER_PTR, /* User space pointer - validated for access and size */
KAPI_TYPE_PATH, /* Pathname - validated for access and path limits */
+ KAPI_TYPE_STRING, /* String type - for sysfs and other string attributes */
+ KAPI_TYPE_BOOL, /* Boolean type - for sysfs and other boolean attributes */
+ KAPI_TYPE_HEX, /* Hexadecimal type - for sysfs hex values */
+ KAPI_TYPE_BINARY, /* Binary data type - for sysfs binary attributes */
+ KAPI_TYPE_BITMAP, /* Bitmap type - for sysfs bitmap attributes */
KAPI_TYPE_CUSTOM,
};
@@ -72,6 +82,10 @@ enum kapi_param_type {
* @KAPI_PARAM_USER: User space pointer
* @KAPI_PARAM_DMA: DMA-capable memory required
* @KAPI_PARAM_ALIGNED: Alignment requirements
+ * @KAPI_PARAM_SYSFS_READONLY: Sysfs read-only attribute
+ * @KAPI_PARAM_SYSFS_WRITEONLY: Sysfs write-only attribute
+ * @KAPI_PARAM_SYSFS_RW: Sysfs read-write attribute
+ * @KAPI_PARAM_SYSFS_BINARY: Sysfs binary attribute
*/
enum kapi_param_flags {
KAPI_PARAM_IN = (1 << 0),
@@ -83,6 +97,10 @@ enum kapi_param_flags {
KAPI_PARAM_USER = (1 << 6),
KAPI_PARAM_DMA = (1 << 7),
KAPI_PARAM_ALIGNED = (1 << 8),
+ KAPI_PARAM_SYSFS_READONLY = (1 << 9),
+ KAPI_PARAM_SYSFS_WRITEONLY = (1 << 10),
+ KAPI_PARAM_SYSFS_RW = (1 << 11),
+ KAPI_PARAM_SYSFS_BINARY = (1 << 12),
};
/**
@@ -164,6 +182,13 @@ enum kapi_constraint_type {
* @constraints: Additional constraints description
* @size_param_idx: Index of parameter that determines size (-1 if fixed size)
* @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct))
+ * @sysfs_path: Path in sysfs (for sysfs attributes)
+ * @sysfs_permissions: Sysfs file permissions (e.g., 0644)
+ * @default_value: Default value as string (for sysfs)
+ * @units: Units of measurement (e.g., "ms", "bytes")
+ * @step: Step value for numeric types
+ * @allowed_strings: Array of allowed string values
+ * @allowed_string_count: Number of allowed string values
*/
struct kapi_param_spec {
char name[KAPI_MAX_NAME_LEN];
@@ -183,6 +208,14 @@ struct kapi_param_spec {
char constraints[KAPI_MAX_DESC_LEN];
int size_param_idx; /* Index of param that determines size, -1 if N/A */
size_t size_multiplier; /* Size per unit (e.g., sizeof(struct epoll_event)) */
+ /* Sysfs-specific fields */
+ char sysfs_path[KAPI_MAX_NAME_LEN];
+ umode_t sysfs_permissions;
+ char default_value[KAPI_MAX_NAME_LEN];
+ char units[32];
+ s64 step;
+ const char **allowed_strings;
+ u32 allowed_string_count;
} __attribute__((packed));
/**
@@ -667,9 +700,22 @@ struct kapi_addr_family_spec {
} __attribute__((packed));
#endif /* CONFIG_NET */
+/**
+ * enum kapi_api_type - Type of kernel API
+ * @KAPI_API_FUNCTION: Function/syscall API
+ * @KAPI_API_IOCTL: IOCTL API
+ * @KAPI_API_SYSFS: Sysfs attribute API
+ */
+enum kapi_api_type {
+ KAPI_API_FUNCTION = 0,
+ KAPI_API_IOCTL,
+ KAPI_API_SYSFS,
+};
+
/**
* struct kernel_api_spec - Complete kernel API specification
- * @name: Function name
+ * @name: Function/attribute name
+ * @api_type: Type of API (function, ioctl, sysfs)
* @version: API version
* @description: Brief description
* @long_description: Detailed description
@@ -698,9 +744,12 @@ struct kapi_addr_family_spec {
* @side_effects: Side effect specifications
* @state_trans_count: Number of state transition specifications
* @state_transitions: State transition specifications
+ * @subsystem: Subsystem name (for sysfs)
+ * @device_type: Device type (for sysfs)
*/
struct kernel_api_spec {
char name[KAPI_MAX_NAME_LEN];
+ enum kapi_api_type api_type;
u32 version;
char description[KAPI_MAX_DESC_LEN];
char long_description[KAPI_MAX_DESC_LEN * 4];
@@ -786,6 +835,10 @@ struct kernel_api_spec {
size_t input_size; /* Size of input structure (0 if none) */
size_t output_size; /* Size of output structure (0 if none) */
char file_ops_name[KAPI_MAX_NAME_LEN]; /* Name of the file_operations structure */
+
+ /* Sysfs-specific fields */
+ char subsystem[KAPI_MAX_NAME_LEN];
+ char device_type[KAPI_MAX_NAME_LEN];
} __attribute__((packed));
/* Macros for defining API specifications */
@@ -1208,6 +1261,47 @@ struct kernel_api_spec {
#define KAPI_EFFECTS_RESOURCES (KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_RESOURCE_DESTROY)
#define KAPI_EFFECTS_IO (KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM)
+/* Sysfs-specific macros */
+
+/**
+ * DEFINE_SYSFS_API_SPEC - Define a sysfs attribute API specification
+ * @attr_name: Sysfs attribute name
+ */
+#define DEFINE_SYSFS_API_SPEC(attr_name) \
+ static struct kernel_api_spec __kapi_sysfs_spec_##attr_name \
+ __used __section(".kapi_specs") = { \
+ .name = __stringify(attr_name), \
+ .api_type = KAPI_API_SYSFS, \
+ .version = 1,
+
+/**
+ * For sysfs attributes, use KAPI_PARAM with sysfs-specific fields
+ */
+#define KAPI_PATH(path) \
+ .sysfs_path = path,
+
+#define KAPI_PERMISSIONS(perms) \
+ .sysfs_permissions = perms,
+
+#define KAPI_DEFAULT(defval) \
+ .default_value = defval,
+
+#define KAPI_UNITS(unit) \
+ .units = unit,
+
+#define KAPI_STEP(s) \
+ .step = s,
+
+#define KAPI_ALLOWED_STRINGS(strings, count) \
+ .allowed_strings = strings, \
+ .allowed_string_count = count,
+
+#define KAPI_SUBSYSTEM(subsys) \
+ .subsystem = subsys,
+
+#define KAPI_DEVICE_TYPE(dtype) \
+ .device_type = dtype,
+
/* Helper macros for common patterns */
#define KAPI_PARAM_IN (KAPI_PARAM_IN)
@@ -1329,6 +1423,13 @@ bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
struct sigaction *act);
int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum);
bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum);
+
+/* Sysfs validation functions */
+int kapi_validate_sysfs_write(const char *attr_name, const char *buf, size_t count);
+int kapi_validate_sysfs_read(const char *attr_name);
+int kapi_validate_sysfs_permission(const char *attr_name, umode_t mode);
+bool kapi_validate_sysfs_string(const struct kapi_param_spec *param, const char *buf, size_t count);
+bool kapi_validate_sysfs_number(const struct kapi_param_spec *param, const char *buf);
#else
static inline bool kapi_validate_params(const struct kernel_api_spec *spec, ...)
{
@@ -1384,6 +1485,26 @@ static inline bool kapi_is_signal_restartable(const struct kernel_api_spec *spec
{
return false;
}
+static inline int kapi_validate_sysfs_write(const char *attr_name, const char *buf, size_t count)
+{
+ return 0;
+}
+static inline int kapi_validate_sysfs_read(const char *attr_name)
+{
+ return 0;
+}
+static inline int kapi_validate_sysfs_permission(const char *attr_name, umode_t mode)
+{
+ return 0;
+}
+static inline bool kapi_validate_sysfs_string(const struct kapi_param_spec *param, const char *buf, size_t count)
+{
+ return true;
+}
+static inline bool kapi_validate_sysfs_number(const struct kapi_param_spec *param, const char *buf)
+{
+ return true;
+}
#endif
/* Export/query functions */
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
index 7be653ac2333b..9b4d3e4fa9f5f 100644
--- a/kernel/api/kernel_api_spec.c
+++ b/kernel/api/kernel_api_spec.c
@@ -139,6 +139,11 @@ static const char *param_type_to_string(enum kapi_param_type type)
[KAPI_TYPE_FD] = "file_descriptor",
[KAPI_TYPE_USER_PTR] = "user_pointer",
[KAPI_TYPE_PATH] = "pathname",
+ [KAPI_TYPE_STRING] = "string",
+ [KAPI_TYPE_BOOL] = "bool",
+ [KAPI_TYPE_HEX] = "hex",
+ [KAPI_TYPE_BINARY] = "binary",
+ [KAPI_TYPE_BITMAP] = "bitmap",
[KAPI_TYPE_CUSTOM] = "custom",
};
@@ -238,11 +243,15 @@ int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
ret = scnprintf(buf, size,
"{\n"
" \"name\": \"%s\",\n"
+ " \"api_type\": \"%s\",\n"
" \"version\": %u,\n"
" \"description\": \"%s\",\n"
" \"long_description\": \"%s\",\n"
" \"context_flags\": \"0x%x\",\n",
spec->name,
+ spec->api_type == KAPI_API_FUNCTION ? "function" :
+ spec->api_type == KAPI_API_IOCTL ? "ioctl" :
+ spec->api_type == KAPI_API_SYSFS ? "sysfs" : "unknown",
spec->version,
spec->description,
spec->long_description,
@@ -261,13 +270,38 @@ int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
" \"type\": \"%s\",\n"
" \"type_class\": \"%s\",\n"
" \"flags\": \"0x%x\",\n"
- " \"description\": \"%s\"\n"
- " }%s\n",
+ " \"description\": \"%s\"",
param->name,
param->type_name,
param_type_to_string(param->type),
param->flags,
- param->description,
+ param->description);
+
+ /* Add sysfs-specific fields if this is a sysfs API */
+ if (spec->api_type == KAPI_API_SYSFS) {
+ if (param->sysfs_path[0])
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"sysfs_path\": \"%s\"", param->sysfs_path);
+ if (param->sysfs_permissions)
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"permissions\": \"0%o\"", param->sysfs_permissions);
+ if (param->default_value[0])
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"default_value\": \"%s\"", param->default_value);
+ if (param->units[0])
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"units\": \"%s\"", param->units);
+ if (param->step)
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"step\": %lld", param->step);
+ if (param->min_value != 0 || param->max_value != 0)
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"range\": [%lld, %lld]",
+ param->min_value, param->max_value);
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ "\n }%s\n",
(i < spec->param_count - 1) ? "," : "");
}
@@ -409,13 +443,25 @@ int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
ret += scnprintf(buf + ret, size - ret,
" \"since_version\": \"%s\",\n"
" \"deprecated\": %s,\n"
- " \"replacement\": \"%s\",\n"
+ " \"replacement\": \"%s\",\n",
+ spec->since_version,
+ spec->deprecated ? "true" : "false",
+ spec->replacement);
+
+ /* Sysfs-specific fields */
+ if (spec->api_type == KAPI_API_SYSFS) {
+ if (spec->subsystem[0])
+ ret += scnprintf(buf + ret, size - ret,
+ " \"subsystem\": \"%s\",\n", spec->subsystem);
+ if (spec->device_type[0])
+ ret += scnprintf(buf + ret, size - ret,
+ " \"device_type\": \"%s\",\n", spec->device_type);
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
" \"examples\": \"%s\",\n"
" \"notes\": \"%s\"\n"
"}\n",
- spec->since_version,
- spec->deprecated ? "true" : "false",
- spec->replacement,
spec->examples,
spec->notes);
@@ -492,6 +538,282 @@ EXPORT_SYMBOL_GPL(kapi_print_spec);
#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+/**
+ * kapi_validate_sysfs_string - Validate a string value for sysfs
+ * @param: Parameter specification
+ * @buf: Buffer containing the string
+ * @count: Size of buffer
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_sysfs_string(const struct kapi_param_spec *param,
+ const char *buf, size_t count)
+{
+ size_t len = count;
+ int i;
+
+ if (!param || param->type != KAPI_TYPE_STRING)
+ return false;
+
+ /* Remove trailing newline if present */
+ if (len > 0 && buf[len - 1] == '\n')
+ len--;
+
+ /* Check length constraints */
+ if (param->size > 0 && len > param->size) {
+ pr_warn("Sysfs %s: string too long (max: %zu, got: %zu)\n",
+ param->name, param->size, len);
+ return false;
+ }
+
+ /* Check against allowed values if specified */
+ if (param->allowed_strings && param->allowed_string_count > 0) {
+ char *str = kstrndup(buf, len, GFP_KERNEL);
+ bool found = false;
+
+ if (!str)
+ return false;
+
+ for (i = 0; i < param->allowed_string_count; i++) {
+ if (strcmp(str, param->allowed_strings[i]) == 0) {
+ found = true;
+ break;
+ }
+ }
+
+ kfree(str);
+
+ if (!found) {
+ pr_warn("Sysfs %s: value not in allowed list\n", param->name);
+ return false;
+ }
+ }
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_sysfs_string);
+
+/**
+ * kapi_validate_sysfs_number - Validate a numeric value for sysfs
+ * @param: Parameter specification
+ * @buf: Buffer containing the value
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_sysfs_number(const struct kapi_param_spec *param,
+ const char *buf)
+{
+ s64 int_val;
+ u64 uint_val;
+ int ret;
+
+ if (!param)
+ return false;
+
+ switch (param->type) {
+ case KAPI_TYPE_INT:
+ ret = kstrtoll(buf, 0, &int_val);
+ if (ret) {
+ pr_warn("Sysfs %s: invalid integer format\n", param->name);
+ return false;
+ }
+
+ /* Check range constraints */
+ if (int_val < param->min_value || int_val > param->max_value) {
+ pr_warn("Sysfs %s: value %lld out of range [%lld, %lld]\n",
+ param->name, int_val, param->min_value, param->max_value);
+ return false;
+ }
+
+ /* Check step constraint */
+ if (param->step > 0) {
+ s64 offset = int_val - param->min_value;
+ if (offset % param->step != 0) {
+ pr_warn("Sysfs %s: value %lld not aligned to step %lld\n",
+ param->name, int_val, param->step);
+ return false;
+ }
+ }
+ break;
+
+ case KAPI_TYPE_UINT:
+ case KAPI_TYPE_HEX:
+ ret = kstrtoull(buf, 0, &uint_val);
+ if (ret) {
+ pr_warn("Sysfs %s: invalid unsigned integer format\n", param->name);
+ return false;
+ }
+
+ /* Check range constraints */
+ if (uint_val < (u64)param->min_value || uint_val > (u64)param->max_value) {
+ pr_warn("Sysfs %s: value %llu out of range [%llu, %llu]\n",
+ param->name, uint_val, (u64)param->min_value, (u64)param->max_value);
+ return false;
+ }
+
+ /* Check valid bits mask */
+ if (param->valid_mask && (uint_val & ~param->valid_mask)) {
+ pr_warn("Sysfs %s: value 0x%llx contains invalid bits (mask: 0x%llx)\n",
+ param->name, uint_val, param->valid_mask);
+ return false;
+ }
+ break;
+
+ case KAPI_TYPE_BOOL:
+ {
+ bool val;
+ ret = kstrtobool(buf, &val);
+ if (ret) {
+ pr_warn("Sysfs %s: invalid boolean value\n", param->name);
+ return false;
+ }
+ }
+ break;
+
+ default:
+ pr_warn("Sysfs %s: unsupported type %d for numeric validation\n",
+ param->name, param->type);
+ return false;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_sysfs_number);
+
+/**
+ * kapi_validate_sysfs_write - Validate a write operation to sysfs attribute
+ * @attr_name: Name of the sysfs attribute
+ * @buf: Buffer containing the value to write
+ * @count: Size of buffer
+ *
+ * Return: 0 if valid, negative error code otherwise
+ */
+int kapi_validate_sysfs_write(const char *attr_name, const char *buf, size_t count)
+{
+ const struct kernel_api_spec *spec;
+ const struct kapi_param_spec *param;
+ int ret;
+
+ spec = kapi_get_spec(attr_name);
+ if (!spec || spec->api_type != KAPI_API_SYSFS)
+ return 0; /* No spec or not a sysfs spec, allow operation */
+
+ if (spec->param_count == 0)
+ return 0; /* No parameters defined */
+
+ param = &spec->params[0]; /* Sysfs attributes have single parameter */
+
+ /* Check access permissions */
+ if (param->flags & KAPI_PARAM_SYSFS_READONLY) {
+ pr_warn("Sysfs %s: write to read-only attribute\n", attr_name);
+ return -EPERM;
+ }
+
+ /* Validate based on type */
+ switch (param->type) {
+ case KAPI_TYPE_STRING:
+ if (!kapi_validate_sysfs_string(param, buf, count))
+ return -EINVAL;
+ break;
+
+ case KAPI_TYPE_INT:
+ case KAPI_TYPE_UINT:
+ case KAPI_TYPE_HEX:
+ case KAPI_TYPE_BOOL:
+ if (!kapi_validate_sysfs_number(param, buf))
+ return -EINVAL;
+ break;
+
+ case KAPI_TYPE_BINARY:
+ /* Binary attributes have their own validation */
+ if (param->size > 0 && count > param->size) {
+ pr_warn("Sysfs %s: binary data too large (max: %zu)\n",
+ attr_name, param->size);
+ return -EINVAL;
+ }
+ break;
+
+ case KAPI_TYPE_CUSTOM:
+ if (param->validate) {
+ ret = param->validate((s64)(unsigned long)buf);
+ if (!ret) {
+ pr_warn("Sysfs %s: custom validation failed\n", attr_name);
+ return -EINVAL;
+ }
+ }
+ break;
+
+ default:
+ pr_warn("Sysfs %s: unknown type %d\n", attr_name, param->type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_sysfs_write);
+
+/**
+ * kapi_validate_sysfs_read - Validate a read operation from sysfs attribute
+ * @attr_name: Name of the sysfs attribute
+ *
+ * Return: 0 if valid, negative error code otherwise
+ */
+int kapi_validate_sysfs_read(const char *attr_name)
+{
+ const struct kernel_api_spec *spec;
+ const struct kapi_param_spec *param;
+
+ spec = kapi_get_spec(attr_name);
+ if (!spec || spec->api_type != KAPI_API_SYSFS)
+ return 0; /* No spec or not a sysfs spec, allow operation */
+
+ if (spec->param_count == 0)
+ return 0; /* No parameters defined */
+
+ param = &spec->params[0]; /* Sysfs attributes have single parameter */
+
+ /* Check access permissions */
+ if (param->flags & KAPI_PARAM_SYSFS_WRITEONLY) {
+ pr_warn("Sysfs %s: read from write-only attribute\n", attr_name);
+ return -EPERM;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_sysfs_read);
+
+/**
+ * kapi_validate_sysfs_permission - Validate permission change for sysfs attribute
+ * @attr_name: Name of the sysfs attribute
+ * @mode: New permission mode
+ *
+ * Return: 0 if valid, negative error code otherwise
+ */
+int kapi_validate_sysfs_permission(const char *attr_name, umode_t mode)
+{
+ const struct kernel_api_spec *spec;
+ const struct kapi_param_spec *param;
+
+ spec = kapi_get_spec(attr_name);
+ if (!spec || spec->api_type != KAPI_API_SYSFS)
+ return 0; /* No spec or not a sysfs spec, allow operation */
+
+ if (spec->param_count == 0)
+ return 0; /* No parameters defined */
+
+ param = &spec->params[0]; /* Sysfs attributes have single parameter */
+
+ /* Check if permissions match specification */
+ if (param->sysfs_permissions && param->sysfs_permissions != mode) {
+ pr_warn("Sysfs %s: permission mismatch (expected: 0%o, got: 0%o)\n",
+ attr_name, param->sysfs_permissions, mode);
+ /* We warn but don't fail - this might be intentional */
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_sysfs_permission);
+
/**
* kapi_validate_fd - Validate that a file descriptor is valid in current context
* @fd: File descriptor to validate
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 20/22] block: sysfs API specifications
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (18 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 21/22] net/socket: add API specification for socket() Sasha Levin
` (2 subsequent siblings)
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add documentation to block sysfs specifications.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
block/blk-integrity.c | 131 +++++++++++++++++++++++
block/blk-sysfs.c | 243 ++++++++++++++++++++++++++++++++++++++++++
block/genhd.c | 99 +++++++++++++++++
3 files changed, 473 insertions(+)
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index e4e2567061f9d..bfe08c8fab91b 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -13,6 +13,8 @@
#include <linux/scatterlist.h>
#include <linux/export.h>
#include <linux/slab.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/syscall_api_spec.h>
#include "blk.h"
@@ -234,6 +236,29 @@ static ssize_t flag_show(struct device *dev, char *page, unsigned char flag)
return sysfs_emit(page, "%d\n", !(bi->flags & flag));
}
+/*
+ * Sysfs API specifications for integrity attributes
+ */
+DEFINE_SYSFS_API_SPEC(format)
+ KAPI_DESCRIPTION("Metadata format for integrity")
+ KAPI_LONG_DESC("Metadata format for integrity capable block device. "
+ "E.g. T10-DIF-TYPE1-CRC. This field describes the type of T10 "
+ "Protection Information that the block device can send and receive. "
+ "If the device can store application integrity metadata but "
+ "no T10 Protection Information profile is used, this field "
+ "contains 'nop'. If the device does not support integrity "
+ "metadata, this field contains 'none'.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "format", "string", "Integrity metadata format")
+ KAPI_PARAM_TYPE(KAPI_TYPE_STRING)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/integrity/format")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/integrity/format")
+KAPI_END_SPEC;
+
static ssize_t format_show(struct device *dev, struct device_attribute *attr,
char *page)
{
@@ -244,6 +269,33 @@ static ssize_t format_show(struct device *dev, struct device_attribute *attr,
return sysfs_emit(page, "%s\n", blk_integrity_profile_name(bi));
}
+DEFINE_SYSFS_API_SPEC(tag_size)
+ KAPI_DESCRIPTION("Integrity tag size")
+ KAPI_LONG_DESC("Number of bytes of integrity tag space available per "
+ "protection_interval_bytes, which is typically "
+ "the device's logical block size. "
+ "This field describes the size of the application tag "
+ "if the storage device is formatted with T10 Protection "
+ "Information and permits use of the application tag.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "tag_size", "unsigned int", "Tag size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/integrity/tag_size")
+ KAPI_PARAM_RANGE(0, 65535)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/integrity/tag_size")
+ KAPI_NOTES("If the device does not support T10 Protection Information (even if the "
+ "device provides application integrity metadata space), this field is set to 0. "
+ "The owner of this tag space is the owner of the block device. The filesystem "
+ "can use this extra space to tag sectors as they see fit. Because the tag space "
+ "is limited, the block interface allows tagging bigger chunks by way of interleaving. "
+ "This way, 8*16 bits of information can be attached to a typical 4KB filesystem block.")
+KAPI_END_SPEC;
+
static ssize_t tag_size_show(struct device *dev, struct device_attribute *attr,
char *page)
{
@@ -252,6 +304,26 @@ static ssize_t tag_size_show(struct device *dev, struct device_attribute *attr,
return sysfs_emit(page, "%u\n", bi->tag_size);
}
+DEFINE_SYSFS_API_SPEC(protection_interval_bytes)
+ KAPI_DESCRIPTION("Protection interval size")
+ KAPI_LONG_DESC("Describes the number of data bytes which are protected by one "
+ "integrity tuple. Typically the device's logical block size. "
+ "For example, a 512-byte sector with 8-byte integrity metadata "
+ "would have a protection interval of 512 bytes.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "protection_interval_bytes", "unsigned int", "Protection interval in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/integrity/protection_interval_bytes")
+ KAPI_PARAM_RANGE(0, 65536)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/integrity/protection_interval_bytes")
+ KAPI_NOTES("This is typically the same as the device's logical block size")
+KAPI_END_SPEC;
+
static ssize_t protection_interval_bytes_show(struct device *dev,
struct device_attribute *attr,
char *page)
@@ -275,6 +347,25 @@ static ssize_t read_verify_show(struct device *dev,
return flag_show(dev, page, BLK_INTEGRITY_NOVERIFY);
}
+DEFINE_SYSFS_API_SPEC(read_verify)
+ KAPI_DESCRIPTION("Read request integrity verification")
+ KAPI_LONG_DESC("Indicates whether the block layer should verify the integrity "
+ "of read requests serviced by devices that support sending "
+ "integrity metadata. A value of 1 enables verification, while "
+ "0 disables it. When enabled, the block layer will check "
+ "integrity metadata on read operations.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "read_verify", "bool", "Enable read integrity verification")
+ KAPI_PARAM_TYPE(KAPI_TYPE_BOOL)
+ KAPI_PERMISSIONS(0644)
+ KAPI_PATH("/sys/block/<disk>/integrity/read_verify")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_RW)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("echo 1 > /sys/block/sda/integrity/read_verify")
+ KAPI_NOTES("This attribute only has effect if the device supports integrity metadata")
+KAPI_END_SPEC;
+
static ssize_t write_generate_store(struct device *dev,
struct device_attribute *attr,
const char *page, size_t count)
@@ -288,6 +379,46 @@ static ssize_t write_generate_show(struct device *dev,
return flag_show(dev, page, BLK_INTEGRITY_NOGENERATE);
}
+DEFINE_SYSFS_API_SPEC(write_generate)
+ KAPI_DESCRIPTION("Write request integrity generation")
+ KAPI_LONG_DESC("Indicates whether the block layer should automatically generate "
+ "checksums for write requests bound for devices that support "
+ "receiving integrity metadata. A value of 1 enables generation, "
+ "while 0 disables it. When enabled, the block layer will compute "
+ "and attach integrity metadata to write operations.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "write_generate", "bool", "Enable write integrity generation")
+ KAPI_PARAM_TYPE(KAPI_TYPE_BOOL)
+ KAPI_PERMISSIONS(0644)
+ KAPI_PATH("/sys/block/<disk>/integrity/write_generate")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_RW)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("echo 1 > /sys/block/sda/integrity/write_generate")
+ KAPI_NOTES("This attribute only has effect if the device supports integrity metadata")
+KAPI_END_SPEC;
+
+DEFINE_SYSFS_API_SPEC(device_is_integrity_capable)
+ KAPI_DESCRIPTION("Device integrity capability")
+ KAPI_LONG_DESC("Indicates whether a storage device is capable of storing "
+ "integrity metadata. Set if the device is T10 PI-capable. "
+ "This flag is set to 1 if the storage media is formatted "
+ "with T10 Protection Information. If the storage media is "
+ "not formatted with T10 Protection Information, this flag "
+ "is set to 0. This is a key indicator for whether the device "
+ "supports end-to-end data protection using standards like "
+ "T10 DIF (Data Integrity Field) for SCSI devices.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "device_is_integrity_capable", "bool", "Device integrity capability flag")
+ KAPI_PARAM_TYPE(KAPI_TYPE_BOOL)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/integrity/device_is_integrity_capable")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/integrity/device_is_integrity_capable")
+KAPI_END_SPEC;
+
static ssize_t device_is_integrity_capable_show(struct device *dev,
struct device_attribute *attr,
char *page)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b2b9b89d6967c..8446ed4fc63d8 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -10,6 +10,8 @@
#include <linux/backing-dev.h>
#include <linux/blktrace_api.h>
#include <linux/debugfs.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/syscall_api_spec.h>
#include "blk.h"
#include "blk-mq.h"
@@ -51,6 +53,31 @@ queue_var_store(unsigned long *var, const char *page, size_t count)
return count;
}
+DEFINE_SYSFS_API_SPEC(nr_requests)
+ KAPI_DESCRIPTION("Number of allocatable requests")
+ KAPI_LONG_DESC("This controls how many requests may be allocated in the "
+ "block layer for read or write requests. Note that the total "
+ "allocated number may be twice this amount, since it applies only "
+ "to reads or writes (not the accumulated sum). "
+ "When CONFIG_BLK_CGROUP is enabled, each request queue may have "
+ "up to N request pools for N block cgroups.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "nr_requests", "unsigned int", "Number of allocatable requests")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0644)
+ KAPI_PATH("/sys/block/<disk>/queue/nr_requests")
+ KAPI_PARAM_RANGE(BLKDEV_MIN_RQ, INT_MAX)
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_RW)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("echo 256 > /sys/block/sda/queue/nr_requests")
+ KAPI_NOTES("To avoid priority inversion through request starvation, a request queue "
+ "maintains a separate request pool per each cgroup when CONFIG_BLK_CGROUP "
+ "is enabled, and this parameter applies to each such per-block-cgroup "
+ "request pool. IOW, if there are N block cgroups, each request queue may "
+ "have up to N request pools, each independently regulated by nr_requests.")
+KAPI_END_SPEC;
+
static ssize_t queue_requests_show(struct gendisk *disk, char *page)
{
ssize_t ret;
@@ -89,6 +116,29 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
return ret;
}
+DEFINE_SYSFS_API_SPEC(read_ahead_kb)
+ KAPI_DESCRIPTION("Read-ahead size")
+ KAPI_LONG_DESC("Maximum number of kilobytes to read-ahead for filesystems "
+ "on this block device. For MADV_HUGEPAGE, the readahead size "
+ "may exceed this setting since its granularity is based on the "
+ "hugepage size.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "read_ahead_kb", "unsigned int", "Read-ahead size in kilobytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0644)
+ KAPI_PATH("/sys/block/<disk>/queue/read_ahead_kb")
+ KAPI_PARAM_RANGE(0, ULONG_MAX >> (PAGE_SHIFT - 10))
+ KAPI_UNITS("kilobytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_RW)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("echo 128 > /sys/block/sda/queue/read_ahead_kb")
+ KAPI_NOTES("128 KB for each device is a good starting point, but increasing to "
+ "4-8 MB might improve performance in environments where sequential "
+ "reading of large files takes place. Changes are not persistent "
+ "across reboots unless saved in startup scripts.")
+KAPI_END_SPEC;
+
static ssize_t queue_ra_show(struct gendisk *disk, char *page)
{
ssize_t ret;
@@ -124,6 +174,62 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count)
return ret;
}
+/*
+ * Sysfs API specifications for queue attributes
+ */
+DEFINE_SYSFS_API_SPEC(logical_block_size)
+ KAPI_DESCRIPTION("Logical block size")
+ KAPI_LONG_DESC("This is the smallest unit the storage device can address. "
+ "It is typically 512 bytes.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "logical_block_size", "unsigned int", "Logical block size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PATH("/sys/block/<disk>/queue/logical_block_size")
+ KAPI_PERMISSIONS(0444)
+ KAPI_PARAM_RANGE(512, 4096)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/queue/logical_block_size")
+KAPI_END_SPEC;
+
+DEFINE_SYSFS_API_SPEC(physical_block_size)
+ KAPI_DESCRIPTION("Physical block size")
+ KAPI_LONG_DESC("This is the smallest unit a physical storage device can "
+ "write atomically. It is usually the same as the logical block "
+ "size but may be bigger. One example is SATA drives with 4KB "
+ "sectors that expose a 512-byte logical block size to the "
+ "operating system.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "physical_block_size", "unsigned int", "Physical block size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/physical_block_size")
+ KAPI_PARAM_RANGE(512, 4194304)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/queue/physical_block_size")
+KAPI_END_SPEC;
+
+DEFINE_SYSFS_API_SPEC(hw_sector_size)
+ KAPI_DESCRIPTION("Hardware sector size")
+ KAPI_LONG_DESC("This is the hardware sector size of the device, in bytes.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "hw_sector_size", "unsigned int", "Hardware sector size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/hw_sector_size")
+ KAPI_PARAM_RANGE(512, 4194304)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/queue/hw_sector_size")
+KAPI_END_SPEC;
+
#define QUEUE_SYSFS_LIMIT_SHOW(_field) \
static ssize_t queue_##_field##_show(struct gendisk *disk, char *page) \
{ \
@@ -147,7 +253,53 @@ QUEUE_SYSFS_LIMIT_SHOW(virt_boundary_mask)
QUEUE_SYSFS_LIMIT_SHOW(dma_alignment)
QUEUE_SYSFS_LIMIT_SHOW(max_open_zones)
QUEUE_SYSFS_LIMIT_SHOW(max_active_zones)
+DEFINE_SYSFS_API_SPEC(atomic_write_unit_min_bytes)
+ KAPI_DESCRIPTION("Minimum atomic write unit size")
+ KAPI_LONG_DESC("This parameter specifies the smallest block which can "
+ "be written atomically with an atomic write operation. All "
+ "atomic write operations must begin at a "
+ "atomic_write_unit_min boundary and must be multiples of "
+ "atomic_write_unit_min. This value must be a power-of-two.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "atomic_write_unit_min_bytes", "unsigned int", "Minimum atomic write unit size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/atomic_write_unit_min_bytes")
+ KAPI_PARAM_RANGE(0, ULLONG_MAX)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/nvme0n1/queue/atomic_write_unit_min_bytes")
+ KAPI_NOTES("This value must be a power-of-two. All atomic write operations must "
+ "begin at a atomic_write_unit_min boundary and must be multiples of "
+ "atomic_write_unit_min.")
+KAPI_END_SPEC;
+
QUEUE_SYSFS_LIMIT_SHOW(atomic_write_unit_min)
+
+DEFINE_SYSFS_API_SPEC(atomic_write_unit_max_bytes)
+ KAPI_DESCRIPTION("Maximum atomic write unit size")
+ KAPI_LONG_DESC("This parameter defines the largest block which can be "
+ "written atomically with an atomic write operation. This "
+ "value must be a multiple of atomic_write_unit_min and must "
+ "be a power-of-two. This value will not be larger than "
+ "atomic_write_max_bytes.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "atomic_write_unit_max_bytes", "unsigned int", "Maximum atomic write unit size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/atomic_write_unit_max_bytes")
+ KAPI_PARAM_RANGE(0, ULLONG_MAX)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/nvme0n1/queue/atomic_write_unit_max_bytes")
+ KAPI_NOTES("This value must be a multiple of atomic_write_unit_min and must be a "
+ "power-of-two. This value will not be larger than atomic_write_max_bytes.")
+KAPI_END_SPEC;
+
QUEUE_SYSFS_LIMIT_SHOW(atomic_write_unit_max)
#define QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(_field) \
@@ -161,7 +313,60 @@ static ssize_t queue_##_field##_show(struct gendisk *disk, char *page) \
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_discard_sectors)
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_hw_discard_sectors)
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_write_zeroes_sectors)
+
+DEFINE_SYSFS_API_SPEC(atomic_write_max_bytes)
+ KAPI_DESCRIPTION("Maximum atomic write size")
+ KAPI_LONG_DESC("This parameter specifies the maximum atomic write "
+ "size reported by the device. This parameter is relevant "
+ "for merging of writes, where a merged atomic write "
+ "operation must not exceed this number of bytes. "
+ "This parameter may be greater than atomic_write_unit_max_bytes.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "atomic_write_max_bytes", "unsigned int", "Maximum atomic write size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/atomic_write_max_bytes")
+ KAPI_PARAM_RANGE(0, ULLONG_MAX)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/nvme0n1/queue/atomic_write_max_bytes")
+ KAPI_NOTES("This parameter is relevant for merging of writes, where a merged atomic "
+ "write operation must not exceed this number of bytes. May be greater than "
+ "atomic_write_unit_max_bytes as atomic_write_unit_max_bytes will be rounded "
+ "down to a power-of-two and may also be limited by other queue limits such "
+ "as max_segments. Will not be larger than max_hw_sectors_kb.")
+KAPI_END_SPEC;
+
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(atomic_write_max_sectors)
+
+DEFINE_SYSFS_API_SPEC(atomic_write_boundary_bytes)
+ KAPI_DESCRIPTION("Atomic write boundary size")
+ KAPI_LONG_DESC("A device may need to internally split an atomic write I/O "
+ "which straddles a given logical block address boundary. This "
+ "parameter specifies the size in bytes of the atomic boundary if "
+ "one is reported by the device. This value must be a "
+ "power-of-two and at least the size as in "
+ "atomic_write_unit_max_bytes.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "atomic_write_boundary_bytes", "unsigned int", "Atomic write boundary size in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/atomic_write_boundary_bytes")
+ KAPI_PARAM_RANGE(0, ULLONG_MAX)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/nvme0n1/queue/atomic_write_boundary_bytes")
+ KAPI_NOTES("A device may need to internally split an atomic write I/O which straddles "
+ "a given logical block address boundary. This specifies the size in bytes of "
+ "the atomic boundary if one is reported by the device. Must be a power-of-two "
+ "and at least the size as in atomic_write_unit_max_bytes. Any attempt to merge "
+ "atomic write I/Os must not result in a merged I/O which crosses this boundary.")
+KAPI_END_SPEC;
+
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(atomic_write_boundary_sectors)
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_zone_append_sectors)
@@ -171,7 +376,45 @@ static ssize_t queue_##_field##_show(struct gendisk *disk, char *page) \
return queue_var_show(disk->queue->limits._field >> 1, page); \
}
+DEFINE_SYSFS_API_SPEC(max_sectors_kb)
+ KAPI_DESCRIPTION("Maximum request size (software limit)")
+ KAPI_LONG_DESC("This is the maximum number of kilobytes that the block "
+ "layer will allow for a filesystem request. Must be smaller than "
+ "or equal to the maximum size allowed by the hardware. Write 0 "
+ "to use default kernel settings.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "max_sectors_kb", "unsigned int", "Maximum request size in kilobytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0644)
+ KAPI_PATH("/sys/block/<disk>/queue/max_sectors_kb")
+ KAPI_PARAM_RANGE(0, UINT_MAX)
+ KAPI_UNITS("kilobytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_RW)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("echo 512 > /sys/block/sda/queue/max_sectors_kb")
+ KAPI_NOTES("Must be <= max_hw_sectors_kb")
+KAPI_END_SPEC;
+
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_KB(max_sectors)
+
+DEFINE_SYSFS_API_SPEC(max_hw_sectors_kb)
+ KAPI_DESCRIPTION("Maximum request size (hardware limit)")
+ KAPI_LONG_DESC("This is the maximum number of kilobytes supported in a "
+ "single data transfer.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "max_hw_sectors_kb", "unsigned int", "Maximum hardware request size in kilobytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/queue/max_hw_sectors_kb")
+ KAPI_PARAM_RANGE(0, UINT_MAX)
+ KAPI_UNITS("kilobytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/queue/max_hw_sectors_kb")
+KAPI_END_SPEC;
+
QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_KB(max_hw_sectors)
#define QUEUE_SYSFS_SHOW_CONST(_name, _val) \
diff --git a/block/genhd.c b/block/genhd.c
index 8171a6bc3210f..3cbc5418825f0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -26,6 +26,8 @@
#include <linux/badblocks.h>
#include <linux/part_stat.h>
#include <linux/blktrace_api.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/syscall_api_spec.h>
#include "blk-throttle.h"
#include "blk.h"
@@ -1104,6 +1106,25 @@ ssize_t part_stat_show(struct device *dev,
* For bio-based device, started from bdev_start_io_acct();
* For rq-based device, started from blk_mq_start_request();
*/
+DEFINE_SYSFS_API_SPEC(inflight)
+ KAPI_DESCRIPTION("I/O requests in progress")
+ KAPI_LONG_DESC("Reports the number of I/O requests currently in progress "
+ "(pending / in flight) in a device driver. This can be less "
+ "than the number of requests queued in the block device queue. "
+ "The report contains 2 fields: one for read requests "
+ "and one for write requests.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "inflight", "string", "Two unsigned integers: read and write requests in flight")
+ KAPI_PARAM_TYPE(KAPI_TYPE_STRING)
+ KAPI_PATH("/sys/block/<disk>/inflight")
+ KAPI_PERMISSIONS(0444)
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/inflight")
+ KAPI_NOTES("The value type is unsigned int. Related to /sys/block/<disk>/queue/nr_requests")
+KAPI_END_SPEC;
+
ssize_t part_inflight_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -1123,6 +1144,28 @@ static ssize_t disk_capability_show(struct device *dev,
return sysfs_emit(buf, "0\n");
}
+/*
+ * Sysfs API specifications for disk attributes
+ */
+DEFINE_SYSFS_API_SPEC(alignment_offset)
+ KAPI_DESCRIPTION("Physical block alignment offset")
+ KAPI_LONG_DESC("Storage devices may report a physical block size that is "
+ "bigger than the logical block size. This parameter "
+ "indicates how many bytes the beginning of the device is "
+ "offset from the disk's natural alignment.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "alignment_offset", "int", "Alignment offset in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PATH("/sys/block/<disk>/alignment_offset")
+ KAPI_PERMISSIONS(0444)
+ KAPI_PARAM_RANGE(0, INT_MAX)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/alignment_offset")
+KAPI_END_SPEC;
+
static ssize_t disk_alignment_offset_show(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -1132,6 +1175,27 @@ static ssize_t disk_alignment_offset_show(struct device *dev,
return sysfs_emit(buf, "%d\n", bdev_alignment_offset(disk->part0));
}
+DEFINE_SYSFS_API_SPEC(discard_alignment)
+ KAPI_DESCRIPTION("Discard alignment offset")
+ KAPI_LONG_DESC("Devices that support discard functionality may "
+ "internally allocate space in units that are bigger than "
+ "the exported logical block size. The discard_alignment "
+ "parameter indicates how many bytes the beginning of the "
+ "device is offset from the internal allocation unit's "
+ "natural alignment.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "discard_alignment", "int", "Discard alignment offset in bytes")
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PATH("/sys/block/<disk>/discard_alignment")
+ KAPI_PERMISSIONS(0444)
+ KAPI_PARAM_RANGE(0, INT_MAX)
+ KAPI_UNITS("bytes")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/discard_alignment")
+KAPI_END_SPEC;
+
static ssize_t disk_discard_alignment_show(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -1141,6 +1205,25 @@ static ssize_t disk_discard_alignment_show(struct device *dev,
return sysfs_emit(buf, "%d\n", bdev_alignment_offset(disk->part0));
}
+DEFINE_SYSFS_API_SPEC(diskseq)
+ KAPI_DESCRIPTION("Disk sequence number")
+ KAPI_LONG_DESC("The diskseq attribute reports the disk sequence number, "
+ "which is a monotonically increasing number assigned to "
+ "every drive. Some devices, like the loop device, refresh "
+ "this number every time the backing file is changed.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "diskseq", "uint64_t", "64-bit disk sequence number")
+ KAPI_PARAM_TYPE(KAPI_TYPE_UINT)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/diskseq")
+ KAPI_PARAM_RANGE(0, ULLONG_MAX)
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/diskseq")
+ KAPI_NOTES("Value type is 64 bit unsigned")
+KAPI_END_SPEC;
+
static ssize_t diskseq_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -1149,6 +1232,22 @@ static ssize_t diskseq_show(struct device *dev,
return sysfs_emit(buf, "%llu\n", disk->diskseq);
}
+DEFINE_SYSFS_API_SPEC(partscan)
+ KAPI_DESCRIPTION("Partition scanning status")
+ KAPI_LONG_DESC("Reports if partition scanning is enabled for the disk. "
+ "Returns '1' if partition scanning is enabled, or '0' if not.")
+ KAPI_PARAM_COUNT(1)
+ KAPI_PARAM(0, "partscan", "bool", "Partition scanning enabled flag")
+ KAPI_PARAM_TYPE(KAPI_TYPE_BOOL)
+ KAPI_PERMISSIONS(0444)
+ KAPI_PATH("/sys/block/<disk>/partscan")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_SYSFS_READONLY)
+ KAPI_PARAM_END
+ KAPI_SUBSYSTEM("block")
+ KAPI_EXAMPLES("cat /sys/block/sda/partscan")
+ KAPI_NOTES("The value type is a 32-bit unsigned integer, but only '0' and '1' are valid values")
+KAPI_END_SPEC;
+
static ssize_t partscan_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 21/22] net/socket: add API specification for socket()
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (19 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 20/22] block: sysfs API specifications Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2025-07-01 2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Add kernel API specification for the socket() system call, documenting
all aspects of socket creation.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/socket.c | 489 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 489 insertions(+)
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f08598..fa42497d72af2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -78,6 +78,7 @@
#include <linux/pseudo_fs.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/syscall_api_spec.h>
#include <linux/compat.h>
#include <linux/kmod.h>
#include <linux/audit.h>
@@ -89,6 +90,7 @@
#include <linux/nospec.h>
#include <linux/indirect_call_wrapper.h>
#include <linux/io_uring/net.h>
+#include <linux/un.h>
#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -1692,6 +1694,493 @@ int __sys_socket(int family, int type, int protocol)
return sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));
}
+DEFINE_KERNEL_API_SPEC(sys_socket)
+ KAPI_DESCRIPTION("Create an endpoint for communication")
+ KAPI_LONG_DESC("Creates an endpoint for communication and returns a file descriptor "
+ "that refers to that endpoint. The file descriptor returned by a successful "
+ "call will be the lowest-numbered file descriptor not currently open for "
+ "the process. The socket has the indicated type, which specifies the "
+ "communication semantics. The socket() system call is the foundation of "
+ "all network programming in Linux, providing access to various network "
+ "protocols and communication mechanisms.")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "family", "int", "Protocol/address family (domain)")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE)
+ KAPI_PARAM_RANGE(0, 45) /* AF_UNSPEC to AF_MCTP */
+ KAPI_PARAM_CONSTRAINT("Common families: AF_UNIX (1), AF_INET (2), AF_INET6 (10), "
+ "AF_NETLINK (16), AF_PACKET (17). Others: AF_BLUETOOTH (31), AF_CAN (29), "
+ "AF_TIPC (30), AF_VSOCK (40), AF_XDP (44). Range: 0-45 (AF_MCTP). "
+ "PF_* are aliases. Negative or >= 46 returns EAFNOSUPPORT.")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "type", "int", "Socket type with optional flags")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_MASK)
+ KAPI_PARAM_VALID_MASK(SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK)
+ KAPI_PARAM_CONSTRAINT("Types: SOCK_STREAM (1), SOCK_DGRAM (2), SOCK_RAW (3), "
+ "SOCK_RDM (4), SOCK_SEQPACKET (5), SOCK_DCCP (6), SOCK_PACKET (10-obsolete). "
+ "Flags (since 2.6.27): SOCK_NONBLOCK, SOCK_CLOEXEC. Range: 0-10.")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(2, "protocol", "int", "Protocol within the family")
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
+ KAPI_PARAM_TYPE(KAPI_TYPE_INT)
+ KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE)
+ KAPI_PARAM_CONSTRAINT("Usually 0 to select the default protocol for the given family and type. "
+ "For AF_INET/AF_INET6: IPPROTO_TCP (6), IPPROTO_UDP (17), IPPROTO_ICMP (1), "
+ "IPPROTO_RAW (255), etc. Must be >= 0 and < IPPROTO_MAX. "
+ "For AF_UNIX: only 0 or PF_UNIX (1) accepted. "
+ "For AF_PACKET: network byte order Ethernet protocol (e.g., ETH_P_IP). "
+ "For AF_NETLINK: NETLINK_ROUTE, NETLINK_AUDIT, etc. (0-31). "
+ "Protocol value passed through update_socket_protocol() BPF hook which may modify it.")
+ KAPI_PARAM_END
+
+ KAPI_RETURN("long", "File descriptor on success; negative error code on failure. "
+ "On success, returns the lowest available file descriptor. "
+ "The descriptor is automatically placed in the process's file descriptor table. "
+ "If SOCK_CLOEXEC is set, FD_CLOEXEC is set on the descriptor. "
+ "If SOCK_NONBLOCK is set, O_NONBLOCK is set on the file.")
+ KAPI_RETURN_TYPE(KAPI_TYPE_FD)
+ KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK)
+ KAPI_RETURN_SUCCESS(0)
+ KAPI_RETURN_END
+
+ /* Core error codes from __sock_create() and __sys_socket() */
+ KAPI_ERROR(0, -EAFNOSUPPORT, "EAFNOSUPPORT", "Address family not supported",
+ "The implementation does not support the specified address family. "
+ "Returned when: family < 0 || family >= NPROTO (46); "
+ "protocol family not registered in net_families[]; "
+ "protocol family module cannot be loaded; "
+ "try_module_get() fails on protocol family owner.")
+ KAPI_ERROR(1, -EINVAL, "EINVAL", "Invalid argument",
+ "Invalid argument specified. Returned when: "
+ "type < 0 || type >= SOCK_MAX (11); "
+ "invalid flags in type ((type & ~SOCK_TYPE_MASK) & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)); "
+ "other protocol-specific validation failures.")
+ KAPI_ERROR(2, -ENFILE, "ENFILE", "File table overflow",
+ "The system-wide limit on the total number of open files has been reached. "
+ "Returned when sock_alloc() fails due to new_inode_pseudo() failure.")
+ KAPI_ERROR(3, -EMFILE, "EMFILE", "Too many open files",
+ "The per-process limit on the number of open file descriptors has been reached. "
+ "Returned when sock_map_fd() cannot allocate a new file descriptor.")
+ KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Out of memory",
+ "Insufficient kernel memory available. Can occur in: "
+ "sk_alloc() when allocating sock structure; "
+ "protocol-specific init functions; "
+ "security_sk_alloc() in LSM hooks; "
+ "various kmalloc()/kmem_cache_alloc() calls.")
+ KAPI_ERROR(5, -ENOBUFS, "ENOBUFS", "No buffer space available",
+ "Insufficient resources to create socket. Similar to ENOMEM but used by "
+ "some protocol families (e.g., AF_PACKET) to indicate resource exhaustion.")
+ KAPI_ERROR(6, -EPROTONOSUPPORT, "EPROTONOSUPPORT", "Protocol not supported",
+ "The protocol is not supported within this domain. Returned when: "
+ "AF_UNIX: protocol != 0 && protocol != PF_UNIX; "
+ "AF_INET/AF_INET6: protocol not found in inetsw[] array; "
+ "AF_NETLINK: protocol < 0 || protocol >= MAX_LINKS (32).")
+ KAPI_ERROR(7, -ESOCKTNOSUPPORT, "ESOCKTNOSUPPORT", "Socket type not supported",
+ "The socket type is not supported within this domain. Returned when: "
+ "AF_UNIX: type not in {STREAM, DGRAM, SEQPACKET, RAW}; "
+ "AF_INET/AF_INET6: no matching (type, protocol) in inetsw[]; "
+ "AF_PACKET: type not in {DGRAM, RAW, PACKET}; "
+ "AF_NETLINK: type not in {RAW, DGRAM}.")
+ KAPI_ERROR(8, -EPERM, "EPERM", "Operation not permitted",
+ "Permission denied due to insufficient privileges. Returned when: "
+ "AF_INET/AF_INET6 with SOCK_RAW: missing CAP_NET_RAW; "
+ "AF_PACKET: missing CAP_NET_RAW; "
+ "Some protocol families may have additional restrictions.")
+ KAPI_ERROR(9, -EACCES, "EACCES", "Permission denied",
+ "Permission denied by Linux Security Module (SELinux, AppArmor, etc.). "
+ "Returned by security_socket_create() or security_socket_post_create() hooks.")
+ KAPI_ERROR(10, -EAGAIN, "EAGAIN", "Resource temporarily unavailable",
+ "Transient resource shortage. Can be returned by some protocol families "
+ "during initialization when resources are temporarily exhausted.")
+ KAPI_ERROR(11, -EINTR, "EINTR", "Interrupted system call",
+ "Operation interrupted by signal. Rare for socket() but possible if "
+ "module loading is interrupted or during memory allocation with GFP_KERNEL.")
+ KAPI_ERROR(12, -EFAULT, "EFAULT", "Bad address",
+ "Not directly returned by socket() since all parameters are values, not pointers. "
+ "Listed for completeness as it appears in documentation.")
+ KAPI_ERROR(13, -ENOSYS, "ENOSYS", "Function not implemented",
+ "Can occur in containers using alt-syscall where socket() is not whitelisted, "
+ "or on architectures where socket() is not implemented.")
+
+ KAPI_ERROR_COUNT(14)
+ KAPI_PARAM_COUNT(3)
+ KAPI_SINCE_VERSION("4.2BSD")
+
+ KAPI_EXAMPLES("/* Create a TCP socket */\n"
+ "int tcp_sock = socket(AF_INET, SOCK_STREAM, 0);\n"
+ "if (tcp_sock < 0) {\n"
+ " perror(\"socket\");\n"
+ " exit(EXIT_FAILURE);\n"
+ "}\n\n"
+ "/* Create a non-blocking UDP socket with close-on-exec */\n"
+ "int udp_sock = socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK | SOCK_CLOEXEC, 0);\n\n"
+ "/* Create a raw ICMP socket (requires CAP_NET_RAW) */\n"
+ "int raw_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);\n\n"
+ "/* Create a Unix domain datagram socket */\n"
+ "int unix_sock = socket(AF_UNIX, SOCK_DGRAM, 0);\n\n"
+ "/* Create a netlink socket for routing information */\n"
+ "int nl_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);\n\n"
+ "/* Create a packet socket for raw Ethernet frames (requires CAP_NET_RAW) */\n"
+ "int packet_sock = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));\n\n"
+ "/* Create a Bluetooth L2CAP socket */\n"
+ "int bt_sock = socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP);")
+
+ KAPI_NOTES("Implementation details:\n"
+ "- Uses RCU to safely access net_families[] array\n"
+ "- May trigger automatic module loading via request_module(\"net-pf-%d\", family)\n"
+ "- Allocates inode from sock_inode_cache via new_inode_pseudo()\n"
+ "- Each protocol family registers via sock_register() with unique family number\n"
+ "- Socket creation involves: sock_alloc() -> pf->create() -> sock_map_fd()\n"
+ "- The update_socket_protocol() BPF hook can modify the protocol parameter\n"
+ "- LSM hooks called: security_socket_create() and security_socket_post_create()\n"
+ "- Creates struct socket (VFS layer) and struct sock (network layer)\n"
+ "- Socket state initialized to SS_UNCONNECTED\n"
+ "- File operations set to socket_file_ops\n"
+ "- The (PF_INET, SOCK_PACKET) combination is deprecated since Linux 2.0\n"
+ "Build-time checks ensure SOCK_CLOEXEC == O_CLOEXEC and flag consistency")
+
+ /* Lock specifications */
+ KAPI_LOCK(0, "rcu_read_lock", KAPI_LOCK_RCU)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Protects net_families[] array access during protocol family lookup. "
+ "Acquired before rcu_dereference(net_families[family]), "
+ "released after pf->create() call or on error path.")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(1, "pf->owner module refcount", KAPI_LOCK_CUSTOM)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_RELEASED
+ KAPI_LOCK_DESC("Prevents protocol family module unload during socket creation. "
+ "try_module_get(pf->owner) before pf->create(), "
+ "module_put(pf->owner) after completion.")
+ KAPI_LOCK_END
+
+ KAPI_LOCK(2, "sock->ops->owner module refcount", KAPI_LOCK_CUSTOM)
+ KAPI_LOCK_ACQUIRED
+ KAPI_LOCK_DESC("Prevents socket operations module unload during socket lifetime. "
+ "try_module_get(sock->ops->owner) after successful creation, "
+ "released only on sock_release() when socket is closed.")
+ KAPI_LOCK_END
+
+ KAPI_LOCK_COUNT(3)
+
+ /* Signal handling */
+ KAPI_SIGNAL(0, 0, "Module loading", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RESTART)
+ KAPI_SIGNAL_CONDITION("CONFIG_MODULES && request_module() called")
+ KAPI_SIGNAL_DESC("Module loading via request_module() is interruptible. "
+ "Signal delivery causes -EINTR from modprobe execution.")
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING)
+ KAPI_SIGNAL_INTERRUPTIBLE
+ KAPI_SIGNAL_END
+
+ KAPI_SIGNAL_COUNT(1)
+
+ /* Side effects */
+ KAPI_SIDE_EFFECT(0, KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_RESOURCE_CREATE,
+ "socket structures",
+ "Allocates struct socket (VFS), struct sock (network), and protocol-specific data. "
+ "Memory from: sock_inode_cache, protocol's slab cache, and general kmalloc.")
+ KAPI_EFFECT_CONDITION("Always occurs on successful socket creation")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(1, KAPI_EFFECT_RESOURCE_CREATE,
+ "file descriptor",
+ "Allocates new file descriptor at lowest available index. "
+ "Creates struct file with socket_file_ops. Sets up file->private_data = socket.")
+ KAPI_EFFECT_CONDITION("Always occurs on successful socket creation")
+ KAPI_EFFECT_REVERSIBLE
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(2, KAPI_EFFECT_FILESYSTEM,
+ "protocol module",
+ "May trigger request_module(\"net-pf-%d\", family) to load protocol module. "
+ "Executes /sbin/modprobe in userspace context.")
+ KAPI_EFFECT_CONDITION("CONFIG_MODULES=y && !net_families[family] && first attempt")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE,
+ "LSM and audit",
+ "Calls security_socket_create() pre-creation and security_socket_post_create() "
+ "post-creation. May generate audit events. SELinux/AppArmor may deny.")
+ KAPI_EFFECT_CONDITION("CONFIG_SECURITY=y or CONFIG_AUDIT=y")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE,
+ "BPF programs",
+ "update_socket_protocol() hook can modify protocol parameter. "
+ "BPF_CGROUP_RUN_PROG_INET_SOCK() may run for AF_INET/AF_INET6.")
+ KAPI_EFFECT_CONDITION("BPF programs attached to cgroup or socket hooks")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(5, KAPI_EFFECT_NETWORK | KAPI_EFFECT_HARDWARE,
+ "network stack",
+ "Initializes protocol-specific state. May interact with network hardware "
+ "(e.g., AF_PACKET binds to network interface).")
+ KAPI_EFFECT_CONDITION("Protocol family specific")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT(6, KAPI_EFFECT_MODIFY_STATE,
+ "resource accounting",
+ "Updates task and memory cgroup accounting. Charges socket memory to owner. "
+ "Increments global socket counters.")
+ KAPI_EFFECT_CONDITION("CONFIG_MEMCG=y or other accounting enabled")
+ KAPI_SIDE_EFFECT_END
+
+ KAPI_SIDE_EFFECT_COUNT(7)
+
+ /* State transitions */
+ KAPI_STATE_TRANS(0, "file descriptor table",
+ "n open descriptors", "n+1 open descriptors",
+ "New fd allocated at min(available). Updates current->files->fd_array[]")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(1, "socket state machine",
+ "non-existent", "SS_UNCONNECTED",
+ "Socket created in unconnected state, ready for bind() or connect()")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(2, "network namespace",
+ "no socket", "socket registered",
+ "Socket associated with current->nsproxy->net_ns network namespace")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS(3, "memory accounting",
+ "pre-allocation", "memory charged",
+ "Socket memory charged to owner's memcg and rlimits")
+ KAPI_STATE_TRANS_END
+
+ KAPI_STATE_TRANS_COUNT(4)
+
+ /* Networking-specific specifications */
+
+ /* Socket state specification */
+ KAPI_SOCKET_STATE_REQ(KAPI_SOCK_STATE_UNSPEC)
+ KAPI_SOCKET_STATE_RESULT(KAPI_SOCK_STATE_OPEN)
+ KAPI_SOCKET_STATE_COND("Successful socket creation")
+ KAPI_SOCKET_STATE_PROTOS(KAPI_PROTO_ALL)
+ KAPI_SOCKET_STATE_END
+
+ /* Protocol-specific behaviors - detailed specifications */
+ KAPI_PROTOCOL_BEHAVIOR(0, KAPI_PROTO_TCP,
+ "TCP (Transmission Control Protocol) creates reliable, ordered, connection-oriented "
+ "byte streams. Features: 3-way handshake connection establishment; sequence numbers "
+ "for ordering; acknowledgments and retransmissions for reliability; flow control "
+ "via sliding window; congestion control (Reno/CUBIC/BBR); Nagle algorithm for "
+ "small packet aggregation; keep-alive probes; urgent data via MSG_OOB. "
+ "Socket combines (AF_INET/AF_INET6, SOCK_STREAM, IPPROTO_TCP).")
+ KAPI_PROTOCOL_FLAGS(0, "TCP-specific socket options via SOL_TCP level")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR(1, KAPI_PROTO_UDP,
+ "UDP (User Datagram Protocol) creates unreliable, connectionless datagram service. "
+ "Features: no connection establishment; best-effort delivery; message boundaries "
+ "preserved; no flow/congestion control; optional checksums; multicast/broadcast "
+ "capable; lower overhead than TCP. Maximum datagram size 65507 bytes (65535 - "
+ "IP header - UDP header). connect() on UDP socket sets default destination. "
+ "Socket combines (AF_INET/AF_INET6, SOCK_DGRAM, IPPROTO_UDP).")
+ KAPI_PROTOCOL_FLAGS(0, "UDP-specific options like UDP_CORK via SOL_UDP")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR(2, KAPI_PROTO_UNIX,
+ "Unix domain sockets provide high-performance local IPC with filesystem-based "
+ "addressing or Linux abstract namespace. Features: reliable delivery; in-order "
+ "semantics for SOCK_STREAM; message boundaries for SOCK_DGRAM/SOCK_SEQPACKET; "
+ "credential passing via SCM_CREDENTIALS; file descriptor passing via SCM_RIGHTS; "
+ "no network overhead; kernel-only data path. SOCK_RAW mapped to SOCK_DGRAM. "
+ "Maximum datagram size 130688 bytes by default (net.core.wmem_max).")
+ KAPI_PROTOCOL_FLAGS(0, "No Unix-specific socket level; uses SOL_SOCKET only")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR(3, KAPI_PROTO_RAW,
+ "Raw sockets provide direct access to network layer (IP) or link layer (Ethernet). "
+ "Features: receive/send raw IP packets; implement custom protocols; packet "
+ "sniffing; bypass transport layer. IP header included based on IP_HDRINCL option. "
+ "Protocol field specifies which protocol to receive (IPPROTO_ICMP, etc.) or "
+ "IPPROTO_RAW to send any. Link layer access via AF_PACKET. Requires CAP_NET_RAW "
+ "capability. Used by ping, traceroute, nmap, tcpdump.")
+ KAPI_PROTOCOL_FLAGS(0, "IP_HDRINCL and raw-specific options via SOL_RAW")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR(4, KAPI_PROTO_PACKET,
+ "Packet sockets provide direct access to link layer (Layer 2). Features: "
+ "send/receive raw Ethernet frames; implement network protocols in userspace; "
+ "packet capture and injection; access to all packets on interface. SOCK_RAW "
+ "provides full Layer 2 header; SOCK_DGRAM provides cooked packets without "
+ "Layer 2 header. Protocol specifies Ethernet protocol (ETH_P_IP, ETH_P_ALL). "
+ "High-performance variants: PACKET_MMAP, PACKET_FANOUT. Requires CAP_NET_RAW.")
+ KAPI_PROTOCOL_FLAGS(0, "Extensive options via SOL_PACKET level")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR(5, KAPI_PROTO_NETLINK,
+ "Netlink sockets provide kernel/user-space communication interface. Features: "
+ "reliable datagram service; multicast groups; message-based; TLV attributes; "
+ "async notifications; used for routing, netfilter, audit, SELinux, etc. "
+ "Protocol specifies subsystem: NETLINK_ROUTE (routing/link), NETLINK_FIREWALL, "
+ "NETLINK_NETFILTER, NETLINK_AUDIT, etc. No special capabilities for most "
+ "protocols except administrative operations.")
+ KAPI_PROTOCOL_FLAGS(0, "Netlink-specific options and attributes")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR(6, KAPI_PROTO_SCTP,
+ "SCTP (Stream Control Transmission Protocol) provides reliable, message-oriented "
+ "service with multi-streaming and multi-homing. Features: message boundaries; "
+ "ordered/unordered delivery; multi-streaming prevents head-of-line blocking; "
+ "multi-homing for redundancy; heartbeats; partial reliability extension. "
+ "4-way handshake with cookie mechanism prevents SYN floods. "
+ "Socket combines (AF_INET/AF_INET6, SOCK_STREAM/SOCK_SEQPACKET, IPPROTO_SCTP).")
+ KAPI_PROTOCOL_FLAGS(0, "SCTP-specific options via SOL_SCTP level")
+ KAPI_PROTOCOL_BEHAVIOR_END
+
+ KAPI_PROTOCOL_BEHAVIOR_COUNT(7)
+
+ /* Buffer specification - not applicable for socket creation */
+ KAPI_BUFFER_SPEC(0)
+ KAPI_BUFFER_SIZE(0, 0, 0)
+ KAPI_BUFFER_END
+
+ /* Async specification - socket creation is synchronous */
+ KAPI_ASYNC_SPEC(KAPI_ASYNC_BLOCK, 0)
+ KAPI_ASYNC_END
+
+ /* Network-specific errors are already covered in main error list */
+
+ /* Address families supported - comprehensive list */
+ KAPI_ADDR_FAMILY(0, AF_UNIX, "AF_UNIX/AF_LOCAL", sizeof(struct sockaddr_un), 2, 110)
+ KAPI_ADDR_FORMAT("struct sockaddr_un { sa_family_t sun_family; char sun_path[108]; }")
+ KAPI_ADDR_FEATURES(false, false, false)
+ KAPI_ADDR_SPECIAL("Abstract namespace: sun_path[0] == '\\0'; "
+ "Autobind: empty sun_path gets random abstract address; "
+ "Filesystem: normal paths follow filesystem permissions")
+ KAPI_ADDR_PORTS(0, 0) /* No port concept */
+ KAPI_ADDR_FAMILY_END
+
+ KAPI_ADDR_FAMILY(1, AF_INET, "AF_INET", sizeof(struct sockaddr_in), 16, 16)
+ KAPI_ADDR_FORMAT("struct sockaddr_in { sa_family_t sin_family; __be16 sin_port; "
+ "struct in_addr sin_addr; char sin_zero[8]; }")
+ KAPI_ADDR_FEATURES(true, true, true)
+ KAPI_ADDR_SPECIAL("INADDR_ANY (0.0.0.0) - wildcard; "
+ "INADDR_LOOPBACK (127.0.0.1) - loopback; "
+ "INADDR_BROADCAST (255.255.255.255) - broadcast; "
+ "224.0.0.0/4 - multicast range")
+ KAPI_ADDR_PORTS(0, 65535) /* 0 = ephemeral port assignment */
+ KAPI_ADDR_FAMILY_END
+
+ KAPI_ADDR_FAMILY(2, AF_INET6, "AF_INET6", sizeof(struct sockaddr_in6), 28, 28)
+ KAPI_ADDR_FORMAT("struct sockaddr_in6 { sa_family_t sin6_family; __be16 sin6_port; "
+ "__be32 sin6_flowinfo; struct in6_addr sin6_addr; __u32 sin6_scope_id; }")
+ KAPI_ADDR_FEATURES(true, true, false) /* No broadcast in IPv6 */
+ KAPI_ADDR_SPECIAL("in6addr_any (::) - wildcard; "
+ "in6addr_loopback (::1) - loopback; "
+ "ff00::/8 - multicast range; "
+ "fe80::/10 - link-local; "
+ "::ffff:0:0/96 - IPv4-mapped addresses")
+ KAPI_ADDR_PORTS(0, 65535)
+ KAPI_ADDR_FAMILY_END
+
+ KAPI_ADDR_FAMILY(3, AF_NETLINK, "AF_NETLINK", sizeof(struct sockaddr_nl), 12, 12)
+ KAPI_ADDR_FORMAT("struct sockaddr_nl { sa_family_t nl_family; __u16 nl_pad; "
+ "__u32 nl_pid; __u32 nl_groups; }")
+ KAPI_ADDR_FEATURES(false, true, false) /* Multicast via nl_groups */
+ KAPI_ADDR_SPECIAL("nl_pid: 0 = kernel; getpid() = this process; "
+ "nl_groups: bitmask of multicast groups")
+ KAPI_ADDR_PORTS(0, 0) /* Uses nl_pid instead */
+ KAPI_ADDR_FAMILY_END
+
+ KAPI_ADDR_FAMILY(4, AF_PACKET, "AF_PACKET", sizeof(struct sockaddr_ll), 20, 20)
+ KAPI_ADDR_FORMAT("struct sockaddr_ll { sa_family_t sll_family; __be16 sll_protocol; "
+ "int sll_ifindex; __u16 sll_hatype; __u8 sll_pkttype; "
+ "__u8 sll_halen; __u8 sll_addr[8]; }")
+ KAPI_ADDR_FEATURES(true, true, true) /* Via sll_pkttype */
+ KAPI_ADDR_SPECIAL("sll_ifindex: 0 = any interface; "
+ "sll_protocol: ETH_P_ALL = all protocols; "
+ "sll_pkttype: PACKET_HOST/BROADCAST/MULTICAST/OTHERHOST")
+ KAPI_ADDR_PORTS(0, 0) /* Layer 2, no ports */
+ KAPI_ADDR_FAMILY_END
+
+ KAPI_ADDR_FAMILY(5, AF_BLUETOOTH, "AF_BLUETOOTH", sizeof(struct sockaddr), 14, 258)
+ KAPI_ADDR_FORMAT("Varies by protocol: sockaddr_l2 (L2CAP), sockaddr_rc (RFCOMM), "
+ "sockaddr_hci (HCI), sockaddr_sco (SCO)")
+ KAPI_ADDR_FEATURES(false, false, false)
+ KAPI_ADDR_SPECIAL("BDADDR_ANY (00:00:00:00:00:00) - any device; "
+ "BDADDR_LOCAL (00:00:00:ff:ff:ff) - local adapter")
+ KAPI_ADDR_PORTS(1, 30) /* PSM for L2CAP, channel for RFCOMM */
+ KAPI_ADDR_FAMILY_END
+
+ KAPI_ADDR_FAMILY_COUNT(6)
+
+ /* Security specification - use existing capability mechanism */
+ KAPI_CAPABILITY(0, CAP_NET_RAW, "CAP_NET_RAW", KAPI_CAP_GRANT_PERMISSION)
+ KAPI_CAP_CONDITION("family == AF_PACKET || type == SOCK_RAW")
+ KAPI_CAP_ALLOWS("Raw socket creation and packet injection")
+ KAPI_CAP_WITHOUT("Permission denied (EPERM)")
+ KAPI_CAPABILITY_END
+
+ KAPI_CAPABILITY_COUNT(1)
+
+ /* Operation characteristics */
+ .is_connection_oriented = false,
+ .is_message_oriented = false,
+ .supports_oob_data = false,
+ .supports_peek = false,
+ .supports_select_poll = false,
+ .is_reentrant = true,
+
+ /* Semantic descriptions */
+ KAPI_NET_DATA_TRANSFER("Not applicable - socket() only creates the endpoint")
+
+ /* Additional constraints and validation rules */
+ KAPI_CONSTRAINT(0, "Protocol/Type Compatibility",
+ "Not all (family, type, protocol) combinations are valid. "
+ "Common valid combinations: "
+ "(AF_INET, SOCK_STREAM, IPPROTO_TCP); "
+ "(AF_INET, SOCK_DGRAM, IPPROTO_UDP); "
+ "(AF_INET, SOCK_RAW, IPPROTO_ICMP); "
+ "(AF_UNIX, SOCK_STREAM, 0); "
+ "(AF_UNIX, SOCK_DGRAM, 0); "
+ "(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); "
+ "(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(1, "Module Loading",
+ "If protocol family module not loaded, socket() may block during "
+ "request_module() execution. This is interruptible and may take "
+ "significant time. Modules loaded: net-pf-N where N is family number.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(2, "Capability Requirements",
+ "CAP_NET_RAW required for: "
+ "- AF_INET/AF_INET6 with SOCK_RAW "
+ "- AF_PACKET with any socket type "
+ "- Some AF_NETLINK operations require CAP_NET_ADMIN "
+ "- AF_BLUETOOTH may require CAP_NET_ADMIN for some operations")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(3, "Network Namespace",
+ "Socket is created in current->nsproxy->net_ns network namespace. "
+ "Socket is bound to this namespace for its lifetime. "
+ "Different namespaces have independent network stacks.")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT(4, "Memory Limits",
+ "Socket creation respects: "
+ "- RLIMIT_NOFILE for file descriptor limits "
+ "- Memory cgroup limits for socket memory "
+ "- System-wide socket memory limits (net.core.somaxconn, etc.) "
+ "- Per-protocol memory limits")
+ KAPI_CONSTRAINT_END
+
+ KAPI_CONSTRAINT_COUNT(5)
+
+KAPI_END_SPEC;
+
SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
{
return __sys_socket(family, type, protocol);
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (20 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 21/22] net/socket: add API specification for socket() Sasha Levin
@ 2025-06-24 18:07 ` Sasha Levin
2025-07-01 2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
22 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-06-24 18:07 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
The kapi tool extracts and displays kernel API specifications.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Documentation/admin-guide/kernel-api-spec.rst | 198 +-
tools/kapi/.gitignore | 4 +
tools/kapi/Cargo.toml | 19 +
| 415 +++++
| 411 +++++
| 1625 +++++++++++++++++
| 283 +++
| 989 ++++++++++
tools/kapi/src/formatter/json.rs | 420 +++++
tools/kapi/src/formatter/mod.rs | 130 ++
tools/kapi/src/formatter/plain.rs | 465 +++++
tools/kapi/src/formatter/rst.rs | 468 +++++
tools/kapi/src/formatter/shall.rs | 605 ++++++
tools/kapi/src/main.rs | 130 ++
14 files changed, 6159 insertions(+), 3 deletions(-)
create mode 100644 tools/kapi/.gitignore
create mode 100644 tools/kapi/Cargo.toml
create mode 100644 tools/kapi/src/extractor/debugfs.rs
create mode 100644 tools/kapi/src/extractor/mod.rs
create mode 100644 tools/kapi/src/extractor/source_parser.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
create mode 100644 tools/kapi/src/formatter/json.rs
create mode 100644 tools/kapi/src/formatter/mod.rs
create mode 100644 tools/kapi/src/formatter/plain.rs
create mode 100644 tools/kapi/src/formatter/rst.rs
create mode 100644 tools/kapi/src/formatter/shall.rs
create mode 100644 tools/kapi/src/main.rs
diff --git a/Documentation/admin-guide/kernel-api-spec.rst b/Documentation/admin-guide/kernel-api-spec.rst
index 3a63f6711e27b..9b452753111ad 100644
--- a/Documentation/admin-guide/kernel-api-spec.rst
+++ b/Documentation/admin-guide/kernel-api-spec.rst
@@ -31,7 +31,9 @@ The framework aims to:
common programming errors during development and testing.
3. **Support Tooling**: Export API specifications in machine-readable formats for
- use by static analyzers, documentation generators, and development tools.
+ use by static analyzers, documentation generators, and development tools. The
+ ``kapi`` tool (see `The kapi Tool`_) provides comprehensive extraction and
+ formatting capabilities.
4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
for debugging and introspection.
@@ -71,6 +73,13 @@ The framework consists of several key components:
- Type-safe parameter specifications
- Context and constraint definitions
+5. **kapi Tool** (``tools/kapi/``)
+
+ - Userspace utility for extracting specifications
+ - Multiple input sources (source, binary, debugfs)
+ - Multiple output formats (plain, JSON, RST)
+ - Testing and validation utilities
+
Data Model
----------
@@ -344,8 +353,177 @@ Documentation Generation
------------------------
The framework exports specifications via debugfs that can be used
-to generate documentation. Tools for automatic documentation generation
-from specifications are planned for future development.
+to generate documentation. The ``kapi`` tool provides comprehensive
+extraction and formatting capabilities for kernel API specifications.
+
+The kapi Tool
+=============
+
+Overview
+--------
+
+The ``kapi`` tool is a userspace utility that extracts and displays kernel API
+specifications from multiple sources. It provides a unified interface to access
+API documentation whether from compiled kernels, source code, or runtime systems.
+
+Installation
+------------
+
+Build the tool from the kernel source tree::
+
+ $ cd tools/kapi
+ $ cargo build --release
+
+ # Optional: Install system-wide
+ $ cargo install --path .
+
+The tool requires Rust and Cargo to build. The binary will be available at
+``tools/kapi/target/release/kapi``.
+
+Command-Line Usage
+------------------
+
+Basic syntax::
+
+ kapi [OPTIONS] [API_NAME]
+
+Options:
+
+- ``--vmlinux <PATH>``: Extract from compiled kernel binary
+- ``--source <PATH>``: Extract from kernel source code
+- ``--debugfs <PATH>``: Extract from debugfs (default: /sys/kernel/debug)
+- ``-f, --format <FORMAT>``: Output format (plain, json, rst)
+- ``-h, --help``: Display help information
+- ``-V, --version``: Display version information
+
+Input Modes
+-----------
+
+**1. Source Code Mode**
+
+Extract specifications directly from kernel source::
+
+ # Scan entire kernel source tree
+ $ kapi --source /path/to/linux
+
+ # Extract from specific file
+ $ kapi --source kernel/sched/core.c
+
+ # Get details for specific API
+ $ kapi --source /path/to/linux sys_sched_yield
+
+**2. Vmlinux Mode**
+
+Extract from compiled kernel with debug symbols::
+
+ # List all APIs in vmlinux
+ $ kapi --vmlinux /boot/vmlinux-5.15.0
+
+ # Get specific syscall details
+ $ kapi --vmlinux ./vmlinux sys_read
+
+**3. Debugfs Mode**
+
+Extract from running kernel via debugfs::
+
+ # Use default debugfs path
+ $ kapi
+
+ # Use custom debugfs mount
+ $ kapi --debugfs /mnt/debugfs
+
+ # Get specific API from running kernel
+ $ kapi sys_write
+
+Output Formats
+--------------
+
+**Plain Text Format** (default)::
+
+ $ kapi sys_read
+
+ Detailed information for sys_read:
+ ==================================
+ Description: Read from a file descriptor
+
+ Detailed Description:
+ Reads up to count bytes from file descriptor fd into the buffer starting at buf.
+
+ Execution Context:
+ - KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+
+ Parameters (3):
+
+ Available since: 1.0
+
+**JSON Format**::
+
+ $ kapi --format json sys_read
+ {
+ "api_details": {
+ "name": "sys_read",
+ "description": "Read from a file descriptor",
+ "long_description": "Reads up to count bytes...",
+ "context_flags": ["KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE"],
+ "since_version": "1.0"
+ }
+ }
+
+**ReStructuredText Format**::
+
+ $ kapi --format rst sys_read
+
+ sys_read
+ ========
+
+ **Read from a file descriptor**
+
+ Reads up to count bytes from file descriptor fd into the buffer...
+
+Usage Examples
+--------------
+
+**Generate complete API documentation**::
+
+ # Export all kernel APIs to JSON
+ $ kapi --source /path/to/linux --format json > kernel-apis.json
+
+ # Generate RST documentation for all syscalls
+ $ kapi --vmlinux ./vmlinux --format rst > syscalls.rst
+
+ # List APIs from specific subsystem
+ $ kapi --source drivers/gpu/drm/
+
+**Integration with other tools**::
+
+ # Find all APIs that can sleep
+ $ kapi --format json | jq '.apis[] | select(.context_flags[] | contains("SLEEPABLE"))'
+
+ # Generate markdown documentation
+ $ kapi --format rst sys_mmap | pandoc -f rst -t markdown
+
+**Debugging and analysis**::
+
+ # Compare API between kernel versions
+ $ diff <(kapi --vmlinux vmlinux-5.10) <(kapi --vmlinux vmlinux-5.15)
+
+ # Check if specific API exists
+ $ kapi --source . my_custom_api || echo "API not found"
+
+Implementation Details
+----------------------
+
+The tool extracts API specifications from three sources:
+
+1. **Source Code**: Parses KAPI specification macros using regular expressions
+2. **Vmlinux**: Reads the ``.kapi_specs`` ELF section from compiled kernels
+3. **Debugfs**: Reads from ``/sys/kernel/debug/kapi/`` filesystem interface
+
+The tool supports all KAPI specification types:
+
+- System calls (``DEFINE_KERNEL_API_SPEC``)
+- IOCTLs (``DEFINE_IOCTL_API_SPEC``)
+- Kernel functions (``KAPI_DEFINE_SPEC``)
IDE Integration
---------------
@@ -357,6 +535,11 @@ Modern IDEs can use the JSON export for:
- Context validation
- Error code documentation
+Example IDE integration::
+
+ # Generate IDE completion data
+ $ kapi --format json > .vscode/kernel-apis.json
+
Testing Framework
-----------------
@@ -367,6 +550,15 @@ The framework includes test helpers::
kapi_test_api("kmalloc", test_cases);
#endif
+The kapi tool can verify specifications against implementations::
+
+ # Run consistency tests
+ $ cd tools/kapi
+ $ ./test_consistency.sh
+
+ # Compare source vs binary specifications
+ $ ./compare_all_syscalls.sh
+
Best Practices
==============
diff --git a/tools/kapi/.gitignore b/tools/kapi/.gitignore
new file mode 100644
index 0000000000000..1390bfc12686c
--- /dev/null
+++ b/tools/kapi/.gitignore
@@ -0,0 +1,4 @@
+# Rust build artifacts
+/target/
+**/*.rs.bk
+
diff --git a/tools/kapi/Cargo.toml b/tools/kapi/Cargo.toml
new file mode 100644
index 0000000000000..4e6bcb10d132f
--- /dev/null
+++ b/tools/kapi/Cargo.toml
@@ -0,0 +1,19 @@
+[package]
+name = "kapi"
+version = "0.1.0"
+edition = "2024"
+authors = ["Sasha Levin <sashal@kernel.org>"]
+description = "Tool for extracting and displaying kernel API specifications"
+license = "GPL-2.0"
+
+[dependencies]
+goblin = "0.10"
+clap = { version = "4.4", features = ["derive"] }
+anyhow = "1.0"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+regex = "1.10"
+walkdir = "2.4"
+
+[dev-dependencies]
+tempfile = "3.8"
--git a/tools/kapi/src/extractor/debugfs.rs b/tools/kapi/src/extractor/debugfs.rs
new file mode 100644
index 0000000000000..a7e12052b96bf
--- /dev/null
+++ b/tools/kapi/src/extractor/debugfs.rs
@@ -0,0 +1,415 @@
+use anyhow::{Context, Result, bail};
+use std::fs;
+use std::io::Write;
+use std::path::PathBuf;
+use crate::formatter::OutputFormatter;
+use serde::Deserialize;
+
+use super::{ApiExtractor, ApiSpec, CapabilitySpec, display_api_spec};
+
+#[derive(Deserialize)]
+struct KernelApiJson {
+ name: String,
+ api_type: Option<String>,
+ version: Option<u32>,
+ description: Option<String>,
+ long_description: Option<String>,
+ context_flags: Option<u32>,
+ since_version: Option<String>,
+ examples: Option<String>,
+ notes: Option<String>,
+ capabilities: Option<Vec<KernelCapabilityJson>>,
+}
+
+#[derive(Deserialize)]
+struct KernelCapabilityJson {
+ capability: i32,
+ name: String,
+ action: String,
+ allows: String,
+ without_cap: String,
+ check_condition: Option<String>,
+ priority: Option<u8>,
+ alternatives: Option<Vec<i32>>,
+}
+
+/// Extractor for kernel API specifications from debugfs
+pub struct DebugfsExtractor {
+ debugfs_path: PathBuf,
+}
+
+impl DebugfsExtractor {
+ /// Create a new debugfs extractor with the specified debugfs path
+ pub fn new(debugfs_path: Option<String>) -> Result<Self> {
+ let path = match debugfs_path {
+ Some(p) => PathBuf::from(p),
+ None => PathBuf::from("/sys/kernel/debug"),
+ };
+
+ // Check if the debugfs path exists
+ if !path.exists() {
+ bail!("Debugfs path does not exist: {}", path.display());
+ }
+
+ // Check if kapi directory exists
+ let kapi_path = path.join("kapi");
+ if !kapi_path.exists() {
+ bail!("Kernel API debugfs interface not found at: {}", kapi_path.display());
+ }
+
+ Ok(Self {
+ debugfs_path: path,
+ })
+ }
+
+ /// Parse the list file to get all available API names
+ fn parse_list_file(&self) -> Result<Vec<String>> {
+ let list_path = self.debugfs_path.join("kapi/list");
+ let content = fs::read_to_string(&list_path)
+ .with_context(|| format!("Failed to read {}", list_path.display()))?;
+
+ let mut apis = Vec::new();
+ let mut in_list = false;
+
+ for line in content.lines() {
+ if line.contains("===") {
+ in_list = true;
+ continue;
+ }
+
+ if in_list && line.starts_with("Total:") {
+ break;
+ }
+
+ if in_list && !line.trim().is_empty() {
+ // Extract API name from lines like "sys_read - Read from a file descriptor"
+ if let Some(name) = line.split(" - ").next() {
+ apis.push(name.trim().to_string());
+ }
+ }
+ }
+
+ Ok(apis)
+ }
+
+ /// Try to parse JSON content, convert context flags from u32 to string representations
+ fn parse_context_flags(flags: u32) -> Vec<String> {
+ let mut result = Vec::new();
+
+ // These values should match KAPI_CTX_* flags from kernel
+ if flags & (1 << 0) != 0 { result.push("PROCESS".to_string()); }
+ if flags & (1 << 1) != 0 { result.push("SOFTIRQ".to_string()); }
+ if flags & (1 << 2) != 0 { result.push("HARDIRQ".to_string()); }
+ if flags & (1 << 3) != 0 { result.push("NMI".to_string()); }
+ if flags & (1 << 4) != 0 { result.push("ATOMIC".to_string()); }
+ if flags & (1 << 5) != 0 { result.push("SLEEPABLE".to_string()); }
+ if flags & (1 << 6) != 0 { result.push("PREEMPT_DISABLED".to_string()); }
+ if flags & (1 << 7) != 0 { result.push("IRQ_DISABLED".to_string()); }
+
+ result
+ }
+
+ /// Convert capability action from kernel representation
+ fn parse_capability_action(action: &str) -> String {
+ match action {
+ "bypass_check" => "Bypasses check".to_string(),
+ "increase_limit" => "Increases limit".to_string(),
+ "override_restriction" => "Overrides restriction".to_string(),
+ "grant_permission" => "Grants permission".to_string(),
+ "modify_behavior" => "Modifies behavior".to_string(),
+ "access_resource" => "Allows resource access".to_string(),
+ "perform_operation" => "Allows operation".to_string(),
+ _ => action.to_string(),
+ }
+ }
+
+ /// Try to parse as JSON first
+ fn try_parse_json(&self, content: &str) -> Option<ApiSpec> {
+ let json_data: KernelApiJson = serde_json::from_str(content).ok()?;
+
+ let mut spec = ApiSpec {
+ name: json_data.name,
+ api_type: json_data.api_type.unwrap_or_else(|| "unknown".to_string()),
+ description: json_data.description,
+ long_description: json_data.long_description,
+ version: json_data.version.map(|v| v.to_string()),
+ context_flags: json_data.context_flags.map_or_else(Vec::new, Self::parse_context_flags),
+ param_count: None,
+ error_count: None,
+ examples: json_data.examples,
+ notes: json_data.notes,
+ since_version: json_data.since_version,
+ subsystem: None, // Not in current JSON format
+ sysfs_path: None, // Not in current JSON format
+ permissions: None, // Not in current JSON format
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ };
+
+ // Convert capabilities
+ if let Some(caps) = json_data.capabilities {
+ for cap in caps {
+ spec.capabilities.push(CapabilitySpec {
+ capability: cap.capability,
+ name: cap.name,
+ action: Self::parse_capability_action(&cap.action),
+ allows: cap.allows,
+ without_cap: cap.without_cap,
+ check_condition: cap.check_condition,
+ priority: cap.priority,
+ alternatives: cap.alternatives.unwrap_or_default(),
+ });
+ }
+ }
+
+ Some(spec)
+ }
+
+ /// Parse a single API specification file
+ fn parse_spec_file(&self, api_name: &str) -> Result<ApiSpec> {
+ let spec_path = self.debugfs_path.join(format!("kapi/specs/{}", api_name));
+ let content = fs::read_to_string(&spec_path)
+ .with_context(|| format!("Failed to read {}", spec_path.display()))?;
+
+ // Try JSON parsing first
+ if let Some(spec) = self.try_parse_json(&content) {
+ return Ok(spec);
+ }
+
+ // Fall back to plain text parsing
+ let mut spec = ApiSpec {
+ name: api_name.to_string(),
+ api_type: "unknown".to_string(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: Vec::new(),
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ };
+
+ // Parse the content
+ let mut collecting_multiline = false;
+ let mut multiline_buffer = String::new();
+ let mut multiline_field = "";
+ let mut parsing_capability = false;
+ let mut current_capability: Option<CapabilitySpec> = None;
+
+ for line in content.lines() {
+ // Handle capability sections
+ if line.starts_with("Capabilities (") {
+ continue; // Skip the header
+ }
+ if line.starts_with(" ") && line.contains(" (") && line.ends_with("):") {
+ // Start of a capability entry like " CAP_IPC_LOCK (14):"
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+
+ let parts: Vec<&str> = line.trim().split(" (").collect();
+ if parts.len() == 2 {
+ let cap_name = parts[0].to_string();
+ let cap_id = parts[1].trim_end_matches("):").parse().unwrap_or(0);
+ current_capability = Some(CapabilitySpec {
+ capability: cap_id,
+ name: cap_name,
+ action: String::new(),
+ allows: String::new(),
+ without_cap: String::new(),
+ check_condition: None,
+ priority: None,
+ alternatives: Vec::new(),
+ });
+ parsing_capability = true;
+ }
+ continue;
+ }
+ if parsing_capability && line.starts_with(" ") {
+ // Parse capability fields
+ if let Some(ref mut cap) = current_capability {
+ if let Some(action) = line.strip_prefix(" Action: ") {
+ cap.action = action.to_string();
+ } else if let Some(allows) = line.strip_prefix(" Allows: ") {
+ cap.allows = allows.to_string();
+ } else if let Some(without) = line.strip_prefix(" Without: ") {
+ cap.without_cap = without.to_string();
+ } else if let Some(cond) = line.strip_prefix(" Condition: ") {
+ cap.check_condition = Some(cond.to_string());
+ } else if let Some(prio) = line.strip_prefix(" Priority: ") {
+ cap.priority = prio.parse().ok();
+ } else if let Some(alts) = line.strip_prefix(" Alternatives: ") {
+ cap.alternatives = alts.split(", ")
+ .filter_map(|s| s.parse().ok())
+ .collect();
+ }
+ }
+ continue;
+ }
+ if parsing_capability && !line.starts_with(" ") {
+ // End of capabilities section
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+ parsing_capability = false;
+ }
+
+ // Handle section headers
+ if line.starts_with("Parameters (") {
+ if let Some(count_str) = line.strip_prefix("Parameters (").and_then(|s| s.strip_suffix("):")) {
+ spec.param_count = count_str.parse().ok();
+ }
+ continue;
+ } else if line.starts_with("Errors (") {
+ if let Some(count_str) = line.strip_prefix("Errors (").and_then(|s| s.strip_suffix("):")) {
+ spec.error_count = count_str.parse().ok();
+ }
+ continue;
+ } else if line.starts_with("Examples:") {
+ collecting_multiline = true;
+ multiline_field = "examples";
+ multiline_buffer.clear();
+ continue;
+ } else if line.starts_with("Notes:") {
+ collecting_multiline = true;
+ multiline_field = "notes";
+ multiline_buffer.clear();
+ continue;
+ }
+
+ // Handle multiline sections
+ if collecting_multiline {
+ if line.trim().is_empty() && multiline_buffer.ends_with("\n\n") {
+ collecting_multiline = false;
+ match multiline_field {
+ "examples" => spec.examples = Some(multiline_buffer.trim().to_string()),
+ "notes" => spec.notes = Some(multiline_buffer.trim().to_string()),
+ _ => {}
+ }
+ multiline_buffer.clear();
+ } else {
+ if !multiline_buffer.is_empty() {
+ multiline_buffer.push('\n');
+ }
+ multiline_buffer.push_str(line);
+ }
+ continue;
+ }
+
+ // Parse regular fields
+ if let Some(desc) = line.strip_prefix("Description: ") {
+ spec.description = Some(desc.to_string());
+ } else if let Some(long_desc) = line.strip_prefix("Long description: ") {
+ spec.long_description = Some(long_desc.to_string());
+ } else if let Some(version) = line.strip_prefix("Version: ") {
+ spec.version = Some(version.to_string());
+ } else if let Some(since) = line.strip_prefix("Since: ") {
+ spec.since_version = Some(since.to_string());
+ } else if let Some(flags) = line.strip_prefix("Context flags: ") {
+ spec.context_flags = flags.split_whitespace()
+ .map(str::to_string)
+ .collect();
+ } else if let Some(subsys) = line.strip_prefix("Subsystem: ") {
+ spec.subsystem = Some(subsys.to_string());
+ } else if let Some(path) = line.strip_prefix("Sysfs Path: ") {
+ spec.sysfs_path = Some(path.to_string());
+ } else if let Some(perms) = line.strip_prefix("Permissions: ") {
+ spec.permissions = Some(perms.to_string());
+ }
+ }
+
+ // Handle any remaining capability
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+
+ // Determine API type based on name
+ if api_name.starts_with("sys_") {
+ spec.api_type = "syscall".to_string();
+ } else if api_name.contains("_ioctl") || api_name.starts_with("ioctl_") {
+ spec.api_type = "ioctl".to_string();
+ } else if api_name.contains("sysfs") || api_name.ends_with("_show") || api_name.ends_with("_store") {
+ spec.api_type = "sysfs".to_string();
+ } else {
+ spec.api_type = "function".to_string();
+ }
+
+ Ok(spec)
+ }
+}
+
+impl ApiExtractor for DebugfsExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ let api_names = self.parse_list_file()?;
+ let mut specs = Vec::new();
+
+ for name in api_names {
+ match self.parse_spec_file(&name) {
+ Ok(spec) => specs.push(spec),
+ Err(_e) => {}, // Silently skip files that fail to parse
+ }
+ }
+
+ Ok(specs)
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ let api_names = self.parse_list_file()?;
+
+ if api_names.contains(&name.to_string()) {
+ Ok(Some(self.parse_spec_file(name)?))
+ } else {
+ Ok(None)
+ }
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.extract_by_name(api_name)? {
+ display_api_spec(&spec, formatter, writer)?;
+ } else {
+ writeln!(writer, "API '{api_name}' not found in debugfs")?;
+ }
+
+ Ok(())
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/mod.rs b/tools/kapi/src/extractor/mod.rs
new file mode 100644
index 0000000000000..644eb7cf64fd9
--- /dev/null
+++ b/tools/kapi/src/extractor/mod.rs
@@ -0,0 +1,411 @@
+use anyhow::Result;
+use std::io::Write;
+use std::convert::TryInto;
+use crate::formatter::OutputFormatter;
+
+pub mod vmlinux;
+pub mod source_parser;
+pub mod debugfs;
+
+pub use vmlinux::VmlinuxExtractor;
+pub use source_parser::SourceExtractor;
+pub use debugfs::DebugfsExtractor;
+
+/// Socket state specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SocketStateSpec {
+ pub required_states: Vec<String>,
+ pub forbidden_states: Vec<String>,
+ pub resulting_state: Option<String>,
+ pub condition: Option<String>,
+ pub applicable_protocols: Option<String>,
+}
+
+/// Protocol behavior specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ProtocolBehaviorSpec {
+ pub applicable_protocols: String,
+ pub behavior: String,
+ pub protocol_flags: Option<String>,
+ pub flag_description: Option<String>,
+}
+
+/// Address family specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AddrFamilySpec {
+ pub family: i32,
+ pub family_name: String,
+ pub addr_struct_size: usize,
+ pub min_addr_len: usize,
+ pub max_addr_len: usize,
+ pub addr_format: Option<String>,
+ pub supports_wildcard: bool,
+ pub supports_multicast: bool,
+ pub supports_broadcast: bool,
+ pub special_addresses: Option<String>,
+ pub port_range_min: u32,
+ pub port_range_max: u32,
+}
+
+/// Buffer specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct BufferSpec {
+ pub buffer_behaviors: Option<String>,
+ pub min_buffer_size: Option<usize>,
+ pub max_buffer_size: Option<usize>,
+ pub optimal_buffer_size: Option<usize>,
+}
+
+/// Async specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AsyncSpec {
+ pub supported_modes: Option<String>,
+ pub nonblock_errno: Option<i32>,
+}
+
+/// Capability specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct CapabilitySpec {
+ pub capability: i32,
+ pub name: String,
+ pub action: String,
+ pub allows: String,
+ pub without_cap: String,
+ pub check_condition: Option<String>,
+ pub priority: Option<u8>,
+ pub alternatives: Vec<i32>,
+}
+
+/// Parameter specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ParamSpec {
+ pub index: u32,
+ pub name: String,
+ pub type_name: String,
+ pub description: String,
+ pub flags: u32,
+ pub param_type: u32,
+ pub constraint_type: u32,
+ pub constraint: Option<String>,
+ pub min_value: Option<i64>,
+ pub max_value: Option<i64>,
+ pub valid_mask: Option<u64>,
+ pub enum_values: Vec<String>,
+ pub size: Option<u32>,
+ pub alignment: Option<u32>,
+}
+
+/// Return value specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ReturnSpec {
+ pub type_name: String,
+ pub description: String,
+ pub return_type: u32,
+ pub check_type: u32,
+ pub success_value: Option<i64>,
+ pub success_min: Option<i64>,
+ pub success_max: Option<i64>,
+ pub error_values: Vec<i32>,
+}
+
+/// Error specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ErrorSpec {
+ pub error_code: i32,
+ pub name: String,
+ pub condition: String,
+ pub description: String,
+}
+
+/// Signal specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalSpec {
+ pub signal_num: i32,
+ pub signal_name: String,
+ pub direction: u32,
+ pub action: u32,
+ pub target: Option<String>,
+ pub condition: Option<String>,
+ pub description: Option<String>,
+ pub timing: u32,
+ pub priority: u32,
+ pub restartable: bool,
+ pub interruptible: bool,
+ pub queue: Option<String>,
+ pub sa_flags: u32,
+ pub sa_flags_required: u32,
+ pub sa_flags_forbidden: u32,
+ pub state_required: u32,
+ pub state_forbidden: u32,
+ pub error_on_signal: Option<i32>,
+}
+
+/// Signal mask specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalMaskSpec {
+ pub name: String,
+ pub description: String,
+}
+
+/// Side effect specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SideEffectSpec {
+ pub effect_type: u32,
+ pub target: String,
+ pub condition: Option<String>,
+ pub description: String,
+ pub reversible: bool,
+}
+
+/// State transition specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StateTransitionSpec {
+ pub object: String,
+ pub from_state: String,
+ pub to_state: String,
+ pub condition: Option<String>,
+ pub description: String,
+}
+
+/// Constraint specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ConstraintSpec {
+ pub name: String,
+ pub description: String,
+ pub expression: Option<String>,
+}
+
+/// Lock specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct LockSpec {
+ pub lock_name: String,
+ pub lock_type: u32,
+ pub acquired: bool,
+ pub released: bool,
+ pub held_on_entry: bool,
+ pub held_on_exit: bool,
+ pub description: String,
+}
+
+/// Common API specification information that all extractors should provide
+#[derive(Debug, Clone)]
+pub struct ApiSpec {
+ pub name: String,
+ pub api_type: String,
+ pub description: Option<String>,
+ pub long_description: Option<String>,
+ pub version: Option<String>,
+ pub context_flags: Vec<String>,
+ pub param_count: Option<u32>,
+ pub error_count: Option<u32>,
+ pub examples: Option<String>,
+ pub notes: Option<String>,
+ pub since_version: Option<String>,
+ // Sysfs-specific fields
+ pub subsystem: Option<String>,
+ pub sysfs_path: Option<String>,
+ pub permissions: Option<String>,
+ // Networking-specific fields
+ pub socket_state: Option<SocketStateSpec>,
+ pub protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+ pub addr_families: Vec<AddrFamilySpec>,
+ pub buffer_spec: Option<BufferSpec>,
+ pub async_spec: Option<AsyncSpec>,
+ pub net_data_transfer: Option<String>,
+ pub capabilities: Vec<CapabilitySpec>,
+ pub parameters: Vec<ParamSpec>,
+ pub return_spec: Option<ReturnSpec>,
+ pub errors: Vec<ErrorSpec>,
+ pub signals: Vec<SignalSpec>,
+ pub signal_masks: Vec<SignalMaskSpec>,
+ pub side_effects: Vec<SideEffectSpec>,
+ pub state_transitions: Vec<StateTransitionSpec>,
+ pub constraints: Vec<ConstraintSpec>,
+ pub locks: Vec<LockSpec>,
+}
+
+/// Trait for extracting API specifications from different sources
+pub trait ApiExtractor {
+ /// Extract all API specifications from the source
+ fn extract_all(&self) -> Result<Vec<ApiSpec>>;
+
+ /// Extract a specific API specification by name
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>>;
+
+ /// Display detailed information about a specific API
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()>;
+}
+
+/// Helper function to display an ApiSpec using a formatter
+pub fn display_api_spec(
+ spec: &ApiSpec,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+) -> Result<()> {
+ formatter.begin_api_details(writer, &spec.name)?;
+
+ if let Some(desc) = &spec.description {
+ formatter.description(writer, desc)?;
+ }
+
+ if let Some(long_desc) = &spec.long_description {
+ formatter.long_description(writer, long_desc)?;
+ }
+
+ if let Some(version) = &spec.since_version {
+ formatter.since_version(writer, version)?;
+ }
+
+ if !spec.context_flags.is_empty() {
+ formatter.begin_context_flags(writer)?;
+ for flag in &spec.context_flags {
+ formatter.context_flag(writer, flag)?;
+ }
+ formatter.end_context_flags(writer)?;
+ }
+
+ if !spec.parameters.is_empty() {
+ formatter.begin_parameters(writer, spec.parameters.len().try_into().unwrap_or(u32::MAX))?;
+ for param in &spec.parameters {
+ formatter.parameter(writer, param)?;
+ }
+ formatter.end_parameters(writer)?;
+ }
+
+ if let Some(ret) = &spec.return_spec {
+ formatter.return_spec(writer, ret)?;
+ }
+
+ if !spec.errors.is_empty() {
+ formatter.begin_errors(writer, spec.errors.len().try_into().unwrap_or(u32::MAX))?;
+ for error in &spec.errors {
+ formatter.error(writer, error)?;
+ }
+ formatter.end_errors(writer)?;
+ }
+
+ if let Some(notes) = &spec.notes {
+ formatter.notes(writer, notes)?;
+ }
+
+ if let Some(examples) = &spec.examples {
+ formatter.examples(writer, examples)?;
+ }
+
+ // Display sysfs-specific fields
+ if spec.api_type == "sysfs" {
+ if let Some(subsystem) = &spec.subsystem {
+ formatter.sysfs_subsystem(writer, subsystem)?;
+ }
+ if let Some(path) = &spec.sysfs_path {
+ formatter.sysfs_path(writer, path)?;
+ }
+ if let Some(perms) = &spec.permissions {
+ formatter.sysfs_permissions(writer, perms)?;
+ }
+ }
+
+ // Display networking-specific fields
+ if let Some(socket_state) = &spec.socket_state {
+ formatter.socket_state(writer, socket_state)?;
+ }
+
+ if !spec.protocol_behaviors.is_empty() {
+ formatter.begin_protocol_behaviors(writer)?;
+ for behavior in &spec.protocol_behaviors {
+ formatter.protocol_behavior(writer, behavior)?;
+ }
+ formatter.end_protocol_behaviors(writer)?;
+ }
+
+ if !spec.addr_families.is_empty() {
+ formatter.begin_addr_families(writer)?;
+ for family in &spec.addr_families {
+ formatter.addr_family(writer, family)?;
+ }
+ formatter.end_addr_families(writer)?;
+ }
+
+ if let Some(buffer_spec) = &spec.buffer_spec {
+ formatter.buffer_spec(writer, buffer_spec)?;
+ }
+
+ if let Some(async_spec) = &spec.async_spec {
+ formatter.async_spec(writer, async_spec)?;
+ }
+
+ if let Some(net_data_transfer) = &spec.net_data_transfer {
+ formatter.net_data_transfer(writer, net_data_transfer)?;
+ }
+
+ if !spec.capabilities.is_empty() {
+ formatter.begin_capabilities(writer)?;
+ for cap in &spec.capabilities {
+ formatter.capability(writer, cap)?;
+ }
+ formatter.end_capabilities(writer)?;
+ }
+
+ // Display signals
+ if !spec.signals.is_empty() {
+ formatter.begin_signals(writer, spec.signals.len().try_into().unwrap_or(u32::MAX))?;
+ for signal in &spec.signals {
+ formatter.signal(writer, signal)?;
+ }
+ formatter.end_signals(writer)?;
+ }
+
+ // Display signal masks
+ if !spec.signal_masks.is_empty() {
+ formatter.begin_signal_masks(writer, spec.signal_masks.len().try_into().unwrap_or(u32::MAX))?;
+ for mask in &spec.signal_masks {
+ formatter.signal_mask(writer, mask)?;
+ }
+ formatter.end_signal_masks(writer)?;
+ }
+
+ // Display side effects
+ if !spec.side_effects.is_empty() {
+ formatter.begin_side_effects(writer, spec.side_effects.len().try_into().unwrap_or(u32::MAX))?;
+ for effect in &spec.side_effects {
+ formatter.side_effect(writer, effect)?;
+ }
+ formatter.end_side_effects(writer)?;
+ }
+
+ // Display state transitions
+ if !spec.state_transitions.is_empty() {
+ formatter.begin_state_transitions(writer, spec.state_transitions.len().try_into().unwrap_or(u32::MAX))?;
+ for trans in &spec.state_transitions {
+ formatter.state_transition(writer, trans)?;
+ }
+ formatter.end_state_transitions(writer)?;
+ }
+
+ // Display constraints
+ if !spec.constraints.is_empty() {
+ formatter.begin_constraints(writer, spec.constraints.len().try_into().unwrap_or(u32::MAX))?;
+ for constraint in &spec.constraints {
+ formatter.constraint(writer, constraint)?;
+ }
+ formatter.end_constraints(writer)?;
+ }
+
+ // Display locks
+ if !spec.locks.is_empty() {
+ formatter.begin_locks(writer, spec.locks.len().try_into().unwrap_or(u32::MAX))?;
+ for lock in &spec.locks {
+ formatter.lock(writer, lock)?;
+ }
+ formatter.end_locks(writer)?;
+ }
+
+ formatter.end_api_details(writer)?;
+
+ Ok(())
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/source_parser.rs b/tools/kapi/src/extractor/source_parser.rs
new file mode 100644
index 0000000000000..bec036a56e40f
--- /dev/null
+++ b/tools/kapi/src/extractor/source_parser.rs
@@ -0,0 +1,1625 @@
+use anyhow::{Context, Result};
+use regex::Regex;
+use std::fs;
+use std::path::Path;
+use std::collections::HashMap;
+use walkdir::WalkDir;
+use std::io::Write;
+use crate::formatter::OutputFormatter;
+use super::{ApiExtractor, ApiSpec, CapabilitySpec, display_api_spec,
+ SocketStateSpec, ProtocolBehaviorSpec, AddrFamilySpec, BufferSpec, AsyncSpec,
+ StateTransitionSpec, SideEffectSpec, ParamSpec, ReturnSpec, ErrorSpec, LockSpec, ConstraintSpec};
+
+#[derive(Debug, Clone)]
+pub struct SourceApiSpec {
+ pub name: String,
+ pub api_type: ApiType,
+ pub parsed_fields: HashMap<String, String>,
+}
+
+#[derive(Debug, Clone, PartialEq)]
+pub enum ApiType {
+ Syscall,
+ Ioctl,
+ Function,
+ Sysfs,
+ Unknown,
+}
+
+impl ApiType {
+ fn from_name(name: &str) -> Self {
+ if name.starts_with("sys_") {
+ ApiType::Syscall
+ } else if name.contains("ioctl") || name.contains("IOCTL") {
+ ApiType::Ioctl
+ } else if name.starts_with("do_") || name.starts_with("__") {
+ ApiType::Function
+ } else {
+ ApiType::Unknown
+ }
+ }
+}
+
+pub struct SourceParser {
+ // Regex patterns for matching KAPI specifications
+ spec_start_pattern: Regex,
+ spec_end_pattern: Regex,
+ ioctl_spec_pattern: Regex,
+ sysfs_spec_pattern: Regex,
+ // Networking-specific patterns
+ socket_state_req_pattern: Regex,
+ socket_state_result_pattern: Regex,
+ socket_state_cond_pattern: Regex,
+ socket_state_protos_pattern: Regex,
+ protocol_behavior_pattern: Regex,
+ protocol_flags_pattern: Regex,
+ addr_family_pattern: Regex,
+ addr_format_pattern: Regex,
+ addr_features_pattern: Regex,
+ addr_special_pattern: Regex,
+ addr_ports_pattern: Regex,
+ buffer_spec_pattern: Regex,
+ async_spec_pattern: Regex,
+ net_data_transfer_pattern: Regex,
+}
+
+impl SourceParser {
+ pub fn new() -> Result<Self> {
+ Ok(SourceParser {
+ // Match DEFINE_KERNEL_API_SPEC(function_name)
+ spec_start_pattern: Regex::new(r"DEFINE_KERNEL_API_SPEC\s*\(\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*\)")?,
+ // Match KAPI_END_SPEC
+ spec_end_pattern: Regex::new(r"KAPI_END_SPEC")?,
+ // Match IOCTL specifications
+ ioctl_spec_pattern: Regex::new(r#"DEFINE_IOCTL_API_SPEC\s*\(\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*,\s*([^,]+)\s*,\s*"([^"]+)"\s*\)"#)?,
+ // Match SYSFS specifications
+ sysfs_spec_pattern: Regex::new(r"DEFINE_SYSFS_API_SPEC\s*\(\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*\)")?,
+ // Networking-specific patterns
+ socket_state_req_pattern: Regex::new(r"KAPI_SOCKET_STATE_REQ\s*\(\s*([^)]+)\s*\)")?,
+ socket_state_result_pattern: Regex::new(r"KAPI_SOCKET_STATE_RESULT\s*\(\s*([^)]+)\s*\)")?,
+ socket_state_cond_pattern: Regex::new(r#"KAPI_SOCKET_STATE_COND\s*\(\s*"([^"]*)"\s*\)"#)?,
+ socket_state_protos_pattern: Regex::new(r"KAPI_SOCKET_STATE_PROTOS\s*\(\s*([^)]+)\s*\)")?,
+ protocol_behavior_pattern: Regex::new(r#"KAPI_PROTOCOL_BEHAVIOR\s*\(\s*(\d+)\s*,\s*([^,]+)\s*,\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?,
+ protocol_flags_pattern: Regex::new(r#"KAPI_PROTOCOL_FLAGS\s*\(\s*(\d+)\s*,\s*"([^"]*)"\s*\)"#)?,
+ addr_family_pattern: Regex::new(r#"KAPI_ADDR_FAMILY\s*\(\s*(\d+)\s*,\s*([^,]+)\s*,\s*"([^"]+)"\s*,\s*([^,]+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\)"#)?,
+ addr_format_pattern: Regex::new(r#"KAPI_ADDR_FORMAT\s*\(\s*"([^"]*)"\s*\)"#)?,
+ addr_features_pattern: Regex::new(r"KAPI_ADDR_FEATURES\s*\(\s*(true|false)\s*,\s*(true|false)\s*,\s*(true|false)\s*\)")?,
+ addr_special_pattern: Regex::new(r#"KAPI_ADDR_SPECIAL\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?,
+ addr_ports_pattern: Regex::new(r"KAPI_ADDR_PORTS\s*\(\s*(\d+)\s*,\s*(\d+)\s*\)")?,
+ buffer_spec_pattern: Regex::new(r"KAPI_BUFFER_SPEC\s*\(\s*(\d+)\s*\)")?,
+ async_spec_pattern: Regex::new(r"KAPI_ASYNC_SPEC\s*\(\s*([^,]+)\s*,\s*(\d+)\s*\)")?,
+ net_data_transfer_pattern: Regex::new(r#"KAPI_NET_DATA_TRANSFER\s*\(\s*"([^"]*)"\s*\)"#)?,
+ })
+ }
+
+ /// Parse a single source file for KAPI specifications
+ pub fn parse_file(&self, path: &Path) -> Result<Vec<SourceApiSpec>> {
+ let content = fs::read_to_string(path)
+ .with_context(|| format!("Failed to read file: {}", path.display()))?;
+
+ self.parse_content(&content, path)
+ }
+
+ /// Parse file content for KAPI specifications
+ pub fn parse_content(&self, content: &str, _file_path: &Path) -> Result<Vec<SourceApiSpec>> {
+ let mut specs = Vec::new();
+ let lines: Vec<&str> = content.lines().collect();
+
+ // First, look for standard KAPI specs
+ for (i, line) in lines.iter().enumerate() {
+ if let Some(captures) = self.spec_start_pattern.captures(line) {
+ let api_name = captures.get(1).unwrap().as_str().to_string();
+
+ // Find the end of this specification
+ if let Some(spec_content) = self.extract_spec_block(&lines, i) {
+ let mut spec = SourceApiSpec {
+ name: api_name.clone(),
+ api_type: ApiType::from_name(&api_name),
+ parsed_fields: HashMap::new(),
+ };
+
+ // Parse the fields
+ self.parse_spec_fields(&spec_content, &mut spec.parsed_fields)?;
+
+ specs.push(spec);
+ }
+ }
+
+ // Also look for IOCTL specs
+ if let Some(captures) = self.ioctl_spec_pattern.captures(line) {
+ let spec_name = captures.get(1).unwrap().as_str().to_string();
+ let cmd = captures.get(2).unwrap().as_str().to_string();
+ let cmd_name = captures.get(3).unwrap().as_str().to_string();
+
+ // Find the end of this IOCTL specification
+ if let Some(spec_content) = self.extract_ioctl_spec_block(&lines, i) {
+ let mut spec = SourceApiSpec {
+ name: spec_name,
+ api_type: ApiType::Ioctl,
+ parsed_fields: HashMap::new(),
+ };
+
+ // Add IOCTL-specific fields
+ spec.parsed_fields.insert("cmd".to_string(), cmd);
+ spec.parsed_fields.insert("cmd_name".to_string(), cmd_name);
+
+ // Parse other fields
+ self.parse_spec_fields(&spec_content, &mut spec.parsed_fields)?;
+
+ specs.push(spec);
+ }
+ }
+
+ // Also look for SYSFS specs
+ if let Some(captures) = self.sysfs_spec_pattern.captures(line) {
+ let attr_name = captures.get(1).unwrap().as_str().to_string();
+
+ // Find the end of this specification
+ if let Some(spec_content) = self.extract_spec_block(&lines, i) {
+ let mut spec = SourceApiSpec {
+ name: attr_name,
+ api_type: ApiType::Sysfs,
+ parsed_fields: HashMap::new(),
+ };
+
+ // Parse the fields
+ self.parse_spec_fields(&spec_content, &mut spec.parsed_fields)?;
+
+ specs.push(spec);
+ }
+ }
+ }
+
+ Ok(specs)
+ }
+
+ /// Extract a complete KAPI specification block from the source
+ fn extract_spec_block(&self, lines: &[&str], start_idx: usize) -> Option<String> {
+ let mut spec_lines = Vec::new();
+
+ for (_i, line) in lines.iter().enumerate().skip(start_idx) {
+ spec_lines.push((*line).to_string());
+
+ // Check for end of spec
+ if self.spec_end_pattern.is_match(line) {
+ return Some(spec_lines.join("\n"));
+ }
+ }
+
+ None
+ }
+
+ /// Extract a complete IOCTL specification block
+ fn extract_ioctl_spec_block(&self, lines: &[&str], start_idx: usize) -> Option<String> {
+ let mut spec_lines = Vec::new();
+ let mut brace_count = 0;
+
+ for (i, line) in lines.iter().enumerate().skip(start_idx) {
+ spec_lines.push((*line).to_string());
+
+ // Count braces
+ for ch in line.chars() {
+ match ch {
+ '{' => brace_count += 1,
+ '}' => brace_count -= 1,
+ _ => {}
+ }
+ }
+
+ // Check for end patterns
+ if line.contains("KAPI_END_IOCTL_SPEC") || line.contains("KAPI_IOCTL_END_SPEC") {
+ return Some(spec_lines.join("\n"));
+ }
+
+ // Alternative end: closing brace with semicolon at top level
+ if brace_count == 0 && line.contains("};") && i > start_idx {
+ return Some(spec_lines.join("\n"));
+ }
+ }
+
+ None
+ }
+
+ /// Parse individual KAPI fields from the specification
+ fn parse_spec_fields(&self, content: &str, fields: &mut HashMap<String, String>) -> Result<()> {
+ // Parse KAPI_DESCRIPTION
+ if let Some(captures) = Regex::new(r#"KAPI_DESCRIPTION\s*\(\s*"([^"]*)"\s*\)"#)?.captures(content) {
+ fields.insert("description".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_LONG_DESC (handle multi-line)
+ if let Some(captures) = Regex::new(r#"KAPI_LONG_DESC\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?.captures(content) {
+ let long_desc = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t\"", " ");
+ fields.insert("long_description".to_string(), long_desc);
+ }
+
+ // Parse KAPI_CONTEXT
+ if let Some(captures) = Regex::new(r"KAPI_CONTEXT\s*\(([^)]+)\)")?.captures(content) {
+ fields.insert("context".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_NOTES (handle multi-line)
+ if let Some(captures) = Regex::new(r#"KAPI_NOTES\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?.captures(content) {
+ let notes = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", "\n")
+ .replace("\"\n\t\t \"", "\n")
+ .replace("\"\n\t\t \"", "\n")
+ .replace("\"\n\t\t\"", "\n")
+ .replace("\\n", "\n")
+ .replace("\\\"", "\"")
+ .trim()
+ .to_string();
+ fields.insert("notes".to_string(), notes);
+ }
+
+ // Parse KAPI_EXAMPLES (handle multi-line)
+ if let Some(captures) = Regex::new(r#"KAPI_EXAMPLES\s*\(\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)"#)?.captures(content) {
+ let examples = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", "")
+ .replace("\"\n\t\t \"", "")
+ .replace("\"\n\t\t \"", "")
+ .replace("\"\n\t\t \"", "")
+ .replace("\"\n\t\t \"", "")
+ .replace("\"\n\t\t \"", "")
+ .replace("\"\n\t\t\"", "")
+ .replace("\\n\\n", "\n\n")
+ .replace("\\n", "\n")
+ .replace("\\\"", "\"")
+ .replace("\\\\", "\\")
+ .trim()
+ .to_string();
+ fields.insert("examples".to_string(), examples);
+ }
+
+ // Parse KAPI_SINCE_VERSION
+ if let Some(captures) = Regex::new(r#"KAPI_SINCE_VERSION\s*\(\s*"([^"]*)"\s*\)"#)?.captures(content) {
+ fields.insert("since_version".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse parameter count
+ let param_regex = Regex::new(r"KAPI_PARAM\s*\(\s*(\d+)\s*,")?;
+ let mut max_param_idx = 0;
+ for captures in param_regex.captures_iter(content) {
+ if let Ok(idx) = captures.get(1).unwrap().as_str().parse::<usize>() {
+ max_param_idx = max_param_idx.max(idx + 1);
+ }
+ }
+ if max_param_idx > 0 {
+ fields.insert("param_count".to_string(), max_param_idx.to_string());
+ }
+
+ // Parse error count
+ let error_regex = Regex::new(r"KAPI_ERROR\s*\(\s*(\d+)\s*,")?;
+ let mut max_error_idx = 0;
+ for captures in error_regex.captures_iter(content) {
+ if let Ok(idx) = captures.get(1).unwrap().as_str().parse::<usize>() {
+ max_error_idx = max_error_idx.max(idx + 1);
+ }
+ }
+ if max_error_idx > 0 {
+ fields.insert("error_count".to_string(), max_error_idx.to_string());
+ }
+
+ // Parse other counts
+ if content.contains(".error_count =") {
+ if let Some(captures) = Regex::new(r"\.error_count\s*=\s*(\d+)")?.captures(content) {
+ fields.insert("error_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ // Parse capability count
+ if let Some(captures) = Regex::new(r"KAPI_CAPABILITY_COUNT\s*\(\s*(\d+)\s*\)")?.captures(content) {
+ fields.insert("capability_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Also check for .capability_count = N
+ if content.contains(".capability_count =") {
+ if let Some(captures) = Regex::new(r"\.capability_count\s*=\s*(\d+)")?.captures(content) {
+ fields.insert("capability_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ // Parse capabilities
+ let cap_regex = Regex::new(r#"KAPI_CAPABILITY\s*\(\s*(\d+)\s*,\s*(\d+)\s*,\s*"([^"]+)"\s*,\s*([A-Z_]+)\s*\)"#)?;
+ let mut capabilities = Vec::new();
+ for captures in cap_regex.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let cap_id = captures.get(2).unwrap().as_str();
+ let cap_name = captures.get(3).unwrap().as_str();
+ let cap_action = captures.get(4).unwrap().as_str();
+
+ // Store capability info - we'll parse the details separately
+ let cap_key = format!("capability_{}", idx);
+ fields.insert(format!("{}_id", cap_key), cap_id.to_string());
+ fields.insert(format!("{}_name", cap_key), cap_name.to_string());
+ fields.insert(format!("{}_action", cap_key), cap_action.to_string());
+ capabilities.push(idx);
+ }
+
+ // Pre-compile capability regex patterns
+ let cap_allows_pattern = Regex::new(r#"KAPI_CAP_ALLOWS\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let cap_without_pattern = Regex::new(r#"KAPI_CAP_WITHOUT\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let cap_condition_pattern = Regex::new(r#"KAPI_CAP_CONDITION\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let cap_priority_pattern = Regex::new(r"KAPI_CAP_PRIORITY\s*\(\s*(\d+)\s*\)")?;
+
+ // Parse capability details for each found capability
+ for idx in capabilities {
+ let cap_key = format!("capability_{}", idx);
+
+ // Find the capability block and parse its fields
+ if let Some(cap_start) = content.find(&format!("KAPI_CAPABILITY({},", idx)) {
+ if let Some(cap_end) = content[cap_start..].find("KAPI_CAPABILITY_END") {
+ let cap_content = &content[cap_start..cap_start + cap_end];
+
+ // Parse KAPI_CAP_ALLOWS
+ if let Some(captures) = cap_allows_pattern.captures(cap_content) {
+ fields.insert(format!("{}_allows", cap_key), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_CAP_WITHOUT
+ if let Some(captures) = cap_without_pattern.captures(cap_content) {
+ fields.insert(format!("{}_without", cap_key), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_CAP_CONDITION
+ if let Some(captures) = cap_condition_pattern.captures(cap_content) {
+ fields.insert(format!("{}_condition", cap_key), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_CAP_PRIORITY
+ if let Some(captures) = cap_priority_pattern.captures(cap_content) {
+ fields.insert(format!("{}_priority", cap_key), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+ }
+ }
+
+ if content.contains(".param_count =") {
+ if let Some(captures) = Regex::new(r"\.param_count\s*=\s*(\d+)")?.captures(content) {
+ fields.insert("param_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ // Parse .since_version
+ if let Some(captures) = Regex::new(r#"\.since_version\s*=\s*"([^"]*)""#)?.captures(content) {
+ fields.insert("since_version".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse .notes (handle multi-line)
+ if let Some(captures) = Regex::new(r#"\.notes\s*=\s*"([^"]*(?:\s*"[^"]*)*?)""#)?.captures(content) {
+ let notes = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t \"", " ")
+ .replace("\"\n\t\t\"", " ")
+ .replace("\"\n\t \"", " ") // Handle single tab + space
+ .trim()
+ .to_string();
+ fields.insert("notes".to_string(), notes);
+ }
+
+ // Parse .examples (handle multi-line)
+ if let Some(captures) = Regex::new(r#"\.examples\s*=\s*"([^"]*(?:\s*"[^"]*)*?)""#)?.captures(content) {
+ let examples = captures.get(1).unwrap().as_str()
+ .replace("\\n\"\n\t\t \"", "\n")
+ .replace("\\n", "\n");
+ fields.insert("examples".to_string(), examples);
+ }
+
+ // Parse sysfs-specific fields
+ // Parse KAPI_SUBSYSTEM
+ if let Some(captures) = Regex::new(r#"KAPI_SUBSYSTEM\s*\(\s*"([^"]*)"\s*\)"#)?.captures(content) {
+ fields.insert("subsystem".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse .subsystem =
+ if let Some(captures) = Regex::new(r#"\.subsystem\s*=\s*"([^"]*)""#)?.captures(content) {
+ fields.insert("subsystem".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_PATH (for sysfs path)
+ if let Some(captures) = Regex::new(r#"KAPI_PATH\s*\(\s*"([^"]*)"\s*\)"#)?.captures(content) {
+ fields.insert("sysfs_path".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_PERMISSIONS
+ if let Some(captures) = Regex::new(r"KAPI_PERMISSIONS\s*\(\s*(\d+)\s*\)")?.captures(content) {
+ fields.insert("permissions".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse networking-specific fields
+
+ // Parse socket state fields
+ if let Some(captures) = self.socket_state_req_pattern.captures(content) {
+ fields.insert("socket_state_req".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ if let Some(captures) = self.socket_state_result_pattern.captures(content) {
+ fields.insert("socket_state_result".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ if let Some(captures) = self.socket_state_cond_pattern.captures(content) {
+ fields.insert("socket_state_cond".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ if let Some(captures) = self.socket_state_protos_pattern.captures(content) {
+ fields.insert("socket_state_protos".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse protocol behaviors
+ let mut protocol_behaviors = Vec::new();
+ for captures in self.protocol_behavior_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let protos = captures.get(2).unwrap().as_str();
+ let behavior = captures.get(3).unwrap().as_str()
+ .replace("\"\n\t\t\"", " ")
+ .replace("\"\n\t\"", " ");
+
+ fields.insert(format!("protocol_behavior_{}_protos", idx), protos.to_string());
+ fields.insert(format!("protocol_behavior_{}_desc", idx), behavior);
+ protocol_behaviors.push(idx);
+ }
+ if !protocol_behaviors.is_empty() {
+ fields.insert("protocol_behavior_indices".to_string(),
+ protocol_behaviors.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse protocol flags (associated with behaviors)
+ for captures in self.protocol_flags_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let flags = captures.get(2).unwrap().as_str();
+ fields.insert(format!("protocol_behavior_{}_flags", idx), flags.to_string());
+ }
+
+ // Parse address families
+ let mut addr_families = Vec::new();
+ for captures in self.addr_family_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let family = captures.get(2).unwrap().as_str();
+ let name = captures.get(3).unwrap().as_str();
+ let struct_size = captures.get(4).unwrap().as_str();
+ let min_len = captures.get(5).unwrap().as_str();
+ let max_len = captures.get(6).unwrap().as_str();
+
+ fields.insert(format!("addr_family_{}_id", idx), family.to_string());
+ fields.insert(format!("addr_family_{}_name", idx), name.to_string());
+ fields.insert(format!("addr_family_{}_struct_size", idx), struct_size.to_string());
+ fields.insert(format!("addr_family_{}_min_len", idx), min_len.to_string());
+ fields.insert(format!("addr_family_{}_max_len", idx), max_len.to_string());
+ addr_families.push(idx);
+ }
+ if !addr_families.is_empty() {
+ fields.insert("addr_family_indices".to_string(),
+ addr_families.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse address family details - these appear after KAPI_ADDR_FAMILY within the block
+ for idx in &addr_families {
+ // Find the KAPI_ADDR_FAMILY block for this index
+ if let Some(family_start) = content.find(&format!("KAPI_ADDR_FAMILY({},", idx)) {
+ if let Some(family_end) = content[family_start..].find("KAPI_ADDR_FAMILY_END") {
+ let family_content = &content[family_start..family_start + family_end];
+
+ // Parse KAPI_ADDR_FORMAT
+ if let Some(captures) = self.addr_format_pattern.captures(family_content) {
+ fields.insert(format!("addr_family_{}_format", idx), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_ADDR_FEATURES
+ if let Some(captures) = self.addr_features_pattern.captures(family_content) {
+ fields.insert(format!("addr_family_{}_wildcard", idx), captures.get(1).unwrap().as_str().to_string());
+ fields.insert(format!("addr_family_{}_multicast", idx), captures.get(2).unwrap().as_str().to_string());
+ fields.insert(format!("addr_family_{}_broadcast", idx), captures.get(3).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_ADDR_SPECIAL
+ if let Some(captures) = self.addr_special_pattern.captures(family_content) {
+ let special = captures.get(1).unwrap().as_str()
+ .replace("\"\n\t\t\t \"", " ")
+ .replace("\"\n\t\t\t\"", " ");
+ fields.insert(format!("addr_family_{}_special", idx), special);
+ }
+
+ // Parse KAPI_ADDR_PORTS
+ if let Some(captures) = self.addr_ports_pattern.captures(family_content) {
+ fields.insert(format!("addr_family_{}_port_min", idx), captures.get(1).unwrap().as_str().to_string());
+ fields.insert(format!("addr_family_{}_port_max", idx), captures.get(2).unwrap().as_str().to_string());
+ }
+ }
+ }
+ }
+
+ // Parse KAPI_ADDR_FAMILY_COUNT
+ if let Some(captures) = Regex::new(r"KAPI_ADDR_FAMILY_COUNT\s*\(\s*(\d+)\s*\)")?.captures(content) {
+ fields.insert("addr_family_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse KAPI_PROTOCOL_BEHAVIOR_COUNT
+ if let Some(captures) = Regex::new(r"KAPI_PROTOCOL_BEHAVIOR_COUNT\s*\(\s*(\d+)\s*\)")?.captures(content) {
+ fields.insert("protocol_behavior_count".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse buffer spec
+ if let Some(captures) = self.buffer_spec_pattern.captures(content) {
+ fields.insert("buffer_spec_behaviors".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse async spec
+ if let Some(captures) = self.async_spec_pattern.captures(content) {
+ fields.insert("async_spec_modes".to_string(), captures.get(1).unwrap().as_str().to_string());
+ fields.insert("async_spec_errno".to_string(), captures.get(2).unwrap().as_str().to_string());
+ }
+
+ // Parse net data transfer
+ if let Some(captures) = self.net_data_transfer_pattern.captures(content) {
+ fields.insert("net_data_transfer".to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse various count fields that appear in networking specs
+ let count_fields = [
+ ("lock_count", r"KAPI_LOCK_COUNT\s*\(\s*(\d+)\s*\)"),
+ ("signal_count", r"KAPI_SIGNAL_COUNT\s*\(\s*(\d+)\s*\)"),
+ ("side_effect_count", r"KAPI_SIDE_EFFECT_COUNT\s*\(\s*(\d+)\s*\)"),
+ ("state_trans_count", r"KAPI_STATE_TRANS_COUNT\s*\(\s*(\d+)\s*\)"),
+ ("constraint_count", r"KAPI_CONSTRAINT_COUNT\s*\(\s*(\d+)\s*\)"),
+ ];
+
+ for (field_name, pattern) in count_fields.iter() {
+ if let Some(captures) = Regex::new(pattern)?.captures(content) {
+ fields.insert((*field_name).to_string(), captures.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ // Parse state transitions
+ let state_trans_pattern = Regex::new(r#"KAPI_STATE_TRANS\s*\(\s*(\d+)\s*,\s*"([^"]+)"\s*,\s*\n?\s*"([^"]+)"\s*,\s*"([^"]+)"\s*,\s*\n?\s*"([^"]+)"\s*\)(?s).*?KAPI_STATE_TRANS_END"#)?;
+ let state_trans_cond_pattern = Regex::new(r#"KAPI_STATE_TRANS_COND\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let mut state_transitions = Vec::new();
+ for captures in state_trans_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let object = captures.get(2).unwrap().as_str();
+ let from_state = captures.get(3).unwrap().as_str();
+ let to_state = captures.get(4).unwrap().as_str();
+ let description = captures.get(5).unwrap().as_str();
+ let block = captures.get(0).unwrap().as_str();
+
+ // Parse condition within the state transition block
+ let condition = state_trans_cond_pattern.captures(block)
+ .and_then(|c| c.get(1))
+ .map(|m| m.as_str())
+ .map(ToString::to_string);
+
+ fields.insert(format!("state_trans_{}_object", idx), object.to_string());
+ fields.insert(format!("state_trans_{}_from", idx), from_state.to_string());
+ fields.insert(format!("state_trans_{}_to", idx), to_state.to_string());
+ if let Some(cond) = condition {
+ fields.insert(format!("state_trans_{}_condition", idx), cond);
+ }
+ fields.insert(format!("state_trans_{}_desc", idx), description.to_string());
+ state_transitions.push(idx);
+ }
+
+ if !state_transitions.is_empty() {
+ fields.insert("state_trans_indices".to_string(),
+ state_transitions.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse side effects
+ let side_effect_pattern = Regex::new(r#"KAPI_SIDE_EFFECT\s*\(\s*(\d+)\s*,\s*([^,]+)\s*,\s*\n?\s*"([^"]+)"\s*,\s*\n?\s*"([^"]+)"\s*\)(?s).*?KAPI_SIDE_EFFECT_END"#)?;
+ let effect_cond_pattern = Regex::new(r#"KAPI_EFFECT_CONDITION\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let effect_reversible_pattern = Regex::new(r"KAPI_EFFECT_REVERSIBLE")?;
+ let mut side_effects = Vec::new();
+ for captures in side_effect_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let effect_type = captures.get(2).unwrap().as_str().trim();
+ let target = captures.get(3).unwrap().as_str();
+ let description = captures.get(4).unwrap().as_str();
+ let block = captures.get(0).unwrap().as_str();
+
+ // Parse additional fields within the side effect block
+
+ let condition = effect_cond_pattern.captures(block)
+ .and_then(|c| c.get(1))
+ .map(|m| m.as_str())
+ .map(ToString::to_string);
+
+ let reversible = effect_reversible_pattern.is_match(block);
+
+ fields.insert(format!("side_effect_{}_type", idx), effect_type.to_string());
+ fields.insert(format!("side_effect_{}_target", idx), target.to_string());
+ if let Some(cond) = condition {
+ fields.insert(format!("side_effect_{}_condition", idx), cond);
+ }
+ fields.insert(format!("side_effect_{}_desc", idx), description.to_string());
+ fields.insert(format!("side_effect_{}_reversible", idx), reversible.to_string());
+ side_effects.push(idx);
+ }
+
+ if !side_effects.is_empty() {
+ fields.insert("side_effect_indices".to_string(),
+ side_effects.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse parameters
+ let param_pattern = Regex::new(r#"KAPI_PARAM\s*\(\s*(\d+)\s*,\s*"([^"]+)"\s*,\s*"([^"]+)"\s*,\s*"([^"]+)"\s*\)(?s).*?KAPI_PARAM_END"#)?;
+ let param_flags_pattern = Regex::new(r"KAPI_PARAM_FLAGS\s*\(\s*([^)]+)\s*\)")?;
+ let param_type_pattern = Regex::new(r"KAPI_PARAM_TYPE\s*\(\s*([^)]+)\s*\)")?;
+ let param_constraint_type_pattern = Regex::new(r"KAPI_PARAM_CONSTRAINT_TYPE\s*\(\s*([^)]+)\s*\)")?;
+ let param_constraint_pattern = Regex::new(r#"KAPI_PARAM_CONSTRAINT\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let param_range_pattern = Regex::new(r"KAPI_PARAM_RANGE\s*\(\s*([^,]+)\s*,\s*([^)]+)\s*\)")?;
+ let mut parameters = Vec::new();
+ for captures in param_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let name = captures.get(2).unwrap().as_str();
+ let type_name = captures.get(3).unwrap().as_str();
+ let description = captures.get(4).unwrap().as_str();
+ let block = captures.get(0).unwrap().as_str();
+
+ // Parse additional fields within the param block
+
+ let flags = param_flags_pattern.captures(block)
+ .and_then(|c| c.get(1))
+ .map_or_else(String::new, |m| m.as_str().to_string());
+
+ let param_type = param_type_pattern.captures(block)
+ .and_then(|c| c.get(1))
+ .map_or_else(String::new, |m| m.as_str().to_string());
+
+ let constraint_type = param_constraint_type_pattern.captures(block)
+ .and_then(|c| c.get(1))
+ .map_or_else(String::new, |m| m.as_str().to_string());
+
+ let constraint = param_constraint_pattern.captures(block)
+ .and_then(|c| c.get(1))
+ .map(|m| m.as_str())
+ .map(ToString::to_string);
+
+ fields.insert(format!("param_{}_name", idx), name.to_string());
+ fields.insert(format!("param_{}_type", idx), type_name.to_string());
+ fields.insert(format!("param_{}_desc", idx), description.to_string());
+ fields.insert(format!("param_{}_flags", idx), flags);
+ fields.insert(format!("param_{}_param_type", idx), param_type);
+ fields.insert(format!("param_{}_constraint_type", idx), constraint_type);
+ if let Some(con) = constraint {
+ fields.insert(format!("param_{}_constraint", idx), con);
+ }
+
+ if let Some(range_caps) = param_range_pattern.captures(block) {
+ fields.insert(format!("param_{}_min", idx), range_caps.get(1).unwrap().as_str().to_string());
+ fields.insert(format!("param_{}_max", idx), range_caps.get(2).unwrap().as_str().to_string());
+ }
+
+ parameters.push(idx);
+ }
+
+ if !parameters.is_empty() {
+ fields.insert("param_indices".to_string(),
+ parameters.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse return specification
+ let return_pattern = Regex::new(r#"KAPI_RETURN\s*\(\s*"([^"]+)"\s*,\s*"([^"]+)"\s*\)(?s).*?KAPI_RETURN_END"#)?;
+ if let Some(captures) = return_pattern.captures(content) {
+ let type_name = captures.get(1).unwrap().as_str();
+ let description = captures.get(2).unwrap().as_str();
+ let block = captures.get(0).unwrap().as_str();
+
+ fields.insert("return_type".to_string(), type_name.to_string());
+ fields.insert("return_desc".to_string(), description.to_string());
+
+ // Parse additional return fields
+ let ret_type_pattern = Regex::new(r"KAPI_RETURN_TYPE\s*\(\s*([^)]+)\s*\)")?;
+ let check_type_pattern = Regex::new(r"KAPI_RETURN_CHECK_TYPE\s*\(\s*([^)]+)\s*\)")?;
+ let success_pattern = Regex::new(r"KAPI_RETURN_SUCCESS\s*\(\s*([^)]+)\s*\)")?;
+
+ if let Some(caps) = ret_type_pattern.captures(block) {
+ fields.insert("return_return_type".to_string(), caps.get(1).unwrap().as_str().to_string());
+ }
+ if let Some(caps) = check_type_pattern.captures(block) {
+ fields.insert("return_check_type".to_string(), caps.get(1).unwrap().as_str().to_string());
+ }
+ if let Some(caps) = success_pattern.captures(block) {
+ fields.insert("return_success".to_string(), caps.get(1).unwrap().as_str().to_string());
+ }
+ }
+
+ // Parse errors
+ let error_pattern = Regex::new(r#"KAPI_ERROR\s*\(\s*(\d+)\s*,\s*([^,]+)\s*,\s*"([^"]+)"\s*,\s*"([^"]+)"\s*,\s*\n?\s*"([^"]+)"\s*\)"#)?;
+ let mut errors = Vec::new();
+ for captures in error_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let error_code = captures.get(2).unwrap().as_str();
+ let name = captures.get(3).unwrap().as_str();
+ let condition = captures.get(4).unwrap().as_str();
+ let description = captures.get(5).unwrap().as_str();
+
+ fields.insert(format!("error_{}_code", idx), error_code.to_string());
+ fields.insert(format!("error_{}_name", idx), name.to_string());
+ fields.insert(format!("error_{}_condition", idx), condition.to_string());
+ fields.insert(format!("error_{}_desc", idx), description.to_string());
+ errors.push(idx);
+ }
+
+ if !errors.is_empty() {
+ fields.insert("error_indices".to_string(),
+ errors.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse locks
+ let lock_pattern = Regex::new(r#"KAPI_LOCK\s*\(\s*(\d+)\s*,\s*"([^"]+)"\s*,\s*([^)]+)\s*\)(?s).*?KAPI_LOCK_END"#)?;
+ let lock_desc_pattern = Regex::new(r#"KAPI_LOCK_DESC\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let mut locks = Vec::new();
+ for captures in lock_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let lock_name = captures.get(2).unwrap().as_str();
+ let lock_type = captures.get(3).unwrap().as_str();
+ let block = captures.get(0).unwrap().as_str();
+
+ fields.insert(format!("lock_{}_name", idx), lock_name.to_string());
+ fields.insert(format!("lock_{}_type", idx), lock_type.to_string());
+
+ // Parse lock description
+ if let Some(desc_caps) = lock_desc_pattern.captures(block) {
+ fields.insert(format!("lock_{}_desc", idx), desc_caps.get(1).unwrap().as_str().to_string());
+ }
+
+ // Parse lock flags
+ if block.contains("KAPI_LOCK_HELD_ENTRY") {
+ fields.insert(format!("lock_{}_held_entry", idx), "true".to_string());
+ }
+ if block.contains("KAPI_LOCK_HELD_EXIT") {
+ fields.insert(format!("lock_{}_held_exit", idx), "true".to_string());
+ }
+ if block.contains("KAPI_LOCK_ACQUIRED") {
+ fields.insert(format!("lock_{}_acquired", idx), "true".to_string());
+ }
+ if block.contains("KAPI_LOCK_RELEASED") {
+ fields.insert(format!("lock_{}_released", idx), "true".to_string());
+ }
+
+ locks.push(idx);
+ }
+
+ if !locks.is_empty() {
+ fields.insert("lock_indices".to_string(),
+ locks.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ // Parse constraints
+ let constraint_pattern = Regex::new(r#"KAPI_CONSTRAINT\s*\(\s*(\d+)\s*,\s*"([^"]+)"\s*,\s*\n?\s*"([^"]*(?:\s*"[^"]*)*?)"\s*\)(?s).*?KAPI_CONSTRAINT_END"#)?;
+ let constraint_expr_pattern = Regex::new(r#"KAPI_CONSTRAINT_EXPR\s*\(\s*"([^"]*)"\s*\)"#)?;
+ let mut constraints = Vec::new();
+ for captures in constraint_pattern.captures_iter(content) {
+ let idx = captures.get(1).unwrap().as_str().parse::<usize>().unwrap_or(0);
+ let name = captures.get(2).unwrap().as_str();
+ let description = captures.get(3).unwrap().as_str()
+ .replace("\"\n\t\t\t\"", " ")
+ .replace("\"\n\t\t\"", " ")
+ .replace("\"\n\t\"", " ")
+ .trim()
+ .to_string();
+ let block = captures.get(0).unwrap().as_str();
+
+ fields.insert(format!("constraint_{}_name", idx), name.to_string());
+ fields.insert(format!("constraint_{}_desc", idx), description);
+
+ // Parse constraint expression if present
+ if let Some(expr_caps) = constraint_expr_pattern.captures(block) {
+ fields.insert(format!("constraint_{}_expr", idx), expr_caps.get(1).unwrap().as_str().to_string());
+ }
+
+ constraints.push(idx);
+ }
+
+ if !constraints.is_empty() {
+ fields.insert("constraint_indices".to_string(),
+ constraints.iter().map(ToString::to_string).collect::<Vec<_>>().join(","));
+ }
+
+ Ok(())
+ }
+
+ /// Scan a directory tree for files containing KAPI specifications
+ pub fn scan_directory(&self, dir: &Path, extensions: &[&str]) -> Result<Vec<SourceApiSpec>> {
+ let mut all_specs = Vec::new();
+
+ for entry in WalkDir::new(dir)
+ .follow_links(true)
+ .into_iter()
+ .filter_map(Result::ok)
+ {
+ let path = entry.path();
+
+ // Skip non-files
+ if !path.is_file() {
+ continue;
+ }
+
+ // Check file extension
+ if let Some(ext) = path.extension() {
+ if extensions.iter().any(|&e| ext == e) {
+ // Try to parse the file
+ match self.parse_file(path) {
+ Ok(specs) => {
+ if !specs.is_empty() {
+ all_specs.extend(specs);
+ }
+ }
+ Err(_e) => {}
+ }
+ }
+ }
+ }
+
+ Ok(all_specs)
+ }
+
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+ use std::io::Write;
+ use tempfile::NamedTempFile;
+
+ #[test]
+ fn test_parse_syscall_spec() {
+ let parser = SourceParser::new().unwrap();
+
+ let content = r#"
+DEFINE_KERNEL_API_SPEC(sys_mlock)
+ KAPI_DESCRIPTION("Lock pages in memory")
+ KAPI_LONG_DESC("Locks pages in the specified address range into RAM")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "start", "unsigned long", "Starting address")
+ KAPI_PARAM_END
+
+ KAPI_PARAM(1, "len", "size_t", "Length of range")
+ KAPI_PARAM_END
+
+ .param_count = 2,
+ .error_count = 3,
+
+KAPI_END_SPEC
+"#;
+
+ let mut temp_file = NamedTempFile::new().unwrap();
+ write!(temp_file, "{}", content).unwrap();
+
+ let specs = parser.parse_content(content, temp_file.path()).unwrap();
+
+ assert_eq!(specs.len(), 1);
+ assert_eq!(specs[0].name, "sys_mlock");
+ assert_eq!(specs[0].api_type, ApiType::Syscall);
+ assert_eq!(specs[0].parsed_fields.get("description").unwrap(), "Lock pages in memory");
+ assert_eq!(specs[0].parsed_fields.get("param_count").unwrap(), "2");
+ }
+
+ #[test]
+ fn test_parse_ioctl_spec() {
+ let parser = SourceParser::new().unwrap();
+
+ let content = r#"
+DEFINE_IOCTL_API_SPEC(binder_write_read, BINDER_WRITE_READ, "BINDER_WRITE_READ")
+ KAPI_DESCRIPTION("Perform read/write operations on binder")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+ KAPI_PARAM(0, "write_size", "binder_size_t", "Bytes to write")
+ KAPI_PARAM_END
+
+KAPI_END_IOCTL_SPEC
+"#;
+
+ let mut temp_file = NamedTempFile::new().unwrap();
+ write!(temp_file, "{}", content).unwrap();
+
+ let specs = parser.parse_content(content, temp_file.path()).unwrap();
+
+ assert_eq!(specs.len(), 1);
+ assert_eq!(specs[0].name, "binder_write_read");
+ assert_eq!(specs[0].api_type, ApiType::Ioctl);
+ assert_eq!(specs[0].parsed_fields.get("cmd_name").unwrap(), "BINDER_WRITE_READ");
+ }
+
+ #[test]
+ fn test_parse_sysfs_spec() {
+ let parser = SourceParser::new().unwrap();
+
+ let content = r#"
+DEFINE_SYSFS_API_SPEC(nr_requests)
+ KAPI_DESCRIPTION("Number of allocatable requests")
+ KAPI_LONG_DESC("This controls how many requests may be allocated")
+ KAPI_SUBSYSTEM("block")
+ KAPI_PATH("/sys/block/<disk>/queue/nr_requests")
+ KAPI_PERMISSIONS(0644)
+ .param_count = 1,
+KAPI_END_SPEC
+"#;
+
+ let mut temp_file = NamedTempFile::new().unwrap();
+ write!(temp_file, "{}", content).unwrap();
+
+ let specs = parser.parse_content(content, temp_file.path()).unwrap();
+
+ assert_eq!(specs.len(), 1);
+ assert_eq!(specs[0].name, "nr_requests");
+ assert_eq!(specs[0].api_type, ApiType::Sysfs);
+ assert_eq!(specs[0].parsed_fields.get("description").unwrap(), "Number of allocatable requests");
+ assert_eq!(specs[0].parsed_fields.get("subsystem").unwrap(), "block");
+ assert_eq!(specs[0].parsed_fields.get("sysfs_path").unwrap(), "/sys/block/<disk>/queue/nr_requests");
+ assert_eq!(specs[0].parsed_fields.get("permissions").unwrap(), "0644");
+ }
+}
+
+// SourceExtractor implementation
+pub struct SourceExtractor {
+ specs: Vec<SourceApiSpec>,
+}
+
+impl SourceExtractor {
+ pub fn new(path: &str) -> Result<Self> {
+ let parser = SourceParser::new()?;
+ let path_obj = Path::new(&path);
+
+ let specs = if path_obj.is_file() {
+ parser.parse_file(path_obj)?
+ } else if path_obj.is_dir() {
+ parser.scan_directory(path_obj, &["c", "h"])?
+ } else {
+ anyhow::bail!("Path does not exist: {}", path_obj.display())
+ };
+
+ Ok(SourceExtractor { specs })
+ }
+
+ fn convert_capability_action(action: &str) -> String {
+ match action {
+ "KAPI_CAP_BYPASS_CHECK" => "Bypasses check".to_string(),
+ "KAPI_CAP_INCREASE_LIMIT" => "Increases limit".to_string(),
+ "KAPI_CAP_OVERRIDE_RESTRICTION" => "Overrides restriction".to_string(),
+ "KAPI_CAP_GRANT_PERMISSION" => "Grants permission".to_string(),
+ "KAPI_CAP_MODIFY_BEHAVIOR" => "Modifies behavior".to_string(),
+ "KAPI_CAP_ACCESS_RESOURCE" => "Allows resource access".to_string(),
+ "KAPI_CAP_PERFORM_OPERATION" => "Allows operation".to_string(),
+ _ => action.to_string(),
+ }
+ }
+
+ fn parse_state_transitions(source_spec: &SourceApiSpec) -> Vec<StateTransitionSpec> {
+ let mut transitions = Vec::new();
+
+ if let Some(indices_str) = source_spec.parsed_fields.get("state_trans_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ let object = source_spec.parsed_fields.get(&format!("state_trans_{}_object", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let from_state = source_spec.parsed_fields.get(&format!("state_trans_{}_from", idx))
+ .cloned()
+ .unwrap_or_else(|| "any".to_string());
+ let to_state = source_spec.parsed_fields.get(&format!("state_trans_{}_to", idx))
+ .cloned()
+ .unwrap_or_else(|| "changed".to_string());
+ let condition = source_spec.parsed_fields.get(&format!("state_trans_{}_condition", idx))
+ .cloned();
+ let description = source_spec.parsed_fields.get(&format!("state_trans_{}_desc", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+
+ transitions.push(StateTransitionSpec {
+ object,
+ from_state,
+ to_state,
+ condition,
+ description,
+ });
+ }
+ }
+ }
+
+ transitions
+ }
+
+ fn parse_side_effects(source_spec: &SourceApiSpec) -> Vec<SideEffectSpec> {
+ let mut effects = Vec::new();
+
+ if let Some(indices_str) = source_spec.parsed_fields.get("side_effect_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ let effect_type_str = source_spec.parsed_fields.get(&format!("side_effect_{}_type", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let target = source_spec.parsed_fields.get(&format!("side_effect_{}_target", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let condition = source_spec.parsed_fields.get(&format!("side_effect_{}_condition", idx))
+ .cloned();
+ let description = source_spec.parsed_fields.get(&format!("side_effect_{}_desc", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let reversible = source_spec.parsed_fields.get(&format!("side_effect_{}_reversible", idx))
+ .is_some_and(|s| s == "true");
+
+ // Convert effect type string to u32
+ let effect_type = Self::parse_effect_type(&effect_type_str);
+
+ effects.push(SideEffectSpec {
+ effect_type,
+ target,
+ condition,
+ description,
+ reversible,
+ });
+ }
+ }
+ }
+
+ effects
+ }
+
+ fn parse_effect_type(effect_type_str: &str) -> u32 {
+ // Parse effect type flags
+ let mut effect_type = 0u32;
+ let parts: Vec<&str> = effect_type_str.split('|').map(str::trim).collect();
+
+ for part in parts {
+ match part {
+ "KAPI_EFFECT_MODIFY_STATE" => effect_type |= 1 << 0,
+ "KAPI_EFFECT_ALLOCATE_MEMORY" => effect_type |= 1 << 1,
+ "KAPI_EFFECT_FREE_MEMORY" => effect_type |= 1 << 2,
+ "KAPI_EFFECT_IO_OPERATION" => effect_type |= 1 << 3,
+ "KAPI_EFFECT_SIGNAL_SEND" => effect_type |= 1 << 4,
+ "KAPI_EFFECT_PROCESS_CREATE" => effect_type |= 1 << 5,
+ "KAPI_EFFECT_PROCESS_TERMINATE" => effect_type |= 1 << 6,
+ "KAPI_EFFECT_FILE_CREATE" => effect_type |= 1 << 7,
+ "KAPI_EFFECT_FILE_DELETE" => effect_type |= 1 << 8,
+ "KAPI_EFFECT_RESOURCE_CREATE" => effect_type |= 1 << 9,
+ "KAPI_EFFECT_RESOURCE_DESTROY" => effect_type |= 1 << 10,
+ "KAPI_EFFECT_LOCK_ACQUIRE" => effect_type |= 1 << 11,
+ "KAPI_EFFECT_LOCK_RELEASE" => effect_type |= 1 << 12,
+ "KAPI_EFFECT_NETWORK_IO" => effect_type |= 1 << 13,
+ "KAPI_EFFECT_SYSTEM_STATE" => effect_type |= 1 << 14,
+ _ => {} // Unknown effect type
+ }
+ }
+
+ effect_type
+ }
+
+ fn parse_parameters(source_spec: &SourceApiSpec) -> Vec<ParamSpec> {
+ let mut params = Vec::new();
+
+ if let Some(indices_str) = source_spec.parsed_fields.get("param_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<u32>() {
+ let name = source_spec.parsed_fields.get(&format!("param_{}_name", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let type_name = source_spec.parsed_fields.get(&format!("param_{}_type", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let description = source_spec.parsed_fields.get(&format!("param_{}_desc", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let flags_str = source_spec.parsed_fields.get(&format!("param_{}_flags", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let param_type_str = source_spec.parsed_fields.get(&format!("param_{}_param_type", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let constraint_type_str = source_spec.parsed_fields.get(&format!("param_{}_constraint_type", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let constraint = source_spec.parsed_fields.get(&format!("param_{}_constraint", idx))
+ .cloned();
+ let min_value = source_spec.parsed_fields.get(&format!("param_{}_min", idx))
+ .and_then(|s| s.parse::<i64>().ok());
+ let max_value = source_spec.parsed_fields.get(&format!("param_{}_max", idx))
+ .and_then(|s| s.parse::<i64>().ok());
+
+ params.push(ParamSpec {
+ index: idx,
+ name,
+ type_name,
+ description,
+ flags: Self::parse_param_flags(&flags_str),
+ param_type: Self::parse_param_type(¶m_type_str),
+ constraint_type: Self::parse_constraint_type(&constraint_type_str),
+ constraint,
+ min_value,
+ max_value,
+ valid_mask: None,
+ enum_values: Vec::new(),
+ size: None,
+ alignment: None,
+ });
+ }
+ }
+ }
+
+ params
+ }
+
+ fn parse_return_spec(source_spec: &SourceApiSpec) -> Option<ReturnSpec> {
+ if let (Some(type_name), Some(description)) = (
+ source_spec.parsed_fields.get("return_type"),
+ source_spec.parsed_fields.get("return_desc")
+ ) {
+ let return_type_str = source_spec.parsed_fields.get("return_return_type")
+ .cloned()
+ .unwrap_or_else(String::new);
+ let check_type_str = source_spec.parsed_fields.get("return_check_type")
+ .cloned()
+ .unwrap_or_else(String::new);
+ let success_value = source_spec.parsed_fields.get("return_success")
+ .and_then(|s| s.parse::<i64>().ok());
+
+ Some(ReturnSpec {
+ type_name: type_name.clone(),
+ description: description.clone(),
+ return_type: Self::parse_return_type(&return_type_str),
+ check_type: Self::parse_check_type(&check_type_str),
+ success_value,
+ success_min: None,
+ success_max: None,
+ error_values: Vec::new(),
+ })
+ } else {
+ None
+ }
+ }
+
+ fn parse_errors(source_spec: &SourceApiSpec) -> Vec<ErrorSpec> {
+ let mut errors = Vec::new();
+
+ if let Some(indices_str) = source_spec.parsed_fields.get("error_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ let error_code_str = source_spec.parsed_fields.get(&format!("error_{}_code", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let name = source_spec.parsed_fields.get(&format!("error_{}_name", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let condition = source_spec.parsed_fields.get(&format!("error_{}_condition", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let description = source_spec.parsed_fields.get(&format!("error_{}_desc", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+
+ // Parse error code (handle -EINVAL format)
+ let error_code = if error_code_str.starts_with("-E") {
+ // Map common error codes
+ match error_code_str.as_str() {
+ "-EINVAL" => -22,
+ "-ENOMEM" => -12,
+ "-EBUSY" => -16,
+ "-ENODEV" => -19,
+ "-ENOENT" => -2,
+ "-EPERM" => -1,
+ "-EACCES" => -13,
+ "-EFAULT" => -14,
+ "-EAGAIN" => -11,
+ "-EEXIST" => -17,
+ _ => 0,
+ }
+ } else {
+ error_code_str.parse::<i32>().unwrap_or(0)
+ };
+
+ errors.push(ErrorSpec {
+ error_code,
+ name,
+ condition,
+ description,
+ });
+ }
+ }
+ }
+
+ errors
+ }
+
+ fn parse_locks(source_spec: &SourceApiSpec) -> Vec<LockSpec> {
+ let mut locks = Vec::new();
+
+ if let Some(indices_str) = source_spec.parsed_fields.get("lock_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ let lock_name = source_spec.parsed_fields.get(&format!("lock_{}_name", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let lock_type_str = source_spec.parsed_fields.get(&format!("lock_{}_type", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let description = source_spec.parsed_fields.get(&format!("lock_{}_desc", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let held_on_entry = source_spec.parsed_fields.get(&format!("lock_{}_held_entry", idx))
+ .is_some_and(|s| s == "true");
+ let held_on_exit = source_spec.parsed_fields.get(&format!("lock_{}_held_exit", idx))
+ .is_some_and(|s| s == "true");
+ let acquired = source_spec.parsed_fields.get(&format!("lock_{}_acquired", idx))
+ .is_some_and(|s| s == "true");
+ let released = source_spec.parsed_fields.get(&format!("lock_{}_released", idx))
+ .is_some_and(|s| s == "true");
+
+ locks.push(LockSpec {
+ lock_name,
+ lock_type: Self::parse_lock_type(&lock_type_str),
+ acquired,
+ released,
+ held_on_entry,
+ held_on_exit,
+ description,
+ });
+ }
+ }
+ }
+
+ locks
+ }
+
+ fn parse_constraints(source_spec: &SourceApiSpec) -> Vec<ConstraintSpec> {
+ let mut constraints = Vec::new();
+
+ if let Some(indices_str) = source_spec.parsed_fields.get("constraint_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ let name = source_spec.parsed_fields.get(&format!("constraint_{}_name", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let description = source_spec.parsed_fields.get(&format!("constraint_{}_desc", idx))
+ .cloned()
+ .unwrap_or_else(String::new);
+ let expression = source_spec.parsed_fields.get(&format!("constraint_{}_expr", idx))
+ .cloned();
+
+ constraints.push(ConstraintSpec {
+ name,
+ description,
+ expression,
+ });
+ }
+ }
+ }
+
+ constraints
+ }
+
+ fn parse_param_flags(flags_str: &str) -> u32 {
+ let mut flags = 0u32;
+ let parts: Vec<&str> = flags_str.split('|').map(str::trim).collect();
+
+ for part in parts {
+ match part {
+ "KAPI_PARAM_IN" => flags |= 1 << 0,
+ "KAPI_PARAM_OUT" => flags |= 1 << 1,
+ "KAPI_PARAM_INOUT" => flags |= (1 << 0) | (1 << 1),
+ "KAPI_PARAM_USER" => flags |= 1 << 2,
+ "KAPI_PARAM_OPTIONAL" => flags |= 1 << 3,
+ _ => {}
+ }
+ }
+
+ flags
+ }
+
+ fn parse_param_type(type_str: &str) -> u32 {
+ match type_str.trim() {
+ "KAPI_TYPE_INT" => 1,
+ "KAPI_TYPE_UINT" => 2,
+ "KAPI_TYPE_PTR" => 3,
+ "KAPI_TYPE_STRUCT" => 4,
+ "KAPI_TYPE_ENUM" => 5,
+ "KAPI_TYPE_FLAGS" => 6,
+ "KAPI_TYPE_FD" => 7,
+ "KAPI_TYPE_STRING" => 8,
+ _ => 0,
+ }
+ }
+
+ fn parse_constraint_type(type_str: &str) -> u32 {
+ match type_str.trim() {
+ "KAPI_CONSTRAINT_RANGE" => 1,
+ "KAPI_CONSTRAINT_MASK" => 2,
+ "KAPI_CONSTRAINT_ENUM" => 3,
+ "KAPI_CONSTRAINT_SIZE" => 4,
+ "KAPI_CONSTRAINT_ALIGNMENT" => 5,
+ _ => 0, // Default to NONE (includes "KAPI_CONSTRAINT_NONE")
+ }
+ }
+
+ fn parse_return_type(type_str: &str) -> u32 {
+ match type_str.trim() {
+ "KAPI_TYPE_INT" => 1,
+ "KAPI_TYPE_UINT" => 2,
+ "KAPI_TYPE_PTR" => 3,
+ "KAPI_TYPE_FD" => 7,
+ _ => 0,
+ }
+ }
+
+ fn parse_check_type(type_str: &str) -> u32 {
+ match type_str.trim() {
+ "KAPI_RETURN_SUCCESS_CHECK" => 1,
+ "KAPI_RETURN_ERROR_CHECK" => 2,
+ "KAPI_RETURN_RANGE_CHECK" => 3,
+ "KAPI_RETURN_PTR_CHECK" => 4,
+ _ => 0,
+ }
+ }
+
+ fn parse_lock_type(type_str: &str) -> u32 {
+ match type_str.trim() {
+ "KAPI_LOCK_MUTEX" => 1,
+ "KAPI_LOCK_SPINLOCK" => 2,
+ "KAPI_LOCK_RWLOCK" => 3,
+ "KAPI_LOCK_SEMAPHORE" => 4,
+ "KAPI_LOCK_RCU" => 5,
+ _ => 0,
+ }
+ }
+
+ fn parse_context_flags(flags_str: &str) -> Vec<String> {
+ let mut result = Vec::new();
+ let parts: Vec<&str> = flags_str.split('|').map(str::trim).collect();
+
+ for part in parts {
+ match part {
+ "KAPI_CTX_PROCESS" => result.push("Process context".to_string()),
+ "KAPI_CTX_SOFTIRQ" => result.push("Softirq context".to_string()),
+ "KAPI_CTX_HARDIRQ" => result.push("Hardirq context".to_string()),
+ "KAPI_CTX_NMI" => result.push("NMI context".to_string()),
+ "KAPI_CTX_USER" => result.push("User mode".to_string()),
+ "KAPI_CTX_KERNEL" => result.push("Kernel mode".to_string()),
+ "KAPI_CTX_SLEEPABLE" => result.push("May sleep".to_string()),
+ "KAPI_CTX_ATOMIC" => result.push("Atomic context".to_string()),
+ "KAPI_CTX_PREEMPTIBLE" => result.push("Preemptible".to_string()),
+ "KAPI_CTX_MIGRATION_DISABLED" => result.push("Migration disabled".to_string()),
+ _ => {} // Ignore unknown flags
+ }
+ }
+
+ result
+ }
+
+ fn convert_to_api_spec(&self, source_spec: &SourceApiSpec) -> ApiSpec {
+ let mut capabilities = Vec::new();
+
+ // Extract capabilities
+ if let Some(cap_count_str) = source_spec.parsed_fields.get("capability_count") {
+ if let Ok(cap_count) = cap_count_str.parse::<usize>() {
+ for i in 0..cap_count {
+ let cap_key = format!("capability_{}", i);
+
+ if let (Some(id_str), Some(name), Some(action)) = (
+ source_spec.parsed_fields.get(&format!("{}_id", cap_key)),
+ source_spec.parsed_fields.get(&format!("{}_name", cap_key)),
+ source_spec.parsed_fields.get(&format!("{}_action", cap_key))
+ ) {
+ let cap_id = id_str.parse::<i32>().unwrap_or(0);
+ capabilities.push(CapabilitySpec {
+ capability: cap_id,
+ name: name.clone(),
+ action: Self::convert_capability_action(action),
+ allows: source_spec.parsed_fields.get(&format!("{}_allows", cap_key))
+ .cloned()
+ .unwrap_or_else(String::new),
+ without_cap: source_spec.parsed_fields.get(&format!("{}_without", cap_key))
+ .cloned()
+ .unwrap_or_else(String::new),
+ check_condition: source_spec.parsed_fields.get(&format!("{}_condition", cap_key))
+ .cloned(),
+ priority: source_spec.parsed_fields.get(&format!("{}_priority", cap_key))
+ .and_then(|s| s.parse::<u8>().ok()),
+ alternatives: Vec::new(), // Not parsed from source yet
+ });
+ }
+ }
+ }
+ }
+
+ // Parse socket state
+ let socket_state = if source_spec.parsed_fields.contains_key("socket_state_req") ||
+ source_spec.parsed_fields.contains_key("socket_state_result") {
+ Some(SocketStateSpec {
+ required_states: source_spec.parsed_fields.get("socket_state_req")
+ .map(|s| vec![s.clone()])
+ .unwrap_or_default(),
+ forbidden_states: Vec::new(), // Not parsed yet
+ resulting_state: source_spec.parsed_fields.get("socket_state_result").cloned(),
+ condition: source_spec.parsed_fields.get("socket_state_cond").cloned(),
+ applicable_protocols: source_spec.parsed_fields.get("socket_state_protos").cloned(),
+ })
+ } else {
+ None
+ };
+
+ // Parse protocol behaviors
+ let mut protocol_behaviors = Vec::new();
+ if let Some(indices_str) = source_spec.parsed_fields.get("protocol_behavior_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ if let (Some(protos), Some(desc)) = (
+ source_spec.parsed_fields.get(&format!("protocol_behavior_{}_protos", idx)),
+ source_spec.parsed_fields.get(&format!("protocol_behavior_{}_desc", idx))
+ ) {
+ protocol_behaviors.push(ProtocolBehaviorSpec {
+ applicable_protocols: protos.clone(),
+ behavior: desc.clone(),
+ protocol_flags: source_spec.parsed_fields.get(&format!("protocol_behavior_{}_flags", idx)).cloned(),
+ flag_description: None, // Could be enhanced to parse flag descriptions
+ });
+ }
+ }
+ }
+ }
+
+ // Parse address families
+ let mut addr_families = Vec::new();
+ if let Some(indices_str) = source_spec.parsed_fields.get("addr_family_indices") {
+ for idx_str in indices_str.split(',') {
+ if let Ok(idx) = idx_str.parse::<usize>() {
+ if let (Some(family_str), Some(name), Some(struct_size_str), Some(min_len_str), Some(max_len_str)) = (
+ source_spec.parsed_fields.get(&format!("addr_family_{}_id", idx)),
+ source_spec.parsed_fields.get(&format!("addr_family_{}_name", idx)),
+ source_spec.parsed_fields.get(&format!("addr_family_{}_struct_size", idx)),
+ source_spec.parsed_fields.get(&format!("addr_family_{}_min_len", idx)),
+ source_spec.parsed_fields.get(&format!("addr_family_{}_max_len", idx))
+ ) {
+ // Parse AF_INET etc as integers
+ let family = if family_str.starts_with("AF_") {
+ // This is a constant name, we'd need to map it to the actual value
+ // For now, use a placeholder
+ match family_str.as_str() {
+ "AF_UNIX" => 1,
+ "AF_INET" => 2,
+ "AF_INET6" => 10,
+ "AF_NETLINK" => 16,
+ "AF_PACKET" => 17,
+ "AF_BLUETOOTH" => 31,
+ _ => 0,
+ }
+ } else {
+ family_str.parse::<i32>().unwrap_or(0)
+ };
+
+ // For sizeof() expressions, we'll store the string as-is
+ let struct_size = if struct_size_str.starts_with("sizeof(") {
+ // Map common struct sizes - this is a limitation of static parsing
+ match struct_size_str.as_str() {
+ "sizeof(struct sockaddr_un)" => 110,
+ "sizeof(struct sockaddr_in)" => 16,
+ "sizeof(struct sockaddr_in6)" => 28,
+ "sizeof(struct sockaddr_nl)" => 12,
+ "sizeof(struct sockaddr_ll)" => 20,
+ "sizeof(struct sockaddr)" => 16, // generic sockaddr
+ _ => 0,
+ }
+ } else {
+ struct_size_str.parse::<usize>().unwrap_or(0)
+ };
+
+ addr_families.push(AddrFamilySpec {
+ family,
+ family_name: name.clone(),
+ addr_struct_size: struct_size,
+ min_addr_len: min_len_str.parse::<usize>().unwrap_or(0),
+ max_addr_len: max_len_str.parse::<usize>().unwrap_or(0),
+ addr_format: source_spec.parsed_fields.get(&format!("addr_family_{}_format", idx)).cloned(),
+ supports_wildcard: source_spec.parsed_fields.get(&format!("addr_family_{}_wildcard", idx))
+ .is_some_and(|s| s == "true"),
+ supports_multicast: source_spec.parsed_fields.get(&format!("addr_family_{}_multicast", idx))
+ .is_some_and(|s| s == "true"),
+ supports_broadcast: source_spec.parsed_fields.get(&format!("addr_family_{}_broadcast", idx))
+ .is_some_and(|s| s == "true"),
+ special_addresses: source_spec.parsed_fields.get(&format!("addr_family_{}_special", idx)).cloned(),
+ port_range_min: source_spec.parsed_fields.get(&format!("addr_family_{}_port_min", idx))
+ .and_then(|s| s.parse::<u32>().ok()).unwrap_or(0),
+ port_range_max: source_spec.parsed_fields.get(&format!("addr_family_{}_port_max", idx))
+ .and_then(|s| s.parse::<u32>().ok()).unwrap_or(0),
+ });
+ }
+ }
+ }
+ }
+
+ // Parse buffer spec
+ let buffer_spec = if source_spec.parsed_fields.contains_key("buffer_spec_behaviors") {
+ Some(BufferSpec {
+ buffer_behaviors: source_spec.parsed_fields.get("buffer_spec_behaviors").cloned(),
+ min_buffer_size: None,
+ max_buffer_size: None,
+ optimal_buffer_size: None,
+ })
+ } else {
+ None
+ };
+
+ // Parse async spec
+ let async_spec = if source_spec.parsed_fields.contains_key("async_spec_modes") {
+ Some(AsyncSpec {
+ supported_modes: source_spec.parsed_fields.get("async_spec_modes").cloned(),
+ nonblock_errno: source_spec.parsed_fields.get("async_spec_errno")
+ .and_then(|s| s.parse::<i32>().ok()),
+ })
+ } else {
+ None
+ };
+
+ ApiSpec {
+ name: source_spec.name.clone(),
+ api_type: match source_spec.api_type {
+ ApiType::Syscall => "syscall".to_string(),
+ ApiType::Ioctl => "ioctl".to_string(),
+ ApiType::Function => "function".to_string(),
+ ApiType::Sysfs => "sysfs".to_string(),
+ ApiType::Unknown => "unknown".to_string(),
+ },
+ description: source_spec.parsed_fields.get("description").cloned(),
+ long_description: source_spec.parsed_fields.get("long_description").cloned(),
+ version: source_spec.parsed_fields.get("version").cloned(),
+ context_flags: source_spec.parsed_fields.get("context")
+ .map(|c| Self::parse_context_flags(c))
+ .unwrap_or_default(),
+ param_count: source_spec.parsed_fields.get("param_count")
+ .and_then(|s| s.parse::<u32>().ok()),
+ error_count: source_spec.parsed_fields.get("error_count")
+ .and_then(|s| s.parse::<u32>().ok()),
+ examples: source_spec.parsed_fields.get("examples").cloned(),
+ notes: source_spec.parsed_fields.get("notes").cloned(),
+ since_version: source_spec.parsed_fields.get("since_version").cloned(),
+ // Sysfs-specific fields
+ subsystem: source_spec.parsed_fields.get("subsystem").cloned(),
+ sysfs_path: source_spec.parsed_fields.get("sysfs_path").cloned(),
+ permissions: source_spec.parsed_fields.get("permissions").cloned(),
+ // Networking-specific fields
+ socket_state,
+ protocol_behaviors,
+ addr_families,
+ buffer_spec,
+ async_spec,
+ net_data_transfer: source_spec.parsed_fields.get("net_data_transfer").cloned(),
+ capabilities,
+ parameters: Self::parse_parameters(source_spec),
+ return_spec: Self::parse_return_spec(source_spec),
+ errors: Self::parse_errors(source_spec),
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: Self::parse_side_effects(source_spec),
+ state_transitions: Self::parse_state_transitions(source_spec),
+ constraints: Self::parse_constraints(source_spec),
+ locks: Self::parse_locks(source_spec),
+ }
+ }
+}
+
+impl ApiExtractor for SourceExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ Ok(self.specs.iter()
+ .map(|s| self.convert_to_api_spec(s))
+ .collect())
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ Ok(self.specs.iter()
+ .find(|s| s.name == name)
+ .map(|s| self.convert_to_api_spec(s)))
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+ let api_spec = self.convert_to_api_spec(spec);
+ display_api_spec(&api_spec, formatter, writer)?;
+ }
+ Ok(())
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/vmlinux/binary_utils.rs b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
new file mode 100644
index 0000000000000..e3f5d1e939d86
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
@@ -0,0 +1,283 @@
+
+// Constants for all structure field sizes
+pub mod sizes {
+ pub const NAME: usize = 128;
+ pub const DESC: usize = 512;
+ pub const MAX_PARAMS: usize = 16;
+ pub const MAX_ERRORS: usize = 32;
+ pub const MAX_CONSTRAINTS: usize = 16;
+ pub const MAX_CAPABILITIES: usize = 8;
+ pub const MAX_SIGNALS: usize = 16;
+ pub const MAX_STRUCT_SPECS: usize = 8;
+ pub const MAX_SIDE_EFFECTS: usize = 16;
+ pub const MAX_STATE_TRANS: usize = 16;
+}
+
+// Helper for reading data at specific offsets
+pub struct DataReader<'a> {
+ data: &'a [u8],
+ pos: usize,
+}
+
+impl<'a> DataReader<'a> {
+ pub fn new(data: &'a [u8], offset: usize) -> Self {
+ Self { data, pos: offset }
+ }
+
+ pub fn read_bytes(&mut self, len: usize) -> Option<&'a [u8]> {
+ if self.pos + len <= self.data.len() {
+ let bytes = &self.data[self.pos..self.pos + len];
+ self.pos += len;
+ Some(bytes)
+ } else {
+ None
+ }
+ }
+
+ pub fn read_cstring(&mut self, max_len: usize) -> Option<String> {
+ let bytes = self.read_bytes(max_len)?;
+ if let Some(null_pos) = bytes.iter().position(|&b| b == 0) {
+ if null_pos > 0 {
+ if let Ok(s) = std::str::from_utf8(&bytes[..null_pos]) {
+ return Some(s.to_string());
+ }
+ }
+ }
+ None
+ }
+
+ pub fn read_u32(&mut self) -> Option<u32> {
+ let bytes = self.read_bytes(4)?;
+ Some(u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]))
+ }
+
+ pub fn read_u8(&mut self) -> Option<u8> {
+ let bytes = self.read_bytes(1)?;
+ Some(bytes[0])
+ }
+
+ pub fn read_i32(&mut self) -> Option<i32> {
+ let bytes = self.read_bytes(4)?;
+ Some(i32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]))
+ }
+
+ pub fn read_u64(&mut self) -> Option<u64> {
+ let bytes = self.read_bytes(8)?;
+ Some(u64::from_le_bytes([
+ bytes[0], bytes[1], bytes[2], bytes[3],
+ bytes[4], bytes[5], bytes[6], bytes[7]
+ ]))
+ }
+
+ pub fn read_i64(&mut self) -> Option<i64> {
+ let bytes = self.read_bytes(8)?;
+ Some(i64::from_le_bytes([
+ bytes[0], bytes[1], bytes[2], bytes[3],
+ bytes[4], bytes[5], bytes[6], bytes[7]
+ ]))
+ }
+
+ pub fn skip(&mut self, len: usize) {
+ self.pos = (self.pos + len).min(self.data.len());
+ }
+}
+
+// Structure layout definitions for calculating sizes
+pub fn param_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_param_spec
+ sizes::NAME + // name
+ sizes::NAME + // type_name
+ 4 + // type (enum)
+ 4 + // flags
+ 8 + // size (size_t)
+ 8 + // alignment (size_t)
+ 8 + // min_value
+ 8 + // max_value
+ 8 + // valid_mask
+ 8 + // enum_values pointer
+ 4 + // enum_count
+ 4 + // constraint_type (enum)
+ 8 + // validate function pointer
+ sizes::DESC + // description
+ sizes::DESC + // constraints
+ 4 + // size_param_idx
+ 8 + // size_multiplier (size_t)
+ // sysfs-specific fields
+ sizes::NAME + // sysfs_path
+ 2 + // sysfs_permissions (umode_t)
+ sizes::NAME + // default_value
+ 32 + // units
+ 8 + // step
+ 8 + // allowed_strings pointer
+ 4 // allowed_string_count
+}
+
+pub fn return_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_return_spec
+ sizes::NAME + // type_name
+ 4 + // type (enum)
+ 4 + // check_type (enum)
+ 8 + // success_value
+ 8 + // success_min
+ 8 + // success_max
+ 8 + // error_values pointer
+ 4 + // error_count
+ 8 + // is_success function pointer
+ sizes::DESC // description
+}
+
+pub fn error_spec_layout_size() -> usize {
+ // Packed structure
+ 4 + // code
+ sizes::NAME + // name
+ sizes::DESC * 2 // condition, description
+}
+
+pub fn lock_spec_layout_size() -> usize {
+ // Packed structure
+ sizes::NAME + // name
+ 4 + // lock_type
+ 1 + 1 + 1 + 1 + // bools
+ sizes::DESC // description
+}
+
+pub fn constraint_spec_layout_size() -> usize {
+ // Packed structure
+ sizes::NAME + // name
+ sizes::DESC * 2 // description, expression
+}
+
+pub fn capability_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_capability_spec
+ 4 + // capability (int)
+ sizes::NAME + // cap_name
+ 4 + // action (enum)
+ sizes::DESC + // allows
+ sizes::DESC + // without_cap
+ sizes::DESC + // check_condition
+ 1 + // priority (u8)
+ 4 * sizes::MAX_CAPABILITIES + // alternative array
+ 4 // alternative_count
+}
+
+pub fn signal_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_signal_spec
+ 4 + // signal_num
+ 32 + // signal_name[32]
+ 4 + // direction (u32)
+ 4 + // action (enum)
+ sizes::DESC + // target
+ sizes::DESC + // condition
+ sizes::DESC + // description
+ 1 + // restartable (bool)
+ 4 + // sa_flags_required
+ 4 + // sa_flags_forbidden
+ 4 + // error_on_signal
+ 4 + // transform_to
+ 32 + // timing[32]
+ 1 + // priority (u8)
+ 1 + // interruptible (bool)
+ 128 + // queue_behavior[128]
+ 4 + // state_required
+ 4 // state_forbidden
+}
+
+pub fn signal_mask_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_signal_mask_spec
+ sizes::NAME + // mask_name
+ 4 * sizes::MAX_SIGNALS + // signals array
+ 4 + // signal_count
+ sizes::DESC // description
+}
+
+pub fn struct_field_layout_size() -> usize {
+ // Packed structure from struct kapi_struct_field
+ sizes::NAME + // name
+ 4 + // type (enum)
+ sizes::NAME + // type_name
+ 8 + // offset (size_t)
+ 8 + // size (size_t)
+ 4 + // flags
+ 4 + // constraint_type (enum)
+ 8 + // min_value (s64)
+ 8 + // max_value (s64)
+ 8 + // valid_mask (u64)
+ sizes::DESC // description
+}
+
+pub fn struct_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_struct_spec
+ sizes::NAME + // name
+ 8 + // size (size_t)
+ 8 + // alignment (size_t)
+ 4 + // field_count
+ struct_field_layout_size() * sizes::MAX_PARAMS + // fields array
+ sizes::DESC // description
+}
+
+pub fn side_effect_layout_size() -> usize {
+ // Packed structure from struct kapi_side_effect
+ 4 + // type (u32)
+ sizes::NAME + // target
+ sizes::DESC + // condition
+ sizes::DESC + // description
+ 1 // reversible (bool)
+}
+
+pub fn state_transition_layout_size() -> usize {
+ // Packed structure from struct kapi_state_transition
+ sizes::NAME + // from_state
+ sizes::NAME + // to_state
+ sizes::DESC + // condition
+ sizes::NAME + // object
+ sizes::DESC // description
+}
+
+pub fn socket_state_spec_layout_size() -> usize {
+ // struct kapi_socket_state_spec
+ sizes::NAME * sizes::MAX_CONSTRAINTS + // required_states array
+ sizes::NAME * sizes::MAX_CONSTRAINTS + // forbidden_states array
+ sizes::NAME + // resulting_state
+ sizes::DESC + // condition
+ sizes::NAME + // applicable_protocols
+ 4 + // required_count
+ 4 // forbidden_count
+}
+
+pub fn protocol_behavior_spec_layout_size() -> usize {
+ // struct kapi_protocol_behavior
+ sizes::NAME + // applicable_protocols
+ sizes::DESC + // behavior
+ sizes::NAME + // protocol_flags
+ sizes::DESC // flag_description
+}
+
+pub fn buffer_spec_layout_size() -> usize {
+ // struct kapi_buffer_spec
+ sizes::DESC + // buffer_behaviors
+ 8 + // min_buffer_size (size_t)
+ 8 + // max_buffer_size (size_t)
+ 8 // optimal_buffer_size (size_t)
+}
+
+pub fn async_spec_layout_size() -> usize {
+ // struct kapi_async_spec
+ sizes::NAME + // supported_modes
+ 4 // nonblock_errno (int)
+}
+
+pub fn addr_family_spec_layout_size() -> usize {
+ // struct kapi_addr_family_spec
+ 4 + // family (int)
+ sizes::NAME + // family_name
+ 8 + // addr_struct_size (size_t)
+ 8 + // min_addr_len (size_t)
+ 8 + // max_addr_len (size_t)
+ sizes::DESC + // addr_format
+ 1 + // supports_wildcard (bool)
+ 1 + // supports_multicast (bool)
+ 1 + // supports_broadcast (bool)
+ sizes::DESC + // special_addresses
+ 4 + // port_range_min (u32)
+ 4 // port_range_max (u32)
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/vmlinux/mod.rs b/tools/kapi/src/extractor/vmlinux/mod.rs
new file mode 100644
index 0000000000000..b04fc6fe5f630
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/mod.rs
@@ -0,0 +1,989 @@
+use anyhow::{Context, Result};
+use goblin::elf::Elf;
+use std::fs;
+use std::io::Write;
+use std::convert::TryInto;
+use crate::formatter::OutputFormatter;
+use super::{ApiExtractor, ApiSpec, CapabilitySpec, ParamSpec, ReturnSpec, ErrorSpec,
+ SignalSpec, SignalMaskSpec, SideEffectSpec, StateTransitionSpec, ConstraintSpec, LockSpec};
+
+mod binary_utils;
+use binary_utils::{sizes, DataReader,
+ param_spec_layout_size, return_spec_layout_size, error_spec_layout_size,
+ lock_spec_layout_size, constraint_spec_layout_size, capability_spec_layout_size,
+ signal_spec_layout_size, signal_mask_spec_layout_size, struct_spec_layout_size,
+ side_effect_layout_size, state_transition_layout_size, socket_state_spec_layout_size,
+ protocol_behavior_spec_layout_size, buffer_spec_layout_size, async_spec_layout_size,
+ addr_family_spec_layout_size};
+
+pub struct VmlinuxExtractor {
+ kapi_data: Vec<u8>,
+ specs: Vec<KapiSpec>,
+}
+
+#[derive(Debug)]
+struct KapiSpec {
+ name: String,
+ api_type: String,
+ offset: usize,
+}
+
+impl VmlinuxExtractor {
+ pub fn new(vmlinux_path: &str) -> Result<Self> {
+ let vmlinux_data = fs::read(&vmlinux_path)
+ .with_context(|| format!("Failed to read vmlinux file: {vmlinux_path}"))?;
+
+ let elf = Elf::parse(&vmlinux_data)
+ .context("Failed to parse ELF file")?;
+
+ // Find the .kapi_specs section
+ let kapi_section = elf.section_headers
+ .iter()
+ .find(|sh| {
+ if let Some(name) = elf.shdr_strtab.get_at(sh.sh_name) {
+ name == ".kapi_specs"
+ } else {
+ false
+ }
+ })
+ .context("Could not find .kapi_specs section in vmlinux")?;
+
+ // Find __start_kapi_specs and __stop_kapi_specs symbols
+ let mut start_addr = None;
+ let mut stop_addr = None;
+
+ for sym in &elf.syms {
+ if let Some(name) = elf.strtab.get_at(sym.st_name) {
+ match name {
+ "__start_kapi_specs" => start_addr = Some(sym.st_value),
+ "__stop_kapi_specs" => stop_addr = Some(sym.st_value),
+ _ => {}
+ }
+ }
+ }
+
+ let start = start_addr.context("Could not find __start_kapi_specs symbol")?;
+ let stop = stop_addr.context("Could not find __stop_kapi_specs symbol")?;
+
+ if stop <= start {
+ anyhow::bail!("No kernel API specifications found in vmlinux");
+ }
+
+ // Calculate the offset within the file
+ let section_vaddr = kapi_section.sh_addr;
+ let file_offset = kapi_section.sh_offset + (start - section_vaddr);
+ let data_size: usize = (stop - start)
+ .try_into()
+ .context("Data size too large for platform")?;
+
+ let file_offset_usize: usize = file_offset
+ .try_into()
+ .context("File offset too large for platform")?;
+
+ if file_offset_usize + data_size > vmlinux_data.len() {
+ anyhow::bail!("Invalid offset/size for .kapi_specs data");
+ }
+
+ // Extract the raw data
+ let kapi_data = vmlinux_data[file_offset_usize..(file_offset_usize + data_size)].to_vec();
+
+ // Parse the specifications
+ let specs = parse_kapi_specs(&kapi_data)?;
+
+ Ok(VmlinuxExtractor {
+ kapi_data,
+ specs,
+ })
+ }
+
+}
+
+impl ApiExtractor for VmlinuxExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ // For vmlinux extractor, we return basic info only
+ // Detailed parsing happens in display_api_details
+ Ok(self.specs.iter().map(|spec| {
+ ApiSpec {
+ name: spec.name.clone(),
+ api_type: spec.api_type.clone(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: vec![],
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ }
+ }).collect())
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ Ok(self.specs.iter()
+ .find(|s| s.name == name)
+ .map(|spec| ApiSpec {
+ name: spec.name.clone(),
+ api_type: spec.api_type.clone(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: vec![],
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ }))
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+ // Parse the binary data into an ApiSpec
+ let api_spec = parse_binary_to_api_spec(&self.kapi_data, spec.offset)?;
+ // Use the common display function
+ super::display_api_spec(&api_spec, formatter, writer)?;
+ }
+ Ok(())
+ }
+}
+
+fn calculate_kernel_api_spec_size() -> usize {
+ // Calculate the total size of struct kernel_api_spec based on field layout
+ // Note: The struct is __attribute__((packed)) in the kernel
+ let _base_size = sizes::NAME + // name (128 bytes)
+ 4 + // api_type (enum, 4 bytes)
+ 4 + // version (u32, 4 bytes)
+ sizes::DESC + // description
+ sizes::DESC * 4 + // long_description
+ 4 + // context_flags
+ 4 + // param_count
+ param_spec_layout_size() * sizes::MAX_PARAMS + // params array
+ return_spec_layout_size() + // return_spec
+ 4 + // error_count
+ error_spec_layout_size() * sizes::MAX_ERRORS + // errors array
+ 4 + // lock_count
+ lock_spec_layout_size() * sizes::MAX_CONSTRAINTS + // locks array
+ 4 + // constraint_count
+ constraint_spec_layout_size() * sizes::MAX_CONSTRAINTS + // constraints array
+ sizes::DESC * 2 + // examples
+ sizes::DESC * 2 + // notes
+ 32 + // since_version[32]
+ 1 + // deprecated (bool)
+ sizes::NAME + // replacement
+ 4 + // signal_count
+ signal_spec_layout_size() * sizes::MAX_SIGNALS + // signals array
+ 4 + // signal_mask_count
+ signal_mask_spec_layout_size() * sizes::MAX_SIGNALS + // signal_masks array
+ 4 + // struct_spec_count
+ struct_spec_layout_size() * sizes::MAX_STRUCT_SPECS + // struct_specs array
+ 4 + // side_effect_count
+ side_effect_layout_size() * sizes::MAX_SIDE_EFFECTS + // side_effects array
+ 4 + // state_trans_count
+ state_transition_layout_size() * sizes::MAX_STATE_TRANS + // state_transitions array
+ 4 + // capability_count
+ capability_spec_layout_size() * sizes::MAX_CAPABILITIES + // capabilities array
+ sizes::NAME + // subsystem
+ sizes::NAME; // device_type
+
+ // Add networking-specific fields (CONFIG_NET)
+ // These are part of the kernel struct when CONFIG_NET is enabled
+ let _net_fields_size =
+ // struct kapi_socket_state_spec socket_state
+ socket_state_spec_layout_size() +
+ // struct kapi_protocol_behavior protocol_behaviors[KAPI_MAX_PROTOCOL_BEHAVIORS]
+ protocol_behavior_spec_layout_size() * 8 + // KAPI_MAX_PROTOCOL_BEHAVIORS = 8
+ 4 + // u32 protocol_behavior_count
+ // struct kapi_buffer_spec buffer_spec
+ buffer_spec_layout_size() +
+ // struct kapi_async_spec async_spec
+ async_spec_layout_size() +
+ // struct kapi_addr_family_spec addr_families[KAPI_MAX_ADDR_FAMILIES]
+ addr_family_spec_layout_size() * 8 + // KAPI_MAX_ADDR_FAMILIES = 8
+ 4 + // u32 addr_family_count
+ // Network operation characteristics (6 bools)
+ 6 + // 6 bool fields
+ // Network semantic descriptions (3 strings)
+ sizes::DESC * 3; // connection_establishment, connection_termination, data_transfer_semantics
+
+ // Add IOCTL-specific fields
+ let _ioctl_fields_size =
+ 4 + // unsigned int cmd
+ sizes::NAME + // char cmd_name[KAPI_MAX_NAME_LEN]
+ 8 + // size_t input_size (assuming 64-bit)
+ 8 + // size_t output_size (assuming 64-bit)
+ sizes::NAME; // char file_ops_name[KAPI_MAX_NAME_LEN]
+
+ // Return the observed kernel struct size (355033 bytes + 7 bytes padding)
+ 355_040
+}
+
+fn parse_kapi_specs(data: &[u8]) -> Result<Vec<KapiSpec>> {
+ let mut specs = Vec::new();
+
+ // Calculate the struct size dynamically
+ let struct_size = calculate_kernel_api_spec_size();
+
+ let mut offset = 0;
+ while offset + struct_size <= data.len() {
+ // Try to read the name at this offset
+ if let Some(name) = read_cstring(data, offset, 128) {
+ if is_valid_api_name(&name) {
+ // Read the api_type enum field (4 bytes after the name)
+ let api_type_offset = offset + 128;
+ let api_type = if api_type_offset + 4 <= data.len() {
+ let api_type_value = u32::from_le_bytes([
+ data[api_type_offset],
+ data[api_type_offset + 1],
+ data[api_type_offset + 2],
+ data[api_type_offset + 3],
+ ]);
+
+ match api_type_value {
+ 0 => "function", // KAPI_API_FUNCTION
+ 1 => "ioctl", // KAPI_API_IOCTL
+ 2 => "sysfs", // KAPI_API_SYSFS
+ _ => "unknown",
+ }
+ } else {
+ "unknown"
+ };
+
+ specs.push(KapiSpec {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ offset,
+ });
+ }
+ }
+
+ offset += struct_size;
+ }
+
+ // Handle any remaining data that might be a partial spec
+ if offset < data.len() && data.len() - offset >= 128 + 4 {
+ if let Some(name) = read_cstring(data, offset, 128) {
+ if is_valid_api_name(&name) {
+ // Read the api_type enum field
+ let api_type_offset = offset + 128;
+ let api_type = if api_type_offset + 4 <= data.len() {
+ let api_type_value = u32::from_le_bytes([
+ data[api_type_offset],
+ data[api_type_offset + 1],
+ data[api_type_offset + 2],
+ data[api_type_offset + 3],
+ ]);
+
+ match api_type_value {
+ 0 => "function", // KAPI_API_FUNCTION
+ 1 => "ioctl", // KAPI_API_IOCTL
+ 2 => "sysfs", // KAPI_API_SYSFS
+ _ => "unknown",
+ }
+ } else {
+ "unknown"
+ };
+
+ specs.push(KapiSpec {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ offset,
+ });
+ }
+ }
+ }
+
+ Ok(specs)
+}
+
+fn read_cstring(data: &[u8], offset: usize, max_len: usize) -> Option<String> {
+ if offset + max_len > data.len() {
+ return None;
+ }
+
+ let bytes = &data[offset..offset + max_len];
+ if let Some(null_pos) = bytes.iter().position(|&b| b == 0) {
+ if null_pos > 0 {
+ if let Ok(s) = std::str::from_utf8(&bytes[..null_pos]) {
+ return Some(s.to_string());
+ }
+ }
+ }
+ None
+}
+
+fn is_valid_api_name(name: &str) -> bool {
+ if name.is_empty() || name.len() > 100 {
+ return false;
+ }
+
+ // Just validate it's a proper identifier since we now use api_type field
+ name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_')
+}
+
+fn parse_binary_to_api_spec(data: &[u8], offset: usize) -> Result<ApiSpec> {
+ let mut reader = DataReader::new(data, offset);
+
+ // Read name
+ let name = reader.read_cstring(sizes::NAME)
+ .ok_or_else(|| anyhow::anyhow!("Failed to read API name"))?;
+
+ // Read api_type enum
+ let api_type = reader.read_u32()
+ .map(|v| match v {
+ 0 => "function", // KAPI_API_FUNCTION
+ 1 => "ioctl", // KAPI_API_IOCTL
+ 2 => "sysfs", // KAPI_API_SYSFS
+ _ => "unknown",
+ })
+ .unwrap_or("unknown")
+ .to_string();
+
+ // Read version
+ let version = reader.read_u32()
+ .map(|v| v.to_string());
+
+ // Read description
+ let description = reader.read_cstring(sizes::DESC)
+ .filter(|s| !s.is_empty());
+
+ // Read long description
+ let long_description = reader.read_cstring(sizes::DESC * 4)
+ .filter(|s| !s.is_empty());
+
+ // Read context flags
+ const KAPI_CTX_PROCESS: u32 = 1 << 0;
+ const KAPI_CTX_SOFTIRQ: u32 = 1 << 1;
+ const KAPI_CTX_HARDIRQ: u32 = 1 << 2;
+ const KAPI_CTX_NMI: u32 = 1 << 3;
+ const KAPI_CTX_USER: u32 = 1 << 4;
+ const KAPI_CTX_KERNEL: u32 = 1 << 5;
+ const KAPI_CTX_SLEEPABLE: u32 = 1 << 6;
+ const KAPI_CTX_ATOMIC: u32 = 1 << 7;
+ const KAPI_CTX_PREEMPTIBLE: u32 = 1 << 8;
+ const KAPI_CTX_MIGRATION_DISABLED: u32 = 1 << 9;
+
+ let context_flags = if let Some(flags) = reader.read_u32() {
+ let mut flag_strings = Vec::new();
+
+ // Build the flag string similar to source format
+ let mut parts = Vec::new();
+ if flags & KAPI_CTX_PROCESS != 0 { parts.push("KAPI_CTX_PROCESS"); }
+ if flags & KAPI_CTX_SOFTIRQ != 0 { parts.push("KAPI_CTX_SOFTIRQ"); }
+ if flags & KAPI_CTX_HARDIRQ != 0 { parts.push("KAPI_CTX_HARDIRQ"); }
+ if flags & KAPI_CTX_NMI != 0 { parts.push("KAPI_CTX_NMI"); }
+ if flags & KAPI_CTX_USER != 0 { parts.push("KAPI_CTX_USER"); }
+ if flags & KAPI_CTX_KERNEL != 0 { parts.push("KAPI_CTX_KERNEL"); }
+ if flags & KAPI_CTX_SLEEPABLE != 0 { parts.push("KAPI_CTX_SLEEPABLE"); }
+ if flags & KAPI_CTX_ATOMIC != 0 { parts.push("KAPI_CTX_ATOMIC"); }
+ if flags & KAPI_CTX_PREEMPTIBLE != 0 { parts.push("KAPI_CTX_PREEMPTIBLE"); }
+ if flags & KAPI_CTX_MIGRATION_DISABLED != 0 { parts.push("KAPI_CTX_MIGRATION_DISABLED"); }
+
+ if !parts.is_empty() {
+ flag_strings.push(parts.join(" | "));
+ }
+ flag_strings
+ } else {
+ vec![]
+ };
+
+ // Read parameter count
+ let param_count = reader.read_u32();
+
+ // Parse parameters
+ let mut parameters = Vec::new();
+ if let Some(count) = param_count {
+ if count > 0 && count as usize <= sizes::MAX_PARAMS {
+ for i in 0..count {
+ if let Some(mut param) = parse_parameter(&mut reader) {
+ param.index = i;
+ parameters.push(param);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(param_spec_layout_size() * (sizes::MAX_PARAMS - count as usize));
+ } else {
+ reader.skip(param_spec_layout_size() * sizes::MAX_PARAMS);
+ }
+ }
+
+ // Parse return spec
+ let return_spec = parse_return_spec(&mut reader);
+
+ // Read error count
+ let error_count = reader.read_u32();
+
+ // Parse errors
+ let mut errors = Vec::new();
+ if let Some(count) = error_count {
+ if count > 0 && count as usize <= sizes::MAX_ERRORS {
+ for _ in 0..count {
+ if let Some(error) = parse_error(&mut reader) {
+ errors.push(error);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(error_spec_layout_size() * (sizes::MAX_ERRORS - count as usize));
+ } else {
+ reader.skip(error_spec_layout_size() * sizes::MAX_ERRORS);
+ }
+ }
+
+ // Parse locks
+ let mut locks = Vec::new();
+ if let Some(count) = reader.read_u32() {
+ if count > 0 && count as usize <= sizes::MAX_CONSTRAINTS {
+ for _ in 0..count {
+ if let Some(lock) = parse_lock(&mut reader) {
+ locks.push(lock);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(lock_spec_layout_size() * (sizes::MAX_CONSTRAINTS - count as usize));
+ } else {
+ reader.skip(lock_spec_layout_size() * sizes::MAX_CONSTRAINTS);
+ }
+ }
+
+ // Parse constraints
+ let mut constraints = Vec::new();
+ if let Some(count) = reader.read_u32() {
+ if count > 0 && count as usize <= sizes::MAX_CONSTRAINTS {
+ for _ in 0..count {
+ if let Some(constraint) = parse_constraint(&mut reader) {
+ constraints.push(constraint);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(constraint_spec_layout_size() * (sizes::MAX_CONSTRAINTS - count as usize));
+ } else {
+ reader.skip(constraint_spec_layout_size() * sizes::MAX_CONSTRAINTS);
+ }
+ }
+
+ // Read examples
+ let examples = reader.read_cstring(sizes::DESC * 2)
+ .filter(|s| !s.is_empty());
+
+ // Read notes
+ let notes = reader.read_cstring(sizes::DESC * 2)
+ .filter(|s| !s.is_empty());
+
+ // Read since_version
+ let since_version = reader.read_cstring(32)
+ .filter(|s| !s.is_empty());
+
+ // Skip deprecated and replacement
+ reader.skip(1); // deprecated (bool)
+ reader.skip(sizes::NAME); // replacement
+
+ // Parse signals
+ let mut signals = Vec::new();
+ if let Some(count) = reader.read_u32() {
+ if count > 0 && count as usize <= sizes::MAX_SIGNALS {
+ for _ in 0..count {
+ if let Some(signal) = parse_signal(&mut reader) {
+ signals.push(signal);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(signal_spec_layout_size() * (sizes::MAX_SIGNALS - count as usize));
+ } else {
+ reader.skip(signal_spec_layout_size() * sizes::MAX_SIGNALS);
+ }
+ }
+
+ // Parse signal masks
+ let signal_mask_count = reader.read_u32();
+ let mut signal_masks = Vec::new();
+ if let Some(count) = signal_mask_count {
+ if count > 0 && count as usize <= sizes::MAX_SIGNALS {
+ for _ in 0..count {
+ if let Some(mask) = parse_signal_mask(&mut reader) {
+ signal_masks.push(mask);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(signal_mask_spec_layout_size() * (sizes::MAX_SIGNALS - count as usize));
+ } else {
+ reader.skip(signal_mask_spec_layout_size() * sizes::MAX_SIGNALS);
+ }
+ }
+
+ // Skip struct specs
+ if let Some(struct_spec_count) = reader.read_u32() {
+ if struct_spec_count > 0 && struct_spec_count as usize <= sizes::MAX_STRUCT_SPECS {
+ reader.skip(struct_spec_layout_size() * struct_spec_count as usize);
+ reader.skip(struct_spec_layout_size() * (sizes::MAX_STRUCT_SPECS - struct_spec_count as usize));
+ } else {
+ reader.skip(struct_spec_layout_size() * sizes::MAX_STRUCT_SPECS);
+ }
+ }
+
+ // Parse side effects
+ let mut side_effects = Vec::new();
+ if let Some(count) = reader.read_u32() {
+ if count > 0 && count as usize <= sizes::MAX_SIDE_EFFECTS {
+ for _ in 0..count {
+ if let Some(effect) = parse_side_effect(&mut reader) {
+ side_effects.push(effect);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(side_effect_layout_size() * (sizes::MAX_SIDE_EFFECTS - count as usize));
+ } else {
+ reader.skip(side_effect_layout_size() * sizes::MAX_SIDE_EFFECTS);
+ }
+ }
+
+ // Parse state transitions
+ let mut state_transitions = Vec::new();
+ if let Some(count) = reader.read_u32() {
+ if count > 0 && count as usize <= sizes::MAX_STATE_TRANS {
+ for _ in 0..count {
+ if let Some(trans) = parse_state_transition(&mut reader) {
+ state_transitions.push(trans);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(state_transition_layout_size() * (sizes::MAX_STATE_TRANS - count as usize));
+ } else {
+ reader.skip(state_transition_layout_size() * sizes::MAX_STATE_TRANS);
+ }
+ }
+
+ // Read capabilities
+ let mut capabilities = Vec::new();
+ if let Some(capability_count) = reader.read_u32() {
+ if capability_count > 0 && capability_count as usize <= sizes::MAX_CAPABILITIES {
+ for _ in 0..capability_count {
+ if let Some(cap) = parse_capability(&mut reader) {
+ capabilities.push(cap);
+ }
+ }
+ // Skip remaining slots
+ reader.skip(capability_spec_layout_size() * (sizes::MAX_CAPABILITIES - capability_count as usize));
+ } else {
+ reader.skip(capability_spec_layout_size() * sizes::MAX_CAPABILITIES);
+ }
+ }
+
+
+ // Sysfs fields not yet available in binary format
+ let subsystem = None;
+ let sysfs_path = None;
+ let permissions = None;
+
+ Ok(ApiSpec {
+ name,
+ api_type,
+ description,
+ long_description,
+ version,
+ context_flags,
+ param_count,
+ error_count,
+ examples,
+ notes,
+ since_version,
+ subsystem,
+ sysfs_path,
+ permissions,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities,
+ parameters,
+ return_spec,
+ errors,
+ signals,
+ signal_masks,
+ side_effects,
+ state_transitions,
+ constraints,
+ locks,
+ })
+}
+
+// Parse a single capability from the binary data
+fn parse_capability(reader: &mut DataReader) -> Option<CapabilitySpec> {
+ let capability = reader.read_i32()?;
+ let cap_name = reader.read_cstring(sizes::NAME)?;
+ let action = reader.read_u32()?;
+ let allows = reader.read_cstring(sizes::DESC).unwrap_or_default();
+ let without_cap = reader.read_cstring(sizes::DESC).unwrap_or_default();
+ let check_condition = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let priority = reader.read_u8();
+
+ // Read alternatives array
+ let mut alternatives = Vec::new();
+ for _ in 0..sizes::MAX_CAPABILITIES {
+ if let Some(alt) = reader.read_i32() {
+ if alt != 0 && alt != -1 {
+ alternatives.push(alt);
+ }
+ }
+ }
+
+ let _alternative_count = reader.read_u32();
+
+ // Convert action enum value to string
+ let action_str = match action {
+ 0 => "Bypasses check",
+ 1 => "Increases limit",
+ 2 => "Overrides restriction",
+ 3 => "Grants permission",
+ 4 => "Modifies behavior",
+ 5 => "Allows resource access",
+ 6 => "Allows operation",
+ _ => "Unknown action",
+ }.to_string();
+
+ Some(CapabilitySpec {
+ capability,
+ name: cap_name,
+ action: action_str,
+ allows,
+ without_cap,
+ check_condition,
+ priority,
+ alternatives,
+ })
+}
+
+// Parse a single parameter from the binary data
+fn parse_parameter(reader: &mut DataReader) -> Option<ParamSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let type_name = reader.read_cstring(sizes::NAME)?;
+ let param_type = reader.read_u32()?;
+ let flags = reader.read_u32()?;
+ let size = reader.read_u64()?;
+ let alignment = reader.read_u64()?;
+ let min_value = reader.read_i64();
+ let max_value = reader.read_i64();
+ let valid_mask = reader.read_u64();
+ reader.skip(8); // enum_values pointer
+ let _enum_count = reader.read_u32()?;
+ let constraint_type = reader.read_u32()?;
+ reader.skip(8); // validate function pointer
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+ let constraint = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let _size_param_idx = reader.read_i32();
+ let _size_multiplier = reader.read_u64();
+ // Skip sysfs-specific fields
+ reader.skip(sizes::NAME); // sysfs_path
+ reader.skip(2); // sysfs_permissions (umode_t)
+ reader.skip(sizes::NAME); // default_value
+ reader.skip(32); // units
+ reader.skip(8); // step
+ reader.skip(8); // allowed_strings pointer
+ reader.skip(4); // allowed_string_count
+
+ // Calculate parameter index from position
+ let index = 0; // Will be set by caller
+
+ Some(ParamSpec {
+ index,
+ name,
+ type_name,
+ description,
+ flags,
+ param_type,
+ constraint_type,
+ constraint,
+ min_value,
+ max_value,
+ valid_mask,
+ enum_values: vec![], // Can't read from binary pointers
+ size: Some(size.try_into().unwrap_or(u32::MAX)),
+ alignment: Some(alignment.try_into().unwrap_or(u32::MAX)),
+ })
+}
+
+// Parse return specification from the binary data
+fn parse_return_spec(reader: &mut DataReader) -> Option<ReturnSpec> {
+ let type_name = reader.read_cstring(sizes::NAME)?;
+ let return_type = reader.read_u32()?;
+ let check_type = reader.read_u32()?;
+ let success_value = reader.read_i64();
+ let success_min = reader.read_i64();
+ let success_max = reader.read_i64();
+ reader.skip(8); // error_values pointer
+ let _error_count = reader.read_u32()?;
+ reader.skip(8); // is_success function pointer
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+
+ Some(ReturnSpec {
+ type_name,
+ description,
+ return_type,
+ check_type,
+ success_value,
+ success_min,
+ success_max,
+ error_values: vec![], // Can't read from binary pointers
+ })
+}
+
+// Parse a single error specification from the binary data
+fn parse_error(reader: &mut DataReader) -> Option<ErrorSpec> {
+ let error_code = reader.read_i32()?;
+ let name = reader.read_cstring(sizes::NAME)?;
+ let condition = reader.read_cstring(sizes::DESC).unwrap_or_default();
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+
+ Some(ErrorSpec {
+ error_code,
+ name,
+ condition,
+ description,
+ })
+}
+
+// Parse a single signal specification from the binary data
+fn parse_signal(reader: &mut DataReader) -> Option<SignalSpec> {
+ let signal_num = reader.read_i32()?;
+ let signal_name = reader.read_cstring(32)?; // Fixed size in struct
+ let direction = reader.read_u32()?;
+ let action = reader.read_u32()?;
+ let target = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let condition = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let description = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let restartable = reader.read_u8()? != 0;
+ let sa_flags_required = reader.read_u32()?;
+ let sa_flags_forbidden = reader.read_u32()?;
+ let error_on_signal = reader.read_i32();
+ let _transform_to = reader.read_i32();
+ let timing_str = reader.read_cstring(32)?;
+ let priority = reader.read_u8()? as u32;
+ let interruptible = reader.read_u8()? != 0;
+ let queue = reader.read_cstring(128).filter(|s| !s.is_empty());
+ let state_required = reader.read_u32()?;
+ let state_forbidden = reader.read_u32()?;
+
+ // Convert timing string to enum value
+ let timing = match timing_str.as_str() {
+ "BEFORE" => 0,
+ "AFTER" => 2,
+ "EXIT" => 3,
+ _ => 1, // Default to DURING (includes "DURING")
+ };
+
+ Some(SignalSpec {
+ signal_num,
+ signal_name,
+ direction,
+ action,
+ target,
+ condition,
+ description,
+ timing,
+ priority,
+ restartable,
+ interruptible,
+ queue,
+ sa_flags: 0, // Not in struct
+ sa_flags_required,
+ sa_flags_forbidden,
+ state_required,
+ state_forbidden,
+ error_on_signal,
+ })
+}
+
+// Parse a single signal mask specification from the binary data
+fn parse_signal_mask(reader: &mut DataReader) -> Option<SignalMaskSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ // Skip signals array
+ reader.skip(4 * sizes::MAX_SIGNALS); // int array
+ let _signal_count = reader.read_u32()?;
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+
+ Some(SignalMaskSpec {
+ name,
+ description,
+ })
+}
+
+// Parse a single side effect specification from the binary data
+fn parse_side_effect(reader: &mut DataReader) -> Option<SideEffectSpec> {
+ let effect_type = reader.read_u32()?;
+ let target = reader.read_cstring(sizes::NAME)?;
+ let condition = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+ let reversible = reader.read_u8()? != 0;
+
+ Some(SideEffectSpec {
+ effect_type,
+ target,
+ condition,
+ description,
+ reversible,
+ })
+}
+
+// Parse a single state transition specification from the binary data
+fn parse_state_transition(reader: &mut DataReader) -> Option<StateTransitionSpec> {
+ let from_state = reader.read_cstring(sizes::NAME)?;
+ let to_state = reader.read_cstring(sizes::NAME)?;
+ let condition = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+ let object = reader.read_cstring(sizes::NAME)?;
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+
+ Some(StateTransitionSpec {
+ object,
+ from_state,
+ to_state,
+ condition,
+ description,
+ })
+}
+
+// Parse a single constraint specification from the binary data
+fn parse_constraint(reader: &mut DataReader) -> Option<ConstraintSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+ let expression = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+
+ Some(ConstraintSpec {
+ name,
+ description,
+ expression,
+ })
+}
+
+// Parse a single lock specification from the binary data
+fn parse_lock(reader: &mut DataReader) -> Option<LockSpec> {
+ let lock_name = reader.read_cstring(sizes::NAME)?;
+ let lock_type = reader.read_u32()?;
+ let acquired = reader.read_u8()? != 0;
+ let released = reader.read_u8()? != 0;
+ let held_on_entry = reader.read_u8()? != 0;
+ let held_on_exit = reader.read_u8()? != 0;
+ let description = reader.read_cstring(sizes::DESC).unwrap_or_default();
+
+ Some(LockSpec {
+ lock_name,
+ lock_type,
+ acquired,
+ released,
+ held_on_entry,
+ held_on_exit,
+ description,
+ })
+}
+
+// Old display_api_details_from_binary function removed - now using parse_binary_to_api_spec + display_api_spec
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_parse_capability() {
+ // Create mock binary data for a capability
+ let mut data = Vec::new();
+
+ // capability (i32) = 14 (CAP_IPC_LOCK)
+ data.extend_from_slice(&14i32.to_le_bytes());
+
+ // cap_name (128 bytes) = "CAP_IPC_LOCK"
+ let mut name_bytes = b"CAP_IPC_LOCK".to_vec();
+ name_bytes.resize(128, 0);
+ data.extend_from_slice(&name_bytes);
+
+ // action (u32) = 0 (KAPI_CAP_BYPASS_CHECK)
+ data.extend_from_slice(&0u32.to_le_bytes());
+
+ // allows (512 bytes)
+ let mut allows_bytes = b"Bypass RLIMIT_MEMLOCK check entirely".to_vec();
+ allows_bytes.resize(512, 0);
+ data.extend_from_slice(&allows_bytes);
+
+ // without_cap (512 bytes)
+ let mut without_bytes = b"Must stay within RLIMIT_MEMLOCK".to_vec();
+ without_bytes.resize(512, 0);
+ data.extend_from_slice(&without_bytes);
+
+ // check_condition (512 bytes)
+ let mut condition_bytes = b"When memory would exceed limit".to_vec();
+ condition_bytes.resize(512, 0);
+ data.extend_from_slice(&condition_bytes);
+
+ // priority (u8) = 0
+ data.push(0);
+
+ // alternatives (4 * 8 = 32 bytes) - all zeros
+ data.extend_from_slice(&[0u8; 32]);
+
+ // alternative_count (u32) = 0
+ data.extend_from_slice(&0u32.to_le_bytes());
+
+ // Parse the capability
+ let mut reader = DataReader::new(&data, 0);
+ let cap = parse_capability(&mut reader).unwrap();
+
+ assert_eq!(cap.capability, 14);
+ assert_eq!(cap.name, "CAP_IPC_LOCK");
+ assert_eq!(cap.action, "Bypasses check");
+ assert_eq!(cap.allows, "Bypass RLIMIT_MEMLOCK check entirely");
+ assert_eq!(cap.without_cap, "Must stay within RLIMIT_MEMLOCK");
+ assert_eq!(cap.check_condition, Some("When memory would exceed limit".to_string()));
+ assert_eq!(cap.priority, Some(0));
+ assert!(cap.alternatives.is_empty());
+ }
+
+ #[test]
+ fn test_calculate_struct_size() {
+ let size = calculate_kernel_api_spec_size();
+ // The actual kernel struct size is 308064, our calculation gives 308305
+ // The difference is acceptable for alignment/padding
+ assert!(size > 308000 && size < 309000, "Struct size {} is out of expected range", size);
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/json.rs b/tools/kapi/src/formatter/json.rs
new file mode 100644
index 0000000000000..836741fdcb91b
--- /dev/null
+++ b/tools/kapi/src/formatter/json.rs
@@ -0,0 +1,420 @@
+use super::OutputFormatter;
+use std::io::Write;
+use serde::Serialize;
+use crate::extractor::{SocketStateSpec, ProtocolBehaviorSpec, AddrFamilySpec, BufferSpec, AsyncSpec, CapabilitySpec,
+ ParamSpec, ReturnSpec, ErrorSpec, SignalSpec, SignalMaskSpec, SideEffectSpec, StateTransitionSpec, ConstraintSpec, LockSpec};
+
+pub struct JsonFormatter {
+ data: JsonData,
+}
+
+#[derive(Serialize)]
+struct JsonData {
+ #[serde(skip_serializing_if = "Option::is_none")]
+ apis: Option<Vec<JsonApi>>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ api_details: Option<JsonApiDetails>,
+}
+
+#[derive(Serialize)]
+struct JsonApi {
+ name: String,
+ api_type: String,
+}
+
+#[derive(Serialize)]
+struct JsonApiDetails {
+ name: String,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ description: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ long_description: Option<String>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ context_flags: Vec<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ examples: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ notes: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ since_version: Option<String>,
+ // Sysfs-specific fields
+ #[serde(skip_serializing_if = "Option::is_none")]
+ subsystem: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ sysfs_path: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ permissions: Option<String>,
+ // Networking-specific fields
+ #[serde(skip_serializing_if = "Option::is_none")]
+ socket_state: Option<SocketStateSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ addr_families: Vec<AddrFamilySpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ buffer_spec: Option<BufferSpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ async_spec: Option<AsyncSpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ net_data_transfer: Option<String>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ capabilities: Vec<CapabilitySpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ state_transitions: Vec<StateTransitionSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ side_effects: Vec<SideEffectSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ parameters: Vec<ParamSpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ return_spec: Option<ReturnSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ errors: Vec<ErrorSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ locks: Vec<LockSpec>,
+}
+
+
+impl JsonFormatter {
+ pub fn new() -> Self {
+ JsonFormatter {
+ data: JsonData {
+ apis: None,
+ api_details: None,
+ }
+ }
+ }
+}
+
+impl OutputFormatter for JsonFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ let json = serde_json::to_string_pretty(&self.data)?;
+ writeln!(w, "{json}")?;
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, _w: &mut dyn Write, _title: &str) -> std::io::Result<()> {
+ self.data.apis = Some(Vec::new());
+ Ok(())
+ }
+
+ fn api_item(&mut self, _w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+ if let Some(apis) = &mut self.data.apis {
+ apis.push(JsonApi {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ });
+ }
+ Ok(())
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, _w: &mut dyn Write, _count: usize) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_details(&mut self, _w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.data.api_details = Some(JsonApiDetails {
+ name: name.to_string(),
+ description: None,
+ long_description: None,
+ context_flags: Vec::new(),
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: Vec::new(),
+ addr_families: Vec::new(),
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: Vec::new(),
+ state_transitions: Vec::new(),
+ side_effects: Vec::new(),
+ parameters: Vec::new(),
+ return_spec: None,
+ errors: Vec::new(),
+ locks: Vec::new(),
+ });
+ Ok(())
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.description = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn long_description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.long_description = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn begin_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn context_flag(&mut self, _w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.context_flags.push(flag.to_string());
+ }
+ Ok(())
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, _w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.examples = Some(examples.to_string());
+ }
+ Ok(())
+ }
+
+ fn notes(&mut self, _w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.notes = Some(notes.to_string());
+ }
+ Ok(())
+ }
+
+ fn since_version(&mut self, _w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.since_version = Some(version.to_string());
+ }
+ Ok(())
+ }
+
+ fn sysfs_subsystem(&mut self, _w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.subsystem = Some(subsystem.to_string());
+ }
+ Ok(())
+ }
+
+ fn sysfs_path(&mut self, _w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.sysfs_path = Some(path.to_string());
+ }
+ Ok(())
+ }
+
+ fn sysfs_permissions(&mut self, _w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.permissions = Some(perms.to_string());
+ }
+ Ok(())
+ }
+
+ // Networking-specific methods
+ fn socket_state(&mut self, _w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.socket_state = Some(state.clone());
+ }
+ Ok(())
+ }
+
+ fn begin_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn protocol_behavior(&mut self, _w: &mut dyn Write, behavior: &ProtocolBehaviorSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.protocol_behaviors.push(behavior.clone());
+ }
+ Ok(())
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn addr_family(&mut self, _w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.addr_families.push(family.clone());
+ }
+ Ok(())
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, _w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.buffer_spec = Some(spec.clone());
+ }
+ Ok(())
+ }
+
+ fn async_spec(&mut self, _w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.async_spec = Some(spec.clone());
+ }
+ Ok(())
+ }
+
+ fn net_data_transfer(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.net_data_transfer = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn begin_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn capability(&mut self, _w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.capabilities.push(cap.clone());
+ }
+ Ok(())
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Stub implementations for new methods
+ fn parameter(&mut self, _w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.parameters.push(param.clone());
+ }
+ Ok(())
+ }
+
+ fn return_spec(&mut self, _w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.return_spec = Some(ret.clone());
+ }
+ Ok(())
+ }
+
+ fn error(&mut self, _w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.errors.push(error.clone());
+ }
+ Ok(())
+ }
+
+ fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal(&mut self, _w: &mut dyn Write, _signal: &SignalSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal_mask(&mut self, _w: &mut dyn Write, _mask: &SignalMaskSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_side_effects(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn side_effect(&mut self, _w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.side_effects.push(effect.clone());
+ }
+ Ok(())
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn state_transition(&mut self, _w: &mut dyn Write, trans: &StateTransitionSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.state_transitions.push(trans.clone());
+ }
+ Ok(())
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn constraint(&mut self, _w: &mut dyn Write, _constraint: &ConstraintSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn lock(&mut self, _w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.locks.push(lock.clone());
+ }
+ Ok(())
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/mod.rs b/tools/kapi/src/formatter/mod.rs
new file mode 100644
index 0000000000000..ec61827ba47b5
--- /dev/null
+++ b/tools/kapi/src/formatter/mod.rs
@@ -0,0 +1,130 @@
+use std::io::Write;
+use crate::extractor::{SocketStateSpec, ProtocolBehaviorSpec, AddrFamilySpec, BufferSpec, AsyncSpec, CapabilitySpec,
+ ParamSpec, ReturnSpec, ErrorSpec, SignalSpec, SignalMaskSpec, SideEffectSpec, StateTransitionSpec, ConstraintSpec, LockSpec};
+
+mod plain;
+mod json;
+mod rst;
+mod shall;
+
+pub use plain::PlainFormatter;
+pub use json::JsonFormatter;
+pub use rst::RstFormatter;
+pub use shall::ShallFormatter;
+
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum OutputFormat {
+ Plain,
+ Json,
+ Rst,
+ Shall,
+}
+
+impl std::str::FromStr for OutputFormat {
+ type Err = String;
+
+ fn from_str(s: &str) -> Result<Self, Self::Err> {
+ match s.to_lowercase().as_str() {
+ "plain" => Ok(OutputFormat::Plain),
+ "json" => Ok(OutputFormat::Json),
+ "rst" => Ok(OutputFormat::Rst),
+ "shall" => Ok(OutputFormat::Shall),
+ _ => Err(format!("Unknown output format: {}", s)),
+ }
+ }
+}
+
+pub trait OutputFormatter {
+ fn begin_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()>;
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()>;
+ fn end_api_list(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()>;
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()>;
+ fn end_api_details(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()>;
+ fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()>;
+ fn end_parameters(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()>;
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()>;
+ fn end_errors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()>;
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()>;
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()>;
+
+ // Sysfs-specific methods
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()>;
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()>;
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()>;
+
+ // Networking-specific methods
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()>;
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn protocol_behavior(&mut self, w: &mut dyn Write, behavior: &ProtocolBehaviorSpec) -> std::io::Result<()>;
+ fn end_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()>;
+ fn end_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()>;
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()>;
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()>;
+ fn end_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ // Signal-related methods
+ fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()>;
+ fn end_signals(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()>;
+ fn end_signal_masks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ // Side effects and state transitions
+ fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()>;
+ fn end_side_effects(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn state_transition(&mut self, w: &mut dyn Write, trans: &StateTransitionSpec) -> std::io::Result<()>;
+ fn end_state_transitions(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ // Constraints and locks
+ fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpec) -> std::io::Result<()>;
+ fn end_constraints(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()>;
+ fn end_locks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+}
+
+pub fn create_formatter(format: OutputFormat) -> Box<dyn OutputFormatter> {
+ match format {
+ OutputFormat::Plain => Box::new(PlainFormatter::new()),
+ OutputFormat::Json => Box::new(JsonFormatter::new()),
+ OutputFormat::Rst => Box::new(RstFormatter::new()),
+ OutputFormat::Shall => Box::new(ShallFormatter::new()),
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/plain.rs b/tools/kapi/src/formatter/plain.rs
new file mode 100644
index 0000000000000..cc78026f20dd1
--- /dev/null
+++ b/tools/kapi/src/formatter/plain.rs
@@ -0,0 +1,465 @@
+use super::OutputFormatter;
+use std::io::Write;
+use crate::extractor::{SocketStateSpec, ProtocolBehaviorSpec, AddrFamilySpec, BufferSpec, AsyncSpec, CapabilitySpec,
+ ParamSpec, ReturnSpec, ErrorSpec, SignalSpec, SignalMaskSpec, SideEffectSpec, StateTransitionSpec, ConstraintSpec, LockSpec};
+
+pub struct PlainFormatter;
+
+impl PlainFormatter {
+ pub fn new() -> Self {
+ PlainFormatter
+ }
+}
+
+impl OutputFormatter for PlainFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ writeln!(w, "\n{title}:")?;
+ writeln!(w, "{}", "-".repeat(title.len() + 1))
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str) -> std::io::Result<()> {
+ writeln!(w, " {name}")
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\nTotal specifications found: {count}")
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ writeln!(w, "\nDetailed information for {name}:")?;
+ writeln!(w, "{}=", "=".repeat(25 + name.len()))
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "Description: {desc}")
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "\nDetailed Description:")?;
+ writeln!(w, "{desc}")
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nExecution Context:")
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ writeln!(w, " - {flag}")
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nParameters ({count}):")
+ }
+
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ writeln!(w, " [{}] {} ({})", param.index, param.name, param.type_name)?;
+ if !param.description.is_empty() {
+ writeln!(w, " {}", param.description)?;
+ }
+
+ // Display flags
+ let mut flags = Vec::new();
+ if param.flags & 0x01 != 0 { flags.push("IN"); }
+ if param.flags & 0x02 != 0 { flags.push("OUT"); }
+ if param.flags & 0x04 != 0 { flags.push("INOUT"); }
+ if param.flags & 0x08 != 0 { flags.push("USER"); }
+ if param.flags & 0x10 != 0 { flags.push("OPTIONAL"); }
+ if !flags.is_empty() {
+ writeln!(w, " Flags: {}", flags.join(" | "))?;
+ }
+
+ // Display constraints
+ if let Some(constraint) = ¶m.constraint {
+ writeln!(w, " Constraint: {constraint}")?;
+ }
+ if let (Some(min), Some(max)) = (param.min_value, param.max_value) {
+ writeln!(w, " Range: {min} to {max}")?;
+ }
+ if let Some(mask) = param.valid_mask {
+ writeln!(w, " Valid mask: 0x{mask:x}")?;
+ }
+ Ok(())
+ }
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ writeln!(w, "\nReturn Value:")?;
+ writeln!(w, " Type: {}", ret.type_name)?;
+ writeln!(w, " {}", ret.description)?;
+ if let Some(val) = ret.success_value {
+ writeln!(w, " Success value: {val}")?;
+ }
+ if let (Some(min), Some(max)) = (ret.success_min, ret.success_max) {
+ writeln!(w, " Success range: {min} to {max}")?;
+ }
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nPossible Errors ({count}):")
+ }
+
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ writeln!(w, " {} ({})", error.name, error.error_code)?;
+ if !error.condition.is_empty() {
+ writeln!(w, " Condition: {}", error.condition)?;
+ }
+ if !error.description.is_empty() {
+ writeln!(w, " {}", error.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ writeln!(w, "\nExamples:")?;
+ writeln!(w, "{examples}")
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ writeln!(w, "\nNotes:")?;
+ writeln!(w, "{notes}")
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, "\nAvailable since: {version}")
+ }
+
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ writeln!(w, "Subsystem: {subsystem}")
+ }
+
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ writeln!(w, "Sysfs Path: {path}")
+ }
+
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ writeln!(w, "Permissions: {perms}")
+ }
+
+ // Networking-specific methods
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ writeln!(w, "\nSocket State Requirements:")?;
+ if !state.required_states.is_empty() {
+ writeln!(w, " Required states: {:?}", state.required_states)?;
+ }
+ if !state.forbidden_states.is_empty() {
+ writeln!(w, " Forbidden states: {:?}", state.forbidden_states)?;
+ }
+ if let Some(result) = &state.resulting_state {
+ writeln!(w, " Resulting state: {result}")?;
+ }
+ if let Some(cond) = &state.condition {
+ writeln!(w, " Condition: {cond}")?;
+ }
+ if let Some(protos) = &state.applicable_protocols {
+ writeln!(w, " Applicable protocols: {protos}")?;
+ }
+ Ok(())
+ }
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nProtocol-Specific Behaviors:")
+ }
+
+ fn protocol_behavior(&mut self, w: &mut dyn Write, behavior: &ProtocolBehaviorSpec) -> std::io::Result<()> {
+ writeln!(w, " {} - {}", behavior.applicable_protocols, behavior.behavior)?;
+ if let Some(flags) = &behavior.protocol_flags {
+ writeln!(w, " Flags: {flags}")?;
+ }
+ Ok(())
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nSupported Address Families:")
+ }
+
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ writeln!(w, " {} ({}):", family.family_name, family.family)?;
+ writeln!(w, " Struct size: {} bytes", family.addr_struct_size)?;
+ writeln!(w, " Address length: {}-{} bytes", family.min_addr_len, family.max_addr_len)?;
+ if let Some(format) = &family.addr_format {
+ writeln!(w, " Format: {format}")?;
+ }
+ writeln!(w, " Features: wildcard={}, multicast={}, broadcast={}",
+ family.supports_wildcard, family.supports_multicast, family.supports_broadcast)?;
+ if let Some(special) = &family.special_addresses {
+ writeln!(w, " Special addresses: {special}")?;
+ }
+ if family.port_range_max > 0 {
+ writeln!(w, " Port range: {}-{}", family.port_range_min, family.port_range_max)?;
+ }
+ Ok(())
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ writeln!(w, "\nBuffer Specification:")?;
+ if let Some(behaviors) = &spec.buffer_behaviors {
+ writeln!(w, " Behaviors: {behaviors}")?;
+ }
+ if let Some(min) = spec.min_buffer_size {
+ writeln!(w, " Min size: {min} bytes")?;
+ }
+ if let Some(max) = spec.max_buffer_size {
+ writeln!(w, " Max size: {max} bytes")?;
+ }
+ if let Some(optimal) = spec.optimal_buffer_size {
+ writeln!(w, " Optimal size: {optimal} bytes")?;
+ }
+ Ok(())
+ }
+
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ writeln!(w, "\nAsynchronous Operation:")?;
+ if let Some(modes) = &spec.supported_modes {
+ writeln!(w, " Supported modes: {modes}")?;
+ }
+ if let Some(errno) = spec.nonblock_errno {
+ writeln!(w, " Non-blocking errno: {errno}")?;
+ }
+ Ok(())
+ }
+
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "\nNetwork Data Transfer: {desc}")
+ }
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nRequired Capabilities:")
+ }
+
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ writeln!(w, " {} ({}) - {}", cap.name, cap.capability, cap.action)?;
+ if !cap.allows.is_empty() {
+ writeln!(w, " Allows: {}", cap.allows)?;
+ }
+ if !cap.without_cap.is_empty() {
+ writeln!(w, " Without capability: {}", cap.without_cap)?;
+ }
+ if let Some(cond) = &cap.check_condition {
+ writeln!(w, " Condition: {cond}")?;
+ }
+ Ok(())
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Signal-related methods
+ fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSignal Specifications ({count}):")
+ }
+
+ fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()> {
+ write!(w, " {} ({})", signal.signal_name, signal.signal_num)?;
+
+ // Display direction
+ let direction = match signal.direction {
+ 0 => "SEND",
+ 1 => "RECEIVE",
+ 2 => "HANDLE",
+ 3 => "IGNORE",
+ _ => "UNKNOWN",
+ };
+ write!(w, " - {direction}")?;
+
+ // Display action
+ let action = match signal.action {
+ 0 => "DEFAULT",
+ 1 => "TERMINATE",
+ 2 => "COREDUMP",
+ 3 => "STOP",
+ 4 => "CONTINUE",
+ 5 => "IGNORE",
+ 6 => "CUSTOM",
+ 7 => "DISCARD",
+ _ => "UNKNOWN",
+ };
+ writeln!(w, " - {action}")?;
+
+ if let Some(target) = &signal.target {
+ writeln!(w, " Target: {target}")?;
+ }
+ if let Some(condition) = &signal.condition {
+ writeln!(w, " Condition: {condition}")?;
+ }
+ if let Some(desc) = &signal.description {
+ writeln!(w, " {desc}")?;
+ }
+
+ // Display timing
+ let timing = match signal.timing {
+ 0 => "BEFORE",
+ 1 => "DURING",
+ 2 => "AFTER",
+ 3 => "EXIT",
+ _ => "UNKNOWN",
+ };
+ writeln!(w, " Timing: {timing}")?;
+ writeln!(w, " Priority: {}", signal.priority)?;
+
+ if signal.restartable {
+ writeln!(w, " Restartable: yes")?;
+ }
+ if signal.interruptible {
+ writeln!(w, " Interruptible: yes")?;
+ }
+ if let Some(error) = signal.error_on_signal {
+ writeln!(w, " Error on signal: {error}")?;
+ }
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSignal Masks ({count}):")
+ }
+
+ fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()> {
+ writeln!(w, " {}", mask.name)?;
+ if !mask.description.is_empty() {
+ writeln!(w, " {}", mask.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Side effects and state transitions
+ fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSide Effects ({count}):")
+ }
+
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ writeln!(w, " {} - {}", effect.target, effect.description)?;
+ if let Some(condition) = &effect.condition {
+ writeln!(w, " Condition: {condition}")?;
+ }
+ if effect.reversible {
+ writeln!(w, " Reversible: yes")?;
+ }
+ Ok(())
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nState Transitions ({count}):")
+ }
+
+ fn state_transition(&mut self, w: &mut dyn Write, trans: &StateTransitionSpec) -> std::io::Result<()> {
+ writeln!(w, " {} : {} -> {}", trans.object, trans.from_state, trans.to_state)?;
+ if let Some(condition) = &trans.condition {
+ writeln!(w, " Condition: {condition}")?;
+ }
+ if !trans.description.is_empty() {
+ writeln!(w, " {}", trans.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Constraints and locks
+ fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nAdditional Constraints ({count}):")
+ }
+
+ fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpec) -> std::io::Result<()> {
+ writeln!(w, " {}", constraint.name)?;
+ if !constraint.description.is_empty() {
+ writeln!(w, " {}", constraint.description)?;
+ }
+ if let Some(expr) = &constraint.expression {
+ writeln!(w, " Expression: {expr}")?;
+ }
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nLocking Requirements ({count}):")
+ }
+
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ write!(w, " {}", lock.lock_name)?;
+
+ // Display lock type
+ let lock_type = match lock.lock_type {
+ 0 => "SPINLOCK",
+ 1 => "MUTEX",
+ 2 => "RWLOCK",
+ 3 => "SEMAPHORE",
+ 4 => "RCU",
+ _ => "UNKNOWN",
+ };
+ writeln!(w, " ({lock_type})")?;
+
+ let mut actions = Vec::new();
+ if lock.acquired { actions.push("acquired"); }
+ if lock.released { actions.push("released"); }
+ if lock.held_on_entry { actions.push("held on entry"); }
+ if lock.held_on_exit { actions.push("held on exit"); }
+
+ if !actions.is_empty() {
+ writeln!(w, " Actions: {}", actions.join(", "))?;
+ }
+
+ if !lock.description.is_empty() {
+ writeln!(w, " {}", lock.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/rst.rs b/tools/kapi/src/formatter/rst.rs
new file mode 100644
index 0000000000000..ee660af176781
--- /dev/null
+++ b/tools/kapi/src/formatter/rst.rs
@@ -0,0 +1,468 @@
+use super::OutputFormatter;
+use std::io::Write;
+use crate::extractor::{SocketStateSpec, ProtocolBehaviorSpec, AddrFamilySpec, BufferSpec, AsyncSpec, CapabilitySpec,
+ ParamSpec, ReturnSpec, ErrorSpec, SignalSpec, SignalMaskSpec, SideEffectSpec, StateTransitionSpec, ConstraintSpec, LockSpec};
+
+pub struct RstFormatter {
+ current_section_level: usize,
+}
+
+impl RstFormatter {
+ pub fn new() -> Self {
+ RstFormatter {
+ current_section_level: 0,
+ }
+ }
+
+ fn section_char(level: usize) -> char {
+ match level {
+ 0 => '=',
+ 1 => '-',
+ 2 => '~',
+ 3 => '^',
+ _ => '"',
+ }
+ }
+}
+
+impl OutputFormatter for RstFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ writeln!(w, "\n{title}")?;
+ writeln!(w, "{}", Self::section_char(0).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+ writeln!(w, "* **{name}** (*{api_type}*)")
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\n**Total specifications found:** {count}")
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.current_section_level = 0;
+ writeln!(w, "\n{name}")?;
+ writeln!(w, "{}", Self::section_char(0).to_string().repeat(name.len()))?;
+ writeln!(w)
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "**{desc}**")?;
+ writeln!(w)
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "{desc}")?;
+ writeln!(w)
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Execution Context";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ writeln!(w, "* {flag}")
+ }
+
+ fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w)
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Parameters ({count})");
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Possible Errors ({count})");
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Examples";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+ writeln!(w, ".. code-block:: c")?;
+ writeln!(w)?;
+ for line in examples.lines() {
+ writeln!(w, " {line}")?;
+ }
+ writeln!(w)
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Notes";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+ writeln!(w, "{notes}")?;
+ writeln!(w)
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, ":Available since: {version}")?;
+ writeln!(w)
+ }
+
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ writeln!(w, ":Subsystem: {subsystem}")?;
+ writeln!(w)
+ }
+
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ writeln!(w, ":Sysfs Path: {path}")?;
+ writeln!(w)
+ }
+
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ writeln!(w, ":Permissions: {perms}")?;
+ writeln!(w)
+ }
+
+ // Networking-specific methods
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Socket State Requirements";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+
+ if !state.required_states.is_empty() {
+ writeln!(w, "**Required states:** {}", state.required_states.join(", "))?;
+ }
+ if !state.forbidden_states.is_empty() {
+ writeln!(w, "**Forbidden states:** {}", state.forbidden_states.join(", "))?;
+ }
+ if let Some(result) = &state.resulting_state {
+ writeln!(w, "**Resulting state:** {result}")?;
+ }
+ if let Some(cond) = &state.condition {
+ writeln!(w, "**Condition:** {cond}")?;
+ }
+ if let Some(protos) = &state.applicable_protocols {
+ writeln!(w, "**Applicable protocols:** {protos}")?;
+ }
+ writeln!(w)
+ }
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Protocol-Specific Behaviors";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn protocol_behavior(&mut self, w: &mut dyn Write, behavior: &ProtocolBehaviorSpec) -> std::io::Result<()> {
+ writeln!(w, "**{}**", behavior.applicable_protocols)?;
+ writeln!(w)?;
+ writeln!(w, "{}", behavior.behavior)?;
+ if let Some(flags) = &behavior.protocol_flags {
+ writeln!(w)?;
+ writeln!(w, "*Flags:* {flags}")?;
+ }
+ writeln!(w)
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Supported Address Families";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ writeln!(w, "**{} ({})**", family.family_name, family.family)?;
+ writeln!(w)?;
+ writeln!(w, "* **Struct size:** {} bytes", family.addr_struct_size)?;
+ writeln!(w, "* **Address length:** {}-{} bytes", family.min_addr_len, family.max_addr_len)?;
+ if let Some(format) = &family.addr_format {
+ writeln!(w, "* **Format:** ``{format}``")?;
+ }
+ writeln!(w, "* **Features:** wildcard={}, multicast={}, broadcast={}",
+ family.supports_wildcard, family.supports_multicast, family.supports_broadcast)?;
+ if let Some(special) = &family.special_addresses {
+ writeln!(w, "* **Special addresses:** {special}")?;
+ }
+ if family.port_range_max > 0 {
+ writeln!(w, "* **Port range:** {}-{}", family.port_range_min, family.port_range_max)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Buffer Specification";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+
+ if let Some(behaviors) = &spec.buffer_behaviors {
+ writeln!(w, "**Behaviors:** {behaviors}")?;
+ }
+ if let Some(min) = spec.min_buffer_size {
+ writeln!(w, "**Min size:** {min} bytes")?;
+ }
+ if let Some(max) = spec.max_buffer_size {
+ writeln!(w, "**Max size:** {max} bytes")?;
+ }
+ if let Some(optimal) = spec.optimal_buffer_size {
+ writeln!(w, "**Optimal size:** {optimal} bytes")?;
+ }
+ writeln!(w)
+ }
+
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Asynchronous Operation";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)?;
+
+ if let Some(modes) = &spec.supported_modes {
+ writeln!(w, "**Supported modes:** {modes}")?;
+ }
+ if let Some(errno) = spec.nonblock_errno {
+ writeln!(w, "**Non-blocking errno:** {errno}")?;
+ }
+ writeln!(w)
+ }
+
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "**Network Data Transfer:** {desc}")?;
+ writeln!(w)
+ }
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Required Capabilities";
+ writeln!(w, "{title}")?;
+ writeln!(w, "{}", Self::section_char(1).to_string().repeat(title.len()))?;
+ writeln!(w)
+ }
+
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ writeln!(w, "**{} ({})** - {}", cap.name, cap.capability, cap.action)?;
+ writeln!(w)?;
+ if !cap.allows.is_empty() {
+ writeln!(w, "* **Allows:** {}", cap.allows)?;
+ }
+ if !cap.without_cap.is_empty() {
+ writeln!(w, "* **Without capability:** {}", cap.without_cap)?;
+ }
+ if let Some(cond) = &cap.check_condition {
+ writeln!(w, "* **Condition:** {}", cond)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Stub implementations for new methods
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ writeln!(w, "**[{}] {}** (*{}*)", param.index, param.name, param.type_name)?;
+ writeln!(w)?;
+ writeln!(w, " {}", param.description)?;
+
+ // Display flags
+ let mut flags = Vec::new();
+ if param.flags & 0x01 != 0 { flags.push("IN"); }
+ if param.flags & 0x02 != 0 { flags.push("OUT"); }
+ if param.flags & 0x04 != 0 { flags.push("USER"); }
+ if param.flags & 0x08 != 0 { flags.push("OPTIONAL"); }
+ if !flags.is_empty() {
+ writeln!(w, " :Flags: {}", flags.join(", "))?;
+ }
+
+ if let Some(constraint) = ¶m.constraint {
+ writeln!(w, " :Constraint: {}", constraint)?;
+ }
+
+ if let (Some(min), Some(max)) = (param.min_value, param.max_value) {
+ writeln!(w, " :Range: {} to {}", min, max)?;
+ }
+
+ writeln!(w)
+ }
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ writeln!(w, "\nReturn Value")?;
+ writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(12))?;
+ writeln!(w)?;
+ writeln!(w, ":Type: {}", ret.type_name)?;
+ writeln!(w, ":Description: {}", ret.description)?;
+ if let Some(success) = ret.success_value {
+ writeln!(w, ":Success value: {}", success)?;
+ }
+ writeln!(w)
+ }
+
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ writeln!(w, "**{}** ({})", error.name, error.error_code)?;
+ writeln!(w)?;
+ writeln!(w, " :Condition: {}", error.condition)?;
+ if !error.description.is_empty() {
+ writeln!(w, " :Description: {}", error.description)?;
+ }
+ writeln!(w)
+ }
+
+ fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal(&mut self, _w: &mut dyn Write, _signal: &SignalSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal_mask(&mut self, _w: &mut dyn Write, _mask: &SignalMaskSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Side Effects ({count})");
+ writeln!(w, "{}\n", title)?;
+ writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(title.len()))
+ }
+
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ write!(w, "* **{}**", effect.target)?;
+ if effect.reversible {
+ write!(w, " *(reversible)*")?;
+ }
+ writeln!(w)?;
+ writeln!(w, " {}", effect.description)?;
+ if let Some(cond) = &effect.condition {
+ writeln!(w, " :Condition: {}", cond)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("State Transitions ({count})");
+ writeln!(w, "{}\n", title)?;
+ writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(title.len()))
+ }
+
+ fn state_transition(&mut self, w: &mut dyn Write, trans: &StateTransitionSpec) -> std::io::Result<()> {
+ writeln!(w, "* **{}**: {} → {}", trans.object, trans.from_state, trans.to_state)?;
+ writeln!(w, " {}", trans.description)?;
+ if let Some(cond) = &trans.condition {
+ writeln!(w, " :Condition: {}", cond)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn constraint(&mut self, _w: &mut dyn Write, _constraint: &ConstraintSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Locks ({count})");
+ writeln!(w, "{}\n", title)?;
+ writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(title.len()))
+ }
+
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ write!(w, "* **{}**", lock.lock_name)?;
+ let lock_type_str = match lock.lock_type {
+ 1 => " *(mutex)*",
+ 2 => " *(spinlock)*",
+ 3 => " *(rwlock)*",
+ 4 => " *(semaphore)*",
+ 5 => " *(RCU)*",
+ _ => "",
+ };
+ writeln!(w, "{}", lock_type_str)?;
+ if !lock.description.is_empty() {
+ writeln!(w, " {}", lock.description)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/shall.rs b/tools/kapi/src/formatter/shall.rs
new file mode 100644
index 0000000000000..ef432a060da52
--- /dev/null
+++ b/tools/kapi/src/formatter/shall.rs
@@ -0,0 +1,605 @@
+use super::OutputFormatter;
+use std::io::Write;
+use crate::extractor::{SocketStateSpec, ProtocolBehaviorSpec, AddrFamilySpec, BufferSpec, AsyncSpec, CapabilitySpec,
+ ParamSpec, ReturnSpec, ErrorSpec, SignalSpec, SignalMaskSpec, SideEffectSpec, StateTransitionSpec, ConstraintSpec, LockSpec};
+
+pub struct ShallFormatter {
+ api_name: Option<String>,
+ in_list: bool,
+}
+
+impl ShallFormatter {
+ pub fn new() -> Self {
+ ShallFormatter {
+ api_name: None,
+ in_list: false,
+ }
+ }
+
+}
+
+impl OutputFormatter for ShallFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ self.in_list = true;
+ writeln!(w, "\n{} API Behavioral Requirements:", title)?;
+ writeln!(w)
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str) -> std::io::Result<()> {
+ writeln!(w, "- {} shall be available for {}", name, name.replace('_', " "))
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ self.in_list = false;
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\nTotal: {} kernel API specifications shall be enforced.", count)
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.api_name = Some(name.to_string());
+ writeln!(w, "\nBehavioral Requirements for {}:", name)?;
+ writeln!(w)
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ self.api_name = None;
+ Ok(())
+ }
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(api_name) = &self.api_name {
+ writeln!(w, "- {} shall {}.", api_name, desc.trim_end_matches('.'))
+ } else {
+ writeln!(w, "- The API shall {}.", desc.trim_end_matches('.'))
+ }
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w)?;
+ for line in desc.lines() {
+ if !line.trim().is_empty() {
+ writeln!(w, "{}", line)?;
+ }
+ }
+ writeln!(w)
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nExecution Context Requirements:")?;
+ writeln!(w)
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ // Parse context flags and make them readable with specific requirements
+ match flag {
+ "Process context" => {
+ writeln!(w, "- The function shall be callable from process context.")?;
+ writeln!(w, " Process context allows the function to sleep, allocate memory with GFP_KERNEL, and access user space.")
+ }
+ "Softirq context" => {
+ writeln!(w, "- The function shall be callable from softirq context.")?;
+ writeln!(w, " In softirq context, the function shall not sleep and shall use GFP_ATOMIC for memory allocations.")
+ }
+ "Hardirq context" => {
+ writeln!(w, "- The function shall be callable from hardirq (interrupt) context.")?;
+ writeln!(w, " In hardirq context, the function shall not sleep, shall minimize execution time, and shall use GFP_ATOMIC for allocations.")
+ }
+ "NMI context" => {
+ writeln!(w, "- The function shall be callable from NMI (Non-Maskable Interrupt) context.")?;
+ writeln!(w, " In NMI context, the function shall not take any locks that might be held by interrupted code.")
+ }
+ "User mode" => {
+ writeln!(w, "- The function shall be callable when the CPU is in user mode.")?;
+ writeln!(w, " This typically applies to system call entry points.")
+ }
+ "Kernel mode" => {
+ writeln!(w, "- The function shall be callable when the CPU is in kernel mode.")
+ }
+ "May sleep" => {
+ writeln!(w, "- The function may sleep (block) during execution.")?;
+ writeln!(w, " Callers shall ensure they are in a context where sleeping is allowed (not in interrupt or atomic context).")
+ }
+ "Atomic context" => {
+ writeln!(w, "- The function shall be callable from atomic context.")?;
+ writeln!(w, " In atomic context, the function shall not sleep and shall complete quickly.")
+ }
+ "Preemptible" => {
+ writeln!(w, "- The function shall be callable when preemption is enabled.")?;
+ writeln!(w, " The function may be preempted by higher priority tasks.")
+ }
+ "Migration disabled" => {
+ writeln!(w, "- The function shall be callable when CPU migration is disabled.")?;
+ writeln!(w, " The function shall not rely on being able to migrate between CPUs.")
+ }
+ _ => {
+ // Fallback for unrecognized flags
+ writeln!(w, "- The function shall be callable from {} context.", flag)
+ }
+ }
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nParameter Requirements:")
+ }
+
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ writeln!(w, "- If {} is provided, it shall be {}.",
+ param.name, param.description.trim_end_matches('.'))?;
+
+ // Only show meaningful numeric constraints
+ if let Some(min) = param.min_value {
+ if let Some(max) = param.max_value {
+ if min != 0 || max != 0 {
+ writeln!(w, "\n- If {} is less than {} or greater than {}, the operation shall fail.",
+ param.name, min, max)?;
+ }
+ } else if min != 0 {
+ writeln!(w, "\n- If {} is less than {}, the operation shall fail.",
+ param.name, min)?;
+ }
+ } else if let Some(max) = param.max_value {
+ if max != 0 {
+ writeln!(w, "\n- If {} is greater than {}, the operation shall fail.",
+ param.name, max)?;
+ }
+ }
+
+ if let Some(constraint) = ¶m.constraint {
+ if !constraint.is_empty() {
+ let constraint_text = constraint.trim_end_matches('.');
+ // Handle constraints that start with "Must be" or similar
+ if constraint_text.to_lowercase().starts_with("must be ") {
+ let requirement = &constraint_text[8..]; // Skip "Must be "
+ writeln!(w, "\n- If {} is not {}, the operation shall fail.",
+ param.name, requirement)?;
+ } else if constraint_text.to_lowercase().starts_with("must ") {
+ let requirement = &constraint_text[5..]; // Skip "Must "
+ writeln!(w, "\n- If {} does not {}, the operation shall fail.",
+ param.name, requirement)?;
+ } else if constraint_text.contains(" must ") || constraint_text.contains(" should ") {
+ // Reformat constraints with must/should in the middle
+ writeln!(w, "\n- {} shall satisfy: {}.",
+ param.name, constraint_text)?;
+ } else {
+ // Default format for other constraints
+ writeln!(w, "\n- If {} is not {}, the operation shall fail.",
+ param.name, constraint_text)?;
+ }
+ }
+ }
+
+ // Only show valid_mask if it's not 0
+ if let Some(mask) = param.valid_mask {
+ if mask != 0 {
+ writeln!(w, "\n- If {} contains bits not set in 0x{:x}, the operation shall fail.",
+ param.name, mask)?;
+ }
+ }
+
+ Ok(())
+ }
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ writeln!(w, "\nReturn Value Behavior:")?;
+ writeln!(w)?;
+
+ if let Some(success) = ret.success_value {
+ writeln!(w, "- If the operation succeeds, the function shall return {}.", success)?;
+ } else if let Some(min) = ret.success_min {
+ if let Some(max) = ret.success_max {
+ writeln!(w, "- If the operation succeeds, the function shall return a value between {} and {} inclusive.", min, max)?;
+ } else {
+ writeln!(w, "- If the operation succeeds, the function shall return a value greater than or equal to {}.", min)?;
+ }
+ }
+
+ if !ret.error_values.is_empty() {
+ writeln!(w, "\n- If the operation fails, the function shall return one of the specified negative error values.")?;
+ }
+
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nError Handling:")?;
+ Ok(())
+ }
+
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ let condition = if error.condition.is_empty() {
+ error.description.to_lowercase().trim_end_matches('.').to_string()
+ } else {
+ error.condition.to_lowercase()
+ };
+ writeln!(w, "- If {condition}, the function shall return -{}.", error.name)?;
+
+ // Add description if available and different from condition
+ if !error.description.is_empty() && error.description != error.condition {
+ writeln!(w, " {}", error.description)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ writeln!(w, "\nExample Usage:")?;
+ writeln!(w)?;
+ writeln!(w, "```")?;
+ write!(w, "{}", examples)?;
+ writeln!(w, "```")
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ writeln!(w, "\nImplementation Notes:")?;
+ writeln!(w)?;
+
+ // Split notes into sentences and format each as a behavioral requirement
+ let sentences: Vec<&str> = notes.split(". ")
+ .filter(|s| !s.trim().is_empty())
+ .collect();
+
+ for sentence in sentences {
+ let trimmed = sentence.trim().trim_end_matches('.');
+ if trimmed.is_empty() {
+ continue;
+ }
+
+ // Check if it already contains "shall" or similar
+ if trimmed.contains("shall") || trimmed.contains("must") {
+ writeln!(w, "- {}.", trimmed)?;
+ } else if trimmed.starts_with("On ") || trimmed.starts_with("If ") || trimmed.starts_with("When ") {
+ // These are already conditional, just add shall
+ writeln!(w, "- {}, the behavior shall be as described.", trimmed)?;
+ } else {
+ // Convert to a shall statement
+ writeln!(w, "- The implementation shall ensure that {}.",
+ trimmed.chars().next().unwrap().to_lowercase().collect::<String>() + &trimmed[1..])?;
+ }
+ }
+ Ok(())
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, "\n- If kernel version is {} or later, this API shall be available.", version)
+ }
+
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ writeln!(w, "- If accessed through sysfs, the attribute shall be located in the {} subsystem.", subsystem)
+ }
+
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ writeln!(w, "\n- If the sysfs interface is mounted, the attribute shall be accessible at {}.", path)
+ }
+
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ writeln!(w, "\n- If the attribute exists, its permissions shall be set to {}.", perms)
+ }
+
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ writeln!(w, "\nSocket State Behavior:")?;
+ writeln!(w)?;
+
+ if !state.required_states.is_empty() {
+ let states_str = state.required_states.join(" or ");
+ writeln!(w, "- If the socket is not in {} state, the operation shall fail.", states_str)?;
+ }
+
+ if !state.forbidden_states.is_empty() {
+ for s in &state.forbidden_states {
+ writeln!(w, "\n- If the socket is in {} state, the operation shall fail.", s)?;
+ }
+ }
+
+ if let Some(result) = &state.resulting_state {
+ writeln!(w, "\n- If the operation succeeds, the socket state shall transition to {}.", result)?;
+ }
+
+ Ok(())
+ }
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nProtocol-Specific Behavior:")
+ }
+
+ fn protocol_behavior(&mut self, w: &mut dyn Write, behavior: &ProtocolBehaviorSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ writeln!(w, "- If protocol is {}, {}.",
+ behavior.applicable_protocols, behavior.behavior)?;
+
+ if let Some(flags) = &behavior.protocol_flags {
+ writeln!(w, "\n- If protocol is {} and flags {} are set, the behavior shall be modified accordingly.",
+ behavior.applicable_protocols, flags)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nAddress Family Behavior:")
+ }
+
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ writeln!(w, "- If address family is {} ({}), the address structure size shall be {} bytes.",
+ family.family, family.family_name, family.addr_struct_size)?;
+
+ writeln!(w, "\n- If address family is {} and address length is less than {} or greater than {}, the operation shall fail.",
+ family.family, family.min_addr_len, family.max_addr_len)?;
+
+ Ok(())
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ writeln!(w, "\nBuffer Behavior:")?;
+ writeln!(w)?;
+
+ if let Some(min) = spec.min_buffer_size {
+ writeln!(w, "- If the buffer size is less than {} bytes, the operation shall fail.", min)?;
+ }
+
+ if let Some(max) = spec.max_buffer_size {
+ writeln!(w, "\n- If the buffer size exceeds {} bytes, the excess data shall be truncated.", max)?;
+ }
+
+ if let Some(behaviors) = &spec.buffer_behaviors {
+ writeln!(w, "\n- When handling buffers, the following behavior shall apply: {}.", behaviors)?;
+ }
+
+ Ok(())
+ }
+
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ writeln!(w, "\nAsynchronous Behavior:")?;
+ writeln!(w)?;
+
+ if let Some(_modes) = &spec.supported_modes {
+ writeln!(w, "- If O_NONBLOCK is set and the operation would block, the function shall return -EAGAIN or -EWOULDBLOCK.")?;
+ }
+
+ if let Some(errno) = spec.nonblock_errno {
+ writeln!(w, "\n- If the file descriptor is in non-blocking mode and no data is available, the function shall return -{}.", errno)?;
+ }
+
+ Ok(())
+ }
+
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "\nData Transfer Behavior:")?;
+ writeln!(w)?;
+ writeln!(w, "- When transferring data, the operation shall {}.", desc.trim_end_matches('.'))
+ }
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nCapability Requirements:")
+ }
+
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ writeln!(w, "- If the process attempts to {}, {} capability shall be checked.",
+ cap.action, cap.name)?;
+ writeln!(w)?;
+ writeln!(w, "- If {} is present, {}.", cap.name, cap.allows)?;
+ writeln!(w)?;
+ writeln!(w, "- If {} is not present, {}.", cap.name, cap.without_cap)?;
+
+ Ok(())
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signals(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSignal Behavior:")?;
+ Ok(())
+ }
+
+ fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+
+ // Skip signals with no meaningful description
+ if let Some(desc) = &signal.description {
+ if !desc.is_empty() {
+ writeln!(w, "- {}: {}.", signal.signal_name, desc)?;
+ return Ok(());
+ }
+ }
+
+ // Default behavior based on direction
+ if signal.direction == 1 { // Sends
+ writeln!(w, "- If the conditions for {} are met, the signal shall be sent to the target process.",
+ signal.signal_name)?;
+ } else if signal.direction == 2 { // Receives
+ writeln!(w, "- If {} is received and not blocked, the operation shall be interrupted.",
+ signal.signal_name)?;
+
+ if signal.restartable {
+ writeln!(w, "\n- If {} is received and SA_RESTART is set, the operation shall be automatically restarted.",
+ signal.signal_name)?;
+ }
+ } else {
+ // Direction 0 or other - just note the signal handling
+ writeln!(w, "- {} shall be handled according to its default behavior.", signal.signal_name)?;
+ }
+
+ if let Some(errno) = signal.error_on_signal {
+ if errno != 0 {
+ writeln!(w, "\n- If interrupted by {}, the function shall return -{}.",
+ signal.signal_name, errno)?;
+ }
+ }
+
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\n### Signal Mask Requirements")?;
+ if count > 0 {
+ writeln!(w, "The API SHALL support the following signal mask operations:")?;
+ }
+ Ok(())
+ }
+
+ fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()> {
+ writeln!(w, "\n- **{}**: {}", mask.name, mask.description)?;
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_side_effects(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSide Effects:")?;
+ Ok(())
+ }
+
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ if let Some(condition) = &effect.condition {
+ writeln!(w, "- If {}, {} shall be {}.",
+ condition, effect.target, effect.description.trim_end_matches('.'))?;
+ } else {
+ writeln!(w, "- When the operation executes, {} shall be {}.",
+ effect.target, effect.description.trim_end_matches('.'))?;
+ }
+
+ if effect.reversible {
+ writeln!(w, "\n- If the operation is rolled back, the effect on {} shall be reversed.",
+ effect.target)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nState Transitions:")?;
+ Ok(())
+ }
+
+ fn state_transition(&mut self, w: &mut dyn Write, trans: &StateTransitionSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ if let Some(condition) = &trans.condition {
+ writeln!(w, "- If {} is in {} state and {}, it shall transition to {} state.",
+ trans.object, trans.from_state, condition, trans.to_state)?;
+ } else {
+ writeln!(w, "- If {} is in {} state, it shall transition to {} state.",
+ trans.object, trans.from_state, trans.to_state)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_constraints(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nConstraints:")?;
+ Ok(())
+ }
+
+ fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+ if let Some(expr) = &constraint.expression {
+ if expr.is_empty() {
+ writeln!(w, "- {}: {}.", constraint.name, constraint.description)?;
+ } else {
+ writeln!(w, "- If {} is violated, the operation shall fail.", constraint.name)?;
+ writeln!(w, " Constraint: {}", expr)?;
+ }
+ } else {
+ writeln!(w, "- {}: {}.", constraint.name, constraint.description)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nLocking Behavior:")?;
+ Ok(())
+ }
+
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ writeln!(w)?;
+
+ // Always show lock information if we have a description
+ if !lock.description.is_empty() {
+ let lock_type_str = match lock.lock_type {
+ 1 => "mutex",
+ 2 => "spinlock",
+ 3 => "rwlock",
+ 4 => "semaphore",
+ 5 => "RCU",
+ _ => "lock",
+ };
+ writeln!(w, "- The {} {} shall be used for: {}",
+ lock.lock_name, lock_type_str, lock.description)?;
+ }
+
+ if lock.held_on_entry {
+ writeln!(w, "- If {} is not held on entry, the operation shall fail.", lock.lock_name)?;
+ }
+
+ if lock.acquired && !lock.held_on_entry {
+ writeln!(w, "- Before accessing the protected resource, {} shall be acquired.", lock.lock_name)?;
+ }
+
+ if lock.released && lock.held_on_exit {
+ writeln!(w, "- If the operation succeeds and no error path is taken, {} shall remain held on exit.", lock.lock_name)?;
+ } else if lock.released {
+ writeln!(w, "- Before returning, {} shall be released.", lock.lock_name)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/main.rs b/tools/kapi/src/main.rs
new file mode 100644
index 0000000000000..76416a9364010
--- /dev/null
+++ b/tools/kapi/src/main.rs
@@ -0,0 +1,130 @@
+//! kapi - Kernel API Specification Tool
+//!
+//! This tool extracts and displays kernel API specifications from multiple sources:
+//! - Kernel source code (KAPI macros)
+//! - Compiled vmlinux binaries (`.kapi_specs` ELF section)
+//! - Running kernel via debugfs
+
+use anyhow::Result;
+use clap::Parser;
+use std::io::{self, Write};
+
+mod formatter;
+mod extractor;
+
+use formatter::{OutputFormat, create_formatter};
+use extractor::{ApiExtractor, VmlinuxExtractor, SourceExtractor, DebugfsExtractor};
+
+#[derive(Parser, Debug)]
+#[command(author, version, about, long_about = None)]
+struct Args {
+ /// Path to the vmlinux file
+ #[arg(long, value_name = "PATH", group = "input")]
+ vmlinux: Option<String>,
+
+ /// Path to kernel source directory or file
+ #[arg(long, value_name = "PATH", group = "input")]
+ source: Option<String>,
+
+ /// Path to debugfs (defaults to /sys/kernel/debug if not specified)
+ #[arg(long, value_name = "PATH", group = "input")]
+ debugfs: Option<String>,
+
+ /// Optional: Name of specific API to show details for
+ api_name: Option<String>,
+
+ /// Output format
+ #[arg(long, short = 'f', default_value = "plain")]
+ format: String,
+}
+
+fn main() -> Result<()> {
+ let args = Args::parse();
+
+ let output_format: OutputFormat = args.format.parse()
+ .map_err(|e: String| anyhow::anyhow!(e))?;
+
+ let extractor: Box<dyn ApiExtractor> = match (args.vmlinux, args.source, args.debugfs.clone()) {
+ (Some(vmlinux_path), None, None) => {
+ Box::new(VmlinuxExtractor::new(&vmlinux_path)?)
+ }
+ (None, Some(source_path), None) => {
+ Box::new(SourceExtractor::new(&source_path)?)
+ }
+ (None, None, Some(_) | None) => {
+ // If debugfs is specified or no input is provided, use debugfs
+ Box::new(DebugfsExtractor::new(args.debugfs)?)
+ }
+ _ => {
+ anyhow::bail!("Please specify only one of --vmlinux, --source, or --debugfs")
+ }
+ };
+
+ display_apis(extractor.as_ref(), args.api_name, output_format)
+}
+
+fn display_apis(extractor: &dyn ApiExtractor, api_name: Option<String>, output_format: OutputFormat) -> Result<()> {
+ let mut formatter = create_formatter(output_format);
+ let mut stdout = io::stdout();
+
+ formatter.begin_document(&mut stdout)?;
+
+ if let Some(api_name_req) = api_name {
+ // Use the extractor to display API details
+ if let Some(_spec) = extractor.extract_by_name(&api_name_req)? {
+ extractor.display_api_details(&api_name_req, &mut *formatter, &mut stdout)?;
+ } else if output_format == OutputFormat::Plain {
+ writeln!(stdout, "\nAPI '{}' not found.", api_name_req)?;
+ writeln!(stdout, "\nAvailable APIs:")?;
+ for spec in extractor.extract_all()? {
+ writeln!(stdout, " {} ({})", spec.name, spec.api_type)?;
+ }
+ }
+ } else {
+ // Display list of APIs using the extractor
+ let all_specs = extractor.extract_all()?;
+ let syscalls: Vec<_> = all_specs.iter().filter(|s| s.api_type == "syscall").collect();
+ let ioctls: Vec<_> = all_specs.iter().filter(|s| s.api_type == "ioctl").collect();
+ let functions: Vec<_> = all_specs.iter().filter(|s| s.api_type == "function").collect();
+ let sysfs: Vec<_> = all_specs.iter().filter(|s| s.api_type == "sysfs").collect();
+
+ if !syscalls.is_empty() {
+ formatter.begin_api_list(&mut stdout, "System Calls")?;
+ for spec in syscalls {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ if !ioctls.is_empty() {
+ formatter.begin_api_list(&mut stdout, "IOCTLs")?;
+ for spec in ioctls {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ if !functions.is_empty() {
+ formatter.begin_api_list(&mut stdout, "Functions")?;
+ for spec in functions {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ if !sysfs.is_empty() {
+ formatter.begin_api_list(&mut stdout, "Sysfs Attributes")?;
+ for spec in sysfs {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+
+ formatter.total_specs(&mut stdout, all_specs.len())?;
+ }
+
+ formatter.end_document(&mut stdout)?;
+
+ Ok(())
+}
+
--
2.39.5
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
@ 2025-06-30 19:53 ` Jonathan Corbet
2025-06-30 22:20 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Corbet @ 2025-06-30 19:53 UTC (permalink / raw)
To: Sasha Levin, linux-kernel
Cc: linux-doc, linux-api, workflows, tools, Sasha Levin
Sasha Levin <sashal@kernel.org> writes:
> Add a comprehensive framework for formally documenting kernel APIs with
> inline specifications. This framework provides:
>
> - Structured API documentation with parameter specifications, return
> values, error conditions, and execution context requirements
> - Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
> - Export of specifications via debugfs for tooling integration
> - Support for both internal kernel APIs and system calls
>
> The framework stores specifications in a dedicated ELF section and
> provides infrastructure for:
> - Compile-time validation of specifications
> - Runtime querying of API documentation
> - Machine-readable export formats
> - Integration with existing SYSCALL_DEFINE macros
>
> This commit introduces the core infrastructure without modifying any
> existing APIs. Subsequent patches will add specifications to individual
> subsystems.
>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> Documentation/admin-guide/kernel-api-spec.rst | 507 ++++++
You need to add that file to index.rst in that directory or it won't be
pulled into the docs build.
Wouldn't it be nice to integrate all this stuff with out existing
kerneldoc mechanism...? :)
Thanks,
jon
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-06-30 19:53 ` Jonathan Corbet
@ 2025-06-30 22:20 ` Mauro Carvalho Chehab
2025-07-01 14:23 ` Sasha Levin
0 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2025-06-30 22:20 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Sasha Levin, linux-kernel, linux-doc, linux-api, workflows, tools
Em Mon, 30 Jun 2025 13:53:55 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:
> Sasha Levin <sashal@kernel.org> writes:
>
> > Add a comprehensive framework for formally documenting kernel APIs with
> > inline specifications. This framework provides:
> >
> > - Structured API documentation with parameter specifications, return
> > values, error conditions, and execution context requirements
> > - Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
> > - Export of specifications via debugfs for tooling integration
> > - Support for both internal kernel APIs and system calls
> >
> > The framework stores specifications in a dedicated ELF section and
> > provides infrastructure for:
> > - Compile-time validation of specifications
> > - Runtime querying of API documentation
> > - Machine-readable export formats
> > - Integration with existing SYSCALL_DEFINE macros
> >
> > This commit introduces the core infrastructure without modifying any
> > existing APIs. Subsequent patches will add specifications to individual
> > subsystems.
> >
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > ---
> > Documentation/admin-guide/kernel-api-spec.rst | 507 ++++++
>
> You need to add that file to index.rst in that directory or it won't be
> pulled into the docs build.
>
> Wouldn't it be nice to integrate all this stuff with out existing
> kerneldoc mechanism...? :)
+1
Having two different mechanisms (kapi and kerneldoc) makes a lot harder
to maintain kAPI.
Also, IGT (a testing tool for DRM subsystem) used to have a macro
based documentation system. It got outdated with time, as people
ends forgetting to update the macros when changing the code.
Also, sometimes we want to add some rich text there, with graphs,
tables, ...
More important than that: people end not remembering to add such macros.
As kerneldoc markups are similar to Doxygen and normal C comments,
it is more likely that people will remember.
So, IMO the best would be to use kerneldoc syntax there, letting
Kerneldoc Sphinx extension handling it for docs, while having
tools to implement the other features you mentioned.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 00/22] Kernel API specification framework
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
` (21 preceding siblings ...)
2025-06-24 18:07 ` [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool Sasha Levin
@ 2025-07-01 2:43 ` Jake Edge
2025-07-01 14:54 ` Sasha Levin
22 siblings, 1 reply; 33+ messages in thread
From: Jake Edge @ 2025-07-01 2:43 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, linux-doc, linux-api, workflows, tools
Hi Sasha,
On Tue, Jun 24 2025 14:07 -0400, Sasha Levin <sashal@kernel.org> wrote:
> Hey folks,
>
> This is a second attempt at a "Kernel API Specification" framework,
> addressing the feedback from the initial RFC and expanding the scope
> to include sysfs attribute specifications.
In light of your talk at OSS last week [1] (for non-subscribers [2]), I
am wondering if any of this code has been written by coding LLMs. It
seems like the kind of unpleasant boilerplate that they are said to be
good at generating, but also seems like an enormous blob of "code" to
review. What is the status of this specification in that regard?
thanks!
jake
[1] https://lwn.net/Articles/1026558/
[2] https://lwn.net/SubscriberLink/1026558/914fa4ec5964b0c5/
--
Jake Edge - LWN - jake@lwn.net - https://lwn.net
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-06-30 22:20 ` Mauro Carvalho Chehab
@ 2025-07-01 14:23 ` Sasha Levin
2025-07-01 15:25 ` Mauro Carvalho Chehab
2025-07-01 19:01 ` Jonathan Corbet
0 siblings, 2 replies; 33+ messages in thread
From: Sasha Levin @ 2025-07-01 14:23 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Corbet, linux-kernel, linux-doc, linux-api, workflows,
tools
On Tue, Jul 01, 2025 at 12:20:58AM +0200, Mauro Carvalho Chehab wrote:
>Em Mon, 30 Jun 2025 13:53:55 -0600
>Jonathan Corbet <corbet@lwn.net> escreveu:
>
>> Sasha Levin <sashal@kernel.org> writes:
>>
>> > Add a comprehensive framework for formally documenting kernel APIs with
>> > inline specifications. This framework provides:
>> >
>> > - Structured API documentation with parameter specifications, return
>> > values, error conditions, and execution context requirements
>> > - Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
>> > - Export of specifications via debugfs for tooling integration
>> > - Support for both internal kernel APIs and system calls
>> >
>> > The framework stores specifications in a dedicated ELF section and
>> > provides infrastructure for:
>> > - Compile-time validation of specifications
>> > - Runtime querying of API documentation
>> > - Machine-readable export formats
>> > - Integration with existing SYSCALL_DEFINE macros
>> >
>> > This commit introduces the core infrastructure without modifying any
>> > existing APIs. Subsequent patches will add specifications to individual
>> > subsystems.
>> >
>> > Signed-off-by: Sasha Levin <sashal@kernel.org>
>> > ---
>> > Documentation/admin-guide/kernel-api-spec.rst | 507 ++++++
>>
>> You need to add that file to index.rst in that directory or it won't be
>> pulled into the docs build.
>>
>> Wouldn't it be nice to integrate all this stuff with out existing
>> kerneldoc mechanism...? :)
>
>+1
>
>Having two different mechanisms (kapi and kerneldoc) makes a lot harder
>to maintain kAPI.
I hated the idea of not reusing kerneldoc.
My concern with kerneldoc was that I can't manipulate the
information it stores in the context of a kernel build. So for example,
I wasn't sure how I can expose information stored within kerneldoc via
debugfs on a running system (or how I can store it within the vmlinux
for later extraction from the binary built kernel).
I did some research based on your proposal, and I think I was incorrect
with the assumption above. I suppose we could do something like the
following:
1. Add new section patterns to doc_sect regex in to include API
specification sections: api-type, api-version, param-type, param-flags,
param-constraint, error-code, capability, signal, lock-req, since...
2. Create new output module (scripts/lib/kdoc/kdoc_apispec.py?) to
generate C macro invocations from parsed data.
Which will generate output like:
DEFINE_KERNEL_API_SPEC(function_name)
KAPI_DESCRIPTION("...")
KAPI_PARAM(0, "name", "type", "desc")
KAPI_PARAM_TYPE(KAPI_TYPE_INT)
KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
KAPI_PARAM_END
KAPI_END_SPEC
3. And then via makefile we can:
- Generate API specs from kerneldoc comments
- Include generated specs conditionally based on CONFIG_KERNEL_API_SPEC
Allowing us to just have these in the relevant source files:
#ifdef CONFIG_KERNEL_API_SPEC
#include "socket.apispec.h"
#endif
In theory, all of that will let us have something like the following in
kerneldoc:
- @api-type: syscall
- @api-version: 1
- @context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
- @param-type: family, KAPI_TYPE_INT
- @param-flags: family, KAPI_PARAM_IN
- @param-range: family, 0, 45
- @param-mask: type, SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK
- @error-code: -EAFNOSUPPORT, "Address family not supported"
- @error-condition: -EAFNOSUPPORT, "family < 0 || family >= NPROTO"
- @capability: CAP_NET_RAW, KAPI_CAP_GRANT_PERMISSION
- @capability-allows: CAP_NET_RAW, "Create SOCK_RAW sockets"
- @since: 2.0
- @return-type: KAPI_TYPE_FD
- @return-check: KAPI_RETURN_ERROR_CHECK
How does it sound? I'm pretty excited about the possiblity to align this
with kerneldoc. Please poke holes in the plan :)
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 00/22] Kernel API specification framework
2025-07-01 2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
@ 2025-07-01 14:54 ` Sasha Levin
0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-07-01 14:54 UTC (permalink / raw)
To: Jake Edge; +Cc: linux-kernel, linux-doc, linux-api, workflows, tools
On Mon, Jun 30, 2025 at 07:43:42PM -0700, Jake Edge wrote:
>
>Hi Sasha,
>
>On Tue, Jun 24 2025 14:07 -0400, Sasha Levin <sashal@kernel.org> wrote:
>
>> Hey folks,
>>
>> This is a second attempt at a "Kernel API Specification" framework,
>> addressing the feedback from the initial RFC and expanding the scope
>> to include sysfs attribute specifications.
>
>In light of your talk at OSS last week [1] (for non-subscribers [2]), I
>am wondering if any of this code has been written by coding LLMs. It
>seems like the kind of unpleasant boilerplate that they are said to be
>good at generating, but also seems like an enormous blob of "code" to
>review. What is the status of this specification in that regard?
Hey Jake!
The macro definitions were done mostly manually: it ended up being
more of a copy/paste/replace exercise to get all the different macros in
place (which, yes, ended up being a huge blob).
For the syscall/ioctl/sysfs APIs I used to demonstrate the
infrastructure, I started with defining the basic spec skeleton manually
based on our existing docs and code review, but then had LLMs extend it
based on it's review of the code.
If we do proceed with something along the lines of this spec, I can see
LLMs being useful at reviewing incoming code changes and alerting us of
required updates/changes to the spec (or, alerting us that we're
breaking the spec). Think of something like AUTOSEL but for
classification of commits that affect the userspace API.
The tools/kapi/ code is mostly mostly LLM generated.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-07-01 14:23 ` Sasha Levin
@ 2025-07-01 15:25 ` Mauro Carvalho Chehab
2025-07-01 19:01 ` Jonathan Corbet
1 sibling, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-01 15:25 UTC (permalink / raw)
To: Sasha Levin
Cc: Jonathan Corbet, linux-kernel, linux-doc, linux-api, workflows,
tools
Em Tue, 1 Jul 2025 10:23:03 -0400
Sasha Levin <sashal@kernel.org> escreveu:
> On Tue, Jul 01, 2025 at 12:20:58AM +0200, Mauro Carvalho Chehab wrote:
> >Em Mon, 30 Jun 2025 13:53:55 -0600
> >Jonathan Corbet <corbet@lwn.net> escreveu:
> >
> >> Sasha Levin <sashal@kernel.org> writes:
> >>
> >> > Add a comprehensive framework for formally documenting kernel APIs with
> >> > inline specifications. This framework provides:
> >> >
> >> > - Structured API documentation with parameter specifications, return
> >> > values, error conditions, and execution context requirements
> >> > - Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
> >> > - Export of specifications via debugfs for tooling integration
> >> > - Support for both internal kernel APIs and system calls
> >> >
> >> > The framework stores specifications in a dedicated ELF section and
> >> > provides infrastructure for:
> >> > - Compile-time validation of specifications
> >> > - Runtime querying of API documentation
> >> > - Machine-readable export formats
> >> > - Integration with existing SYSCALL_DEFINE macros
> >> >
> >> > This commit introduces the core infrastructure without modifying any
> >> > existing APIs. Subsequent patches will add specifications to individual
> >> > subsystems.
> >> >
> >> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> >> > ---
> >> > Documentation/admin-guide/kernel-api-spec.rst | 507 ++++++
> >>
> >> You need to add that file to index.rst in that directory or it won't be
> >> pulled into the docs build.
> >>
> >> Wouldn't it be nice to integrate all this stuff with out existing
> >> kerneldoc mechanism...? :)
> >
> >+1
> >
> >Having two different mechanisms (kapi and kerneldoc) makes a lot harder
> >to maintain kAPI.
>
> I hated the idea of not reusing kerneldoc.
>
> My concern with kerneldoc was that I can't manipulate the
> information it stores in the context of a kernel build. So for example,
> I wasn't sure how I can expose information stored within kerneldoc via
> debugfs on a running system (or how I can store it within the vmlinux
> for later extraction from the binary built kernel).
>
> I did some research based on your proposal, and I think I was incorrect
> with the assumption above. I suppose we could do something like the
> following:
>
> 1. Add new section patterns to doc_sect regex in to include API
> specification sections: api-type, api-version, param-type, param-flags,
> param-constraint, error-code, capability, signal, lock-req, since...
>
> 2. Create new output module (scripts/lib/kdoc/kdoc_apispec.py?) to
> generate C macro invocations from parsed data.
>
> Which will generate output like:
>
> DEFINE_KERNEL_API_SPEC(function_name)
> KAPI_DESCRIPTION("...")
> KAPI_PARAM(0, "name", "type", "desc")
> KAPI_PARAM_TYPE(KAPI_TYPE_INT)
> KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
> KAPI_PARAM_END
> KAPI_END_SPEC
> 3. And then via makefile we can:
> - Generate API specs from kerneldoc comments
> - Include generated specs conditionally based on CONFIG_KERNEL_API_SPEC
>
> Allowing us to just have these in the relevant source files:
> #ifdef CONFIG_KERNEL_API_SPEC
> #include "socket.apispec.h"
> #endif
>
>
> In theory, all of that will let us have something like the following in
> kerneldoc:
>
> - @api-type: syscall
> - @api-version: 1
> - @context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
> - @param-type: family, KAPI_TYPE_INT
> - @param-flags: family, KAPI_PARAM_IN
> - @param-range: family, 0, 45
> - @param-mask: type, SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK
> - @error-code: -EAFNOSUPPORT, "Address family not supported"
> - @error-condition: -EAFNOSUPPORT, "family < 0 || family >= NPROTO"
> - @capability: CAP_NET_RAW, KAPI_CAP_GRANT_PERMISSION
> - @capability-allows: CAP_NET_RAW, "Create SOCK_RAW sockets"
> - @since: 2.0
> - @return-type: KAPI_TYPE_FD
> - @return-check: KAPI_RETURN_ERROR_CHECK
>
> How does it sound? I'm pretty excited about the possiblity to align this
> with kerneldoc. Please poke holes in the plan :)
Sounds like a plan!
We did something somewhat similar on IGT.
The python classes there were written with the goal to document
tests, so its examples are related to test docs, but I wrote it
to be generic.
There, all fields comes form a JSON file like this:
https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_test_config.json?ref_type=heads
which describes what fields will be used. It also lists file
patterns that will use it. The fields allow hierarchical
grouping, with could be interesting for some types of fields.
From the json example (I dropped the optional field description
from the example, to make it cleaner):
"Category": {
"Mega feature": {
"Sub-category": {},
}
...
"Test category": {},
"Issue": {},
...
The hierarchical part is useful to properly order kapi content
without the need to add multiple Sphinx markups to manually reorder
the output inside the .rst files.
(*) I would avoid hardcoding the fields/structures, as eventually
we may need more flexibility to add fields and/or having some
fields that are specific, for instance, to debugfs or sysfs.
The python class it uses is at:
https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/scripts/test_list.py?ref_type=heads
and caller is at:
https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/scripts/igt_doc.py?ref_type=heads
Eventually you may find something useful there. If so, feel free to
pick from it.
Regards,
Mauro
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-07-01 14:23 ` Sasha Levin
2025-07-01 15:25 ` Mauro Carvalho Chehab
@ 2025-07-01 19:01 ` Jonathan Corbet
2025-07-01 20:50 ` Sasha Levin
1 sibling, 1 reply; 33+ messages in thread
From: Jonathan Corbet @ 2025-07-01 19:01 UTC (permalink / raw)
To: Sasha Levin, Mauro Carvalho Chehab
Cc: linux-kernel, linux-doc, linux-api, workflows, tools,
Kate Stewart, Gabriele Paoloni, Chuck Wolber
[Adding some of the ELISA folks, who are working in a related area and
might have thoughts on this. You can find the patch series under
discussion at:
https://lore.kernel.org/all/20250624180742.5795-1-sashal@kernel.org
]
Sasha Levin <sashal@kernel.org> writes:
> 1. Add new section patterns to doc_sect regex in to include API
> specification sections: api-type, api-version, param-type, param-flags,
> param-constraint, error-code, capability, signal, lock-req, since...
Easily enough done - you can never have too many regexes :)
> 2. Create new output module (scripts/lib/kdoc/kdoc_apispec.py?) to
> generate C macro invocations from parsed data.
>
> Which will generate output like:
>
> DEFINE_KERNEL_API_SPEC(function_name)
> KAPI_DESCRIPTION("...")
> KAPI_PARAM(0, "name", "type", "desc")
> KAPI_PARAM_TYPE(KAPI_TYPE_INT)
> KAPI_PARAM_FLAGS(KAPI_PARAM_IN)
> KAPI_PARAM_END
> KAPI_END_SPEC
Also shouldn't be all that hard.
> 3. And then via makefile we can:
> - Generate API specs from kerneldoc comments
> - Include generated specs conditionally based on CONFIG_KERNEL_API_SPEC
>
> Allowing us to just have these in the relevant source files:
> #ifdef CONFIG_KERNEL_API_SPEC
> #include "socket.apispec.h"
> #endif
...seems like it should work.
> In theory, all of that will let us have something like the following in
> kerneldoc:
>
> - @api-type: syscall
> - @api-version: 1
> - @context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
> - @param-type: family, KAPI_TYPE_INT
> - @param-flags: family, KAPI_PARAM_IN
> - @param-range: family, 0, 45
> - @param-mask: type, SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK
> - @error-code: -EAFNOSUPPORT, "Address family not supported"
> - @error-condition: -EAFNOSUPPORT, "family < 0 || family >= NPROTO"
> - @capability: CAP_NET_RAW, KAPI_CAP_GRANT_PERMISSION
> - @capability-allows: CAP_NET_RAW, "Create SOCK_RAW sockets"
> - @since: 2.0
> - @return-type: KAPI_TYPE_FD
> - @return-check: KAPI_RETURN_ERROR_CHECK
>
> How does it sound? I'm pretty excited about the possiblity to align this
> with kerneldoc. Please poke holes in the plan :)
I think we could do it without all the @signs. We'd also want to see
how well we could integrate that information with the minimal structure
we already have: getting the return-value information into the Returns:
section, for example, and tying the parameter constraints to the
parameter descriptions we already have.
The other thing I would really like to see, to the extent we can, is
that a bunch of patches adding all this data to the source will actually
be accepted by the relevant maintainers. It would be a shame to get all
this infrastructure into place, then have things stall out due to
maintainer pushback. Maybe you should start by annotating the
scheduler-related system calls; if that works the rest should be a piece
of cake :)
Thanks,
jon
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-07-01 19:01 ` Jonathan Corbet
@ 2025-07-01 20:50 ` Sasha Levin
2025-07-01 21:43 ` Jonathan Corbet
0 siblings, 1 reply; 33+ messages in thread
From: Sasha Levin @ 2025-07-01 20:50 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Mauro Carvalho Chehab, linux-kernel, linux-doc, linux-api,
workflows, tools, Kate Stewart, Gabriele Paoloni, Chuck Wolber
On Tue, Jul 01, 2025 at 01:01:27PM -0600, Jonathan Corbet wrote:
>[Adding some of the ELISA folks, who are working in a related area and
>might have thoughts on this. You can find the patch series under
>discussion at:
>
> https://lore.kernel.org/all/20250624180742.5795-1-sashal@kernel.org
Yup, we all met at OSS and reached the conclusion that we should lean
towards a machine readable spec, which we thought was closer to my
proposal than the kerneldoc work.
However, with your suggestion, I think it makes more sense to go back to
kerneldoc as that can be made machine readable.
>> In theory, all of that will let us have something like the following in
>> kerneldoc:
>>
>> - @api-type: syscall
>> - @api-version: 1
>> - @context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
>> - @param-type: family, KAPI_TYPE_INT
>> - @param-flags: family, KAPI_PARAM_IN
>> - @param-range: family, 0, 45
>> - @param-mask: type, SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK
>> - @error-code: -EAFNOSUPPORT, "Address family not supported"
>> - @error-condition: -EAFNOSUPPORT, "family < 0 || family >= NPROTO"
>> - @capability: CAP_NET_RAW, KAPI_CAP_GRANT_PERMISSION
>> - @capability-allows: CAP_NET_RAW, "Create SOCK_RAW sockets"
>> - @since: 2.0
>> - @return-type: KAPI_TYPE_FD
>> - @return-check: KAPI_RETURN_ERROR_CHECK
>>
>> How does it sound? I'm pretty excited about the possiblity to align this
>> with kerneldoc. Please poke holes in the plan :)
>
>I think we could do it without all the @signs. We'd also want to see
>how well we could integrate that information with the minimal structure
>we already have: getting the return-value information into the Returns:
>section, for example, and tying the parameter constraints to the
>parameter descriptions we already have.
Right!
So I have a proof of concept which during the build process creates
.apispec.h which are generated from kerneldoc and contain macros
identical to the ones in my RFC.
Here's an example of sys_mlock() spec:
/**
* sys_mlock - Lock pages in memory
* @start: Starting address of memory range to lock
* @len: Length of memory range to lock in bytes
*
* Locks pages in the specified address range into RAM, preventing them from
* being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
* resource limit.
*
* long-desc: Locks pages in the specified address range into RAM, preventing
* them from being paged to swap. Requires CAP_IPC_LOCK capability
* or RLIMIT_MEMLOCK resource limit.
* context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
* param-type: start, KAPI_TYPE_UINT
* param-flags: start, KAPI_PARAM_IN
* param-constraint-type: start, KAPI_CONSTRAINT_NONE
* param-constraint: start, Rounded down to page boundary
* param-type: len, KAPI_TYPE_UINT
* param-flags: len, KAPI_PARAM_IN
* param-constraint-type: len, KAPI_CONSTRAINT_RANGE
* param-range: len, 0, LONG_MAX
* param-constraint: len, Rounded up to page boundary
* return-type: KAPI_TYPE_INT
* return-check-type: KAPI_RETURN_ERROR_CHECK
* return-success: 0
* error-code: -ENOMEM, ENOMEM, Address range issue,
* Some of the specified range is not mapped, has unmapped gaps,
* or the lock would cause the number of mapped regions to exceed the limit.
* error-code: -EPERM, EPERM, Insufficient privileges,
* The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
* error-code: -EINVAL, EINVAL, Address overflow,
* The result of the addition start+len was less than start (arithmetic overflow).
* error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
* Some or all of the specified address range could not be locked.
* error-code: -EINTR, EINTR, Interrupted by signal,
* The operation was interrupted by a fatal signal before completion.
* error-code: -EFAULT, EFAULT, Bad address,
* The specified address range contains invalid addresses that cannot be accessed.
* since-version: 2.0
* lock: mmap_lock, KAPI_LOCK_RWLOCK
* lock-acquired: true
* lock-released: true
* lock-desc: Process memory map write lock
* signal: FATAL
* signal-direction: KAPI_SIGNAL_RECEIVE
* signal-action: KAPI_SIGNAL_ACTION_RETURN
* signal-condition: Fatal signal pending
* signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
* when acquiring mmap_write_lock_killable() and during page population
* in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
* mlock - the operation continues even if SIGINT/SIGTERM are received.
* signal-error: -EINTR
* signal-timing: KAPI_SIGNAL_TIME_DURING
* signal-priority: 0
* signal-interruptible: yes
* signal-state-req: KAPI_SIGNAL_STATE_RUNNING
* examples: mlock(addr, 4096); // Lock one page
* mlock(addr, len); // Lock range of pages
* notes: Memory locks do not stack - multiple calls on the same range can be
* undone by a single munlock. Locks are not inherited by child processes.
* Pages are locked on whole page boundaries. Commonly used by real-time
* applications to prevent page faults during time-critical operations.
* Also used for security to prevent sensitive data (e.g., cryptographic keys)
* from being written to swap. Note: locked pages may still be saved to
* swap during system suspend/hibernate.
*
* Tagged addresses are automatically handled via untagged_addr(). The operation
* occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
* populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
* by recounting locked memory to avoid double-counting overlapping regions.
* side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes
* side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
* side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
* side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
* side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
* state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
* state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
* capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
* capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
* capability-without: Must respect RLIMIT_MEMLOCK resource limit
* capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
* capability-priority: 0
* constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
* constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
* constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
* constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
*
* Context: Process context. May sleep. Takes mmap_lock for write.
*
* Return: 0 on success, negative error code on failure
*/
>The other thing I would really like to see, to the extent we can, is
>that a bunch of patches adding all this data to the source will actually
>be accepted by the relevant maintainers. It would be a shame to get all
>this infrastructure into place, then have things stall out due to
>maintainer pushback. Maybe you should start by annotating the
>scheduler-related system calls; if that works the rest should be a piece
>of cake :)
In the RFC I've sent out I've specced out API from different subsystems
to solicit some feedback on those, but so fair it's been quiet.
I'll resend a "lean" RFC v3 with just the base macro spec infra +
kerneldoc support + "tricker" sched API + "trickier" mm API.
I'm thinking that if it's still quiet in a month or two I'll propose a
talk at LPC around it, or maybe try and feedback/consensus during
maintainer's summit.
But yes, it doesn't make sense to take it in until we have an ack from a
few larger subsystems.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-07-01 20:50 ` Sasha Levin
@ 2025-07-01 21:43 ` Jonathan Corbet
2025-07-01 22:16 ` Sasha Levin
0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Corbet @ 2025-07-01 21:43 UTC (permalink / raw)
To: Sasha Levin
Cc: Mauro Carvalho Chehab, linux-kernel, linux-doc, linux-api,
workflows, tools, Kate Stewart, Gabriele Paoloni, Chuck Wolber
Sasha Levin <sashal@kernel.org> writes:
> So I have a proof of concept which during the build process creates
> .apispec.h which are generated from kerneldoc and contain macros
> identical to the ones in my RFC.
>
> Here's an example of sys_mlock() spec:
So I'm getting ahead of the game, but I have to ask some questions...
> /**
> * sys_mlock - Lock pages in memory
> * @start: Starting address of memory range to lock
> * @len: Length of memory range to lock in bytes
> *
> * Locks pages in the specified address range into RAM, preventing them from
> * being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
> * resource limit.
> *
> * long-desc: Locks pages in the specified address range into RAM, preventing
> * them from being paged to swap. Requires CAP_IPC_LOCK capability
> * or RLIMIT_MEMLOCK resource limit.
Why duplicate the long description?
> * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
> * param-type: start, KAPI_TYPE_UINT
This is something I wondered before; rather than a bunch of lengthy
KAPI_* symbols, why not just say __u64 (or some other familiar type)
here?
> * param-flags: start, KAPI_PARAM_IN
> * param-constraint-type: start, KAPI_CONSTRAINT_NONE
> * param-constraint: start, Rounded down to page boundary
> * param-type: len, KAPI_TYPE_UINT
> * param-flags: len, KAPI_PARAM_IN
> * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
> * param-range: len, 0, LONG_MAX
> * param-constraint: len, Rounded up to page boundary
> * return-type: KAPI_TYPE_INT
> * return-check-type: KAPI_RETURN_ERROR_CHECK
> * return-success: 0
> * error-code: -ENOMEM, ENOMEM, Address range issue,
> * Some of the specified range is not mapped, has unmapped gaps,
> * or the lock would cause the number of mapped regions to exceed the limit.
> * error-code: -EPERM, EPERM, Insufficient privileges,
> * The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
> * error-code: -EINVAL, EINVAL, Address overflow,
> * The result of the addition start+len was less than start (arithmetic overflow).
> * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
> * Some or all of the specified address range could not be locked.
> * error-code: -EINTR, EINTR, Interrupted by signal,
> * The operation was interrupted by a fatal signal before completion.
> * error-code: -EFAULT, EFAULT, Bad address,
> * The specified address range contains invalid addresses that cannot be accessed.
> * since-version: 2.0
> * lock: mmap_lock, KAPI_LOCK_RWLOCK
> * lock-acquired: true
> * lock-released: true
> * lock-desc: Process memory map write lock
> * signal: FATAL
> * signal-direction: KAPI_SIGNAL_RECEIVE
> * signal-action: KAPI_SIGNAL_ACTION_RETURN
> * signal-condition: Fatal signal pending
> * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
> * when acquiring mmap_write_lock_killable() and during page population
> * in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
> * mlock - the operation continues even if SIGINT/SIGTERM are received.
> * signal-error: -EINTR
> * signal-timing: KAPI_SIGNAL_TIME_DURING
> * signal-priority: 0
> * signal-interruptible: yes
> * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
> * examples: mlock(addr, 4096); // Lock one page
> * mlock(addr, len); // Lock range of pages
> * notes: Memory locks do not stack - multiple calls on the same range can be
> * undone by a single munlock. Locks are not inherited by child processes.
> * Pages are locked on whole page boundaries. Commonly used by real-time
> * applications to prevent page faults during time-critical operations.
> * Also used for security to prevent sensitive data (e.g., cryptographic keys)
> * from being written to swap. Note: locked pages may still be saved to
> * swap during system suspend/hibernate.
> *
> * Tagged addresses are automatically handled via untagged_addr(). The operation
> * occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
> * populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
> * by recounting locked memory to avoid double-counting overlapping regions.
> * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes
I hope the really long lines starting here aren't the intended way to go...:)
> * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
> * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
> * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
> * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
> * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
> * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
> * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
> * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
> * capability-without: Must respect RLIMIT_MEMLOCK resource limit
> * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
> * capability-priority: 0
> * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
> * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
> * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
> * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
> *
> * Context: Process context. May sleep. Takes mmap_lock for write.
> *
> * Return: 0 on success, negative error code on failure
Both of these, of course, are much less informative versions of the data
you have put up above; it would be nice to unify them somehow.
Thanks,
jon
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
2025-07-01 21:43 ` Jonathan Corbet
@ 2025-07-01 22:16 ` Sasha Levin
0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2025-07-01 22:16 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Mauro Carvalho Chehab, linux-kernel, linux-doc, linux-api,
workflows, tools, Kate Stewart, Gabriele Paoloni, Chuck Wolber
On Tue, Jul 01, 2025 at 03:43:32PM -0600, Jonathan Corbet wrote:
>Sasha Levin <sashal@kernel.org> writes:
>
>> So I have a proof of concept which during the build process creates
>> .apispec.h which are generated from kerneldoc and contain macros
>> identical to the ones in my RFC.
>>
>> Here's an example of sys_mlock() spec:
>
>So I'm getting ahead of the game, but I have to ask some questions...
>
>> /**
>> * sys_mlock - Lock pages in memory
>> * @start: Starting address of memory range to lock
>> * @len: Length of memory range to lock in bytes
>> *
>> * Locks pages in the specified address range into RAM, preventing them from
>> * being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
>> * resource limit.
>> *
>> * long-desc: Locks pages in the specified address range into RAM, preventing
>> * them from being paged to swap. Requires CAP_IPC_LOCK capability
>> * or RLIMIT_MEMLOCK resource limit.
>
>Why duplicate the long description?
Will fix.
>> * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
>> * param-type: start, KAPI_TYPE_UINT
>
>This is something I wondered before; rather than a bunch of lengthy
>KAPI_* symbols, why not just say __u64 (or some other familiar type)
>here?
I think it gets tricky when we got to more complex types. For example,
how do we represent a FD or a (struct sockaddr *)?
With macros, KAPI_TYPE_FD or KAPI_TYPE_SOCKADDR make sense, but
__sockaddr will be a bit confusing (I think).
>> * param-flags: start, KAPI_PARAM_IN
>> * param-constraint-type: start, KAPI_CONSTRAINT_NONE
>> * param-constraint: start, Rounded down to page boundary
>> * param-type: len, KAPI_TYPE_UINT
>> * param-flags: len, KAPI_PARAM_IN
>> * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
>> * param-range: len, 0, LONG_MAX
>> * param-constraint: len, Rounded up to page boundary
>> * return-type: KAPI_TYPE_INT
>> * return-check-type: KAPI_RETURN_ERROR_CHECK
>> * return-success: 0
>> * error-code: -ENOMEM, ENOMEM, Address range issue,
>> * Some of the specified range is not mapped, has unmapped gaps,
>> * or the lock would cause the number of mapped regions to exceed the limit.
>> * error-code: -EPERM, EPERM, Insufficient privileges,
>> * The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
>> * error-code: -EINVAL, EINVAL, Address overflow,
>> * The result of the addition start+len was less than start (arithmetic overflow).
>> * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
>> * Some or all of the specified address range could not be locked.
>> * error-code: -EINTR, EINTR, Interrupted by signal,
>> * The operation was interrupted by a fatal signal before completion.
>> * error-code: -EFAULT, EFAULT, Bad address,
>> * The specified address range contains invalid addresses that cannot be accessed.
>> * since-version: 2.0
>> * lock: mmap_lock, KAPI_LOCK_RWLOCK
>> * lock-acquired: true
>> * lock-released: true
>> * lock-desc: Process memory map write lock
>> * signal: FATAL
>> * signal-direction: KAPI_SIGNAL_RECEIVE
>> * signal-action: KAPI_SIGNAL_ACTION_RETURN
>> * signal-condition: Fatal signal pending
>> * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
>> * when acquiring mmap_write_lock_killable() and during page population
>> * in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
>> * mlock - the operation continues even if SIGINT/SIGTERM are received.
>> * signal-error: -EINTR
>> * signal-timing: KAPI_SIGNAL_TIME_DURING
>> * signal-priority: 0
>> * signal-interruptible: yes
>> * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
>> * examples: mlock(addr, 4096); // Lock one page
>> * mlock(addr, len); // Lock range of pages
>> * notes: Memory locks do not stack - multiple calls on the same range can be
>> * undone by a single munlock. Locks are not inherited by child processes.
>> * Pages are locked on whole page boundaries. Commonly used by real-time
>> * applications to prevent page faults during time-critical operations.
>> * Also used for security to prevent sensitive data (e.g., cryptographic keys)
>> * from being written to swap. Note: locked pages may still be saved to
>> * swap during system suspend/hibernate.
>> *
>> * Tagged addresses are automatically handled via untagged_addr(). The operation
>> * occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
>> * populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
>> * by recounting locked memory to avoid double-counting overlapping regions.
>> * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes
>
>I hope the really long lines starting here aren't the intended way to go...:)
I guess that we have two options around more complex blocks like these.
One, the longer lines you've pointed out. They are indeed long and
difficult to read, but they present a relatively static and "not too
interesting" information which users are likely to gloss over.
The other one would look something like:
side-effect: KAPI_EFFECT_MODIFY_STATE
side-effect-type: KAPI_EFFECT_MODIFY_STATE
side-effect-target: mm->locked_vm
side-effect-description: Increases process locked memory counter
side-effect-reversible: yes
Which isn't as long, but it occupies a bunch of vertical real estate
while not being too interesting for most of the readers.
>> * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
>> * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
>> * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
>> * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
>> * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
>> * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
>> * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
>> * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
>> * capability-without: Must respect RLIMIT_MEMLOCK resource limit
>> * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
>> * capability-priority: 0
>> * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
>> * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
>> * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
>> * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
>> *
>> * Context: Process context. May sleep. Takes mmap_lock for write.
>> *
>> * Return: 0 on success, negative error code on failure
>
>Both of these, of course, are much less informative versions of the data
>you have put up above; it would be nice to unify them somehow.
Ack
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2025-07-01 22:16 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
2025-06-30 19:53 ` Jonathan Corbet
2025-06-30 22:20 ` Mauro Carvalho Chehab
2025-07-01 14:23 ` Sasha Levin
2025-07-01 15:25 ` Mauro Carvalho Chehab
2025-07-01 19:01 ` Jonathan Corbet
2025-07-01 20:50 ` Sasha Levin
2025-07-01 21:43 ` Jonathan Corbet
2025-07-01 22:16 ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 02/22] eventpoll: add API specification for epoll_create1 Sasha Levin
2025-06-24 18:07 ` [RFC v2 03/22] eventpoll: add API specification for epoll_create Sasha Levin
2025-06-24 18:07 ` [RFC v2 04/22] eventpoll: add API specification for epoll_ctl Sasha Levin
2025-06-24 18:07 ` [RFC v2 05/22] eventpoll: add API specification for epoll_wait Sasha Levin
2025-06-24 18:07 ` [RFC v2 06/22] eventpoll: add API specification for epoll_pwait Sasha Levin
2025-06-24 18:07 ` [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 08/22] exec: add API specification for execve Sasha Levin
2025-06-24 18:07 ` [RFC v2 09/22] exec: add API specification for execveat Sasha Levin
2025-06-24 18:07 ` [RFC v2 10/22] mm/mlock: add API specification for mlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 11/22] mm/mlock: add API specification for mlock2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 12/22] mm/mlock: add API specification for mlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 13/22] mm/mlock: add API specification for munlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 14/22] mm/mlock: add API specification for munlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure Sasha Levin
2025-06-24 18:07 ` [RFC v2 17/22] fwctl: add detailed IOCTL API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 18/22] binder: " Sasha Levin
2025-06-24 18:07 ` [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 20/22] block: sysfs API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 21/22] net/socket: add API specification for socket() Sasha Levin
2025-06-24 18:07 ` [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2025-07-01 2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
2025-07-01 14:54 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).