* [RFC PATCH v5 01/15] kernel/api: introduce kernel API specification framework
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 02/15] kernel/api: enable kerneldoc-based API specifications Sasha Levin
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Add a framework for formally documenting kernel APIs with inline
specifications. This framework provides:
- Structured API documentation with parameter specifications, return
values, error conditions, and execution context requirements
- Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
- Export of specifications via debugfs for tooling integration
- Support for both internal kernel APIs and system calls
The framework stores specifications in a dedicated ELF section and
provides infrastructure for:
- Compile-time validation of specifications
- Runtime querying of API documentation
- Machine-readable export formats
- Integration with existing SYSCALL_DEFINE macros
This commit introduces the core infrastructure without modifying any
existing APIs. Subsequent patches will add specifications to individual
subsystems.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.gitignore | 1 +
Documentation/dev-tools/kernel-api-spec.rst | 507 ++++++
MAINTAINERS | 9 +
include/asm-generic/vmlinux.lds.h | 28 +
include/linux/kernel_api_spec.h | 1597 +++++++++++++++++++
include/linux/syscall_api_spec.h | 198 +++
include/linux/syscalls.h | 38 +
init/Kconfig | 2 +
kernel/Makefile | 3 +
kernel/api/Kconfig | 35 +
kernel/api/Makefile | 29 +
kernel/api/kernel_api_spec.c | 1185 ++++++++++++++
scripts/generate_api_specs.sh | 18 +
13 files changed, 3650 insertions(+)
create mode 100644 Documentation/dev-tools/kernel-api-spec.rst
create mode 100644 include/linux/kernel_api_spec.h
create mode 100644 include/linux/syscall_api_spec.h
create mode 100644 kernel/api/Kconfig
create mode 100644 kernel/api/Makefile
create mode 100644 kernel/api/kernel_api_spec.c
create mode 100755 scripts/generate_api_specs.sh
diff --git a/.gitignore b/.gitignore
index 3a7241c941f5e..7130001e444f1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,6 +12,7 @@
#
.*
*.a
+*.apispec.h
*.asn1.[ch]
*.bin
*.bz2
diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/dev-tools/kernel-api-spec.rst
new file mode 100644
index 0000000000000..3a63f6711e27b
--- /dev/null
+++ b/Documentation/dev-tools/kernel-api-spec.rst
@@ -0,0 +1,507 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Kernel API Specification Framework
+======================================
+
+:Author: Sasha Levin <sashal@kernel.org>
+:Date: June 2025
+
+.. contents:: Table of Contents
+ :depth: 3
+ :local:
+
+Introduction
+============
+
+The Kernel API Specification Framework (KAPI) provides a comprehensive system for
+formally documenting, validating, and introspecting kernel APIs. This framework
+addresses the long-standing challenge of maintaining accurate, machine-readable
+documentation for the thousands of internal kernel APIs and system calls.
+
+Purpose and Goals
+-----------------
+
+The framework aims to:
+
+1. **Improve API Documentation**: Provide structured, inline documentation that
+ lives alongside the code and is maintained as part of the development process.
+
+2. **Enable Runtime Validation**: Optionally validate API usage at runtime to catch
+ common programming errors during development and testing.
+
+3. **Support Tooling**: Export API specifications in machine-readable formats for
+ use by static analyzers, documentation generators, and development tools.
+
+4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
+ for debugging and introspection.
+
+5. **Formalize Contracts**: Explicitly document API contracts including parameter
+ constraints, execution contexts, locking requirements, and side effects.
+
+Architecture Overview
+=====================
+
+Components
+----------
+
+The framework consists of several key components:
+
+1. **Core Framework** (``kernel/api/kernel_api_spec.c``)
+
+ - API specification registration and storage
+ - Runtime validation engine
+ - Specification lookup and querying
+
+2. **DebugFS Interface** (``kernel/api/kapi_debugfs.c``)
+
+ - Runtime introspection via ``/sys/kernel/debug/kapi/``
+ - JSON and XML export formats
+ - Per-API detailed information
+
+3. **IOCTL Support** (``kernel/api/ioctl_validation.c``)
+
+ - Extended framework for IOCTL specifications
+ - Automatic validation wrappers
+ - Structure field validation
+
+4. **Specification Macros** (``include/linux/kernel_api_spec.h``)
+
+ - Declarative macros for API documentation
+ - Type-safe parameter specifications
+ - Context and constraint definitions
+
+Data Model
+----------
+
+The framework uses a hierarchical data model::
+
+ kernel_api_spec
+ ├── Basic Information
+ │ ├── name (API function name)
+ │ ├── version (specification version)
+ │ ├── description (human-readable description)
+ │ └── kernel_version (when API was introduced)
+ │
+ ├── Parameters (up to 16)
+ │ └── kapi_param_spec
+ │ ├── name
+ │ ├── type (int, pointer, string, etc.)
+ │ ├── direction (in, out, inout)
+ │ ├── constraints (range, mask, enum values)
+ │ └── validation rules
+ │
+ ├── Return Value
+ │ └── kapi_return_spec
+ │ ├── type
+ │ ├── success conditions
+ │ └── validation rules
+ │
+ ├── Error Conditions (up to 32)
+ │ └── kapi_error_spec
+ │ ├── error code
+ │ ├── condition description
+ │ └── recovery advice
+ │
+ ├── Execution Context
+ │ ├── allowed contexts (process, interrupt, etc.)
+ │ ├── locking requirements
+ │ └── preemption/interrupt state
+ │
+ └── Side Effects
+ ├── memory allocation
+ ├── state changes
+ └── signal handling
+
+Usage Guide
+===========
+
+Basic API Specification
+-----------------------
+
+To document a kernel API, use the specification macros in the implementation file:
+
+.. code-block:: c
+
+ #include <linux/kernel_api_spec.h>
+
+ KAPI_DEFINE_SPEC(kmalloc_spec, kmalloc, "3.0")
+ KAPI_DESCRIPTION("Allocate kernel memory")
+ KAPI_PARAM(0, size, KAPI_TYPE_SIZE_T, KAPI_DIR_IN,
+ "Number of bytes to allocate")
+ KAPI_PARAM_RANGE(0, 0, KMALLOC_MAX_SIZE)
+ KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+ "Allocation flags (GFP_*)")
+ KAPI_PARAM_MASK(1, __GFP_BITS_MASK)
+ KAPI_RETURN(KAPI_TYPE_POINTER, "Pointer to allocated memory or NULL")
+ KAPI_ERROR(ENOMEM, "Out of memory")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SOFTIRQ | KAPI_CTX_HARDIRQ)
+ KAPI_SIDE_EFFECT("Allocates memory from kernel heap")
+ KAPI_LOCK_NOT_REQUIRED("Any lock")
+ KAPI_END_SPEC
+
+ void *kmalloc(size_t size, gfp_t flags)
+ {
+ /* Implementation */
+ }
+
+System Call Specification
+-------------------------
+
+System calls use specialized macros:
+
+.. code-block:: c
+
+ KAPI_DEFINE_SYSCALL_SPEC(open_spec, open, "1.0")
+ KAPI_DESCRIPTION("Open a file")
+ KAPI_PARAM(0, pathname, KAPI_TYPE_USER_STRING, KAPI_DIR_IN,
+ "Path to file")
+ KAPI_PARAM_PATH(0, PATH_MAX)
+ KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+ "Open flags (O_*)")
+ KAPI_PARAM(2, mode, KAPI_TYPE_MODE_T, KAPI_DIR_IN,
+ "File permissions (if creating)")
+ KAPI_RETURN(KAPI_TYPE_INT, "File descriptor or -1")
+ KAPI_ERROR(EACCES, "Permission denied")
+ KAPI_ERROR(ENOENT, "File does not exist")
+ KAPI_ERROR(EMFILE, "Too many open files")
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+ KAPI_SIGNAL(EINTR, "Open can be interrupted by signal")
+ KAPI_END_SYSCALL_SPEC
+
+IOCTL Specification
+-------------------
+
+IOCTLs have extended support for structure validation:
+
+.. code-block:: c
+
+ KAPI_DEFINE_IOCTL_SPEC(vidioc_querycap_spec, VIDIOC_QUERYCAP,
+ "VIDIOC_QUERYCAP",
+ sizeof(struct v4l2_capability),
+ sizeof(struct v4l2_capability),
+ "video_fops")
+ KAPI_DESCRIPTION("Query device capabilities")
+ KAPI_IOCTL_FIELD(driver, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+ "Driver name", 16)
+ KAPI_IOCTL_FIELD(card, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+ "Device name", 32)
+ KAPI_IOCTL_FIELD(version, KAPI_TYPE_U32, KAPI_DIR_OUT,
+ "Driver version")
+ KAPI_IOCTL_FIELD(capabilities, KAPI_TYPE_FLAGS, KAPI_DIR_OUT,
+ "Device capabilities")
+ KAPI_END_IOCTL_SPEC
+
+Runtime Validation
+==================
+
+Enabling Validation
+-------------------
+
+Runtime validation is controlled by kernel configuration:
+
+1. Enable ``CONFIG_KAPI_SPEC`` to build the framework
+2. Enable ``CONFIG_KAPI_RUNTIME_CHECKS`` for runtime validation
+3. Optionally enable ``CONFIG_KAPI_SPEC_DEBUGFS`` for debugfs interface
+
+Validation Modes
+----------------
+
+The framework supports several validation modes:
+
+.. code-block:: c
+
+ /* Enable validation for specific API */
+ kapi_enable_validation("kmalloc");
+
+ /* Enable validation for all APIs */
+ kapi_enable_all_validation();
+
+ /* Set validation level */
+ kapi_set_validation_level(KAPI_VALIDATE_FULL);
+
+Validation Levels:
+
+- ``KAPI_VALIDATE_NONE``: No validation
+- ``KAPI_VALIDATE_BASIC``: Type and NULL checks only
+- ``KAPI_VALIDATE_NORMAL``: Basic + range and constraint checks
+- ``KAPI_VALIDATE_FULL``: All checks including custom validators
+
+Custom Validators
+-----------------
+
+APIs can register custom validation functions:
+
+.. code-block:: c
+
+ static bool validate_buffer_size(const struct kapi_param_spec *spec,
+ const void *value, void *context)
+ {
+ size_t size = *(size_t *)value;
+ struct my_context *ctx = context;
+
+ return size > 0 && size <= ctx->max_buffer_size;
+ }
+
+ KAPI_PARAM_CUSTOM_VALIDATOR(0, validate_buffer_size)
+
+DebugFS Interface
+=================
+
+The debugfs interface provides runtime access to API specifications:
+
+Directory Structure
+-------------------
+
+::
+
+ /sys/kernel/debug/kapi/
+ ├── apis/ # All registered APIs
+ │ ├── kmalloc/
+ │ │ ├── specification # Human-readable spec
+ │ │ ├── json # JSON format
+ │ │ └── xml # XML format
+ │ └── open/
+ │ └── ...
+ ├── summary # Overview of all APIs
+ ├── validation/ # Validation controls
+ │ ├── enabled # Global enable/disable
+ │ ├── level # Validation level
+ │ └── stats # Validation statistics
+ └── export/ # Bulk export options
+ ├── all.json # All specs in JSON
+ └── all.xml # All specs in XML
+
+Usage Examples
+--------------
+
+Query specific API::
+
+ $ cat /sys/kernel/debug/kapi/apis/kmalloc/specification
+ API: kmalloc
+ Version: 3.0
+ Description: Allocate kernel memory
+
+ Parameters:
+ [0] size (size_t, in): Number of bytes to allocate
+ Range: 0 - 4194304
+ [1] flags (flags, in): Allocation flags (GFP_*)
+ Mask: 0x1ffffff
+
+ Returns: pointer - Pointer to allocated memory or NULL
+
+ Errors:
+ ENOMEM: Out of memory
+
+ Context: process, softirq, hardirq
+
+ Side Effects:
+ - Allocates memory from kernel heap
+
+Export all specifications::
+
+ $ cat /sys/kernel/debug/kapi/export/all.json > kernel-apis.json
+
+Enable validation for specific API::
+
+ $ echo 1 > /sys/kernel/debug/kapi/apis/kmalloc/validate
+
+Performance Considerations
+==========================
+
+Memory Overhead
+---------------
+
+Each API specification consumes approximately 2-4KB of memory. With thousands
+of kernel APIs, this can add up to several megabytes. Consider:
+
+1. Building with ``CONFIG_KAPI_SPEC=n`` for production kernels
+2. Using ``__init`` annotations for APIs only used during boot
+3. Implementing lazy loading for rarely used specifications
+
+Runtime Overhead
+----------------
+
+When ``CONFIG_KAPI_RUNTIME_CHECKS`` is enabled:
+
+- Each validated API call adds 50-200ns overhead
+- Complex validations (custom validators) may add more
+- Use validation only in development/testing kernels
+
+Optimization Strategies
+-----------------------
+
+1. **Compile-time optimization**: When validation is disabled, all
+ validation code is optimized away by the compiler.
+
+2. **Selective validation**: Enable validation only for specific APIs
+ or subsystems under test.
+
+3. **Caching**: The framework caches validation results for repeated
+ calls with identical parameters.
+
+Documentation Generation
+------------------------
+
+The framework exports specifications via debugfs that can be used
+to generate documentation. Tools for automatic documentation generation
+from specifications are planned for future development.
+
+IDE Integration
+---------------
+
+Modern IDEs can use the JSON export for:
+
+- Parameter hints
+- Type checking
+- Context validation
+- Error code documentation
+
+Testing Framework
+-----------------
+
+The framework includes test helpers::
+
+ #ifdef CONFIG_KAPI_TESTING
+ /* Verify API behaves according to specification */
+ kapi_test_api("kmalloc", test_cases);
+ #endif
+
+Best Practices
+==============
+
+Writing Specifications
+----------------------
+
+1. **Be Comprehensive**: Document all parameters, errors, and side effects
+2. **Keep Updated**: Update specs when API behavior changes
+3. **Use Examples**: Include usage examples in descriptions
+4. **Validate Constraints**: Define realistic constraints for parameters
+5. **Document Context**: Clearly specify allowed execution contexts
+
+Maintenance
+-----------
+
+1. **Version Specifications**: Increment version when API changes
+2. **Deprecation**: Mark deprecated APIs and suggest replacements
+3. **Cross-reference**: Link related APIs in descriptions
+4. **Test Specifications**: Verify specs match implementation
+
+Common Patterns
+---------------
+
+**Optional Parameters**::
+
+ KAPI_PARAM(2, optional_arg, KAPI_TYPE_POINTER, KAPI_DIR_IN,
+ "Optional argument (may be NULL)")
+ KAPI_PARAM_OPTIONAL(2)
+
+**Variable Arguments**::
+
+ KAPI_PARAM(1, fmt, KAPI_TYPE_FORMAT_STRING, KAPI_DIR_IN,
+ "Printf-style format string")
+ KAPI_PARAM_VARIADIC(2, "Format arguments")
+
+**Callback Functions**::
+
+ KAPI_PARAM(1, callback, KAPI_TYPE_FUNCTION_PTR, KAPI_DIR_IN,
+ "Callback function")
+ KAPI_PARAM_CALLBACK(1, "int (*)(void *data)", "data")
+
+Troubleshooting
+===============
+
+Common Issues
+-------------
+
+**Specification Not Found**::
+
+ kernel: KAPI: Specification for 'my_api' not found
+
+ Solution: Ensure KAPI_DEFINE_SPEC is in the same translation unit
+ as the function implementation.
+
+**Validation Failures**::
+
+ kernel: KAPI: Validation failed for kmalloc parameter 'size':
+ value 5242880 exceeds maximum 4194304
+
+ Solution: Check parameter constraints or adjust specification if
+ the constraint is incorrect.
+
+**Build Errors**::
+
+ error: 'KAPI_TYPE_UNKNOWN' undeclared
+
+ Solution: Include <linux/kernel_api_spec.h> and ensure
+ CONFIG_KAPI_SPEC is enabled.
+
+Debug Options
+-------------
+
+Enable verbose debugging::
+
+ echo 8 > /proc/sys/kernel/printk
+ echo 1 > /sys/kernel/debug/kapi/debug/verbose
+
+Future Directions
+=================
+
+Planned Features
+----------------
+
+1. **Automatic Extraction**: Tool to extract specifications from existing
+ kernel-doc comments
+
+2. **Contract Verification**: Static analysis to verify implementation
+ matches specification
+
+3. **Performance Profiling**: Measure actual API performance against
+ documented expectations
+
+4. **Fuzzing Integration**: Use specifications to guide intelligent
+ fuzzing of kernel APIs
+
+5. **Version Compatibility**: Track API changes across kernel versions
+
+Research Areas
+--------------
+
+1. **Formal Verification**: Use specifications for mathematical proofs
+ of correctness
+
+2. **Runtime Monitoring**: Detect specification violations in production
+ with minimal overhead
+
+3. **API Evolution**: Analyze how kernel APIs change over time
+
+4. **Security Applications**: Use specifications for security policy
+ enforcement
+
+Contributing
+============
+
+Submitting Specifications
+-------------------------
+
+1. Add specifications to the same file as the API implementation
+2. Follow existing patterns and naming conventions
+3. Test with CONFIG_KAPI_RUNTIME_CHECKS enabled
+4. Verify debugfs output is correct
+5. Run scripts/checkpatch.pl on your changes
+
+Review Criteria
+---------------
+
+Specifications will be reviewed for:
+
+1. **Completeness**: All parameters and errors documented
+2. **Accuracy**: Specification matches implementation
+3. **Clarity**: Descriptions are clear and helpful
+4. **Consistency**: Follows framework conventions
+5. **Performance**: No unnecessary runtime overhead
+
+Contact
+-------
+
+- Maintainer: Sasha Levin <sashal@kernel.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index 5b11839cba9de..14cd8b3c95e40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13647,6 +13647,15 @@ W: https://linuxtv.org
T: git git://linuxtv.org/media.git
F: drivers/media/radio/radio-keene*
+KERNEL API SPECIFICATION FRAMEWORK (KAPI)
+M: Sasha Levin <sashal@kernel.org>
+L: linux-api@vger.kernel.org
+S: Maintained
+F: Documentation/admin-guide/kernel-api-spec.rst
+F: include/linux/kernel_api_spec.h
+F: kernel/api/
+F: scripts/extract-kapi-spec.sh
+
KERNEL AUTOMOUNTER
M: Ian Kent <raven@themaw.net>
L: autofs@vger.kernel.org
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8ca130af301fc..658a14f8bf309 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -296,6 +296,33 @@
#define TRACE_SYSCALLS()
#endif
+#ifdef CONFIG_KAPI_SPEC
+/*
+ * KAPI_SPECS - Include kernel API specifications in current section
+ *
+ * The .kapi_specs input section has 32-byte alignment requirement from
+ * the compiler, so we must align to 32 bytes before setting the start
+ * symbol to avoid padding between the symbol and actual data.
+ */
+#define KAPI_SPECS() \
+ . = ALIGN(32); \
+ __start_kapi_specs = .; \
+ KEEP(*(.kapi_specs)) \
+ __stop_kapi_specs = .;
+
+/* For placing KAPI specs in a dedicated section */
+#define KAPI_SPECS_SECTION() \
+ .kapi_specs : AT(ADDR(.kapi_specs) - LOAD_OFFSET) { \
+ . = ALIGN(32); \
+ __start_kapi_specs = .; \
+ KEEP(*(.kapi_specs)) \
+ __stop_kapi_specs = .; \
+ }
+#else
+#define KAPI_SPECS()
+#define KAPI_SPECS_SECTION()
+#endif
+
#ifdef CONFIG_BPF_EVENTS
#define BPF_RAW_TP() STRUCT_ALIGN(); \
BOUNDED_SECTION_BY(__bpf_raw_tp_map, __bpf_raw_tp)
@@ -485,6 +512,7 @@
. = ALIGN(8); \
BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \
*(__tracepoints_strings)/* Tracepoints: strings */ \
+ KAPI_SPECS() \
} \
\
.rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spec.h
new file mode 100644
index 0000000000000..b3460d156602e
--- /dev/null
+++ b/include/linux/kernel_api_spec.h
@@ -0,0 +1,1597 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * kernel_api_spec.h - Kernel API Formal Specification Framework
+ *
+ * This framework provides structures and macros to formally specify kernel APIs
+ * in both human and machine-readable formats. It supports comprehensive documentation
+ * of function signatures, parameters, return values, error conditions, and constraints.
+ */
+
+#ifndef _LINUX_KERNEL_API_SPEC_H
+#define _LINUX_KERNEL_API_SPEC_H
+
+#include <linux/types.h>
+#include <linux/stringify.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+
+struct sigaction;
+
+#define KAPI_MAX_PARAMS 16
+#define KAPI_MAX_ERRORS 32
+#define KAPI_MAX_CONSTRAINTS 32
+#define KAPI_MAX_SIGNALS 32
+#define KAPI_MAX_NAME_LEN 128
+#define KAPI_MAX_DESC_LEN 512
+#define KAPI_MAX_CAPABILITIES 8
+#define KAPI_MAX_SOCKET_STATES 16
+#define KAPI_MAX_PROTOCOL_BEHAVIORS 8
+#define KAPI_MAX_NET_ERRORS 16
+#define KAPI_MAX_SOCKOPTS 16
+#define KAPI_MAX_ADDR_FAMILIES 8
+
+/* Magic numbers for section validation (ASCII mnemonics) */
+#define KAPI_MAGIC_PARAMS 0x4B415031 /* 'KAP1' */
+#define KAPI_MAGIC_RETURN 0x4B415232 /* 'KAR2' */
+#define KAPI_MAGIC_ERRORS 0x4B414533 /* 'KAE3' */
+#define KAPI_MAGIC_LOCKS 0x4B414C34 /* 'KAL4' */
+#define KAPI_MAGIC_CONSTRAINTS 0x4B414335 /* 'KAC5' */
+#define KAPI_MAGIC_INFO 0x4B414936 /* 'KAI6' */
+#define KAPI_MAGIC_SIGNALS 0x4B415337 /* 'KAS7' */
+#define KAPI_MAGIC_SIGMASK 0x4B414D38 /* 'KAM8' */
+#define KAPI_MAGIC_STRUCTS 0x4B415439 /* 'KAT9' */
+#define KAPI_MAGIC_EFFECTS 0x4B414641 /* 'KAFA' */
+#define KAPI_MAGIC_TRANS 0x4B415442 /* 'KATB' */
+#define KAPI_MAGIC_CAPS 0x4B414343 /* 'KACC' */
+
+/**
+ * enum kapi_param_type - Parameter type classification
+ * @KAPI_TYPE_VOID: void type
+ * @KAPI_TYPE_INT: Integer types (int, long, etc.)
+ * @KAPI_TYPE_UINT: Unsigned integer types
+ * @KAPI_TYPE_PTR: Pointer types
+ * @KAPI_TYPE_STRUCT: Structure types
+ * @KAPI_TYPE_UNION: Union types
+ * @KAPI_TYPE_ENUM: Enumeration types
+ * @KAPI_TYPE_FUNC_PTR: Function pointer types
+ * @KAPI_TYPE_ARRAY: Array types
+ * @KAPI_TYPE_FD: File descriptor - validated in process context
+ * @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size
+ * @KAPI_TYPE_PATH: Pathname - validated for access and path limits
+ * @KAPI_TYPE_CUSTOM: Custom/complex types
+ */
+enum kapi_param_type {
+ KAPI_TYPE_VOID = 0,
+ KAPI_TYPE_INT,
+ KAPI_TYPE_UINT,
+ KAPI_TYPE_PTR,
+ KAPI_TYPE_STRUCT,
+ KAPI_TYPE_UNION,
+ KAPI_TYPE_ENUM,
+ KAPI_TYPE_FUNC_PTR,
+ KAPI_TYPE_ARRAY,
+ KAPI_TYPE_FD, /* File descriptor - validated in process context */
+ KAPI_TYPE_USER_PTR, /* User space pointer - validated for access and size */
+ KAPI_TYPE_PATH, /* Pathname - validated for access and path limits */
+ KAPI_TYPE_CUSTOM,
+};
+
+/**
+ * enum kapi_param_flags - Parameter attribute flags
+ * @KAPI_PARAM_IN: Input parameter
+ * @KAPI_PARAM_OUT: Output parameter
+ * @KAPI_PARAM_INOUT: Input/output parameter
+ * @KAPI_PARAM_OPTIONAL: Optional parameter (can be NULL)
+ * @KAPI_PARAM_CONST: Const qualified parameter
+ * @KAPI_PARAM_VOLATILE: Volatile qualified parameter
+ * @KAPI_PARAM_USER: User space pointer
+ * @KAPI_PARAM_DMA: DMA-capable memory required
+ * @KAPI_PARAM_ALIGNED: Alignment requirements
+ */
+enum kapi_param_flags {
+ KAPI_PARAM_IN = (1 << 0),
+ KAPI_PARAM_OUT = (1 << 1),
+ KAPI_PARAM_INOUT = (1 << 2),
+ KAPI_PARAM_OPTIONAL = (1 << 3),
+ KAPI_PARAM_CONST = (1 << 4),
+ KAPI_PARAM_VOLATILE = (1 << 5),
+ KAPI_PARAM_USER = (1 << 6),
+ KAPI_PARAM_DMA = (1 << 7),
+ KAPI_PARAM_ALIGNED = (1 << 8),
+};
+
+/**
+ * enum kapi_context_flags - Function execution context flags
+ * @KAPI_CTX_PROCESS: Can be called from process context
+ * @KAPI_CTX_SOFTIRQ: Can be called from softirq context
+ * @KAPI_CTX_HARDIRQ: Can be called from hardirq context
+ * @KAPI_CTX_NMI: Can be called from NMI context
+ * @KAPI_CTX_ATOMIC: Must be called in atomic context
+ * @KAPI_CTX_SLEEPABLE: May sleep
+ * @KAPI_CTX_PREEMPT_DISABLED: Requires preemption disabled
+ * @KAPI_CTX_IRQ_DISABLED: Requires interrupts disabled
+ */
+enum kapi_context_flags {
+ KAPI_CTX_PROCESS = (1 << 0),
+ KAPI_CTX_SOFTIRQ = (1 << 1),
+ KAPI_CTX_HARDIRQ = (1 << 2),
+ KAPI_CTX_NMI = (1 << 3),
+ KAPI_CTX_ATOMIC = (1 << 4),
+ KAPI_CTX_SLEEPABLE = (1 << 5),
+ KAPI_CTX_PREEMPT_DISABLED = (1 << 6),
+ KAPI_CTX_IRQ_DISABLED = (1 << 7),
+};
+
+/**
+ * enum kapi_lock_type - Lock types used/required by the function
+ * @KAPI_LOCK_NONE: No locking requirements
+ * @KAPI_LOCK_MUTEX: Mutex lock
+ * @KAPI_LOCK_SPINLOCK: Spinlock
+ * @KAPI_LOCK_RWLOCK: Read-write lock
+ * @KAPI_LOCK_SEQLOCK: Sequence lock
+ * @KAPI_LOCK_RCU: RCU lock
+ * @KAPI_LOCK_SEMAPHORE: Semaphore
+ * @KAPI_LOCK_CUSTOM: Custom locking mechanism
+ */
+enum kapi_lock_type {
+ KAPI_LOCK_NONE = 0,
+ KAPI_LOCK_MUTEX,
+ KAPI_LOCK_SPINLOCK,
+ KAPI_LOCK_RWLOCK,
+ KAPI_LOCK_SEQLOCK,
+ KAPI_LOCK_RCU,
+ KAPI_LOCK_SEMAPHORE,
+ KAPI_LOCK_CUSTOM,
+};
+
+/**
+ * enum kapi_constraint_type - Types of parameter constraints
+ * @KAPI_CONSTRAINT_NONE: No constraint
+ * @KAPI_CONSTRAINT_RANGE: Numeric range constraint
+ * @KAPI_CONSTRAINT_MASK: Bitmask constraint
+ * @KAPI_CONSTRAINT_ENUM: Enumerated values constraint
+ * @KAPI_CONSTRAINT_ALIGNMENT: Alignment constraint (must be aligned to specified boundary)
+ * @KAPI_CONSTRAINT_POWER_OF_TWO: Value must be a power of two
+ * @KAPI_CONSTRAINT_PAGE_ALIGNED: Value must be page-aligned
+ * @KAPI_CONSTRAINT_NONZERO: Value must be non-zero
+ * @KAPI_CONSTRAINT_USER_STRING: Userspace null-terminated string with length range
+ * @KAPI_CONSTRAINT_USER_PATH: Userspace pathname string (validated for accessibility and PATH_MAX)
+ * @KAPI_CONSTRAINT_USER_PTR: Userspace pointer (validated for accessibility and size)
+ * @KAPI_CONSTRAINT_CUSTOM: Custom validation function
+ */
+enum kapi_constraint_type {
+ KAPI_CONSTRAINT_NONE = 0,
+ KAPI_CONSTRAINT_RANGE,
+ KAPI_CONSTRAINT_MASK,
+ KAPI_CONSTRAINT_ENUM,
+ KAPI_CONSTRAINT_ALIGNMENT,
+ KAPI_CONSTRAINT_POWER_OF_TWO,
+ KAPI_CONSTRAINT_PAGE_ALIGNED,
+ KAPI_CONSTRAINT_NONZERO,
+ KAPI_CONSTRAINT_USER_STRING,
+ KAPI_CONSTRAINT_USER_PATH,
+ KAPI_CONSTRAINT_USER_PTR,
+ KAPI_CONSTRAINT_CUSTOM,
+};
+
+/**
+ * struct kapi_param_spec - Parameter specification
+ * @name: Parameter name
+ * @type_name: Type name as string
+ * @type: Parameter type classification
+ * @flags: Parameter attribute flags
+ * @size: Size in bytes (for arrays/buffers)
+ * @alignment: Required alignment
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag parameters)
+ * @enum_values: Array of valid enumerated values
+ * @enum_count: Number of valid enumerated values
+ * @constraint_type: Type of constraint applied
+ * @validate: Custom validation function
+ * @description: Human-readable description
+ * @constraints: Additional constraints description
+ * @size_param_idx: Index of parameter that determines size (-1 if fixed size)
+ * @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct))
+ */
+struct kapi_param_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ char type_name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ u32 flags;
+ size_t size;
+ size_t alignment;
+ s64 min_value;
+ s64 max_value;
+ u64 valid_mask;
+ const s64 *enum_values;
+ u32 enum_count;
+ enum kapi_constraint_type constraint_type;
+ bool (*validate)(s64 value);
+ char description[KAPI_MAX_DESC_LEN];
+ char constraints[KAPI_MAX_DESC_LEN];
+ int size_param_idx; /* Index of param that determines size, -1 if N/A */
+ size_t size_multiplier; /* Size per unit (e.g., sizeof(struct epoll_event)) */
+} __attribute__((packed));
+
+/**
+ * struct kapi_error_spec - Error condition specification
+ * @error_code: Error code value
+ * @name: Error code name (e.g., "EINVAL")
+ * @condition: Condition that triggers this error
+ * @description: Detailed error description
+ */
+struct kapi_error_spec {
+ int error_code;
+ char name[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_return_check_type - Return value check types
+ * @KAPI_RETURN_EXACT: Success is an exact value
+ * @KAPI_RETURN_RANGE: Success is within a range
+ * @KAPI_RETURN_ERROR_CHECK: Success is when NOT in error list
+ * @KAPI_RETURN_FD: Return value is a file descriptor (>= 0 is success)
+ * @KAPI_RETURN_CUSTOM: Custom validation function
+ * @KAPI_RETURN_NO_RETURN: Function does not return (e.g., exec on success)
+ */
+enum kapi_return_check_type {
+ KAPI_RETURN_EXACT,
+ KAPI_RETURN_RANGE,
+ KAPI_RETURN_ERROR_CHECK,
+ KAPI_RETURN_FD,
+ KAPI_RETURN_CUSTOM,
+ KAPI_RETURN_NO_RETURN,
+};
+
+/**
+ * struct kapi_return_spec - Return value specification
+ * @type_name: Return type name
+ * @type: Return type classification
+ * @check_type: Type of success check to perform
+ * @success_value: Exact value indicating success (for EXACT)
+ * @success_min: Minimum success value (for RANGE)
+ * @success_max: Maximum success value (for RANGE)
+ * @error_values: Array of error values (for ERROR_CHECK)
+ * @error_count: Number of error values
+ * @is_success: Custom function to check success
+ * @description: Return value description
+ */
+struct kapi_return_spec {
+ char type_name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ enum kapi_return_check_type check_type;
+ s64 success_value;
+ s64 success_min;
+ s64 success_max;
+ const s64 *error_values;
+ u32 error_count;
+ bool (*is_success)(s64 retval);
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_lock_scope - Lock acquisition/release scope
+ * @KAPI_LOCK_INTERNAL: Lock is acquired and released within the function (common case)
+ * @KAPI_LOCK_ACQUIRES: Function acquires lock but does not release it
+ * @KAPI_LOCK_RELEASES: Function releases lock (must be held on entry)
+ * @KAPI_LOCK_CALLER_HELD: Lock must be held by caller throughout the call
+ */
+enum kapi_lock_scope {
+ KAPI_LOCK_INTERNAL = 0,
+ KAPI_LOCK_ACQUIRES,
+ KAPI_LOCK_RELEASES,
+ KAPI_LOCK_CALLER_HELD,
+};
+
+/**
+ * struct kapi_lock_spec - Lock requirement specification
+ * @lock_name: Name of the lock
+ * @lock_type: Type of lock
+ * @scope: Lock scope (internal, acquires, releases, or caller-held)
+ * @description: Additional lock requirements
+ */
+struct kapi_lock_spec {
+ char lock_name[KAPI_MAX_NAME_LEN];
+ enum kapi_lock_type lock_type;
+ enum kapi_lock_scope scope;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_constraint_spec - Additional constraint specification
+ * @name: Constraint name
+ * @description: Constraint description
+ * @expression: Formal expression (if applicable)
+ */
+struct kapi_constraint_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ char expression[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_signal_direction - Signal flow direction
+ * @KAPI_SIGNAL_RECEIVE: Function may receive this signal
+ * @KAPI_SIGNAL_SEND: Function may send this signal
+ * @KAPI_SIGNAL_HANDLE: Function handles this signal specially
+ * @KAPI_SIGNAL_BLOCK: Function blocks this signal
+ * @KAPI_SIGNAL_IGNORE: Function ignores this signal
+ */
+enum kapi_signal_direction {
+ KAPI_SIGNAL_RECEIVE = (1 << 0),
+ KAPI_SIGNAL_SEND = (1 << 1),
+ KAPI_SIGNAL_HANDLE = (1 << 2),
+ KAPI_SIGNAL_BLOCK = (1 << 3),
+ KAPI_SIGNAL_IGNORE = (1 << 4),
+};
+
+/**
+ * enum kapi_signal_action - What the function does with the signal
+ * @KAPI_SIGNAL_ACTION_DEFAULT: Default signal action applies
+ * @KAPI_SIGNAL_ACTION_TERMINATE: Causes termination
+ * @KAPI_SIGNAL_ACTION_COREDUMP: Causes termination with core dump
+ * @KAPI_SIGNAL_ACTION_STOP: Stops the process
+ * @KAPI_SIGNAL_ACTION_CONTINUE: Continues a stopped process
+ * @KAPI_SIGNAL_ACTION_CUSTOM: Custom handling described in notes
+ * @KAPI_SIGNAL_ACTION_RETURN: Returns from syscall with EINTR
+ * @KAPI_SIGNAL_ACTION_RESTART: Restarts the syscall
+ * @KAPI_SIGNAL_ACTION_QUEUE: Queues the signal for later delivery
+ * @KAPI_SIGNAL_ACTION_DISCARD: Discards the signal
+ * @KAPI_SIGNAL_ACTION_TRANSFORM: Transforms to another signal
+ */
+enum kapi_signal_action {
+ KAPI_SIGNAL_ACTION_DEFAULT = 0,
+ KAPI_SIGNAL_ACTION_TERMINATE,
+ KAPI_SIGNAL_ACTION_COREDUMP,
+ KAPI_SIGNAL_ACTION_STOP,
+ KAPI_SIGNAL_ACTION_CONTINUE,
+ KAPI_SIGNAL_ACTION_CUSTOM,
+ KAPI_SIGNAL_ACTION_RETURN,
+ KAPI_SIGNAL_ACTION_RESTART,
+ KAPI_SIGNAL_ACTION_QUEUE,
+ KAPI_SIGNAL_ACTION_DISCARD,
+ KAPI_SIGNAL_ACTION_TRANSFORM,
+};
+
+/**
+ * struct kapi_signal_spec - Signal specification
+ * @signal_num: Signal number (e.g., SIGKILL, SIGTERM)
+ * @signal_name: Signal name as string
+ * @direction: Direction flags (OR of kapi_signal_direction)
+ * @action: What happens when signal is received
+ * @target: Description of target process/thread for sent signals
+ * @condition: Condition under which signal is sent/received/handled
+ * @description: Detailed description of signal handling
+ * @restartable: Whether syscall is restartable after this signal
+ * @sa_flags_required: Required signal action flags (SA_*)
+ * @sa_flags_forbidden: Forbidden signal action flags
+ * @error_on_signal: Error code returned when signal occurs (-EINTR, etc)
+ * @transform_to: Signal number to transform to (if action is TRANSFORM)
+ * @timing: When signal can occur ("entry", "during", "exit", "anytime")
+ * @priority: Signal handling priority (lower processed first)
+ * @interruptible: Whether this operation is interruptible by this signal
+ * @queue_behavior: How signal is queued ("realtime", "standard", "coalesce")
+ * @state_required: Required process state for signal to be delivered
+ * @state_forbidden: Forbidden process state for signal delivery
+ */
+struct kapi_signal_spec {
+ int signal_num;
+ char signal_name[32];
+ u32 direction;
+ enum kapi_signal_action action;
+ char target[KAPI_MAX_DESC_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ bool restartable;
+ u32 sa_flags_required;
+ u32 sa_flags_forbidden;
+ int error_on_signal;
+ int transform_to;
+ char timing[32];
+ u8 priority;
+ bool interruptible;
+ char queue_behavior[128];
+ u32 state_required;
+ u32 state_forbidden;
+} __attribute__((packed));
+
+/**
+ * struct kapi_signal_mask_spec - Signal mask specification
+ * @mask_name: Name of the signal mask
+ * @signals: Array of signal numbers in the mask
+ * @signal_count: Number of signals in the mask
+ * @description: Description of what this mask represents
+ */
+struct kapi_signal_mask_spec {
+ char mask_name[KAPI_MAX_NAME_LEN];
+ int signals[KAPI_MAX_SIGNALS];
+ u32 signal_count;
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_field - Structure field specification
+ * @name: Field name
+ * @type: Field type classification
+ * @type_name: Type name as string
+ * @offset: Offset within structure
+ * @size: Size of field in bytes
+ * @flags: Field attribute flags
+ * @constraint_type: Type of constraint applied
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag fields)
+ * @enum_values: Comma-separated list of valid enum values (for enum types)
+ * @description: Field description
+ */
+struct kapi_struct_field {
+ char name[KAPI_MAX_NAME_LEN];
+ enum kapi_param_type type;
+ char type_name[KAPI_MAX_NAME_LEN];
+ size_t offset;
+ size_t size;
+ u32 flags;
+ enum kapi_constraint_type constraint_type;
+ s64 min_value;
+ s64 max_value;
+ u64 valid_mask;
+ char enum_values[KAPI_MAX_DESC_LEN]; /* Comma-separated list of valid enum values */
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_spec - Structure type specification
+ * @name: Structure name
+ * @size: Total size of structure
+ * @alignment: Required alignment
+ * @field_count: Number of fields
+ * @fields: Field specifications
+ * @description: Structure description
+ */
+struct kapi_struct_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ size_t size;
+ size_t alignment;
+ u32 field_count;
+ struct kapi_struct_field fields[KAPI_MAX_PARAMS];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_capability_action - What the capability allows
+ * @KAPI_CAP_BYPASS_CHECK: Bypasses a check entirely
+ * @KAPI_CAP_INCREASE_LIMIT: Increases or removes a limit
+ * @KAPI_CAP_OVERRIDE_RESTRICTION: Overrides a restriction
+ * @KAPI_CAP_GRANT_PERMISSION: Grants permission that would otherwise be denied
+ * @KAPI_CAP_MODIFY_BEHAVIOR: Changes the behavior of the operation
+ * @KAPI_CAP_ACCESS_RESOURCE: Allows access to restricted resources
+ * @KAPI_CAP_PERFORM_OPERATION: Allows performing a privileged operation
+ */
+enum kapi_capability_action {
+ KAPI_CAP_BYPASS_CHECK = 0,
+ KAPI_CAP_INCREASE_LIMIT,
+ KAPI_CAP_OVERRIDE_RESTRICTION,
+ KAPI_CAP_GRANT_PERMISSION,
+ KAPI_CAP_MODIFY_BEHAVIOR,
+ KAPI_CAP_ACCESS_RESOURCE,
+ KAPI_CAP_PERFORM_OPERATION,
+};
+
+/**
+ * struct kapi_capability_spec - Capability requirement specification
+ * @capability: The capability constant (e.g., CAP_IPC_LOCK)
+ * @cap_name: Capability name as string
+ * @action: What the capability allows (kapi_capability_action)
+ * @allows: Description of what the capability allows
+ * @without_cap: What happens without the capability
+ * @check_condition: Condition when capability is checked
+ * @priority: Check priority (lower checked first)
+ * @alternative: Alternative capabilities that can be used
+ * @alternative_count: Number of alternative capabilities
+ */
+struct kapi_capability_spec {
+ int capability;
+ char cap_name[KAPI_MAX_NAME_LEN];
+ enum kapi_capability_action action;
+ char allows[KAPI_MAX_DESC_LEN];
+ char without_cap[KAPI_MAX_DESC_LEN];
+ char check_condition[KAPI_MAX_DESC_LEN];
+ u8 priority;
+ int alternative[KAPI_MAX_CAPABILITIES];
+ u32 alternative_count;
+} __attribute__((packed));
+
+/**
+ * enum kapi_side_effect_type - Types of side effects
+ * @KAPI_EFFECT_NONE: No side effects
+ * @KAPI_EFFECT_ALLOC_MEMORY: Allocates memory
+ * @KAPI_EFFECT_FREE_MEMORY: Frees memory
+ * @KAPI_EFFECT_MODIFY_STATE: Modifies global/shared state
+ * @KAPI_EFFECT_SIGNAL_SEND: Sends signals
+ * @KAPI_EFFECT_FILE_POSITION: Modifies file position
+ * @KAPI_EFFECT_LOCK_ACQUIRE: Acquires locks
+ * @KAPI_EFFECT_LOCK_RELEASE: Releases locks
+ * @KAPI_EFFECT_RESOURCE_CREATE: Creates system resources (FDs, PIDs, etc)
+ * @KAPI_EFFECT_RESOURCE_DESTROY: Destroys system resources
+ * @KAPI_EFFECT_SCHEDULE: May cause scheduling/context switch
+ * @KAPI_EFFECT_HARDWARE: Interacts with hardware
+ * @KAPI_EFFECT_NETWORK: Network I/O operation
+ * @KAPI_EFFECT_FILESYSTEM: Filesystem modification
+ * @KAPI_EFFECT_PROCESS_STATE: Modifies process state
+ * @KAPI_EFFECT_IRREVERSIBLE: Effect cannot be undone
+ */
+enum kapi_side_effect_type {
+ KAPI_EFFECT_NONE = 0,
+ KAPI_EFFECT_ALLOC_MEMORY = (1 << 0),
+ KAPI_EFFECT_FREE_MEMORY = (1 << 1),
+ KAPI_EFFECT_MODIFY_STATE = (1 << 2),
+ KAPI_EFFECT_SIGNAL_SEND = (1 << 3),
+ KAPI_EFFECT_FILE_POSITION = (1 << 4),
+ KAPI_EFFECT_LOCK_ACQUIRE = (1 << 5),
+ KAPI_EFFECT_LOCK_RELEASE = (1 << 6),
+ KAPI_EFFECT_RESOURCE_CREATE = (1 << 7),
+ KAPI_EFFECT_RESOURCE_DESTROY = (1 << 8),
+ KAPI_EFFECT_SCHEDULE = (1 << 9),
+ KAPI_EFFECT_HARDWARE = (1 << 10),
+ KAPI_EFFECT_NETWORK = (1 << 11),
+ KAPI_EFFECT_FILESYSTEM = (1 << 12),
+ KAPI_EFFECT_PROCESS_STATE = (1 << 13),
+ KAPI_EFFECT_IRREVERSIBLE = (1 << 14),
+};
+
+/**
+ * struct kapi_side_effect - Side effect specification
+ * @type: Bitmask of effect types
+ * @target: What is affected (e.g., "process memory", "file descriptor table")
+ * @condition: Condition under which effect occurs
+ * @description: Detailed description of the effect
+ * @reversible: Whether the effect can be undone
+ */
+struct kapi_side_effect {
+ u32 type;
+ char target[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+ bool reversible;
+} __attribute__((packed));
+
+/**
+ * struct kapi_state_transition - State transition specification
+ * @from_state: Starting state description
+ * @to_state: Ending state description
+ * @condition: Condition for transition
+ * @object: Object whose state changes
+ * @description: Detailed description
+ */
+struct kapi_state_transition {
+ char from_state[KAPI_MAX_NAME_LEN];
+ char to_state[KAPI_MAX_NAME_LEN];
+ char condition[KAPI_MAX_DESC_LEN];
+ char object[KAPI_MAX_NAME_LEN];
+ char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+#define KAPI_MAX_STRUCT_SPECS 8
+#define KAPI_MAX_SIDE_EFFECTS 32
+#define KAPI_MAX_STATE_TRANS 8
+
+/**
+ * enum kapi_socket_state - Socket states for state machine
+ */
+enum kapi_socket_state {
+ KAPI_SOCK_STATE_UNSPEC = 0,
+ KAPI_SOCK_STATE_CLOSED,
+ KAPI_SOCK_STATE_OPEN,
+ KAPI_SOCK_STATE_BOUND,
+ KAPI_SOCK_STATE_LISTEN,
+ KAPI_SOCK_STATE_SYN_SENT,
+ KAPI_SOCK_STATE_SYN_RECV,
+ KAPI_SOCK_STATE_ESTABLISHED,
+ KAPI_SOCK_STATE_FIN_WAIT1,
+ KAPI_SOCK_STATE_FIN_WAIT2,
+ KAPI_SOCK_STATE_CLOSE_WAIT,
+ KAPI_SOCK_STATE_CLOSING,
+ KAPI_SOCK_STATE_LAST_ACK,
+ KAPI_SOCK_STATE_TIME_WAIT,
+ KAPI_SOCK_STATE_CONNECTED,
+ KAPI_SOCK_STATE_DISCONNECTED,
+};
+
+/**
+ * enum kapi_socket_protocol - Socket protocol types
+ */
+enum kapi_socket_protocol {
+ KAPI_PROTO_TCP = (1 << 0),
+ KAPI_PROTO_UDP = (1 << 1),
+ KAPI_PROTO_UNIX = (1 << 2),
+ KAPI_PROTO_RAW = (1 << 3),
+ KAPI_PROTO_PACKET = (1 << 4),
+ KAPI_PROTO_NETLINK = (1 << 5),
+ KAPI_PROTO_SCTP = (1 << 6),
+ KAPI_PROTO_DCCP = (1 << 7),
+ KAPI_PROTO_ALL = 0xFFFFFFFF,
+};
+
+/**
+ * enum kapi_buffer_behavior - Network buffer handling behaviors
+ */
+enum kapi_buffer_behavior {
+ KAPI_BUF_PEEK = (1 << 0),
+ KAPI_BUF_TRUNCATE = (1 << 1),
+ KAPI_BUF_SCATTER = (1 << 2),
+ KAPI_BUF_ZERO_COPY = (1 << 3),
+ KAPI_BUF_KERNEL_ALLOC = (1 << 4),
+ KAPI_BUF_DMA_CAPABLE = (1 << 5),
+ KAPI_BUF_FRAGMENT = (1 << 6),
+};
+
+/**
+ * enum kapi_async_behavior - Asynchronous operation behaviors
+ */
+enum kapi_async_behavior {
+ KAPI_ASYNC_BLOCK = 0,
+ KAPI_ASYNC_NONBLOCK = (1 << 0),
+ KAPI_ASYNC_POLL_READY = (1 << 1),
+ KAPI_ASYNC_SIGNAL_DRIVEN = (1 << 2),
+ KAPI_ASYNC_AIO = (1 << 3),
+ KAPI_ASYNC_IO_URING = (1 << 4),
+ KAPI_ASYNC_EPOLL = (1 << 5),
+};
+
+/**
+ * struct kapi_socket_state_spec - Socket state requirement/transition
+ */
+struct kapi_socket_state_spec {
+ enum kapi_socket_state required_states[KAPI_MAX_SOCKET_STATES];
+ u32 required_state_count;
+ enum kapi_socket_state forbidden_states[KAPI_MAX_SOCKET_STATES];
+ u32 forbidden_state_count;
+ enum kapi_socket_state resulting_state;
+ char state_condition[KAPI_MAX_DESC_LEN];
+ u32 applicable_protocols;
+} __attribute__((packed));
+
+/**
+ * struct kapi_protocol_behavior - Protocol-specific behavior
+ */
+struct kapi_protocol_behavior {
+ u32 applicable_protocols;
+ char behavior[KAPI_MAX_DESC_LEN];
+ s64 protocol_flags;
+ char flag_description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_buffer_spec - Network buffer specification
+ */
+struct kapi_buffer_spec {
+ u32 buffer_behaviors;
+ size_t min_buffer_size;
+ size_t max_buffer_size;
+ size_t optimal_buffer_size;
+ char fragmentation_rules[KAPI_MAX_DESC_LEN];
+ bool can_partial_transfer;
+ char partial_transfer_rules[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_async_spec - Asynchronous behavior specification
+ */
+struct kapi_async_spec {
+ enum kapi_async_behavior supported_modes;
+ int nonblock_errno;
+ u32 poll_events_in;
+ u32 poll_events_out;
+ char completion_condition[KAPI_MAX_DESC_LEN];
+ bool supports_timeout;
+ char timeout_behavior[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_addr_family_spec - Address family specification
+ */
+struct kapi_addr_family_spec {
+ int family;
+ char family_name[32];
+ size_t addr_struct_size;
+ size_t min_addr_len;
+ size_t max_addr_len;
+ char addr_format[KAPI_MAX_DESC_LEN];
+ bool supports_wildcard;
+ bool supports_multicast;
+ bool supports_broadcast;
+ char special_addresses[KAPI_MAX_DESC_LEN];
+ u32 port_range_min;
+ u32 port_range_max;
+} __attribute__((packed));
+
+/**
+ * struct kernel_api_spec - Complete kernel API specification
+ * @name: Function name
+ * @version: API version
+ * @description: Brief description
+ * @long_description: Detailed description
+ * @context_flags: Execution context flags
+ * @param_count: Number of parameters
+ * @params: Parameter specifications
+ * @return_spec: Return value specification
+ * @error_count: Number of possible errors
+ * @errors: Error specifications
+ * @lock_count: Number of lock specifications
+ * @locks: Lock requirement specifications
+ * @constraint_count: Number of additional constraints
+ * @constraints: Additional constraint specifications
+ * @examples: Usage examples
+ * @notes: Additional notes
+ * @since_version: Kernel version when introduced
+ * @signal_count: Number of signal specifications
+ * @signals: Signal handling specifications
+ * @signal_mask_count: Number of signal mask specifications
+ * @signal_masks: Signal mask specifications
+ * @struct_spec_count: Number of structure specifications
+ * @struct_specs: Structure type specifications
+ * @side_effect_count: Number of side effect specifications
+ * @side_effects: Side effect specifications
+ * @state_trans_count: Number of state transition specifications
+ * @state_transitions: State transition specifications
+ */
+struct kernel_api_spec {
+ char name[KAPI_MAX_NAME_LEN];
+ u32 version;
+ char description[KAPI_MAX_DESC_LEN];
+ char long_description[KAPI_MAX_DESC_LEN * 4];
+ u32 context_flags;
+
+ /* Parameters */
+ u32 param_magic; /* 0x4B415031 = 'KAP1' */
+ u32 param_count;
+ struct kapi_param_spec params[KAPI_MAX_PARAMS];
+
+ /* Return value */
+ u32 return_magic; /* 0x4B415232 = 'KAR2' */
+ struct kapi_return_spec return_spec;
+
+ /* Errors */
+ u32 error_magic; /* 0x4B414533 = 'KAE3' */
+ u32 error_count;
+ struct kapi_error_spec errors[KAPI_MAX_ERRORS];
+
+ /* Locking */
+ u32 lock_magic; /* 0x4B414C34 = 'KAL4' */
+ u32 lock_count;
+ struct kapi_lock_spec locks[KAPI_MAX_CONSTRAINTS];
+
+ /* Constraints */
+ u32 constraint_magic; /* 0x4B414335 = 'KAC5' */
+ u32 constraint_count;
+ struct kapi_constraint_spec constraints[KAPI_MAX_CONSTRAINTS];
+
+ /* Additional information */
+ u32 info_magic; /* 0x4B414936 = 'KAI6' */
+ char examples[KAPI_MAX_DESC_LEN * 2];
+ char notes[KAPI_MAX_DESC_LEN * 2];
+ char since_version[32];
+
+ /* Signal specifications */
+ u32 signal_magic; /* 0x4B415337 = 'KAS7' */
+ u32 signal_count;
+ struct kapi_signal_spec signals[KAPI_MAX_SIGNALS];
+
+ /* Signal mask specifications */
+ u32 sigmask_magic; /* 0x4B414D38 = 'KAM8' */
+ u32 signal_mask_count;
+ struct kapi_signal_mask_spec signal_masks[KAPI_MAX_SIGNALS];
+
+ /* Structure specifications */
+ u32 struct_magic; /* 0x4B415439 = 'KAT9' */
+ u32 struct_spec_count;
+ struct kapi_struct_spec struct_specs[KAPI_MAX_STRUCT_SPECS];
+
+ /* Side effects */
+ u32 effect_magic; /* 0x4B414641 = 'KAFA' */
+ u32 side_effect_count;
+ struct kapi_side_effect side_effects[KAPI_MAX_SIDE_EFFECTS];
+
+ /* State transitions */
+ u32 trans_magic; /* 0x4B415442 = 'KATB' */
+ u32 state_trans_count;
+ struct kapi_state_transition state_transitions[KAPI_MAX_STATE_TRANS];
+
+ /* Capability specifications */
+ u32 cap_magic; /* 0x4B414343 = 'KACC' */
+ u32 capability_count;
+ struct kapi_capability_spec capabilities[KAPI_MAX_CAPABILITIES];
+
+ /* Extended fields for socket and network operations */
+ struct kapi_socket_state_spec socket_state;
+ struct kapi_protocol_behavior protocol_behaviors[KAPI_MAX_PROTOCOL_BEHAVIORS];
+ u32 protocol_behavior_count;
+ struct kapi_buffer_spec buffer_spec;
+ struct kapi_async_spec async_spec;
+ struct kapi_addr_family_spec addr_families[KAPI_MAX_ADDR_FAMILIES];
+ u32 addr_family_count;
+
+ /* Operation characteristics */
+ bool is_connection_oriented;
+ bool is_message_oriented;
+ bool supports_oob_data;
+ bool supports_peek;
+ bool supports_select_poll;
+ bool is_reentrant;
+
+ /* Semantic descriptions */
+ char connection_establishment[KAPI_MAX_DESC_LEN];
+ char connection_termination[KAPI_MAX_DESC_LEN];
+ char data_transfer_semantics[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/* Macros for defining API specifications */
+
+/**
+ * DEFINE_KERNEL_API_SPEC - Define a kernel API specification
+ * @func_name: Function name to specify
+ */
+#define DEFINE_KERNEL_API_SPEC(func_name) \
+ static struct kernel_api_spec __kapi_spec_##func_name \
+ __used __section(".kapi_specs") = { \
+ .name = __stringify(func_name), \
+ .version = 1,
+
+#define KAPI_END_SPEC };
+
+/**
+ * KAPI_DESCRIPTION - Set API description
+ * @desc: Description string
+ */
+#define KAPI_DESCRIPTION(desc) \
+ .description = desc,
+
+/**
+ * KAPI_LONG_DESC - Set detailed API description
+ * @desc: Detailed description string
+ */
+#define KAPI_LONG_DESC(desc) \
+ .long_description = desc,
+
+/**
+ * KAPI_CONTEXT - Set execution context flags
+ * @flags: Context flags (OR'ed KAPI_CTX_* values)
+ */
+#define KAPI_CONTEXT(flags) \
+ .context_flags = flags,
+
+/**
+ * KAPI_PARAM - Define a parameter specification
+ * @idx: Parameter index (0-based)
+ * @pname: Parameter name
+ * @ptype: Type name string
+ * @pdesc: Parameter description
+ */
+#define KAPI_PARAM(idx, pname, ptype, pdesc) \
+ .params[idx] = { \
+ .name = pname, \
+ .type_name = ptype, \
+ .description = pdesc, \
+ .size_param_idx = -1, /* Default: no dynamic sizing */
+
+#define KAPI_PARAM_TYPE(ptype) \
+ .type = ptype,
+
+#define KAPI_PARAM_FLAGS(pflags) \
+ .flags = pflags,
+
+#define KAPI_PARAM_SIZE(psize) \
+ .size = psize,
+
+#define KAPI_PARAM_RANGE(pmin, pmax) \
+ .min_value = pmin, \
+ .max_value = pmax,
+
+#define KAPI_PARAM_CONSTRAINT_TYPE(ctype) \
+ .constraint_type = ctype,
+
+#define KAPI_PARAM_CONSTRAINT(desc) \
+ .constraints = desc,
+
+#define KAPI_PARAM_VALID_MASK(mask) \
+ .valid_mask = mask,
+
+#define KAPI_PARAM_ENUM_VALUES(values) \
+ .enum_values = values, \
+ .enum_count = ARRAY_SIZE(values),
+
+#define KAPI_PARAM_ALIGNMENT(align) \
+ .alignment = align,
+
+#define KAPI_PARAM_SIZE_PARAM(idx) \
+ .size_param_idx = idx,
+
+#define KAPI_PARAM_END },
+
+/**
+ * KAPI_PARAM_COUNT - Set the number of parameters
+ * @n: Number of parameters
+ */
+#define KAPI_PARAM_COUNT(n) \
+ .param_magic = KAPI_MAGIC_PARAMS, \
+ .param_count = n,
+
+/**
+ * KAPI_RETURN - Define return value specification
+ * @rtype: Return type name
+ * @rdesc: Return value description
+ */
+#define KAPI_RETURN(rtype, rdesc) \
+ .return_spec = { \
+ .type_name = rtype, \
+ .description = rdesc,
+
+#define KAPI_RETURN_SUCCESS(val) \
+ .success_value = val,
+
+#define KAPI_RETURN_TYPE(rtype) \
+ .type = rtype,
+
+#define KAPI_RETURN_CHECK_TYPE(ctype) \
+ .check_type = ctype,
+
+#define KAPI_RETURN_ERROR_VALUES(values) \
+ .error_values = values,
+
+#define KAPI_RETURN_ERROR_COUNT(count) \
+ .error_count = count,
+
+#define KAPI_RETURN_SUCCESS_RANGE(min, max) \
+ .success_min = min, \
+ .success_max = max,
+
+#define KAPI_RETURN_END },
+
+/**
+ * KAPI_ERROR - Define an error condition
+ * @idx: Error index
+ * @ecode: Error code value
+ * @ename: Error name
+ * @econd: Error condition
+ * @edesc: Error description
+ */
+#define KAPI_ERROR(idx, ecode, ename, econd, edesc) \
+ .errors[idx] = { \
+ .error_code = ecode, \
+ .name = ename, \
+ .condition = econd, \
+ .description = edesc, \
+ },
+
+/**
+ * KAPI_ERROR_COUNT - Set the number of errors
+ * @n: Number of errors
+ */
+#define KAPI_ERROR_COUNT(n) \
+ .error_magic = KAPI_MAGIC_ERRORS, \
+ .error_count = n,
+
+/**
+ * KAPI_LOCK - Define a lock requirement
+ * @idx: Lock index
+ * @lname: Lock name
+ * @ltype: Lock type
+ */
+#define KAPI_LOCK(idx, lname, ltype) \
+ .locks[idx] = { \
+ .lock_name = lname, \
+ .lock_type = ltype,
+
+#define KAPI_LOCK_ACQUIRED \
+ .acquired = true,
+
+#define KAPI_LOCK_RELEASED \
+ .released = true,
+
+#define KAPI_LOCK_HELD_ENTRY \
+ .held_on_entry = true,
+
+#define KAPI_LOCK_HELD_EXIT \
+ .held_on_exit = true,
+
+#define KAPI_LOCK_DESC(ldesc) \
+ .description = ldesc,
+
+#define KAPI_LOCK_END },
+
+/**
+ * KAPI_CONSTRAINT - Define an additional constraint
+ * @idx: Constraint index
+ * @cname: Constraint name
+ * @cdesc: Constraint description
+ */
+#define KAPI_CONSTRAINT(idx, cname, cdesc) \
+ .constraints[idx] = { \
+ .name = cname, \
+ .description = cdesc,
+
+#define KAPI_CONSTRAINT_EXPR(expr) \
+ .expression = expr,
+
+#define KAPI_CONSTRAINT_END },
+
+/**
+ * KAPI_EXAMPLES - Set API usage examples
+ * @examples: Examples string
+ */
+#define KAPI_EXAMPLES(ex) \
+ .info_magic = KAPI_MAGIC_INFO, \
+ .examples = ex,
+
+/**
+ * KAPI_NOTES - Set API notes
+ * @notes: Notes string
+ */
+#define KAPI_NOTES(n) \
+ .notes = n,
+
+
+/**
+ * KAPI_SIGNAL - Define a signal specification
+ * @idx: Signal index
+ * @signum: Signal number (e.g., SIGKILL)
+ * @signame: Signal name string
+ * @dir: Direction flags
+ * @act: Action taken
+ */
+#define KAPI_SIGNAL(idx, signum, signame, dir, act) \
+ .signals[idx] = { \
+ .signal_num = signum, \
+ .signal_name = signame, \
+ .direction = dir, \
+ .action = act,
+
+#define KAPI_SIGNAL_TARGET(tgt) \
+ .target = tgt,
+
+#define KAPI_SIGNAL_CONDITION(cond) \
+ .condition = cond,
+
+#define KAPI_SIGNAL_DESC(desc) \
+ .description = desc,
+
+#define KAPI_SIGNAL_RESTARTABLE \
+ .restartable = true,
+
+#define KAPI_SIGNAL_SA_FLAGS_REQ(flags) \
+ .sa_flags_required = flags,
+
+#define KAPI_SIGNAL_SA_FLAGS_FORBID(flags) \
+ .sa_flags_forbidden = flags,
+
+#define KAPI_SIGNAL_ERROR(err) \
+ .error_on_signal = err,
+
+#define KAPI_SIGNAL_TRANSFORM(sig) \
+ .transform_to = sig,
+
+#define KAPI_SIGNAL_TIMING(when) \
+ .timing = when,
+
+#define KAPI_SIGNAL_PRIORITY(prio) \
+ .priority = prio,
+
+#define KAPI_SIGNAL_INTERRUPTIBLE \
+ .interruptible = true,
+
+#define KAPI_SIGNAL_QUEUE(behavior) \
+ .queue_behavior = behavior,
+
+#define KAPI_SIGNAL_STATE_REQ(state) \
+ .state_required = state,
+
+#define KAPI_SIGNAL_STATE_FORBID(state) \
+ .state_forbidden = state,
+
+#define KAPI_SIGNAL_END },
+
+#define KAPI_SIGNAL_COUNT(n) \
+ .signal_magic = KAPI_MAGIC_SIGNALS, \
+ .signal_count = n,
+
+/**
+ * KAPI_SIGNAL_MASK - Define a signal mask specification
+ * @idx: Mask index
+ * @name: Mask name
+ * @desc: Mask description
+ */
+#define KAPI_SIGNAL_MASK(idx, name, desc) \
+ .signal_masks[idx] = { \
+ .mask_name = name, \
+ .description = desc,
+
+/*
+ * KAPI_SIGNAL_MASK_SIGNALS - Specify signals in a signal mask
+ * @...: Variadic list of signal numbers
+ *
+ * Usage:
+ * KAPI_SIGNAL_MASK(0, "blocked", "Signals blocked during operation")
+ * KAPI_SIGNAL_MASK_SIGNALS(SIGINT, SIGTERM, SIGQUIT)
+ * KAPI_SIGNAL_MASK_END
+ */
+#define KAPI_SIGNAL_MASK_SIGNALS(...) \
+ .signals = { __VA_ARGS__ }, \
+ .signal_count = sizeof((int[]){ __VA_ARGS__ }) / sizeof(int),
+
+#define KAPI_SIGNAL_MASK_END },
+
+/**
+ * KAPI_STRUCT_SPEC - Define a structure specification
+ * @idx: Structure spec index
+ * @sname: Structure name
+ * @sdesc: Structure description
+ */
+#define KAPI_STRUCT_SPEC(idx, sname, sdesc) \
+ .struct_specs[idx] = { \
+ .name = #sname, \
+ .description = sdesc,
+
+#define KAPI_STRUCT_SIZE(ssize, salign) \
+ .size = ssize, \
+ .alignment = salign,
+
+#define KAPI_STRUCT_FIELD_COUNT(n) \
+ .field_count = n,
+
+/**
+ * KAPI_STRUCT_FIELD - Define a structure field
+ * @fidx: Field index
+ * @fname: Field name
+ * @ftype: Field type (KAPI_TYPE_*)
+ * @ftype_name: Type name as string
+ * @fdesc: Field description
+ */
+#define KAPI_STRUCT_FIELD(fidx, fname, ftype, ftype_name, fdesc) \
+ .fields[fidx] = { \
+ .name = fname, \
+ .type = ftype, \
+ .type_name = ftype_name, \
+ .description = fdesc,
+
+#define KAPI_FIELD_OFFSET(foffset) \
+ .offset = foffset,
+
+#define KAPI_FIELD_SIZE(fsize) \
+ .size = fsize,
+
+#define KAPI_FIELD_FLAGS(fflags) \
+ .flags = fflags,
+
+#define KAPI_FIELD_CONSTRAINT_RANGE(min, max) \
+ .constraint_type = KAPI_CONSTRAINT_RANGE, \
+ .min_value = min, \
+ .max_value = max,
+
+#define KAPI_FIELD_CONSTRAINT_MASK(mask) \
+ .constraint_type = KAPI_CONSTRAINT_MASK, \
+ .valid_mask = mask,
+
+#define KAPI_FIELD_CONSTRAINT_ENUM(values) \
+ .constraint_type = KAPI_CONSTRAINT_ENUM, \
+ .enum_values = values,
+
+#define KAPI_STRUCT_FIELD_END },
+
+#define KAPI_STRUCT_SPEC_END },
+
+/* Counter for structure specifications */
+#define KAPI_STRUCT_SPEC_COUNT(n) \
+ .struct_magic = KAPI_MAGIC_STRUCTS, \
+ .struct_spec_count = n,
+
+/* Additional lock-related macros */
+#define KAPI_LOCK_COUNT(n) \
+ .lock_magic = KAPI_MAGIC_LOCKS, \
+ .lock_count = n,
+
+/**
+ * KAPI_SIDE_EFFECT - Define a side effect
+ * @idx: Side effect index
+ * @etype: Effect type bitmask (OR'ed KAPI_EFFECT_* values)
+ * @etarget: What is affected
+ * @edesc: Effect description
+ */
+#define KAPI_SIDE_EFFECT(idx, etype, etarget, edesc) \
+ .side_effects[idx] = { \
+ .type = etype, \
+ .target = etarget, \
+ .description = edesc, \
+ .reversible = false, /* Default to non-reversible */
+
+#define KAPI_EFFECT_CONDITION(cond) \
+ .condition = cond,
+
+#define KAPI_EFFECT_REVERSIBLE \
+ .reversible = true,
+
+#define KAPI_SIDE_EFFECT_END },
+
+/**
+ * KAPI_STATE_TRANS - Define a state transition
+ * @idx: State transition index
+ * @obj: Object whose state changes
+ * @from: From state
+ * @to: To state
+ * @desc: Transition description
+ */
+#define KAPI_STATE_TRANS(idx, obj, from, to, desc) \
+ .state_transitions[idx] = { \
+ .object = obj, \
+ .from_state = from, \
+ .to_state = to, \
+ .description = desc,
+
+#define KAPI_STATE_TRANS_COND(cond) \
+ .condition = cond,
+
+#define KAPI_STATE_TRANS_END },
+
+/* Counters for side effects and state transitions */
+#define KAPI_SIDE_EFFECT_COUNT(n) \
+ .effect_magic = KAPI_MAGIC_EFFECTS, \
+ .side_effect_count = n,
+
+#define KAPI_STATE_TRANS_COUNT(n) \
+ .trans_magic = KAPI_MAGIC_TRANS, \
+ .state_trans_count = n,
+
+/* Helper macros for common side effect patterns */
+#define KAPI_EFFECTS_MEMORY (KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_MEMORY)
+#define KAPI_EFFECTS_LOCKING (KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_RELEASE)
+#define KAPI_EFFECTS_RESOURCES (KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_RESOURCE_DESTROY)
+#define KAPI_EFFECTS_IO (KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM)
+
+/*
+ * Helper macros for combining common parameter flag patterns.
+ * Note: KAPI_PARAM_IN, KAPI_PARAM_OUT, KAPI_PARAM_INOUT, and KAPI_PARAM_OPTIONAL
+ * are already defined in enum kapi_param_flags - use those directly.
+ */
+#define KAPI_PARAM_FLAGS_INOUT (KAPI_PARAM_IN | KAPI_PARAM_OUT)
+#define KAPI_PARAM_FLAGS_USER (KAPI_PARAM_USER | KAPI_PARAM_IN)
+
+/* Common signal timing constants */
+#define KAPI_SIGNAL_TIME_ENTRY "entry"
+#define KAPI_SIGNAL_TIME_DURING "during"
+#define KAPI_SIGNAL_TIME_EXIT "exit"
+#define KAPI_SIGNAL_TIME_ANYTIME "anytime"
+#define KAPI_SIGNAL_TIME_BLOCKING "while_blocked"
+#define KAPI_SIGNAL_TIME_SLEEPING "while_sleeping"
+#define KAPI_SIGNAL_TIME_BEFORE "before"
+#define KAPI_SIGNAL_TIME_AFTER "after"
+
+/* Common signal queue behaviors */
+#define KAPI_SIGNAL_QUEUE_STANDARD "standard"
+#define KAPI_SIGNAL_QUEUE_REALTIME "realtime"
+#define KAPI_SIGNAL_QUEUE_COALESCE "coalesce"
+#define KAPI_SIGNAL_QUEUE_REPLACE "replace"
+#define KAPI_SIGNAL_QUEUE_DISCARD "discard"
+
+/* Process state flags for signal delivery */
+#define KAPI_SIGNAL_STATE_RUNNING (1 << 0)
+#define KAPI_SIGNAL_STATE_SLEEPING (1 << 1)
+#define KAPI_SIGNAL_STATE_STOPPED (1 << 2)
+#define KAPI_SIGNAL_STATE_TRACED (1 << 3)
+#define KAPI_SIGNAL_STATE_ZOMBIE (1 << 4)
+#define KAPI_SIGNAL_STATE_DEAD (1 << 5)
+
+/* Capability specification macros */
+
+/**
+ * KAPI_CAPABILITY - Define a capability requirement
+ * @idx: Capability index
+ * @cap: Capability constant (e.g., CAP_IPC_LOCK)
+ * @name: Capability name string
+ * @act: Action type (kapi_capability_action)
+ */
+#define KAPI_CAPABILITY(idx, cap, name, act) \
+ .capabilities[idx] = { \
+ .capability = cap, \
+ .cap_name = name, \
+ .action = act,
+
+#define KAPI_CAP_ALLOWS(desc) \
+ .allows = desc,
+
+#define KAPI_CAP_WITHOUT(desc) \
+ .without_cap = desc,
+
+#define KAPI_CAP_CONDITION(cond) \
+ .check_condition = cond,
+
+#define KAPI_CAP_PRIORITY(prio) \
+ .priority = prio,
+
+#define KAPI_CAP_ALTERNATIVE(caps, count) \
+ .alternative = caps, \
+ .alternative_count = count,
+
+#define KAPI_CAPABILITY_END },
+
+/* Counter for capability specifications */
+#define KAPI_CAPABILITY_COUNT(n) \
+ .cap_magic = KAPI_MAGIC_CAPS, \
+ .capability_count = n,
+
+/* Common signal patterns for syscalls */
+#define KAPI_SIGNAL_INTERRUPTIBLE_SLEEP \
+ KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+ KAPI_SIGNAL_ERROR(-EINTR) \
+ KAPI_SIGNAL_RESTARTABLE \
+ KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+ KAPI_SIGNAL_END, \
+ KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_RETURN) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+ KAPI_SIGNAL_ERROR(-EINTR) \
+ KAPI_SIGNAL_RESTARTABLE \
+ KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+ KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_FATAL_DEFAULT \
+ KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_TERMINATE) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+ KAPI_SIGNAL_PRIORITY(0) \
+ KAPI_SIGNAL_DESC("Process terminated immediately") \
+ KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_STOP_CONT \
+ KAPI_SIGNAL(3, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_STOP) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+ KAPI_SIGNAL_DESC("Process stopped") \
+ KAPI_SIGNAL_END, \
+ KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_CONTINUE) \
+ KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+ KAPI_SIGNAL_DESC("Process continued") \
+ KAPI_SIGNAL_END
+
+/* Validation and runtime checking */
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+bool kapi_validate_params(const struct kernel_api_spec *spec, ...);
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value);
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count);
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value);
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count);
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval);
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval);
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval);
+void kapi_check_context(const struct kernel_api_spec *spec);
+void kapi_check_locks(const struct kernel_api_spec *spec);
+bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int signum);
+bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
+ struct sigaction *act);
+int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum);
+bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum);
+#else
+static inline bool kapi_validate_params(const struct kernel_api_spec *spec, ...)
+{
+ return true;
+}
+static inline bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+ return true;
+}
+static inline bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count)
+{
+ return true;
+}
+static inline int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value)
+{
+ return 0;
+}
+static inline int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count)
+{
+ return 0;
+}
+static inline bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+ return true;
+}
+static inline bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+ return true;
+}
+static inline int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+ return 0;
+}
+static inline void kapi_check_context(const struct kernel_api_spec *spec) {}
+static inline void kapi_check_locks(const struct kernel_api_spec *spec) {}
+static inline bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int signum)
+{
+ return true;
+}
+static inline bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int signum,
+ struct sigaction *act)
+{
+ return true;
+}
+static inline int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum)
+{
+ return -EINTR;
+}
+static inline bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int signum)
+{
+ return false;
+}
+#endif
+
+/* Export/query functions */
+const struct kernel_api_spec *kapi_get_spec(const char *name);
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size);
+void kapi_print_spec(const struct kernel_api_spec *spec);
+
+/* Registration for dynamic APIs */
+int kapi_register_spec(struct kernel_api_spec *spec);
+void kapi_unregister_spec(const char *name);
+
+/* Helper to get parameter constraint info */
+static inline bool kapi_get_param_constraint(const char *api_name, int param_idx,
+ enum kapi_constraint_type *type,
+ u64 *valid_mask, s64 *min_val, s64 *max_val)
+{
+ const struct kernel_api_spec *spec = kapi_get_spec(api_name);
+
+ if (!spec || param_idx >= spec->param_count)
+ return false;
+
+ if (type)
+ *type = spec->params[param_idx].constraint_type;
+ if (valid_mask)
+ *valid_mask = spec->params[param_idx].valid_mask;
+ if (min_val)
+ *min_val = spec->params[param_idx].min_value;
+ if (max_val)
+ *max_val = spec->params[param_idx].max_value;
+
+ return true;
+}
+
+/* Socket state requirement macros */
+#define KAPI_SOCKET_STATE_REQ(...) \
+ .socket_state = { \
+ .required_states = { __VA_ARGS__ }, \
+ .required_state_count = sizeof((enum kapi_socket_state[]){__VA_ARGS__})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_FORBID(...) \
+ .forbidden_states = { __VA_ARGS__ }, \
+ .forbidden_state_count = sizeof((enum kapi_socket_state[]){__VA_ARGS__})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_RESULT(state) \
+ .resulting_state = state,
+
+#define KAPI_SOCKET_STATE_COND(cond) \
+ .state_condition = cond,
+
+#define KAPI_SOCKET_STATE_PROTOS(protos) \
+ .applicable_protocols = protos,
+
+#define KAPI_SOCKET_STATE_END },
+
+/* Protocol behavior macros */
+#define KAPI_PROTOCOL_BEHAVIOR(idx, protos, desc) \
+ .protocol_behaviors[idx] = { \
+ .applicable_protocols = protos, \
+ .behavior = desc,
+
+#define KAPI_PROTOCOL_FLAGS(flags, desc) \
+ .protocol_flags = flags, \
+ .flag_description = desc,
+
+#define KAPI_PROTOCOL_BEHAVIOR_END },
+
+/* Async behavior macros */
+#define KAPI_ASYNC_SPEC(modes, errno) \
+ .async_spec = { \
+ .supported_modes = modes, \
+ .nonblock_errno = errno,
+
+#define KAPI_ASYNC_POLL(in, out) \
+ .poll_events_in = in, \
+ .poll_events_out = out,
+
+#define KAPI_ASYNC_COMPLETION(cond) \
+ .completion_condition = cond,
+
+#define KAPI_ASYNC_TIMEOUT(supported, desc) \
+ .supports_timeout = supported, \
+ .timeout_behavior = desc,
+
+#define KAPI_ASYNC_END },
+
+/* Buffer behavior macros */
+#define KAPI_BUFFER_SPEC(behaviors) \
+ .buffer_spec = { \
+ .buffer_behaviors = behaviors,
+
+#define KAPI_BUFFER_SIZE(min, max, optimal) \
+ .min_buffer_size = min, \
+ .max_buffer_size = max, \
+ .optimal_buffer_size = optimal,
+
+#define KAPI_BUFFER_PARTIAL(allowed, rules) \
+ .can_partial_transfer = allowed, \
+ .partial_transfer_rules = rules,
+
+#define KAPI_BUFFER_FRAGMENT(rules) \
+ .fragmentation_rules = rules,
+
+#define KAPI_BUFFER_END },
+
+/* Address family macros */
+#define KAPI_ADDR_FAMILY(idx, fam, name, struct_sz, min_len, max_len) \
+ .addr_families[idx] = { \
+ .family = fam, \
+ .family_name = name, \
+ .addr_struct_size = struct_sz, \
+ .min_addr_len = min_len, \
+ .max_addr_len = max_len,
+
+#define KAPI_ADDR_FORMAT(fmt) \
+ .addr_format = fmt,
+
+#define KAPI_ADDR_FEATURES(wildcard, multicast, broadcast) \
+ .supports_wildcard = wildcard, \
+ .supports_multicast = multicast, \
+ .supports_broadcast = broadcast,
+
+#define KAPI_ADDR_SPECIAL(addrs) \
+ .special_addresses = addrs,
+
+#define KAPI_ADDR_PORTS(min, max) \
+ .port_range_min = min, \
+ .port_range_max = max,
+
+#define KAPI_ADDR_FAMILY_END },
+
+#define KAPI_ADDR_FAMILY_COUNT(n) \
+ .addr_family_count = n,
+
+#define KAPI_PROTOCOL_BEHAVIOR_COUNT(n) \
+ .protocol_behavior_count = n,
+
+#define KAPI_CONSTRAINT_COUNT(n) \
+ .constraint_magic = KAPI_MAGIC_CONSTRAINTS, \
+ .constraint_count = n,
+
+/* Network operation characteristics macros */
+#define KAPI_NET_CONNECTION_ORIENTED \
+ .is_connection_oriented = true,
+
+#define KAPI_NET_MESSAGE_ORIENTED \
+ .is_message_oriented = true,
+
+#define KAPI_NET_SUPPORTS_OOB \
+ .supports_oob_data = true,
+
+#define KAPI_NET_SUPPORTS_PEEK \
+ .supports_peek = true,
+
+#define KAPI_NET_REENTRANT \
+ .is_reentrant = true,
+
+/* Semantic description macros */
+#define KAPI_NET_CONN_ESTABLISH(desc) \
+ .connection_establishment = desc,
+
+#define KAPI_NET_CONN_TERMINATE(desc) \
+ .connection_termination = desc,
+
+#define KAPI_NET_DATA_TRANSFER(desc) \
+ .data_transfer_semantics = desc,
+
+#endif /* _LINUX_KERNEL_API_SPEC_H */
diff --git a/include/linux/syscall_api_spec.h b/include/linux/syscall_api_spec.h
new file mode 100644
index 0000000000000..b7f9ba0f978ab
--- /dev/null
+++ b/include/linux/syscall_api_spec.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_api_spec.h - System Call API Specification Integration
+ *
+ * This header extends the SYSCALL_DEFINEX macros to support inline API specifications,
+ * allowing syscall documentation to be written alongside the implementation in a
+ * human-readable and machine-parseable format.
+ */
+
+#ifndef _LINUX_SYSCALL_API_SPEC_H
+#define _LINUX_SYSCALL_API_SPEC_H
+
+#include <linux/kernel_api_spec.h>
+
+
+
+/* Automatic syscall validation infrastructure */
+/*
+ * The validation is now integrated directly into the SYSCALL_DEFINEx macros
+ * in syscalls.h when CONFIG_KAPI_RUNTIME_CHECKS is enabled.
+ *
+ * The validation happens in the __do_kapi_sys##name wrapper function which:
+ * 1. Validates all parameters before calling the actual syscall
+ * 2. Calls the real syscall implementation
+ * 3. Validates the return value
+ * 4. Returns the result
+ */
+
+
+/*
+ * Helper macros for common syscall patterns
+ */
+
+/* For syscalls that can sleep */
+#define KAPI_SYSCALL_SLEEPABLE \
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+/* For syscalls that must be atomic */
+#define KAPI_SYSCALL_ATOMIC \
+ KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC)
+
+/* Common parameter specifications */
+#define KAPI_PARAM_FD(idx, desc) \
+ KAPI_PARAM(idx, "fd", "int", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_FD, \
+ .constraint_type = KAPI_CONSTRAINT_NONE, \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_BUF(idx, name, desc) \
+ KAPI_PARAM(idx, name, "void __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_STRUCT - Define a userspace struct pointer parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @struct_type: The struct type (e.g., struct iocb)
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a struct.
+ * The pointer will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The memory region of sizeof(struct_type) bytes is accessible
+ */
+#define KAPI_PARAM_USER_STRUCT(idx, name, struct_type, desc) \
+ KAPI_PARAM(idx, name, #struct_type " __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size = sizeof(struct_type), \
+ .constraint_type = KAPI_CONSTRAINT_USER_PTR, \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_PTR_SIZED - Define a userspace pointer with explicit size
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @ptr_size: Size in bytes of the memory region
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a memory
+ * region of a specific size. The pointer will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The memory region of ptr_size bytes is accessible
+ */
+#define KAPI_PARAM_USER_PTR_SIZED(idx, name, ptr_size, desc) \
+ KAPI_PARAM(idx, name, "void __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .size = ptr_size, \
+ .constraint_type = KAPI_CONSTRAINT_USER_PTR, \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_STRING - Define a userspace null-terminated string parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @min_len: Minimum string length (excluding null terminator)
+ * @max_len: Maximum string length (excluding null terminator)
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a
+ * null-terminated string. The string will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The string length (excluding null terminator) is within [min_len, max_len]
+ */
+#define KAPI_PARAM_USER_STRING(idx, name, min_len, max_len, desc) \
+ KAPI_PARAM(idx, name, "const char __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_USER_PTR, \
+ .constraint_type = KAPI_CONSTRAINT_USER_STRING, \
+ .min_value = min_len, \
+ .max_value = max_len, \
+ KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_PATH - Define a userspace pathname parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a
+ * null-terminated pathname string. The path will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The path is a valid null-terminated string
+ * - The path length does not exceed PATH_MAX (4096 bytes)
+ */
+#define KAPI_PARAM_USER_PATH(idx, name, desc) \
+ KAPI_PARAM(idx, name, "const char __user *", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+ .type = KAPI_TYPE_PATH, \
+ .constraint_type = KAPI_CONSTRAINT_USER_PATH, \
+ KAPI_PARAM_END
+
+#define KAPI_PARAM_SIZE_T(idx, name, desc) \
+ KAPI_PARAM(idx, name, "size_t", desc) \
+ KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+ KAPI_PARAM_RANGE(0, SIZE_MAX) \
+ KAPI_PARAM_END
+
+/* Common error specifications */
+#define KAPI_ERROR_EBADF(idx) \
+ KAPI_ERROR(idx, -EBADF, "EBADF", "Invalid file descriptor", \
+ "The file descriptor is not valid or has been closed")
+
+#define KAPI_ERROR_EINVAL(idx, condition) \
+ KAPI_ERROR(idx, -EINVAL, "EINVAL", condition, \
+ "Invalid argument provided")
+
+#define KAPI_ERROR_ENOMEM(idx) \
+ KAPI_ERROR(idx, -ENOMEM, "ENOMEM", "Insufficient memory", \
+ "Cannot allocate memory for the operation")
+
+#define KAPI_ERROR_EPERM(idx) \
+ KAPI_ERROR(idx, -EPERM, "EPERM", "Operation not permitted", \
+ "The calling process does not have the required permissions")
+
+#define KAPI_ERROR_EFAULT(idx) \
+ KAPI_ERROR(idx, -EFAULT, "EFAULT", "Bad address", \
+ "Invalid user space address provided")
+
+/* Standard return value specifications */
+#define KAPI_RETURN_SUCCESS_ZERO \
+ KAPI_RETURN("long", "0 on success, negative error code on failure") \
+ KAPI_RETURN_SUCCESS(0, "== 0") \
+ KAPI_RETURN_END
+
+#define KAPI_RETURN_FD_SPEC \
+ KAPI_RETURN("long", "File descriptor on success, negative error code on failure") \
+ .check_type = KAPI_RETURN_FD, \
+ KAPI_RETURN_END
+
+#define KAPI_RETURN_COUNT \
+ KAPI_RETURN("long", "Number of bytes processed on success, negative error code on failure") \
+ KAPI_RETURN_SUCCESS(0, ">= 0") \
+ KAPI_RETURN_END
+
+/* KAPI_ERROR_COUNT and KAPI_PARAM_COUNT are now defined in kernel_api_spec.h */
+
+/**
+ * KAPI_SINCE_VERSION - Set the since version
+ * @version: Version string when the API was introduced
+ */
+#define KAPI_SINCE_VERSION(version) \
+ .since_version = version,
+
+
+/**
+ * KAPI_SIGNAL_MASK_COUNT - Set the signal mask count
+ * @count: Number of signal masks defined
+ */
+#define KAPI_SIGNAL_MASK_COUNT(count) \
+ .signal_mask_count = count,
+
+
+
+#endif /* _LINUX_SYSCALL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cf84d98964b2f..eafda2f509999 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -89,6 +89,7 @@ struct file_attr;
#include <linux/bug.h>
#include <linux/sem.h>
#include <asm/siginfo.h>
+#include <linux/syscall_api_spec.h>
#include <linux/unistd.h>
#include <linux/quota.h>
#include <linux/key.h>
@@ -134,6 +135,7 @@ struct file_attr;
#define __SC_TYPE(t, a) t
#define __SC_ARGS(t, a) a
#define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof(t) > sizeof(long))
+#define __SC_CAST_TO_S64(t, a) (s64)(a)
#ifdef CONFIG_FTRACE_SYSCALLS
#define __SC_STR_ADECL(t, a) #a
@@ -244,6 +246,41 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
* done within __do_sys_*().
*/
#ifndef __SYSCALL_DEFINEx
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+#define __SYSCALL_DEFINEx(x, name, ...) \
+ __diag_push(); \
+ __diag_ignore(GCC, 8, "-Wattribute-alias", \
+ "Type aliasing is used to sanitize syscall arguments");\
+ asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
+ __attribute__((alias(__stringify(__se_sys##name)))); \
+ ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
+ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
+ static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
+ asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
+ asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
+ { \
+ long ret = __do_kapi_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
+ __MAP(x,__SC_TEST,__VA_ARGS__); \
+ __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
+ return ret; \
+ } \
+ __diag_pop(); \
+ static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
+ { \
+ const struct kernel_api_spec *__spec = kapi_get_spec("sys_" #name); \
+ if (__spec) { \
+ s64 __params[x] = { __MAP(x,__SC_CAST_TO_S64,__VA_ARGS__) }; \
+ int __ret = kapi_validate_syscall_params(__spec, __params, x); \
+ if (__ret) return __ret; \
+ } \
+ long ret = __do_sys##name(__MAP(x,__SC_ARGS,__VA_ARGS__)); \
+ if (__spec) { \
+ kapi_validate_syscall_return(__spec, (s64)ret); \
+ } \
+ return ret; \
+ } \
+ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
#define __SYSCALL_DEFINEx(x, name, ...) \
__diag_push(); \
__diag_ignore(GCC, 8, "-Wattribute-alias", \
@@ -262,6 +299,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
} \
__diag_pop(); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
#endif /* __SYSCALL_DEFINEx */
/* For split 64-bit arguments on 32-bit architectures */
diff --git a/init/Kconfig b/init/Kconfig
index fa79feb8fe57b..6a2a88de3aa84 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2191,6 +2191,8 @@ source "kernel/Kconfig.kexec"
source "kernel/liveupdate/Kconfig"
+source "kernel/api/Kconfig"
+
endmenu # General setup
source "arch/Kconfig"
diff --git a/kernel/Makefile b/kernel/Makefile
index e83669841b8cc..0531ed6973619 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -57,6 +57,9 @@ obj-y += dma/
obj-y += entry/
obj-y += unwind/
obj-$(CONFIG_MODULES) += module/
+obj-$(CONFIG_KAPI_SPEC) += api/
+# Ensure api/ is always cleaned even when CONFIG_KAPI_SPEC is not set
+obj- += api/
obj-$(CONFIG_KCMP) += kcmp.o
obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
new file mode 100644
index 0000000000000..fde25ec70e134
--- /dev/null
+++ b/kernel/api/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel API Specification Framework Configuration
+#
+
+config KAPI_SPEC
+ bool "Kernel API Specification Framework"
+ help
+ This option enables the kernel API specification framework,
+ which provides formal documentation of kernel APIs in both
+ human and machine-readable formats.
+
+ The framework allows developers to document APIs inline with
+ their implementation, including parameter specifications,
+ return values, error conditions, locking requirements, and
+ execution context constraints.
+
+ When enabled, API specifications can be queried at runtime
+ and exported in various formats (JSON, XML) through debugfs.
+
+ If unsure, say N.
+
+config KAPI_RUNTIME_CHECKS
+ bool "Runtime API specification checks"
+ depends on KAPI_SPEC
+ depends on DEBUG_KERNEL
+ help
+ Enable runtime validation of API usage against specifications.
+ This includes checking execution context requirements, parameter
+ validation, and lock state verification.
+
+ This adds overhead and should only be used for debugging and
+ development. The checks use WARN_ONCE to report violations.
+
+ If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
new file mode 100644
index 0000000000000..acab17c78afa3
--- /dev/null
+++ b/kernel/api/Makefile
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Kernel API Specification Framework
+#
+
+# Core API specification framework
+obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
+
+# Auto-generated API specifications collector
+ifeq ($(CONFIG_KAPI_SPEC),y)
+obj-$(CONFIG_KAPI_SPEC) += generated_api_specs.o
+
+# Find all potential apispec files (this is evaluated at make time)
+apispec-files := $(shell find $(objtree) -name "*.apispec.h" -type f 2>/dev/null)
+
+# Generate the collector file
+# Note: FORCE ensures this is always regenerated to pick up new apispec files
+$(obj)/generated_api_specs.c: $(srctree)/scripts/generate_api_specs.sh FORCE
+ $(Q)$(CONFIG_SHELL) $< $(srctree) $(objtree) > $@
+
+targets += generated_api_specs.c
+
+# Add explicit dependency on the generator script
+$(obj)/generated_api_specs.o: $(obj)/generated_api_specs.c
+endif
+
+# Always clean generated files, regardless of config
+clean-files += generated_api_specs.c
+
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
new file mode 100644
index 0000000000000..252000db38d2c
--- /dev/null
+++ b/kernel/api/kernel_api_spec.c
@@ -0,0 +1,1185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel_api_spec.c - Kernel API Specification Framework Implementation
+ *
+ * Provides runtime support for kernel API specifications including validation,
+ * export to various formats, and querying capabilities.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/export.h>
+#include <linux/preempt.h>
+#include <linux/hardirq.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <linux/limits.h>
+#include <linux/fcntl.h>
+#include <linux/mm.h>
+
+/* Section where API specifications are stored */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+/* Dynamic API registration */
+static LIST_HEAD(dynamic_api_specs);
+static DEFINE_MUTEX(api_spec_mutex);
+
+struct dynamic_api_spec {
+ struct list_head list;
+ struct kernel_api_spec *spec;
+};
+
+/*
+ * __kapi_find_spec_locked - Internal lookup, caller must hold api_spec_mutex
+ */
+static const struct kernel_api_spec *__kapi_find_spec_locked(const char *name)
+{
+ struct kernel_api_spec *spec;
+ struct dynamic_api_spec *dyn_spec;
+
+ /* Search static specifications */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ if (strcmp(spec->name, name) == 0)
+ return spec;
+ }
+
+ /* Search dynamic specifications (mutex already held) */
+ list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+ if (strcmp(dyn_spec->spec->name, name) == 0)
+ return dyn_spec->spec;
+ }
+
+ return NULL;
+}
+
+/**
+ * kapi_get_spec - Get API specification by name
+ * @name: Function name to look up
+ *
+ * Return: Pointer to API specification or NULL if not found
+ */
+const struct kernel_api_spec *kapi_get_spec(const char *name)
+{
+ const struct kernel_api_spec *spec;
+
+ mutex_lock(&api_spec_mutex);
+ spec = __kapi_find_spec_locked(name);
+ mutex_unlock(&api_spec_mutex);
+
+ return spec;
+}
+EXPORT_SYMBOL_GPL(kapi_get_spec);
+
+/**
+ * kapi_register_spec - Register a dynamic API specification
+ * @spec: API specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_spec(struct kernel_api_spec *spec)
+{
+ struct dynamic_api_spec *dyn_spec;
+ int ret = 0;
+
+ if (!spec || !spec->name[0])
+ return -EINVAL;
+
+ dyn_spec = kzalloc(sizeof(*dyn_spec), GFP_KERNEL);
+ if (!dyn_spec)
+ return -ENOMEM;
+
+ dyn_spec->spec = spec;
+
+ mutex_lock(&api_spec_mutex);
+
+ /* Check if already exists while holding lock to prevent races */
+ if (__kapi_find_spec_locked(spec->name)) {
+ ret = -EEXIST;
+ kfree(dyn_spec);
+ } else {
+ list_add_tail(&dyn_spec->list, &dynamic_api_specs);
+ }
+
+ mutex_unlock(&api_spec_mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_register_spec);
+
+/**
+ * kapi_unregister_spec - Unregister a dynamic API specification
+ * @name: Name of API to unregister
+ */
+void kapi_unregister_spec(const char *name)
+{
+ struct dynamic_api_spec *dyn_spec, *tmp;
+
+ mutex_lock(&api_spec_mutex);
+ list_for_each_entry_safe(dyn_spec, tmp, &dynamic_api_specs, list) {
+ if (strcmp(dyn_spec->spec->name, name) == 0) {
+ list_del(&dyn_spec->list);
+ kfree(dyn_spec);
+ break;
+ }
+ }
+ mutex_unlock(&api_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_spec);
+
+/**
+ * param_type_to_string - Convert parameter type to string
+ * @type: Parameter type
+ *
+ * Return: String representation of type
+ */
+static const char *param_type_to_string(enum kapi_param_type type)
+{
+ static const char * const type_names[] = {
+ [KAPI_TYPE_VOID] = "void",
+ [KAPI_TYPE_INT] = "int",
+ [KAPI_TYPE_UINT] = "uint",
+ [KAPI_TYPE_PTR] = "pointer",
+ [KAPI_TYPE_STRUCT] = "struct",
+ [KAPI_TYPE_UNION] = "union",
+ [KAPI_TYPE_ENUM] = "enum",
+ [KAPI_TYPE_FUNC_PTR] = "function_pointer",
+ [KAPI_TYPE_ARRAY] = "array",
+ [KAPI_TYPE_FD] = "file_descriptor",
+ [KAPI_TYPE_USER_PTR] = "user_pointer",
+ [KAPI_TYPE_PATH] = "pathname",
+ [KAPI_TYPE_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(type_names))
+ return "unknown";
+
+ return type_names[type];
+}
+
+/**
+ * lock_type_to_string - Convert lock type to string
+ * @type: Lock type
+ *
+ * Return: String representation of lock type
+ */
+static const char *lock_type_to_string(enum kapi_lock_type type)
+{
+ static const char * const lock_names[] = {
+ [KAPI_LOCK_NONE] = "none",
+ [KAPI_LOCK_MUTEX] = "mutex",
+ [KAPI_LOCK_SPINLOCK] = "spinlock",
+ [KAPI_LOCK_RWLOCK] = "rwlock",
+ [KAPI_LOCK_SEQLOCK] = "seqlock",
+ [KAPI_LOCK_RCU] = "rcu",
+ [KAPI_LOCK_SEMAPHORE] = "semaphore",
+ [KAPI_LOCK_CUSTOM] = "custom",
+ };
+
+ if (type >= ARRAY_SIZE(lock_names))
+ return "unknown";
+
+ return lock_names[type];
+}
+
+/**
+ * lock_scope_to_string - Convert lock scope to string
+ * @scope: Lock scope
+ *
+ * Return: String representation of lock scope
+ */
+static const char *lock_scope_to_string(enum kapi_lock_scope scope)
+{
+ static const char * const scope_names[] = {
+ [KAPI_LOCK_INTERNAL] = "internal",
+ [KAPI_LOCK_ACQUIRES] = "acquires",
+ [KAPI_LOCK_RELEASES] = "releases",
+ [KAPI_LOCK_CALLER_HELD] = "caller_held",
+ };
+
+ if (scope >= ARRAY_SIZE(scope_names))
+ return "unknown";
+
+ return scope_names[scope];
+}
+
+/**
+ * return_check_type_to_string - Convert return check type to string
+ * @type: Return check type
+ *
+ * Return: String representation of return check type
+ */
+static const char *return_check_type_to_string(enum kapi_return_check_type type)
+{
+ static const char * const check_names[] = {
+ [KAPI_RETURN_EXACT] = "exact",
+ [KAPI_RETURN_RANGE] = "range",
+ [KAPI_RETURN_ERROR_CHECK] = "error_check",
+ [KAPI_RETURN_FD] = "file_descriptor",
+ [KAPI_RETURN_CUSTOM] = "custom",
+ [KAPI_RETURN_NO_RETURN] = "no_return",
+ };
+
+ if (type >= ARRAY_SIZE(check_names))
+ return "unknown";
+
+ return check_names[type];
+}
+
+/**
+ * capability_action_to_string - Convert capability action to string
+ * @action: Capability action
+ *
+ * Return: String representation of capability action
+ */
+static const char *capability_action_to_string(enum kapi_capability_action action)
+{
+ static const char * const action_names[] = {
+ [KAPI_CAP_BYPASS_CHECK] = "bypass_check",
+ [KAPI_CAP_INCREASE_LIMIT] = "increase_limit",
+ [KAPI_CAP_OVERRIDE_RESTRICTION] = "override_restriction",
+ [KAPI_CAP_GRANT_PERMISSION] = "grant_permission",
+ [KAPI_CAP_MODIFY_BEHAVIOR] = "modify_behavior",
+ [KAPI_CAP_ACCESS_RESOURCE] = "access_resource",
+ [KAPI_CAP_PERFORM_OPERATION] = "perform_operation",
+ };
+
+ if (action >= ARRAY_SIZE(action_names))
+ return "unknown";
+
+ return action_names[action];
+}
+
+/**
+ * kapi_export_json - Export API specification to JSON format
+ * @spec: API specification to export
+ * @buf: Buffer to write JSON to
+ * @size: Size of buffer
+ *
+ * Return: Number of bytes written or negative error
+ */
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t size)
+{
+ int ret = 0;
+ int i;
+
+ if (!spec || !buf || size == 0)
+ return -EINVAL;
+
+ ret = scnprintf(buf, size,
+ "{\n"
+ " \"name\": \"%s\",\n"
+ " \"version\": %u,\n"
+ " \"description\": \"%s\",\n"
+ " \"long_description\": \"%s\",\n"
+ " \"context_flags\": \"0x%x\",\n",
+ spec->name,
+ spec->version,
+ spec->description,
+ spec->long_description,
+ spec->context_flags);
+
+ /* Parameters */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"parameters\": [\n");
+
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"name\": \"%s\",\n"
+ " \"type\": \"%s\",\n"
+ " \"type_class\": \"%s\",\n"
+ " \"flags\": \"0x%x\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ param->name,
+ param->type_name,
+ param_type_to_string(param->type),
+ param->flags,
+ param->description,
+ (i < spec->param_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Return value */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"return\": {\n"
+ " \"type\": \"%s\",\n"
+ " \"type_class\": \"%s\",\n"
+ " \"check_type\": \"%s\",\n",
+ spec->return_spec.type_name,
+ param_type_to_string(spec->return_spec.type),
+ return_check_type_to_string(spec->return_spec.check_type));
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"success_value\": %lld,\n",
+ spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"success_min\": %lld,\n"
+ " \"success_max\": %lld,\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ ret += scnprintf(buf + ret, size - ret,
+ " \"error_count\": %u,\n",
+ spec->return_spec.error_count);
+ break;
+ default:
+ break;
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ " \"description\": \"%s\"\n"
+ " },\n",
+ spec->return_spec.description);
+
+ /* Errors */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"errors\": [\n");
+
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"code\": %d,\n"
+ " \"name\": \"%s\",\n"
+ " \"condition\": \"%s\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ error->error_code,
+ error->name,
+ error->condition,
+ error->description,
+ (i < spec->error_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Locks */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"locks\": [\n");
+
+ for (i = 0; i < spec->lock_count && i < KAPI_MAX_CONSTRAINTS; i++) {
+ const struct kapi_lock_spec *lock = &spec->locks[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"name\": \"%s\",\n"
+ " \"type\": \"%s\",\n"
+ " \"scope\": \"%s\",\n"
+ " \"description\": \"%s\"\n"
+ " }%s\n",
+ lock->lock_name,
+ lock_type_to_string(lock->lock_type),
+ lock_scope_to_string(lock->scope),
+ lock->description,
+ (i < spec->lock_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Capabilities */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"capabilities\": [\n");
+
+ for (i = 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i++) {
+ const struct kapi_capability_spec *cap = &spec->capabilities[i];
+
+ ret += scnprintf(buf + ret, size - ret,
+ " {\n"
+ " \"capability\": %d,\n"
+ " \"name\": \"%s\",\n"
+ " \"action\": \"%s\",\n"
+ " \"allows\": \"%s\",\n"
+ " \"without_cap\": \"%s\",\n"
+ " \"check_condition\": \"%s\",\n"
+ " \"priority\": %u",
+ cap->capability,
+ cap->cap_name,
+ capability_action_to_string(cap->action),
+ cap->allows,
+ cap->without_cap,
+ cap->check_condition,
+ cap->priority);
+
+ if (cap->alternative_count > 0) {
+ int j;
+ ret += scnprintf(buf + ret, size - ret,
+ ",\n \"alternatives\": [");
+ for (j = 0; j < cap->alternative_count; j++) {
+ ret += scnprintf(buf + ret, size - ret,
+ "%d%s", cap->alternative[j],
+ (j < cap->alternative_count - 1) ? ", " : "");
+ }
+ ret += scnprintf(buf + ret, size - ret, "]");
+ }
+
+ ret += scnprintf(buf + ret, size - ret,
+ "\n }%s\n",
+ (i < spec->capability_count - 1) ? "," : "");
+ }
+
+ ret += scnprintf(buf + ret, size - ret, " ],\n");
+
+ /* Additional info */
+ ret += scnprintf(buf + ret, size - ret,
+ " \"since_version\": \"%s\",\n"
+ " \"examples\": \"%s\",\n"
+ " \"notes\": \"%s\"\n"
+ "}\n",
+ spec->since_version,
+ spec->examples,
+ spec->notes);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_export_json);
+
+
+/**
+ * kapi_print_spec - Print API specification to kernel log
+ * @spec: API specification to print
+ */
+void kapi_print_spec(const struct kernel_api_spec *spec)
+{
+ int i;
+
+ if (!spec)
+ return;
+
+ pr_info("=== Kernel API Specification ===\n");
+ pr_info("Name: %s\n", spec->name);
+ pr_info("Version: %u\n", spec->version);
+ pr_info("Description: %s\n", spec->description);
+
+ if (spec->long_description[0])
+ pr_info("Long Description: %s\n", spec->long_description);
+
+ pr_info("Context Flags: 0x%x\n", spec->context_flags);
+
+ /* Parameters */
+ if (spec->param_count > 0) {
+ pr_info("Parameters:\n");
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param = &spec->params[i];
+ pr_info(" [%d] %s: %s (flags: 0x%x)\n",
+ i, param->name, param->type_name, param->flags);
+ if (param->description[0])
+ pr_info(" Description: %s\n", param->description);
+ }
+ }
+
+ /* Return value */
+ pr_info("Return: %s\n", spec->return_spec.type_name);
+ if (spec->return_spec.description[0])
+ pr_info(" Description: %s\n", spec->return_spec.description);
+
+ /* Errors */
+ if (spec->error_count > 0) {
+ pr_info("Possible Errors:\n");
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ const struct kapi_error_spec *error = &spec->errors[i];
+ pr_info(" %s (%d): %s\n",
+ error->name, error->error_code, error->condition);
+ }
+ }
+
+ /* Capabilities */
+ if (spec->capability_count > 0) {
+ pr_info("Capabilities:\n");
+ for (i = 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i++) {
+ const struct kapi_capability_spec *cap = &spec->capabilities[i];
+ pr_info(" %s (%d):\n", cap->cap_name, cap->capability);
+ pr_info(" Action: %s\n", capability_action_to_string(cap->action));
+ pr_info(" Allows: %s\n", cap->allows);
+ pr_info(" Without: %s\n", cap->without_cap);
+ if (cap->check_condition[0])
+ pr_info(" Condition: %s\n", cap->check_condition);
+ }
+ }
+
+ pr_info("================================\n");
+}
+EXPORT_SYMBOL_GPL(kapi_print_spec);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_fd - Validate that a file descriptor is valid in current context
+ * @fd: File descriptor to validate
+ *
+ * Return: true if fd is valid in current process context, false otherwise
+ */
+static bool kapi_validate_fd(int fd)
+{
+ struct fd f;
+
+ /* Special case: AT_FDCWD is always valid */
+ if (fd == AT_FDCWD)
+ return true;
+
+ /* Check basic range */
+ if (fd < 0)
+ return false;
+
+ /* Check if fd is valid in current process context */
+ f = fdget(fd);
+ if (fd_empty(f)) {
+ return false;
+ }
+
+ /* fd is valid, release reference */
+ fdput(f);
+ return true;
+}
+
+/**
+ * kapi_validate_user_ptr - Validate that a user pointer is accessible
+ * @ptr: User pointer to validate
+ * @size: Size in bytes to validate
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr(const void __user *ptr, size_t size)
+{
+ /* NULL pointers are not valid; caller handles optional case */
+ if (!ptr)
+ return false;
+
+ return access_ok(ptr, size);
+}
+
+/**
+ * kapi_validate_user_ptr_with_params - Validate user pointer with dynamic size
+ * @param_spec: Parameter specification
+ * @ptr: User pointer to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr_with_params(const struct kapi_param_spec *param_spec,
+ const void __user *ptr,
+ const s64 *all_params,
+ int param_count)
+{
+ size_t actual_size;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ /* Calculate actual size based on related parameter */
+ if (param_spec->size_param_idx >= 0 &&
+ param_spec->size_param_idx < param_count) {
+ s64 count = all_params[param_spec->size_param_idx];
+
+ /* Validate count is positive */
+ if (count <= 0) {
+ pr_warn("Parameter %s: size determinant is non-positive (%lld)\n",
+ param_spec->name, count);
+ return false;
+ }
+
+ /* Check for multiplication overflow */
+ if (param_spec->size_multiplier > 0 &&
+ count > SIZE_MAX / param_spec->size_multiplier) {
+ pr_warn("Parameter %s: size calculation overflow\n",
+ param_spec->name);
+ return false;
+ }
+
+ actual_size = count * param_spec->size_multiplier;
+ } else {
+ /* Use fixed size */
+ actual_size = param_spec->size;
+ }
+
+ return kapi_validate_user_ptr(ptr, actual_size);
+}
+
+/**
+ * kapi_validate_path - Validate that a pathname is accessible and within limits
+ * @path: User pointer to pathname
+ * @param_spec: Parameter specification
+ *
+ * Return: true if path is valid, false otherwise
+ */
+static bool kapi_validate_path(const char __user *path,
+ const struct kapi_param_spec *param_spec)
+{
+ size_t len;
+
+ /* NULL is allowed for optional parameters */
+ if (!path && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!path) {
+ pr_warn("Parameter %s: NULL path not allowed\n", param_spec->name);
+ return false;
+ }
+
+ /* Check if the path is accessible */
+ if (!access_ok(path, 1)) {
+ pr_warn("Parameter %s: path pointer %p not accessible\n",
+ param_spec->name, path);
+ return false;
+ }
+
+ /* Use strnlen_user to get the length and validate accessibility */
+ len = strnlen_user(path, PATH_MAX + 1);
+ if (len == 0) {
+ pr_warn("Parameter %s: invalid path pointer %p\n",
+ param_spec->name, path);
+ return false;
+ }
+
+ /* Check path length limit */
+ if (len > PATH_MAX) {
+ pr_warn("Parameter %s: path too long (exceeds PATH_MAX)\n",
+ param_spec->name);
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * kapi_validate_user_string - Validate a userspace null-terminated string
+ * @str: User pointer to string
+ * @param_spec: Parameter specification containing length constraints
+ *
+ * Validates that the userspace string pointer is accessible and that the
+ * string length (excluding null terminator) is within the range specified
+ * by min_value and max_value in the parameter specification.
+ *
+ * Return: true if string is valid, false otherwise
+ */
+static bool kapi_validate_user_string(const char __user *str,
+ const struct kapi_param_spec *param_spec)
+{
+ size_t len;
+ size_t max_check_len;
+
+ /* NULL is allowed for optional parameters */
+ if (!str && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!str) {
+ pr_warn("Parameter %s: NULL string not allowed\n", param_spec->name);
+ return false;
+ }
+
+ /* Check if the string pointer is accessible */
+ if (!access_ok(str, 1)) {
+ pr_warn("Parameter %s: string pointer %p not accessible\n",
+ param_spec->name, str);
+ return false;
+ }
+
+ /*
+ * Use strnlen_user to get the string length and validate accessibility.
+ * Check up to max_value + 1 to detect strings that are too long.
+ * If max_value is 0 or unset, use PATH_MAX as a reasonable default.
+ */
+ max_check_len = param_spec->max_value > 0 ?
+ (size_t)param_spec->max_value + 1 : PATH_MAX + 1;
+ len = strnlen_user(str, max_check_len);
+
+ if (len == 0) {
+ pr_warn("Parameter %s: invalid string pointer %p\n",
+ param_spec->name, str);
+ return false;
+ }
+
+ /*
+ * strnlen_user returns the length including the null terminator.
+ * Convert to string length (excluding terminator) for range check.
+ */
+ len--;
+
+ /* Check minimum length constraint */
+ if (param_spec->min_value > 0 && len < (size_t)param_spec->min_value) {
+ pr_warn("Parameter %s: string too short (%zu < %lld)\n",
+ param_spec->name, len, param_spec->min_value);
+ return false;
+ }
+
+ /* Check maximum length constraint */
+ if (param_spec->max_value > 0 && len > (size_t)param_spec->max_value) {
+ pr_warn("Parameter %s: string too long (%zu > %lld)\n",
+ param_spec->name, len, param_spec->max_value);
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * kapi_validate_user_ptr_constraint - Validate a userspace pointer with size
+ * @ptr: User pointer to validate
+ * @param_spec: Parameter specification containing size
+ *
+ * Validates that the userspace pointer is accessible and that the memory
+ * region of the specified size can be accessed. The size is taken from
+ * the param_spec->size field.
+ *
+ * Return: true if pointer is valid, false otherwise
+ */
+static bool kapi_validate_user_ptr_constraint(const void __user *ptr,
+ const struct kapi_param_spec *param_spec)
+{
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!ptr) {
+ pr_warn("Parameter %s: NULL pointer not allowed\n", param_spec->name);
+ return false;
+ }
+
+ /* Validate size is specified */
+ if (param_spec->size == 0) {
+ pr_warn("Parameter %s: size not specified for user pointer validation\n",
+ param_spec->name);
+ return false;
+ }
+
+ /* Check if the memory region is accessible */
+ if (!access_ok(ptr, param_spec->size)) {
+ pr_warn("Parameter %s: user pointer %p not accessible for %zu bytes\n",
+ param_spec->name, ptr, param_spec->size);
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * kapi_validate_param - Validate a parameter against its specification
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 value)
+{
+ int i;
+
+ /* Special handling for file descriptor type */
+ if (param_spec->type == KAPI_TYPE_FD) {
+ if (!kapi_validate_fd((int)value)) {
+ pr_warn("Parameter %s: invalid file descriptor %lld\n",
+ param_spec->name, value);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* Special handling for user pointer type */
+ if (param_spec->type == KAPI_TYPE_USER_PTR) {
+ const void __user *ptr = (const void __user *)value;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!kapi_validate_user_ptr(ptr, param_spec->size)) {
+ pr_warn("Parameter %s: invalid user pointer %p (size: %zu)\n",
+ param_spec->name, ptr, param_spec->size);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* Special handling for path type */
+ if (param_spec->type == KAPI_TYPE_PATH) {
+ const char __user *path = (const char __user *)value;
+
+ if (!kapi_validate_path(path, param_spec)) {
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ switch (param_spec->constraint_type) {
+ case KAPI_CONSTRAINT_NONE:
+ return true;
+
+ case KAPI_CONSTRAINT_RANGE:
+ if (value < param_spec->min_value || value > param_spec->max_value) {
+ pr_warn("Parameter %s value %lld out of range [%lld, %lld]\n",
+ param_spec->name, value,
+ param_spec->min_value, param_spec->max_value);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_MASK:
+ if (value & ~param_spec->valid_mask) {
+ pr_warn("Parameter %s value 0x%llx contains invalid bits (valid mask: 0x%llx)\n",
+ param_spec->name, value, param_spec->valid_mask);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_ENUM:
+ if (!param_spec->enum_values || param_spec->enum_count == 0)
+ return true;
+
+ for (i = 0; i < param_spec->enum_count; i++) {
+ if (value == param_spec->enum_values[i])
+ return true;
+ }
+ pr_warn("Parameter %s value %lld not in valid enumeration\n",
+ param_spec->name, value);
+ return false;
+
+ case KAPI_CONSTRAINT_ALIGNMENT:
+ if (param_spec->alignment == 0) {
+ pr_warn("Parameter %s: alignment constraint specified but alignment is 0\n",
+ param_spec->name);
+ return false;
+ }
+ if (value & (param_spec->alignment - 1)) {
+ pr_warn("Parameter %s value 0x%llx not aligned to %zu boundary\n",
+ param_spec->name, value, param_spec->alignment);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_POWER_OF_TWO:
+ if (value == 0 || (value & (value - 1))) {
+ pr_warn("Parameter %s value %lld is not a power of two\n",
+ param_spec->name, value);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_PAGE_ALIGNED:
+ if (value & (PAGE_SIZE - 1)) {
+ pr_warn("Parameter %s value 0x%llx not page-aligned (PAGE_SIZE=%ld)\n",
+ param_spec->name, value, PAGE_SIZE);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_NONZERO:
+ if (value == 0) {
+ pr_warn("Parameter %s must be non-zero\n", param_spec->name);
+ return false;
+ }
+ return true;
+
+ case KAPI_CONSTRAINT_USER_STRING:
+ return kapi_validate_user_string((const char __user *)value, param_spec);
+
+ case KAPI_CONSTRAINT_USER_PATH:
+ return kapi_validate_path((const char __user *)value, param_spec);
+
+ case KAPI_CONSTRAINT_USER_PTR:
+ return kapi_validate_user_ptr_constraint((const void __user *)value, param_spec);
+
+ case KAPI_CONSTRAINT_CUSTOM:
+ if (param_spec->validate)
+ return param_spec->validate(value);
+ return true;
+
+ default:
+ return true;
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param);
+
+/**
+ * kapi_validate_param_with_context - Validate parameter with access to all params
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_spec,
+ s64 value, const s64 *all_params, int param_count)
+{
+ /* Special handling for user pointer type with dynamic sizing */
+ if (param_spec->type == KAPI_TYPE_USER_PTR) {
+ const void __user *ptr = (const void __user *)value;
+
+ /* NULL is allowed for optional parameters */
+ if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+ return true;
+
+ if (!kapi_validate_user_ptr_with_params(param_spec, ptr, all_params, param_count)) {
+ pr_warn("Parameter %s: invalid user pointer %p\n",
+ param_spec->name, ptr);
+ return false;
+ }
+ /* Continue with additional constraint checks if needed */
+ }
+
+ /* For other types, fall back to regular validation */
+ return kapi_validate_param(param_spec, value);
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param_with_context);
+
+/**
+ * kapi_validate_syscall_param - Validate syscall parameter with enforcement
+ * @spec: API specification
+ * @param_idx: Parameter index
+ * @value: Parameter value
+ *
+ * Return: -EINVAL if invalid, 0 if valid
+ */
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+ int param_idx, s64 value)
+{
+ const struct kapi_param_spec *param_spec;
+
+ if (!spec || param_idx >= spec->param_count)
+ return 0;
+
+ param_spec = &spec->params[param_idx];
+
+ if (!kapi_validate_param(param_spec, value)) {
+ if (strncmp(spec->name, "sys_", 4) == 0) {
+ /* For syscalls, we can return EINVAL to userspace */
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_param);
+
+/**
+ * kapi_validate_syscall_params - Validate all syscall parameters together
+ * @spec: API specification
+ * @params: Array of parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: -EINVAL if any parameter is invalid, 0 if all valid
+ */
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+ const s64 *params, int param_count)
+{
+ int i;
+
+ if (!spec || !params)
+ return 0;
+
+ /* Validate that we have the expected number of parameters */
+ if (param_count != spec->param_count) {
+ pr_warn("API %s: parameter count mismatch (expected %u, got %d)\n",
+ spec->name, spec->param_count, param_count);
+ return -EINVAL;
+ }
+
+ /* Validate each parameter with context */
+ for (i = 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+ const struct kapi_param_spec *param_spec = &spec->params[i];
+
+ if (!kapi_validate_param_with_context(param_spec, params[i], params, param_count)) {
+ if (strncmp(spec->name, "sys_", 4) == 0) {
+ /* For syscalls, we can return EINVAL to userspace */
+ return -EINVAL;
+ }
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_params);
+
+/**
+ * kapi_check_return_success - Check if return value indicates success
+ * @return_spec: Return specification
+ * @retval: Return value to check
+ *
+ * Returns true if the return value indicates success according to the spec.
+ */
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec, s64 retval)
+{
+ u32 i;
+
+ if (!return_spec)
+ return true; /* No spec means we can't validate */
+
+ switch (return_spec->check_type) {
+ case KAPI_RETURN_EXACT:
+ return retval == return_spec->success_value;
+
+ case KAPI_RETURN_RANGE:
+ return retval >= return_spec->success_min &&
+ retval <= return_spec->success_max;
+
+ case KAPI_RETURN_ERROR_CHECK:
+ /* Success if NOT in error list */
+ if (return_spec->error_values) {
+ for (i = 0; i < return_spec->error_count; i++) {
+ if (retval == return_spec->error_values[i])
+ return false; /* Found in error list */
+ }
+ }
+ return true; /* Not in error list = success */
+
+ case KAPI_RETURN_FD:
+ /* File descriptors: >= 0 is success, < 0 is error */
+ return retval >= 0;
+
+ case KAPI_RETURN_CUSTOM:
+ if (return_spec->is_success)
+ return return_spec->is_success(retval);
+ fallthrough;
+
+ default:
+ return true; /* Unknown check type, assume success */
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_check_return_success);
+
+/**
+ * kapi_validate_return_value - Validate that return value matches spec
+ * @spec: API specification
+ * @retval: Return value to validate
+ *
+ * Return: true if return value is valid according to spec, false otherwise.
+ *
+ * This function checks:
+ * 1. If the value indicates success, it must match the success criteria
+ * 2. If the value indicates error, it must be one of the specified error codes
+ */
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 retval)
+{
+ int i;
+ bool is_success;
+
+ if (!spec)
+ return true; /* No spec means we can't validate */
+
+ /* First check if this is a success return */
+ is_success = kapi_check_return_success(&spec->return_spec, retval);
+
+ if (is_success) {
+ /* Special validation for file descriptor returns */
+ if (spec->return_spec.check_type == KAPI_RETURN_FD) {
+ /* For successful FD returns, validate it's a valid FD */
+ if (!kapi_validate_fd((int)retval)) {
+ pr_warn("API %s returned invalid file descriptor %lld\n",
+ spec->name, retval);
+ return false;
+ }
+ }
+ return true;
+ }
+
+ /* Error case - check if it's one of the specified errors */
+ if (spec->error_count == 0) {
+ /* No errors specified, so any error is potentially valid */
+ pr_debug("API %s returned unspecified error %lld\n",
+ spec->name, retval);
+ return true;
+ }
+
+ /* Check if the error is in our list of specified errors */
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ if (retval == spec->errors[i].error_code)
+ return true;
+ }
+
+ /* Error not in spec */
+ pr_warn("API %s returned unspecified error code %lld. Valid errors are:\n",
+ spec->name, retval);
+ for (i = 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+ pr_warn(" %s (%d): %s\n",
+ spec->errors[i].name,
+ spec->errors[i].error_code,
+ spec->errors[i].condition);
+ }
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_return_value);
+
+/**
+ * kapi_validate_syscall_return - Validate syscall return value with enforcement
+ * @spec: API specification
+ * @retval: Return value
+ *
+ * Return: 0 if valid, -EINVAL if the return value doesn't match spec
+ *
+ * For syscalls, this can help detect kernel bugs where unspecified error
+ * codes are returned to userspace.
+ */
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 retval)
+{
+ if (!spec)
+ return 0;
+
+ if (!kapi_validate_return_value(spec, retval)) {
+ /* Log the violation but don't change the return value */
+ WARN_ONCE(1, "Syscall %s returned unspecified value %lld\n",
+ spec->name, retval);
+ /* Could return -EINVAL here to enforce, but that might break userspace */
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_return);
+
+/**
+ * kapi_check_context - Check if current context matches API requirements
+ * @spec: API specification to check against
+ */
+void kapi_check_context(const struct kernel_api_spec *spec)
+{
+ u32 ctx = spec->context_flags;
+ bool valid = false;
+
+ if (!ctx)
+ return;
+
+ /* Check if we're in an allowed context */
+ if ((ctx & KAPI_CTX_PROCESS) && !in_interrupt())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_SOFTIRQ) && in_softirq())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_HARDIRQ) && in_hardirq())
+ valid = true;
+
+ if ((ctx & KAPI_CTX_NMI) && in_nmi())
+ valid = true;
+
+ if (!valid) {
+ WARN_ONCE(1, "API %s called from invalid context\n", spec->name);
+ }
+
+ /* Check specific requirements */
+ if ((ctx & KAPI_CTX_ATOMIC) && preemptible()) {
+ WARN_ONCE(1, "API %s requires atomic context\n", spec->name);
+ }
+
+ if ((ctx & KAPI_CTX_SLEEPABLE) && !preemptible()) {
+ WARN_ONCE(1, "API %s requires sleepable context\n", spec->name);
+ }
+}
+EXPORT_SYMBOL_GPL(kapi_check_context);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh
new file mode 100755
index 0000000000000..2c3078a508fef
--- /dev/null
+++ b/scripts/generate_api_specs.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Stub script for generating API specifications collector
+# This is a placeholder until the full implementation is available
+#
+
+cat << 'EOF'
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Auto-generated API specifications collector (stub)
+ * Generated by scripts/generate_api_specs.sh
+ */
+
+#include <linux/kernel_api_spec.h>
+
+/* No API specifications collected yet */
+EOF
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 02/15] kernel/api: enable kerneldoc-based API specifications
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 01/15] kernel/api: introduce kernel API specification framework Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 03/15] kernel/api: add debugfs interface for kernel " Sasha Levin
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
This patch adds support for extracting API specifications from
kernel-doc comments and generating C macro invocations for the
kernel API specification framework.
Changes include:
- New kdoc_apispec.py module for generating API spec macros
- Updates to kernel-doc.py to support -apispec output format
- Build system integration in Makefile.build
- Generator script for collecting all API specifications
- Support for API-specific sections in kernel-doc comments
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
scripts/Makefile.build | 28 +
scripts/Makefile.clean | 3 +
scripts/generate_api_specs.sh | 83 ++-
scripts/kernel-doc.py | 5 +
tools/lib/python/kdoc/kdoc_apispec.py | 755 ++++++++++++++++++++++++++
tools/lib/python/kdoc/kdoc_output.py | 9 +-
tools/lib/python/kdoc/kdoc_parser.py | 86 ++-
7 files changed, 957 insertions(+), 12 deletions(-)
create mode 100644 tools/lib/python/kdoc/kdoc_apispec.py
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 52c08c4eb0b9a..7a192d29a01f6 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -172,6 +172,34 @@ ifneq ($(KBUILD_EXTRA_WARN),)
$<
endif
+# Generate API spec headers from kernel-doc comments
+ifeq ($(CONFIG_KAPI_SPEC),y)
+# Function to check if a file has API specifications
+has-apispec = $(shell grep -qE '^\s*\*\s*(long-desc|context-flags|state-trans):' $(src)/$(1) 2>/dev/null && echo $(1))
+
+# Get base names without directory prefix
+c-objs-base := $(notdir $(real-obj-y) $(real-obj-m))
+# Filter to only .o files with corresponding .c source files
+c-files := $(foreach o,$(c-objs-base),$(if $(wildcard $(src)/$(o:.o=.c)),$(o:.o=.c)))
+# Also check for any additional .c files that contain API specs but are included
+extra-c-files := $(shell find $(src) -maxdepth 1 -name "*.c" -exec grep -l '^\s*\*\s*\(long-desc\|context-flags\|state-trans\):' {} \; 2>/dev/null | xargs -r basename -a)
+# Combine both lists and remove duplicates
+all-c-files := $(sort $(c-files) $(extra-c-files))
+# Only include files that actually have API specifications
+apispec-files := $(foreach f,$(all-c-files),$(call has-apispec,$(f)))
+# Generate apispec targets with proper directory prefix
+apispec-y := $(addprefix $(obj)/,$(apispec-files:.c=.apispec.h))
+always-y += $(apispec-y)
+targets += $(apispec-y)
+
+quiet_cmd_apispec = APISPEC $@
+ cmd_apispec = PYTHONDONTWRITEBYTECODE=1 $(KERNELDOC) -apispec \
+ $(KDOCFLAGS) $< > $@ || rm -f $@
+
+$(obj)/%.apispec.h: $(src)/%.c FORCE
+ $(call if_changed,apispec)
+endif
+
# Compile C sources (.c)
# ---------------------------------------------------------------------------
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 6ead00ec7313b..f78dbbe637f27 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -35,6 +35,9 @@ __clean-files := $(filter-out $(no-clean-files), $(__clean-files))
__clean-files := $(wildcard $(addprefix $(obj)/, $(__clean-files)))
+# Also clean generated apispec headers (computed dynamically in Makefile.build)
+__clean-files += $(wildcard $(obj)/*.apispec.h)
+
# ==========================================================================
# To make this rule robust against "Argument list too long" error,
diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh
index 2c3078a508fef..3ac6be9b4fe98 100755
--- a/scripts/generate_api_specs.sh
+++ b/scripts/generate_api_specs.sh
@@ -1,18 +1,87 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
-# Stub script for generating API specifications collector
-# This is a placeholder until the full implementation is available
+# generate_api_specs.sh - Generate C file that includes all API specification headers
#
+# Usage: generate_api_specs.sh <srctree> <objtree>
-cat << 'EOF'
-// SPDX-License-Identifier: GPL-2.0
+SRCTREE="$1"
+OBJTREE="$2"
+
+if [ -z "$SRCTREE" ] || [ -z "$OBJTREE" ]; then
+ echo "Usage: $0 <srctree> <objtree>" >&2
+ exit 1
+fi
+
+# Generate header
+cat <<EOF
+/* SPDX-License-Identifier: GPL-2.0 */
/*
- * Auto-generated API specifications collector (stub)
- * Generated by scripts/generate_api_specs.sh
+ * Auto-generated file - DO NOT EDIT
+ * Generated by: scripts/generate_api_specs.sh
+ *
+ * This file includes all kernel API specification headers
*/
+#include <linux/kernel.h>
#include <linux/kernel_api_spec.h>
+#include <linux/errno.h>
+#include <linux/capability.h>
+#include <linux/fcntl.h>
+#include <uapi/linux/aio_abi.h>
+#include <uapi/linux/sched/types.h>
+#include <uapi/linux/xattr.h>
+#include <linux/eventfd.h>
+#include <linux/eventpoll.h>
+#include <linux/wait.h>
+#include <linux/inotify.h>
+#include <linux/splice.h>
+#include <linux/timerfd.h>
+#include <linux/kexec.h>
+#include <uapi/linux/mount.h>
+#include <uapi/linux/signalfd.h>
+#include <linux/fs.h>
+#include <linux/signal.h>
+#include <linux/ptrace.h>
+#include <linux/quota.h>
+#include <linux/uio.h>
+#include <linux/time.h>
+
+#ifdef CONFIG_KAPI_SPEC
+
+EOF
-/* No API specifications collected yet */
+# Find all .apispec.h files and generate includes
+# Look in both source tree and object tree
+(find "$SRCTREE" -name "*.apispec.h" -type f 2>/dev/null; \
+ find "$OBJTREE" -name "*.apispec.h" -type f 2>/dev/null) | \
+ grep -v "/generated_api_specs.c" | \
+ sort -u | \
+ while read -r apispec_file; do
+ # Get relative path from srctree or objtree
+ case "$apispec_file" in
+ "$SRCTREE"*)
+ rel_path="${apispec_file#$SRCTREE/}"
+ ;;
+ *)
+ rel_path="${apispec_file#$OBJTREE/}"
+ ;;
+ esac
+
+ # Skip if file is empty
+ if [ ! -s "$apispec_file" ]; then
+ continue
+ fi
+
+ # Generate include statement with relative path from kernel/api/
+ # The generated file is always at kernel/api/generated_api_specs.c,
+ # so we need to go up two directories to reach the root
+ echo "#include \"../../${rel_path}\""
+ done
+
+# Close the ifdef
+cat <<EOF
+
+#endif /* CONFIG_KAPI_SPEC */
EOF
+
diff --git a/scripts/kernel-doc.py b/scripts/kernel-doc.py
index 7a1eaf986bcd4..8404a9e9e3fed 100755
--- a/scripts/kernel-doc.py
+++ b/scripts/kernel-doc.py
@@ -231,6 +231,8 @@ def main():
help="Output reStructuredText format (default).")
out_fmt.add_argument("-N", "-none", "--none", action="store_true",
help="Do not output documentation, only warnings.")
+ out_fmt.add_argument("-apispec", "--apispec", action="store_true",
+ help="Output C macro invocations for kernel API specifications.")
# Output selection mutually-exclusive group
@@ -294,11 +296,14 @@ def main():
# Import kernel-doc libraries only after checking Python version
from kdoc.kdoc_files import KernelFiles # pylint: disable=C0415
from kdoc.kdoc_output import RestFormat, ManFormat # pylint: disable=C0415
+ from kdoc.kdoc_apispec import ApiSpecFormat # pylint: disable=C0415
if args.man:
out_style = ManFormat(modulename=args.modulename)
elif args.none:
out_style = None
+ elif args.apispec:
+ out_style = ApiSpecFormat()
else:
out_style = RestFormat()
diff --git a/tools/lib/python/kdoc/kdoc_apispec.py b/tools/lib/python/kdoc/kdoc_apispec.py
new file mode 100644
index 0000000000000..ae03af6d10685
--- /dev/null
+++ b/tools/lib/python/kdoc/kdoc_apispec.py
@@ -0,0 +1,755 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Generate C macro invocations for kernel API specifications from kernel-doc comments.
+
+This module creates C header files with API specification macros that match
+the kernel API specification framework introduced in commit 9688de5c25bed.
+"""
+
+from kdoc.kdoc_output import OutputFormat
+import re
+
+
+# Maximum string lengths (from kernel_api_spec.h)
+KAPI_MAX_DESC_LEN = 512
+KAPI_MAX_NAME_LEN = 128
+KAPI_MAX_SIGNAL_NAME_LEN = 32
+
+# Valid KAPI effect types
+VALID_EFFECT_TYPES = {
+ 'KAPI_EFFECT_NONE', 'KAPI_EFFECT_MODIFY_STATE', 'KAPI_EFFECT_PROCESS_STATE',
+ 'KAPI_EFFECT_IRREVERSIBLE', 'KAPI_EFFECT_SCHEDULE', 'KAPI_EFFECT_FILESYSTEM',
+ 'KAPI_EFFECT_HARDWARE', 'KAPI_EFFECT_ALLOC_MEMORY', 'KAPI_EFFECT_FREE_MEMORY',
+ 'KAPI_EFFECT_SIGNAL_SEND', 'KAPI_EFFECT_FILE_POSITION', 'KAPI_EFFECT_LOCK_ACQUIRE',
+ 'KAPI_EFFECT_LOCK_RELEASE', 'KAPI_EFFECT_RESOURCE_CREATE', 'KAPI_EFFECT_RESOURCE_DESTROY',
+ 'KAPI_EFFECT_NETWORK'
+}
+
+
+class ApiSpecFormat(OutputFormat):
+ """Generate C macro invocations for kernel API specifications"""
+
+ def __init__(self):
+ super().__init__()
+ self.header_written = False
+
+ def msg(self, fname, name, args):
+ """Handles a single entry from kernel-doc parser"""
+ if not self.header_written:
+ self.data = self._generate_header()
+ self.header_written = True
+ else:
+ self.data = ""
+
+ result = super().msg(fname, name, args)
+ return result if result else self.data
+
+ def _generate_header(self):
+ """Generate the file header"""
+ return (
+ "/* SPDX-License-Identifier: GPL-2.0 */\n"
+ "/* Auto-generated from kerneldoc annotations - DO NOT EDIT */\n\n"
+ "#include <linux/kernel_api_spec.h>\n"
+ "#include <linux/errno.h>\n\n"
+ )
+
+ def _format_macro_param(self, value, max_len=KAPI_MAX_DESC_LEN):
+ """Format a value for use in C macro parameter, truncating if needed"""
+ if value is None:
+ return '""'
+ value = str(value).replace('\\', '\\\\').replace('"', '\\"')
+ value = value.replace('\n', ' ')
+ # Truncate to fit within max_len, accounting for escaping overhead
+ if len(value) > max_len - 10:
+ value = value[:max_len - 13] + '...'
+ return f'"{value}"'
+
+ def _get_section(self, sections, key):
+ """Get first line from sections, checking with and without @ prefix"""
+ for prefix in ['', '@']:
+ full_key = prefix + key
+ if full_key in sections:
+ content = sections[full_key].strip()
+ # Return only first line to avoid mixing sections
+ return content.split('\n')[0].strip() if content else ''
+ return None
+
+ def _get_raw_section(self, sections, key):
+ """Get full section content, checking with and without @ prefix"""
+ for prefix in ['', '@']:
+ full_key = prefix + key
+ if full_key in sections:
+ return sections[full_key]
+ return ''
+
+ def _parse_indented_items(self, section_content, item_parser):
+ """Generic parser for indented items.
+
+ Args:
+ section_content: Raw section content
+ item_parser: Function that takes (lines, start_index) and returns (item, next_index)
+
+ Returns:
+ List of parsed items
+ """
+ if not section_content:
+ return []
+
+ items = []
+ lines = section_content.strip().split('\n')
+ i = 0
+
+ while i < len(lines):
+ if not lines[i].strip():
+ i += 1
+ continue
+
+ # Check if this is a main item (not indented)
+ if not lines[i].startswith((' ', '\t')):
+ item, i = item_parser(lines, i)
+ if item:
+ items.append(item)
+ else:
+ i += 1
+
+ return items
+
+ def _parse_subfields(self, lines, start_idx):
+ """Parse indented subfields starting from start_idx+1.
+
+ Returns: (dict of subfields, next index)
+ """
+ subfields = {}
+ i = start_idx + 1
+
+ while i < len(lines) and (lines[i].startswith((' ', '\t'))):
+ line = lines[i].strip()
+ if ':' in line:
+ key, value = line.split(':', 1)
+ subfields[key.strip()] = value.strip()
+ i += 1
+
+ return subfields, i
+
+ def _parse_signal_item(self, lines, i):
+ """Parse a single signal specification"""
+ signal = {'name': lines[i].strip()}
+ subfields, next_i = self._parse_subfields(lines, i)
+
+ # Map subfields to signal attributes
+ signal.update({
+ 'direction': subfields.get('direction', 'KAPI_SIGNAL_RECEIVE'),
+ 'action': subfields.get('action', 'KAPI_SIGNAL_ACTION_RETURN'),
+ 'condition': subfields.get('condition'),
+ 'desc': subfields.get('desc'),
+ 'error': subfields.get('error'),
+ 'timing': subfields.get('timing'),
+ 'priority': subfields.get('priority'),
+ 'interruptible': subfields.get('interruptible', '').lower() == 'yes',
+ 'number': subfields.get('number', '0'),
+ })
+
+ return signal, next_i
+
+ def _parse_error_item(self, lines, i):
+ """Parse a single error specification"""
+ line = lines[i].strip()
+
+ # Skip desc: lines
+ if line.startswith('desc:'):
+ return None, i + 1
+
+ # Check for error pattern
+ if not re.match(r'^[A-Z][A-Z0-9_]+,', line):
+ return None, i + 1
+
+ error = {'line': line, 'desc': ''}
+
+ # Look for desc: continuation
+ i += 1
+ desc_lines = []
+ while i < len(lines):
+ next_line = lines[i].strip()
+ if next_line.startswith('desc:'):
+ desc_lines.append(next_line[5:].strip())
+ i += 1
+ elif not next_line or re.match(r'^[A-Z][A-Z0-9_]+,', next_line):
+ break
+ else:
+ desc_lines.append(next_line)
+ i += 1
+
+ if desc_lines:
+ error['desc'] = ' '.join(desc_lines)
+
+ return error, i
+
+ def _parse_lock_item(self, lines, i):
+ """Parse a single lock specification"""
+ line = lines[i].strip()
+ if ':' not in line:
+ return None, i + 1
+
+ parts = line.split(':', 1)[1].strip().split(',', 1)
+ if len(parts) < 2:
+ return None, i + 1
+
+ lock = {
+ 'name': parts[0].strip(),
+ 'type': parts[1].strip()
+ }
+
+ subfields, next_i = self._parse_subfields(lines, i)
+
+ # Map boolean fields
+ for field in ['acquired', 'released', 'held-on-entry', 'held-on-exit']:
+ if subfields.get(field, '').lower() == 'true':
+ lock[field] = True
+
+ lock['desc'] = subfields.get('desc', '')
+
+ return lock, next_i
+
+ def _parse_constraint_item(self, lines, i):
+ """Parse a single constraint specification"""
+ line = lines[i].strip()
+
+ # Check for old format with comma
+ if ',' in line:
+ parts = line.split(',', 1)
+ constraint = {
+ 'name': parts[0].strip(),
+ 'desc': parts[1].strip() if len(parts) > 1 else '',
+ 'expr': None
+ }
+ else:
+ constraint = {'name': line, 'desc': '', 'expr': None}
+
+ subfields, next_i = self._parse_subfields(lines, i)
+
+ if 'desc' in subfields:
+ constraint['desc'] = (constraint['desc'] + ' ' + subfields['desc']).strip()
+ constraint['expr'] = subfields.get('expr')
+
+ return constraint, next_i
+
+ def _parse_side_effect_item(self, lines, i):
+ """Parse a single side effect specification"""
+ line = lines[i].strip()
+
+ # Default to new format
+ effect = {
+ 'type': line,
+ 'target': '',
+ 'desc': '',
+ 'condition': None,
+ 'reversible': False
+ }
+
+ # Check for old format with commas
+ if ',' in line:
+ # Handle condition and reversible flags
+ cond_match = re.search(r',\s*condition=([^,]+?)(?:\s*,\s*reversible=(yes|no)\s*)?$', line)
+ if cond_match:
+ effect['condition'] = cond_match.group(1).strip()
+ effect['reversible'] = cond_match.group(2) == 'yes'
+ line = line[:cond_match.start()]
+ elif ', reversible=yes' in line:
+ effect['reversible'] = True
+ line = line.replace(', reversible=yes', '')
+ elif ', reversible=no' in line:
+ line = line.replace(', reversible=no', '')
+
+ parts = line.split(',', 2)
+ if len(parts) >= 1:
+ effect['type'] = parts[0].strip()
+ if len(parts) >= 2:
+ effect['target'] = parts[1].strip()
+ if len(parts) >= 3:
+ effect['desc'] = parts[2].strip()
+ else:
+ # Multi-line format with subfields
+ subfields, next_i = self._parse_subfields(lines, i)
+ effect.update({
+ 'target': subfields.get('target', ''),
+ 'desc': subfields.get('desc', ''),
+ 'condition': subfields.get('condition'),
+ 'reversible': subfields.get('reversible', '').lower() == 'yes'
+ })
+ return effect, next_i
+
+ return effect, i + 1
+
+ def _parse_state_trans_item(self, lines, i):
+ """Parse a single state transition specification"""
+ line = lines[i].strip()
+
+ trans = {
+ 'target': line,
+ 'from': '',
+ 'to': '',
+ 'condition': '',
+ 'desc': ''
+ }
+
+ # Check for old format with commas
+ if ',' in line:
+ parts = line.split(',', 3)
+ if len(parts) >= 1:
+ trans['target'] = parts[0].strip()
+ if len(parts) >= 2:
+ trans['from'] = parts[1].strip()
+ if len(parts) >= 3:
+ trans['to'] = parts[2].strip()
+ if len(parts) >= 4:
+ desc_part = parts[3].strip()
+ desc_parts = desc_part.split(',', 1)
+ if len(desc_parts) > 1:
+ trans['condition'] = desc_parts[0].strip()
+ trans['desc'] = desc_parts[1].strip()
+ else:
+ trans['desc'] = desc_part
+ return trans, i + 1
+ else:
+ # Multi-line format with subfields
+ subfields, next_i = self._parse_subfields(lines, i)
+ trans.update({
+ 'from': subfields.get('from', ''),
+ 'to': subfields.get('to', ''),
+ 'condition': subfields.get('condition', ''),
+ 'desc': subfields.get('desc', '')
+ })
+ return trans, next_i
+
+ def _process_parameters(self, sections, parameterlist, parameterdescs, parametertypes):
+ """Process and output parameter specifications"""
+ param_count = len(parameterlist)
+ if param_count > 0:
+ self.data += f"\n\tKAPI_PARAM_COUNT({param_count})\n"
+
+ for param_idx, param in enumerate(parameterlist):
+ param_name = param.strip()
+ param_desc = parameterdescs.get(param_name, '')
+ param_ctype = parametertypes.get(param_name, '')
+
+ # Parse parameter specifications
+ param_section = self._get_raw_section(sections, 'param')
+ param_specs = {}
+ if param_section:
+ param_specs = self._parse_param_spec(param_section, param_name)
+
+ self.data += f"\n\tKAPI_PARAM({param_idx}, {self._format_macro_param(param_name)}, "
+ self.data += f"{self._format_macro_param(param_ctype)}, {self._format_macro_param(param_desc)})\n"
+
+ # Add parameter attributes
+ for key, macro in [
+ ('param-type', 'KAPI_PARAM_TYPE'),
+ ('param-flags', 'KAPI_PARAM_FLAGS'),
+ ('param-size', 'KAPI_PARAM_SIZE'),
+ ('param-alignment', 'KAPI_PARAM_ALIGNMENT'),
+ ]:
+ if key in param_specs:
+ self.data += f"\t\t{macro}({param_specs[key]})\n"
+
+ # Handle constraint type
+ if 'param-constraint-type' in param_specs:
+ ctype = param_specs['param-constraint-type']
+ if ctype == 'KAPI_CONSTRAINT_BITMASK':
+ ctype = 'KAPI_CONSTRAINT_MASK'
+ self.data += f"\t\tKAPI_PARAM_CONSTRAINT_TYPE({ctype})\n"
+
+ # Handle range
+ if 'param-range' in param_specs and ',' in param_specs['param-range']:
+ min_val, max_val = param_specs['param-range'].split(',', 1)
+ self.data += f"\t\tKAPI_PARAM_RANGE({min_val.strip()}, {max_val.strip()})\n"
+
+ # Handle mask
+ if 'param-mask' in param_specs:
+ self.data += f"\t\tKAPI_PARAM_VALID_MASK({param_specs['param-mask']})\n"
+
+ # Handle enum values
+ if 'param-enum-values' in param_specs:
+ self.data += f"\t\tKAPI_PARAM_ENUM_VALUES({param_specs['param-enum-values']})\n"
+
+ # Handle constraint description
+ if 'param-constraint' in param_specs:
+ self.data += f"\t\tKAPI_PARAM_CONSTRAINT({self._format_macro_param(param_specs['param-constraint'])})\n"
+
+ self.data += "\tKAPI_PARAM_END\n"
+
+ def _parse_param_spec(self, section_content, param_name):
+ """Parse parameter specifications from indented format"""
+ specs = {}
+ lines = section_content.strip().split('\n')
+ current_item = None
+
+ # Map to expected keys
+ field_map = {
+ 'flags': 'param-flags',
+ 'size': 'param-size',
+ 'constraint-type': 'param-constraint-type',
+ 'constraint': 'param-constraint',
+ 'range': 'param-range',
+ 'mask': 'param-mask',
+ 'valid-mask': 'param-mask',
+ 'valid-values': 'param-enum-values',
+ 'alignment': 'param-alignment',
+ 'struct-type': 'param-struct-type',
+ }
+
+ i = 0
+ while i < len(lines):
+ line = lines[i]
+ if not line.strip():
+ i += 1
+ continue
+
+ # Check if this is our parameter (non-indented line)
+ if not line.startswith((' ', '\t')):
+ parts = line.strip().split(',', 1)
+ current_item = param_name if parts[0].strip() == param_name else None
+ if current_item and len(parts) > 1:
+ specs['param-type'] = parts[1].strip()
+ i += 1
+ elif current_item == param_name:
+ # Parse subfield
+ stripped = line.strip()
+ if ':' in stripped:
+ key, value = stripped.split(':', 1)
+ key = key.strip()
+ value = value.strip()
+
+ # Collect continuation lines (indented lines without a colon that
+ # defines a new key, i.e., lines that are pure continuations)
+ i += 1
+ while i < len(lines):
+ next_line = lines[i]
+ # Stop if we hit a non-indented line (new param)
+ if next_line.strip() and not next_line.startswith((' ', '\t')):
+ break
+ next_stripped = next_line.strip()
+ # Stop if we hit a new key (contains colon with known key prefix)
+ if next_stripped and ':' in next_stripped:
+ potential_key = next_stripped.split(':', 1)[0].strip()
+ if potential_key in field_map or potential_key in ['type', 'desc']:
+ break
+ # This is a continuation line
+ if next_stripped:
+ value = value + ' ' + next_stripped
+ i += 1
+
+ if key in field_map:
+ # Clean up the value - remove excessive whitespace
+ value = ' '.join(value.split())
+ specs[field_map[key]] = value
+ else:
+ i += 1
+
+ return specs
+
+ def _validate_effect_type(self, effect_type):
+ """Validate and normalize effect type"""
+ if 'KAPI_EFFECT_SCHEDULER' in effect_type:
+ return effect_type.replace('KAPI_EFFECT_SCHEDULER', 'KAPI_EFFECT_SCHEDULE')
+
+ if 'KAPI_EFFECT_' in effect_type and effect_type not in VALID_EFFECT_TYPES:
+ if '|' in effect_type:
+ parts = [p.strip() for p in effect_type.split('|')]
+ valid_parts = [p if p in VALID_EFFECT_TYPES else 'KAPI_EFFECT_MODIFY_STATE' for p in parts]
+ return ' | '.join(valid_parts)
+ return 'KAPI_EFFECT_MODIFY_STATE'
+
+ return effect_type
+
+ def _has_api_spec(self, sections):
+ """Check if this function has an API specification.
+
+ Returns True if at least 2 KAPI-specific section indicators are present.
+ We require 2+ indicators (not just 1) to avoid false positives from
+ regular kernel-doc comments that happen to use a common section name
+ like 'return' or 'error'. Having multiple KAPI sections strongly
+ suggests intentional API specification rather than coincidence.
+ """
+ indicators = [
+ 'api-type', 'context-flags', 'param-type', 'error-code',
+ 'capability', 'signal', 'lock', 'state-trans', 'constraint',
+ 'return', 'error', 'side-effects', 'struct'
+ ]
+
+ count = sum(1 for ind in indicators
+ if any(key.lower().startswith(ind.lower()) or
+ key.lower().startswith('@' + ind.lower())
+ for key in sections.keys()))
+
+ # Require 2+ indicators to distinguish from regular kernel-doc
+ return count >= 2
+
+ def out_function(self, fname, name, args):
+ """Generate API spec for a function"""
+ function_name = args.get('function', name)
+ sections = args.sections if hasattr(args, 'sections') else args.get('sections', {})
+
+ if not self._has_api_spec(sections):
+ return
+
+ parameterlist = args.parameterlist if hasattr(args, 'parameterlist') else args.get('parameterlist', [])
+ parameterdescs = args.parameterdescs if hasattr(args, 'parameterdescs') else args.get('parameterdescs', {})
+ parametertypes = args.parametertypes if hasattr(args, 'parametertypes') else args.get('parametertypes', {})
+ purpose = args.get('purpose', '')
+
+ # Start macro invocation
+ self.data += f"DEFINE_KERNEL_API_SPEC({function_name})\n"
+
+ # Basic info
+ if purpose:
+ self.data += f"\tKAPI_DESCRIPTION({self._format_macro_param(purpose)})\n"
+
+ long_desc = self._get_section(sections, 'long-desc')
+ if long_desc:
+ self.data += f"\tKAPI_LONG_DESC({self._format_macro_param(long_desc)})\n"
+
+ # Context flags
+ context = self._get_section(sections, 'context-flags') or self._get_section(sections, 'context')
+ if context:
+ self.data += f"\tKAPI_CONTEXT({context})\n"
+
+ # Process parameters
+ self._process_parameters(sections, parameterlist, parameterdescs, parametertypes)
+
+ # Process errors
+ errors = self._parse_indented_items(
+ self._get_raw_section(sections, 'error'),
+ self._parse_error_item
+ )
+
+ if errors:
+ self.data += f"\n\tKAPI_RETURN_ERROR_COUNT({len(errors)})\n"
+ self.data += f"\n\tKAPI_ERROR_COUNT({len(errors)})\n"
+
+ for idx, error in enumerate(errors):
+ self._output_error(idx, error)
+
+ # Process signals
+ signals = self._parse_indented_items(
+ self._get_raw_section(sections, 'signal'),
+ self._parse_signal_item
+ )
+
+ if signals:
+ self.data += f"\n\tKAPI_SIGNAL_COUNT({len(signals)})\n"
+
+ for idx, signal in enumerate(signals):
+ self._output_signal(idx, signal)
+
+ # Process other specifications
+ self._process_locks(sections)
+ self._process_constraints(sections)
+ self._process_side_effects(sections)
+ self._process_state_transitions(sections)
+ self._process_capabilities(sections)
+
+ # Add examples and notes
+ for key, macro in [('examples', 'KAPI_EXAMPLES'), ('notes', 'KAPI_NOTES')]:
+ value = self._get_section(sections, key)
+ if value:
+ self.data += f"\n\t{macro}({self._format_macro_param(value)})\n"
+
+ self.data += "\nKAPI_END_SPEC;\n\n"
+
+ def _output_error(self, idx, error):
+ """Output a single error specification"""
+ line = error['line']
+ if line.startswith('-'):
+ line = line[1:].strip()
+
+ parts = line.split(',', 2)
+ if len(parts) == 2:
+ # Format: NAME, description
+ name = parts[0].strip()
+ short_desc = parts[1].strip()
+ code = f"-{name}"
+ elif len(parts) >= 3:
+ # Format: code, name, description
+ code = parts[0].strip()
+ name = parts[1].strip()
+ short_desc = parts[2].strip()
+ if not code.startswith('-'):
+ code = f"-{code}"
+ else:
+ return
+
+ long_desc = error.get('desc', '') or short_desc
+
+ self.data += f"\n\tKAPI_ERROR({idx}, {code}, {self._format_macro_param(name)}, "
+ self.data += f"{self._format_macro_param(short_desc)},\n\t\t {self._format_macro_param(long_desc)})\n"
+
+ def _output_signal(self, idx, signal):
+ """Output a single signal specification"""
+ self.data += f"\n\tKAPI_SIGNAL({idx}, {signal['number']}, "
+ self.data += f"{self._format_macro_param(signal['name'], KAPI_MAX_SIGNAL_NAME_LEN)}, "
+ self.data += f"{signal['direction']}, {signal['action']})\n"
+
+ for key, macro in [
+ ('condition', 'KAPI_SIGNAL_CONDITION'),
+ ('desc', 'KAPI_SIGNAL_DESC'),
+ ('error', 'KAPI_SIGNAL_ERROR'),
+ ('timing', 'KAPI_SIGNAL_TIMING'),
+ ('priority', 'KAPI_SIGNAL_PRIORITY'),
+ ]:
+ if signal.get(key):
+ # Priority field is numeric
+ if key == 'priority':
+ self.data += f"\t\t{macro}({signal[key]})\n"
+ else:
+ self.data += f"\t\t{macro}({self._format_macro_param(signal[key])})\n"
+
+ if signal.get('interruptible'):
+ self.data += "\t\tKAPI_SIGNAL_INTERRUPTIBLE\n"
+
+ self.data += "\tKAPI_SIGNAL_END\n"
+
+ def _process_locks(self, sections):
+ """Process lock specifications"""
+ locks = self._parse_indented_items(
+ self._get_raw_section(sections, 'lock'),
+ self._parse_lock_item
+ )
+
+ if locks:
+ self.data += f"\n\tKAPI_LOCK_COUNT({len(locks)})\n"
+
+ for idx, lock in enumerate(locks):
+ self.data += f"\n\tKAPI_LOCK({idx}, {self._format_macro_param(lock['name'])}, {lock['type']})\n"
+
+ for flag in ['acquired', 'released']:
+ if lock.get(flag):
+ self.data += f"\t\tKAPI_LOCK_{flag.upper()}\n"
+
+ if lock.get('desc'):
+ self.data += f"\t\tKAPI_LOCK_DESC({self._format_macro_param(lock['desc'])})\n"
+
+ self.data += "\tKAPI_LOCK_END\n"
+
+ def _process_constraints(self, sections):
+ """Process constraint specifications"""
+ constraints = self._parse_indented_items(
+ self._get_raw_section(sections, 'constraint'),
+ self._parse_constraint_item
+ )
+
+ if constraints:
+ self.data += f"\n\tKAPI_CONSTRAINT_COUNT({len(constraints)})\n"
+
+ for idx, constraint in enumerate(constraints):
+ self.data += f"\n\tKAPI_CONSTRAINT({idx}, {self._format_macro_param(constraint['name'])},\n"
+ self.data += f"\t\t\t{self._format_macro_param(constraint['desc'])})\n"
+
+ if constraint.get('expr'):
+ self.data += f"\t\tKAPI_CONSTRAINT_EXPR({self._format_macro_param(constraint['expr'])})\n"
+
+ self.data += "\tKAPI_CONSTRAINT_END\n"
+
+ def _process_side_effects(self, sections):
+ """Process side effect specifications"""
+ effects = self._parse_indented_items(
+ self._get_raw_section(sections, 'side-effect'),
+ self._parse_side_effect_item
+ )
+
+ if effects:
+ self.data += f"\n\tKAPI_SIDE_EFFECT_COUNT({len(effects)})\n"
+
+ for idx, effect in enumerate(effects):
+ effect_type = self._validate_effect_type(effect['type'])
+
+ self.data += f"\n\tKAPI_SIDE_EFFECT({idx}, {effect_type},\n"
+ self.data += f"\t\t\t {self._format_macro_param(effect['target'])},\n"
+ self.data += f"\t\t\t {self._format_macro_param(effect['desc'])})\n"
+
+ if effect.get('condition'):
+ self.data += f"\t\tKAPI_EFFECT_CONDITION({self._format_macro_param(effect['condition'])})\n"
+
+ if effect.get('reversible'):
+ self.data += "\t\tKAPI_EFFECT_REVERSIBLE\n"
+
+ self.data += "\tKAPI_SIDE_EFFECT_END\n"
+
+ def _process_state_transitions(self, sections):
+ """Process state transition specifications"""
+ transitions = self._parse_indented_items(
+ self._get_raw_section(sections, 'state-trans'),
+ self._parse_state_trans_item
+ )
+
+ if transitions:
+ self.data += f"\n\tKAPI_STATE_TRANS_COUNT({len(transitions)})\n"
+
+ for idx, trans in enumerate(transitions):
+ desc = trans['desc']
+ if trans.get('condition'):
+ desc = trans['condition'] + (', ' + desc if desc else '')
+
+ self.data += f"\n\tKAPI_STATE_TRANS({idx}, {self._format_macro_param(trans['target'])}, "
+ self.data += f"{self._format_macro_param(trans['from'])}, {self._format_macro_param(trans['to'])},\n"
+ self.data += f"\t\t\t {self._format_macro_param(desc)})\n"
+ self.data += "\tKAPI_STATE_TRANS_END\n"
+
+ def _process_capabilities(self, sections):
+ """Process capability specifications"""
+ cap_section = self._get_raw_section(sections, 'capability')
+ if not cap_section:
+ return
+
+ lines = cap_section.strip().split('\n')
+ capabilities = []
+ i = 0
+
+ while i < len(lines):
+ line = lines[i].strip()
+ if not line or line.startswith(('allows:', 'without:', 'condition:', 'priority:')):
+ i += 1
+ continue
+
+ cap_info = {'line': line}
+
+ # Parse subfields
+ subfields, next_i = self._parse_subfields(lines, i)
+ cap_info.update(subfields)
+ capabilities.append(cap_info)
+ i = next_i
+
+ if capabilities:
+ self.data += f"\n\tKAPI_CAPABILITY_COUNT({len(capabilities)})\n"
+
+ for idx, cap in enumerate(capabilities):
+ parts = cap['line'].split(',', 2)
+ if len(parts) >= 2:
+ cap_name = parts[0].strip()
+ cap_type = parts[1].strip()
+ cap_desc = parts[2].strip() if len(parts) > 2 else cap_name
+
+ # Fix common type issues
+ if 'BYPASS' in cap_type and cap_type != 'KAPI_CAP_BYPASS_CHECK':
+ cap_type = 'KAPI_CAP_BYPASS_CHECK'
+
+ self.data += f"\n\tKAPI_CAPABILITY({idx}, {cap_name}, {self._format_macro_param(cap_desc)}, {cap_type})\n"
+
+ for key, macro in [
+ ('allows', 'KAPI_CAP_ALLOWS'),
+ ('without', 'KAPI_CAP_WITHOUT'),
+ ('condition', 'KAPI_CAP_CONDITION'),
+ ('priority', 'KAPI_CAP_PRIORITY'),
+ ]:
+ if cap.get(key):
+ value = self._format_macro_param(cap[key]) if key != 'priority' else cap[key]
+ self.data += f"\t\t{macro}({value})\n"
+
+ self.data += "\tKAPI_CAPABILITY_END\n"
+
+ # Skip output methods for non-function types
+ def out_enum(self, fname, name, args): pass
+ def out_typedef(self, fname, name, args): pass
+ def out_struct(self, fname, name, args): pass
+ def out_doc(self, fname, name, args): pass
diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/kdoc_output.py
index b1aaa7fc36041..cc5752cd76a8d 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -124,8 +124,13 @@ class OutputFormat:
Output warnings for identifiers that will be displayed.
"""
- for log_msg in args.warnings:
- self.config.warning(log_msg)
+ warnings = getattr(args, 'warnings', [])
+
+ for log_msg in warnings:
+ # Skip numeric warnings (line numbers) which are false positives
+ # from parameter-specific sections like "param-constraint: name, value"
+ if not isinstance(log_msg, int):
+ self.config.warning(log_msg)
def check_doc(self, name, args):
"""Check if DOC should be output"""
diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 500aafc500322..ecd218e762a34 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -31,6 +31,23 @@ from kdoc.kdoc_item import KdocItem
# Allow whitespace at end of comment start.
doc_start = KernRe(r'^/\*\*\s*$', cache=False)
+# Sections that are allowed to be duplicated for API specifications
+# These represent lists of items (multiple errors, signals, etc.)
+ALLOWED_DUPLICATE_SECTIONS = {
+ 'param', '@param',
+ 'error', '@error',
+ 'signal', '@signal',
+ 'lock', '@lock',
+ 'side-effect', '@side-effect',
+ 'state-trans', '@state-trans',
+ 'capability', '@capability',
+ 'constraint', '@constraint',
+ 'validation-group', '@validation-group',
+ 'validation-rule', '@validation-rule',
+ 'validation-flag', '@validation-flag',
+ 'struct-field', '@struct-field',
+}
+
doc_end = KernRe(r'\*/', cache=False)
doc_com = KernRe(r'\s*\*\s*', cache=False)
doc_com_body = KernRe(r'\s*\* ?', cache=False)
@@ -43,10 +60,71 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
# @{section-name}:
# while trying to not match literal block starts like "example::"
#
+# Base kernel-doc section names
known_section_names = 'description|context|returns?|notes?|examples?'
-known_sections = KernRe(known_section_names, flags = re.I)
+
+# API specification section names (for KAPI spec framework)
+# Format: (base_name, has_count_variant, has_other_variants)
+# Sections with has_count_variant=True need negative lookahead in doc_sect
+# to avoid matching 'error' when 'error-count' is intended
+_kapi_base_sections = [
+ # (name, needs_lookahead, additional_variants)
+ ('api-type', False, []),
+ ('api-version', False, []),
+ ('param', True, []), # has param-count
+ ('struct', True, ['struct-type', 'struct-field', 'struct-field-[a-z\\-]+']),
+ ('validation-group', False, []),
+ ('validation-policy', False, []),
+ ('validation-flag', False, []),
+ ('validation-rule', False, []),
+ ('error', True, ['error-code', 'error-condition']),
+ ('capability', True, []),
+ ('signal', True, []),
+ ('lock', True, []),
+ ('since', False, ['since-version']),
+ ('context-flags', False, []),
+ ('return', True, ['return-type', 'return-check', 'return-check-type',
+ 'return-success', 'return-desc']),
+ ('long-desc', False, []),
+ ('constraint', True, []),
+ ('side-effect', True, []),
+ ('state-trans', True, []),
+]
+
+def _build_kapi_patterns():
+ """Build KAPI section patterns from the base definitions."""
+ validation_parts = [] # For known_sections (simple validation)
+ parsing_parts = [] # For doc_sect (with negative lookaheads)
+
+ for name, has_count, variants in _kapi_base_sections:
+ # Add base name (with optional @ prefix)
+ validation_parts.append(f'@?{name}')
+ if has_count:
+ # Need negative lookahead to not match 'name-count' or 'name-*'
+ parsing_parts.append(f'@?{name}(?!-)')
+ validation_parts.append(f'@?{name}-count')
+ parsing_parts.append(f'@?{name}-count')
+ else:
+ parsing_parts.append(f'@?{name}')
+
+ # Add variants
+ for variant in variants:
+ validation_parts.append(f'@?{variant}')
+ parsing_parts.append(f'@?{variant}')
+
+ # Add catch-all for kapi-* extensions
+ validation_parts.append(r'@?kapi-.*')
+ parsing_parts.append(r'@?kapi-.*')
+
+ return '|'.join(validation_parts), '|'.join(parsing_parts)
+
+_kapi_validation_pattern, _kapi_parsing_pattern = _build_kapi_patterns()
+
+known_sections = KernRe(known_section_names + '|' + _kapi_validation_pattern,
+ flags=re.I)
doc_sect = doc_com + \
- KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
+ KernRe(r'\s*(@[.\w\-]+|@\.\.\.|' + known_section_names + '|' +
+ _kapi_parsing_pattern + r')\s*:([^:].*)?$',
flags=re.I, cache=False)
doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
@@ -342,7 +420,9 @@ class KernelEntry:
else:
if name in self.sections and self.sections[name] != "":
# Only warn on user-specified duplicate section names
- if name != SECTION_DEFAULT:
+ # Skip warning for sections that are expected to have duplicates
+ # (like error, param, signal, etc. for API specifications)
+ if name != SECTION_DEFAULT and name not in ALLOWED_DUPLICATE_SECTIONS:
self.emit_msg(self.new_start_line,
f"duplicate section name '{name}'")
# Treat as a new paragraph - add a blank line
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 03/15] kernel/api: add debugfs interface for kernel API specifications
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 01/15] kernel/api: introduce kernel API specification framework Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 02/15] kernel/api: enable kerneldoc-based API specifications Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 04/15] tools/kapi: Add kernel API specification extraction tool Sasha Levin
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Add a debugfs interface to expose kernel API specifications at runtime.
This allows tools and users to query the complete API specifications
through the debugfs filesystem.
The interface provides:
- /sys/kernel/debug/kapi/list - lists all available API specifications
- /sys/kernel/debug/kapi/specs/<name> - detailed info for each API
Each specification file includes:
- Function name, version, and descriptions
- Execution context requirements and flags
- Parameter details with types, flags, and constraints
- Return value specifications and success conditions
- Error codes with descriptions and conditions
- Locking requirements and constraints
- Signal handling specifications
- Examples, notes, and deprecation status
This enables runtime introspection of kernel APIs for documentation
tools, static analyzers, and debugging purposes.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/api/Kconfig | 20 +++
kernel/api/Makefile | 3 +
kernel/api/kapi_debugfs.c | 358 ++++++++++++++++++++++++++++++++++++++
3 files changed, 381 insertions(+)
create mode 100644 kernel/api/kapi_debugfs.c
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
index fde25ec70e134..d2754b21acc43 100644
--- a/kernel/api/Kconfig
+++ b/kernel/api/Kconfig
@@ -33,3 +33,23 @@ config KAPI_RUNTIME_CHECKS
development. The checks use WARN_ONCE to report violations.
If unsure, say N.
+
+config KAPI_SPEC_DEBUGFS
+ bool "Export kernel API specifications via debugfs"
+ depends on KAPI_SPEC
+ depends on DEBUG_FS
+ help
+ This option enables exporting kernel API specifications through
+ the debugfs filesystem. When enabled, specifications can be
+ accessed at /sys/kernel/debug/kapi/.
+
+ The debugfs interface provides:
+ - A list of all available API specifications
+ - Detailed information for each API including parameters,
+ return values, errors, locking requirements, and constraints
+ - Complete machine-readable representation of the specs
+
+ This is useful for documentation tools, static analyzers, and
+ runtime introspection of kernel APIs.
+
+ If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index acab17c78afa3..716f128eea71d 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -10,6 +10,9 @@ obj-$(CONFIG_KAPI_SPEC) += kernel_api_spec.o
ifeq ($(CONFIG_KAPI_SPEC),y)
obj-$(CONFIG_KAPI_SPEC) += generated_api_specs.o
+# Debugfs interface for kernel API specs
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) += kapi_debugfs.o
+
# Find all potential apispec files (this is evaluated at make time)
apispec-files := $(shell find $(objtree) -name "*.apispec.h" -type f 2>/dev/null)
diff --git a/kernel/api/kapi_debugfs.c b/kernel/api/kapi_debugfs.c
new file mode 100644
index 0000000000000..84d5446d93916
--- /dev/null
+++ b/kernel/api/kapi_debugfs.c
@@ -0,0 +1,358 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kernel API specification debugfs interface
+ *
+ * This provides a debugfs interface to expose kernel API specifications
+ * at runtime, allowing tools and users to query the complete API specs.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+/* External symbols for kernel API spec section */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+static struct dentry *kapi_debugfs_root;
+
+/* Helper function to print parameter type as string */
+static const char *param_type_str(enum kapi_param_type type)
+{
+ switch (type) {
+ case KAPI_TYPE_INT: return "int";
+ case KAPI_TYPE_UINT: return "uint";
+ case KAPI_TYPE_PTR: return "ptr";
+ case KAPI_TYPE_STRUCT: return "struct";
+ case KAPI_TYPE_UNION: return "union";
+ case KAPI_TYPE_ARRAY: return "array";
+ case KAPI_TYPE_FD: return "fd";
+ case KAPI_TYPE_ENUM: return "enum";
+ case KAPI_TYPE_USER_PTR: return "user_ptr";
+ case KAPI_TYPE_PATH: return "path";
+ case KAPI_TYPE_FUNC_PTR: return "func_ptr";
+ case KAPI_TYPE_CUSTOM: return "custom";
+ default: return "unknown";
+ }
+}
+
+/* Helper to print parameter flags */
+static void print_param_flags(struct seq_file *m, u32 flags)
+{
+ seq_printf(m, " flags: ");
+ if (flags & KAPI_PARAM_IN) seq_printf(m, "IN ");
+ if (flags & KAPI_PARAM_OUT) seq_printf(m, "OUT ");
+ if (flags & KAPI_PARAM_INOUT) seq_printf(m, "INOUT ");
+ if (flags & KAPI_PARAM_OPTIONAL) seq_printf(m, "OPTIONAL ");
+ if (flags & KAPI_PARAM_CONST) seq_printf(m, "CONST ");
+ if (flags & KAPI_PARAM_USER) seq_printf(m, "USER ");
+ if (flags & KAPI_PARAM_VOLATILE) seq_printf(m, "VOLATILE ");
+ if (flags & KAPI_PARAM_DMA) seq_printf(m, "DMA ");
+ if (flags & KAPI_PARAM_ALIGNED) seq_printf(m, "ALIGNED ");
+ seq_printf(m, "\n");
+}
+
+/* Helper to print context flags */
+static void print_context_flags(struct seq_file *m, u32 flags)
+{
+ seq_printf(m, "Context flags: ");
+ if (flags & KAPI_CTX_PROCESS) seq_printf(m, "PROCESS ");
+ if (flags & KAPI_CTX_HARDIRQ) seq_printf(m, "HARDIRQ ");
+ if (flags & KAPI_CTX_SOFTIRQ) seq_printf(m, "SOFTIRQ ");
+ if (flags & KAPI_CTX_NMI) seq_printf(m, "NMI ");
+ if (flags & KAPI_CTX_SLEEPABLE) seq_printf(m, "SLEEPABLE ");
+ if (flags & KAPI_CTX_ATOMIC) seq_printf(m, "ATOMIC ");
+ if (flags & KAPI_CTX_PREEMPT_DISABLED) seq_printf(m, "PREEMPT_DISABLED ");
+ if (flags & KAPI_CTX_IRQ_DISABLED) seq_printf(m, "IRQ_DISABLED ");
+ seq_printf(m, "\n");
+}
+
+/* Show function for individual API spec */
+static int kapi_spec_show(struct seq_file *m, void *v)
+{
+ struct kernel_api_spec *spec = m->private;
+ int i;
+
+ seq_printf(m, "Kernel API Specification\n");
+ seq_printf(m, "========================\n\n");
+
+ /* Basic info */
+ seq_printf(m, "Name: %s\n", spec->name);
+ seq_printf(m, "Version: %u\n", spec->version);
+ seq_printf(m, "Description: %s\n", spec->description);
+ if (strlen(spec->long_description) > 0)
+ seq_printf(m, "Long description: %s\n", spec->long_description);
+
+ /* Context */
+ print_context_flags(m, spec->context_flags);
+ seq_printf(m, "\n");
+
+ /* Parameters */
+ if (spec->param_count > 0) {
+ seq_printf(m, "Parameters (%u):\n", spec->param_count);
+ for (i = 0; i < spec->param_count; i++) {
+ struct kapi_param_spec *param = &spec->params[i];
+ seq_printf(m, " [%d] %s:\n", i, param->name);
+ seq_printf(m, " type: %s (%s)\n",
+ param_type_str(param->type), param->type_name);
+ print_param_flags(m, param->flags);
+ if (strlen(param->description) > 0)
+ seq_printf(m, " description: %s\n", param->description);
+ if (param->size > 0)
+ seq_printf(m, " size: %zu\n", param->size);
+ if (param->alignment > 0)
+ seq_printf(m, " alignment: %zu\n", param->alignment);
+
+ /* Print constraints if any */
+ if (param->constraint_type != KAPI_CONSTRAINT_NONE) {
+ seq_printf(m, " constraints:\n");
+ switch (param->constraint_type) {
+ case KAPI_CONSTRAINT_RANGE:
+ seq_printf(m, " type: range\n");
+ seq_printf(m, " min: %lld\n", param->min_value);
+ seq_printf(m, " max: %lld\n", param->max_value);
+ break;
+ case KAPI_CONSTRAINT_MASK:
+ seq_printf(m, " type: mask\n");
+ seq_printf(m, " valid_bits: 0x%llx\n", param->valid_mask);
+ break;
+ case KAPI_CONSTRAINT_ENUM:
+ seq_printf(m, " type: enum\n");
+ seq_printf(m, " count: %u\n", param->enum_count);
+ break;
+ case KAPI_CONSTRAINT_USER_STRING:
+ seq_printf(m, " type: user_string\n");
+ seq_printf(m, " min_len: %lld\n", param->min_value);
+ seq_printf(m, " max_len: %lld\n", param->max_value);
+ break;
+ case KAPI_CONSTRAINT_USER_PATH:
+ seq_printf(m, " type: user_path\n");
+ seq_printf(m, " max_len: PATH_MAX (4096)\n");
+ break;
+ case KAPI_CONSTRAINT_USER_PTR:
+ seq_printf(m, " type: user_ptr\n");
+ seq_printf(m, " size: %zu bytes\n", param->size);
+ break;
+ case KAPI_CONSTRAINT_CUSTOM:
+ seq_printf(m, " type: custom\n");
+ if (strlen(param->constraints) > 0)
+ seq_printf(m, " description: %s\n",
+ param->constraints);
+ break;
+ default:
+ break;
+ }
+ }
+ seq_printf(m, "\n");
+ }
+ }
+
+ /* Return value */
+ seq_printf(m, "Return value:\n");
+ seq_printf(m, " type: %s\n", spec->return_spec.type_name);
+ if (strlen(spec->return_spec.description) > 0)
+ seq_printf(m, " description: %s\n", spec->return_spec.description);
+
+ switch (spec->return_spec.check_type) {
+ case KAPI_RETURN_EXACT:
+ seq_printf(m, " success: == %lld\n", spec->return_spec.success_value);
+ break;
+ case KAPI_RETURN_RANGE:
+ seq_printf(m, " success: [%lld, %lld]\n",
+ spec->return_spec.success_min,
+ spec->return_spec.success_max);
+ break;
+ case KAPI_RETURN_FD:
+ seq_printf(m, " success: valid file descriptor (>= 0)\n");
+ break;
+ case KAPI_RETURN_ERROR_CHECK:
+ seq_printf(m, " success: error check\n");
+ break;
+ case KAPI_RETURN_CUSTOM:
+ seq_printf(m, " success: custom check\n");
+ break;
+ default:
+ break;
+ }
+ seq_printf(m, "\n");
+
+ /* Errors */
+ if (spec->error_count > 0) {
+ seq_printf(m, "Errors (%u):\n", spec->error_count);
+ for (i = 0; i < spec->error_count; i++) {
+ struct kapi_error_spec *err = &spec->errors[i];
+ seq_printf(m, " %s (%d): %s\n",
+ err->name, err->error_code, err->description);
+ if (strlen(err->condition) > 0)
+ seq_printf(m, " condition: %s\n", err->condition);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Locks */
+ if (spec->lock_count > 0) {
+ seq_printf(m, "Locks (%u):\n", spec->lock_count);
+ for (i = 0; i < spec->lock_count; i++) {
+ struct kapi_lock_spec *lock = &spec->locks[i];
+ const char *type_str, *scope_str;
+ switch (lock->lock_type) {
+ case KAPI_LOCK_MUTEX: type_str = "mutex"; break;
+ case KAPI_LOCK_SPINLOCK: type_str = "spinlock"; break;
+ case KAPI_LOCK_RWLOCK: type_str = "rwlock"; break;
+ case KAPI_LOCK_SEMAPHORE: type_str = "semaphore"; break;
+ case KAPI_LOCK_RCU: type_str = "rcu"; break;
+ case KAPI_LOCK_SEQLOCK: type_str = "seqlock"; break;
+ default: type_str = "unknown"; break;
+ }
+ switch (lock->scope) {
+ case KAPI_LOCK_INTERNAL: scope_str = "acquired and released"; break;
+ case KAPI_LOCK_ACQUIRES: scope_str = "acquired (not released)"; break;
+ case KAPI_LOCK_RELEASES: scope_str = "released (held on entry)"; break;
+ case KAPI_LOCK_CALLER_HELD: scope_str = "held by caller"; break;
+ default: scope_str = "unknown"; break;
+ }
+ seq_printf(m, " %s (%s): %s\n",
+ lock->lock_name, type_str, lock->description);
+ seq_printf(m, " scope: %s\n", scope_str);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Constraints */
+ if (spec->constraint_count > 0) {
+ seq_printf(m, "Additional constraints (%u):\n", spec->constraint_count);
+ for (i = 0; i < spec->constraint_count; i++) {
+ struct kapi_constraint_spec *cons = &spec->constraints[i];
+
+ seq_printf(m, " - %s", cons->name);
+ if (cons->description[0])
+ seq_printf(m, ": %s", cons->description);
+ seq_printf(m, "\n");
+ if (cons->expression[0])
+ seq_printf(m, " expression: %s\n", cons->expression);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Signals */
+ if (spec->signal_count > 0) {
+ seq_printf(m, "Signal handling (%u):\n", spec->signal_count);
+ for (i = 0; i < spec->signal_count; i++) {
+ struct kapi_signal_spec *sig = &spec->signals[i];
+ seq_printf(m, " %s (%d):\n", sig->signal_name, sig->signal_num);
+ seq_printf(m, " direction: ");
+ if (sig->direction & KAPI_SIGNAL_SEND) seq_printf(m, "send ");
+ if (sig->direction & KAPI_SIGNAL_RECEIVE) seq_printf(m, "receive ");
+ if (sig->direction & KAPI_SIGNAL_HANDLE) seq_printf(m, "handle ");
+ if (sig->direction & KAPI_SIGNAL_BLOCK) seq_printf(m, "block ");
+ if (sig->direction & KAPI_SIGNAL_IGNORE) seq_printf(m, "ignore ");
+ seq_printf(m, "\n");
+ seq_printf(m, " action: ");
+ switch (sig->action) {
+ case KAPI_SIGNAL_ACTION_DEFAULT: seq_printf(m, "default"); break;
+ case KAPI_SIGNAL_ACTION_TERMINATE: seq_printf(m, "terminate"); break;
+ case KAPI_SIGNAL_ACTION_COREDUMP: seq_printf(m, "coredump"); break;
+ case KAPI_SIGNAL_ACTION_STOP: seq_printf(m, "stop"); break;
+ case KAPI_SIGNAL_ACTION_CONTINUE: seq_printf(m, "continue"); break;
+ case KAPI_SIGNAL_ACTION_CUSTOM: seq_printf(m, "custom"); break;
+ case KAPI_SIGNAL_ACTION_RETURN: seq_printf(m, "return"); break;
+ case KAPI_SIGNAL_ACTION_RESTART: seq_printf(m, "restart"); break;
+ default: seq_printf(m, "unknown"); break;
+ }
+ seq_printf(m, "\n");
+ if (strlen(sig->description) > 0)
+ seq_printf(m, " description: %s\n", sig->description);
+ }
+ seq_printf(m, "\n");
+ }
+
+ /* Additional info */
+ if (strlen(spec->examples) > 0) {
+ seq_printf(m, "Examples:\n%s\n\n", spec->examples);
+ }
+ if (strlen(spec->notes) > 0) {
+ seq_printf(m, "Notes:\n%s\n\n", spec->notes);
+ }
+ if (strlen(spec->since_version) > 0) {
+ seq_printf(m, "Since: %s\n", spec->since_version);
+ }
+
+ return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops = {
+ .open = kapi_spec_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+/* Show all available API specs */
+static int kapi_list_show(struct seq_file *m, void *v)
+{
+ struct kernel_api_spec *spec;
+ int count = 0;
+
+ seq_printf(m, "Available Kernel API Specifications\n");
+ seq_printf(m, "===================================\n\n");
+
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ seq_printf(m, "%s - %s\n", spec->name, spec->description);
+ count++;
+ }
+
+ seq_printf(m, "\nTotal: %d specifications\n", count);
+ return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops = {
+ .open = kapi_list_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+ struct kernel_api_spec *spec;
+ struct dentry *spec_dir;
+
+ /* Create main directory */
+ kapi_debugfs_root = debugfs_create_dir("kapi", NULL);
+
+ /* Create list file */
+ debugfs_create_file("list", 0444, kapi_debugfs_root, NULL, &kapi_list_fops);
+
+ /* Create specs subdirectory */
+ spec_dir = debugfs_create_dir("specs", kapi_debugfs_root);
+
+ /* Create a file for each API spec */
+ for (spec = __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+ debugfs_create_file(spec->name, 0444, spec_dir, spec, &kapi_spec_fops);
+ }
+
+ pr_info("Kernel API debugfs interface initialized\n");
+ return 0;
+}
+
+static void __exit kapi_debugfs_exit(void)
+{
+ debugfs_remove_recursive(kapi_debugfs_root);
+}
+
+/* Initialize as part of kernel, not as a module */
+fs_initcall(kapi_debugfs_init);
\ No newline at end of file
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 04/15] tools/kapi: Add kernel API specification extraction tool
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (2 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 03/15] kernel/api: add debugfs interface for kernel " Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 05/15] kernel/api: add API specification for io_setup Sasha Levin
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
The kapi tool extracts and displays kernel API specifications.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Documentation/dev-tools/kernel-api-spec.rst | 198 +++-
tools/kapi/.gitignore | 4 +
tools/kapi/Cargo.toml | 19 +
| 442 +++++++++
| 692 ++++++++++++++
| 464 ++++++++++
| 213 +++++
| 180 ++++
| 102 +++
| 864 ++++++++++++++++++
tools/kapi/src/formatter/json.rs | 468 ++++++++++
tools/kapi/src/formatter/mod.rs | 140 +++
tools/kapi/src/formatter/plain.rs | 549 +++++++++++
tools/kapi/src/formatter/rst.rs | 621 +++++++++++++
tools/kapi/src/main.rs | 116 +++
15 files changed, 5069 insertions(+), 3 deletions(-)
create mode 100644 tools/kapi/.gitignore
create mode 100644 tools/kapi/Cargo.toml
create mode 100644 tools/kapi/src/extractor/debugfs.rs
create mode 100644 tools/kapi/src/extractor/kerneldoc_parser.rs
create mode 100644 tools/kapi/src/extractor/mod.rs
create mode 100644 tools/kapi/src/extractor/source_parser.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/magic_finder.rs
create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
create mode 100644 tools/kapi/src/formatter/json.rs
create mode 100644 tools/kapi/src/formatter/mod.rs
create mode 100644 tools/kapi/src/formatter/plain.rs
create mode 100644 tools/kapi/src/formatter/rst.rs
create mode 100644 tools/kapi/src/main.rs
diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/dev-tools/kernel-api-spec.rst
index 3a63f6711e27b..9b452753111ad 100644
--- a/Documentation/dev-tools/kernel-api-spec.rst
+++ b/Documentation/dev-tools/kernel-api-spec.rst
@@ -31,7 +31,9 @@ The framework aims to:
common programming errors during development and testing.
3. **Support Tooling**: Export API specifications in machine-readable formats for
- use by static analyzers, documentation generators, and development tools.
+ use by static analyzers, documentation generators, and development tools. The
+ ``kapi`` tool (see `The kapi Tool`_) provides comprehensive extraction and
+ formatting capabilities.
4. **Enhance Debugging**: Provide detailed API information at runtime through debugfs
for debugging and introspection.
@@ -71,6 +73,13 @@ The framework consists of several key components:
- Type-safe parameter specifications
- Context and constraint definitions
+5. **kapi Tool** (``tools/kapi/``)
+
+ - Userspace utility for extracting specifications
+ - Multiple input sources (source, binary, debugfs)
+ - Multiple output formats (plain, JSON, RST)
+ - Testing and validation utilities
+
Data Model
----------
@@ -344,8 +353,177 @@ Documentation Generation
------------------------
The framework exports specifications via debugfs that can be used
-to generate documentation. Tools for automatic documentation generation
-from specifications are planned for future development.
+to generate documentation. The ``kapi`` tool provides comprehensive
+extraction and formatting capabilities for kernel API specifications.
+
+The kapi Tool
+=============
+
+Overview
+--------
+
+The ``kapi`` tool is a userspace utility that extracts and displays kernel API
+specifications from multiple sources. It provides a unified interface to access
+API documentation whether from compiled kernels, source code, or runtime systems.
+
+Installation
+------------
+
+Build the tool from the kernel source tree::
+
+ $ cd tools/kapi
+ $ cargo build --release
+
+ # Optional: Install system-wide
+ $ cargo install --path .
+
+The tool requires Rust and Cargo to build. The binary will be available at
+``tools/kapi/target/release/kapi``.
+
+Command-Line Usage
+------------------
+
+Basic syntax::
+
+ kapi [OPTIONS] [API_NAME]
+
+Options:
+
+- ``--vmlinux <PATH>``: Extract from compiled kernel binary
+- ``--source <PATH>``: Extract from kernel source code
+- ``--debugfs <PATH>``: Extract from debugfs (default: /sys/kernel/debug)
+- ``-f, --format <FORMAT>``: Output format (plain, json, rst)
+- ``-h, --help``: Display help information
+- ``-V, --version``: Display version information
+
+Input Modes
+-----------
+
+**1. Source Code Mode**
+
+Extract specifications directly from kernel source::
+
+ # Scan entire kernel source tree
+ $ kapi --source /path/to/linux
+
+ # Extract from specific file
+ $ kapi --source kernel/sched/core.c
+
+ # Get details for specific API
+ $ kapi --source /path/to/linux sys_sched_yield
+
+**2. Vmlinux Mode**
+
+Extract from compiled kernel with debug symbols::
+
+ # List all APIs in vmlinux
+ $ kapi --vmlinux /boot/vmlinux-5.15.0
+
+ # Get specific syscall details
+ $ kapi --vmlinux ./vmlinux sys_read
+
+**3. Debugfs Mode**
+
+Extract from running kernel via debugfs::
+
+ # Use default debugfs path
+ $ kapi
+
+ # Use custom debugfs mount
+ $ kapi --debugfs /mnt/debugfs
+
+ # Get specific API from running kernel
+ $ kapi sys_write
+
+Output Formats
+--------------
+
+**Plain Text Format** (default)::
+
+ $ kapi sys_read
+
+ Detailed information for sys_read:
+ ==================================
+ Description: Read from a file descriptor
+
+ Detailed Description:
+ Reads up to count bytes from file descriptor fd into the buffer starting at buf.
+
+ Execution Context:
+ - KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+
+ Parameters (3):
+
+ Available since: 1.0
+
+**JSON Format**::
+
+ $ kapi --format json sys_read
+ {
+ "api_details": {
+ "name": "sys_read",
+ "description": "Read from a file descriptor",
+ "long_description": "Reads up to count bytes...",
+ "context_flags": ["KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE"],
+ "since_version": "1.0"
+ }
+ }
+
+**ReStructuredText Format**::
+
+ $ kapi --format rst sys_read
+
+ sys_read
+ ========
+
+ **Read from a file descriptor**
+
+ Reads up to count bytes from file descriptor fd into the buffer...
+
+Usage Examples
+--------------
+
+**Generate complete API documentation**::
+
+ # Export all kernel APIs to JSON
+ $ kapi --source /path/to/linux --format json > kernel-apis.json
+
+ # Generate RST documentation for all syscalls
+ $ kapi --vmlinux ./vmlinux --format rst > syscalls.rst
+
+ # List APIs from specific subsystem
+ $ kapi --source drivers/gpu/drm/
+
+**Integration with other tools**::
+
+ # Find all APIs that can sleep
+ $ kapi --format json | jq '.apis[] | select(.context_flags[] | contains("SLEEPABLE"))'
+
+ # Generate markdown documentation
+ $ kapi --format rst sys_mmap | pandoc -f rst -t markdown
+
+**Debugging and analysis**::
+
+ # Compare API between kernel versions
+ $ diff <(kapi --vmlinux vmlinux-5.10) <(kapi --vmlinux vmlinux-5.15)
+
+ # Check if specific API exists
+ $ kapi --source . my_custom_api || echo "API not found"
+
+Implementation Details
+----------------------
+
+The tool extracts API specifications from three sources:
+
+1. **Source Code**: Parses KAPI specification macros using regular expressions
+2. **Vmlinux**: Reads the ``.kapi_specs`` ELF section from compiled kernels
+3. **Debugfs**: Reads from ``/sys/kernel/debug/kapi/`` filesystem interface
+
+The tool supports all KAPI specification types:
+
+- System calls (``DEFINE_KERNEL_API_SPEC``)
+- IOCTLs (``DEFINE_IOCTL_API_SPEC``)
+- Kernel functions (``KAPI_DEFINE_SPEC``)
IDE Integration
---------------
@@ -357,6 +535,11 @@ Modern IDEs can use the JSON export for:
- Context validation
- Error code documentation
+Example IDE integration::
+
+ # Generate IDE completion data
+ $ kapi --format json > .vscode/kernel-apis.json
+
Testing Framework
-----------------
@@ -367,6 +550,15 @@ The framework includes test helpers::
kapi_test_api("kmalloc", test_cases);
#endif
+The kapi tool can verify specifications against implementations::
+
+ # Run consistency tests
+ $ cd tools/kapi
+ $ ./test_consistency.sh
+
+ # Compare source vs binary specifications
+ $ ./compare_all_syscalls.sh
+
Best Practices
==============
diff --git a/tools/kapi/.gitignore b/tools/kapi/.gitignore
new file mode 100644
index 0000000000000..1390bfc12686c
--- /dev/null
+++ b/tools/kapi/.gitignore
@@ -0,0 +1,4 @@
+# Rust build artifacts
+/target/
+**/*.rs.bk
+
diff --git a/tools/kapi/Cargo.toml b/tools/kapi/Cargo.toml
new file mode 100644
index 0000000000000..4e6bcb10d132f
--- /dev/null
+++ b/tools/kapi/Cargo.toml
@@ -0,0 +1,19 @@
+[package]
+name = "kapi"
+version = "0.1.0"
+edition = "2024"
+authors = ["Sasha Levin <sashal@kernel.org>"]
+description = "Tool for extracting and displaying kernel API specifications"
+license = "GPL-2.0"
+
+[dependencies]
+goblin = "0.10"
+clap = { version = "4.4", features = ["derive"] }
+anyhow = "1.0"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+regex = "1.10"
+walkdir = "2.4"
+
+[dev-dependencies]
+tempfile = "3.8"
--git a/tools/kapi/src/extractor/debugfs.rs b/tools/kapi/src/extractor/debugfs.rs
new file mode 100644
index 0000000000000..698c51e50438f
--- /dev/null
+++ b/tools/kapi/src/extractor/debugfs.rs
@@ -0,0 +1,442 @@
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result, bail};
+use serde::Deserialize;
+use std::fs;
+use std::io::Write;
+use std::path::PathBuf;
+
+use super::{ApiExtractor, ApiSpec, CapabilitySpec, display_api_spec};
+
+#[derive(Deserialize)]
+struct KernelApiJson {
+ name: String,
+ api_type: Option<String>,
+ version: Option<u32>,
+ description: Option<String>,
+ long_description: Option<String>,
+ context_flags: Option<u32>,
+ since_version: Option<String>,
+ examples: Option<String>,
+ notes: Option<String>,
+ capabilities: Option<Vec<KernelCapabilityJson>>,
+}
+
+#[derive(Deserialize)]
+struct KernelCapabilityJson {
+ capability: i32,
+ name: String,
+ action: String,
+ allows: String,
+ without_cap: String,
+ check_condition: Option<String>,
+ priority: Option<u8>,
+ alternatives: Option<Vec<i32>>,
+}
+
+/// Extractor for kernel API specifications from debugfs
+pub struct DebugfsExtractor {
+ debugfs_path: PathBuf,
+}
+
+impl DebugfsExtractor {
+ /// Create a new debugfs extractor with the specified debugfs path
+ pub fn new(debugfs_path: Option<String>) -> Result<Self> {
+ let path = match debugfs_path {
+ Some(p) => PathBuf::from(p),
+ None => PathBuf::from("/sys/kernel/debug"),
+ };
+
+ // Check if the debugfs path exists
+ if !path.exists() {
+ bail!("Debugfs path does not exist: {}", path.display());
+ }
+
+ // Check if kapi directory exists
+ let kapi_path = path.join("kapi");
+ if !kapi_path.exists() {
+ bail!(
+ "Kernel API debugfs interface not found at: {}",
+ kapi_path.display()
+ );
+ }
+
+ Ok(Self { debugfs_path: path })
+ }
+
+ /// Parse the list file to get all available API names
+ fn parse_list_file(&self) -> Result<Vec<String>> {
+ let list_path = self.debugfs_path.join("kapi/list");
+ let content = fs::read_to_string(&list_path)
+ .with_context(|| format!("Failed to read {}", list_path.display()))?;
+
+ let mut apis = Vec::new();
+ let mut in_list = false;
+
+ for line in content.lines() {
+ if line.contains("===") {
+ in_list = true;
+ continue;
+ }
+
+ if in_list && line.starts_with("Total:") {
+ break;
+ }
+
+ if in_list && !line.trim().is_empty() {
+ // Extract API name from lines like "sys_read - Read from a file descriptor"
+ if let Some(name) = line.split(" - ").next() {
+ apis.push(name.trim().to_string());
+ }
+ }
+ }
+
+ Ok(apis)
+ }
+
+ /// Try to parse JSON content, convert context flags from u32 to string representations
+ fn parse_context_flags(flags: u32) -> Vec<String> {
+ let mut result = Vec::new();
+
+ // These values should match KAPI_CTX_* flags from kernel
+ if flags & (1 << 0) != 0 {
+ result.push("PROCESS".to_string());
+ }
+ if flags & (1 << 1) != 0 {
+ result.push("SOFTIRQ".to_string());
+ }
+ if flags & (1 << 2) != 0 {
+ result.push("HARDIRQ".to_string());
+ }
+ if flags & (1 << 3) != 0 {
+ result.push("NMI".to_string());
+ }
+ if flags & (1 << 4) != 0 {
+ result.push("ATOMIC".to_string());
+ }
+ if flags & (1 << 5) != 0 {
+ result.push("SLEEPABLE".to_string());
+ }
+ if flags & (1 << 6) != 0 {
+ result.push("PREEMPT_DISABLED".to_string());
+ }
+ if flags & (1 << 7) != 0 {
+ result.push("IRQ_DISABLED".to_string());
+ }
+
+ result
+ }
+
+ /// Convert capability action from kernel representation
+ fn parse_capability_action(action: &str) -> String {
+ match action {
+ "bypass_check" => "Bypasses check".to_string(),
+ "increase_limit" => "Increases limit".to_string(),
+ "override_restriction" => "Overrides restriction".to_string(),
+ "grant_permission" => "Grants permission".to_string(),
+ "modify_behavior" => "Modifies behavior".to_string(),
+ "access_resource" => "Allows resource access".to_string(),
+ "perform_operation" => "Allows operation".to_string(),
+ _ => action.to_string(),
+ }
+ }
+
+ /// Try to parse as JSON first
+ fn try_parse_json(&self, content: &str) -> Option<ApiSpec> {
+ let json_data: KernelApiJson = serde_json::from_str(content).ok()?;
+
+ let mut spec = ApiSpec {
+ name: json_data.name,
+ api_type: json_data.api_type.unwrap_or_else(|| "unknown".to_string()),
+ description: json_data.description,
+ long_description: json_data.long_description,
+ version: json_data.version.map(|v| v.to_string()),
+ context_flags: json_data
+ .context_flags
+ .map_or_else(Vec::new, Self::parse_context_flags),
+ param_count: None,
+ error_count: None,
+ examples: json_data.examples,
+ notes: json_data.notes,
+ since_version: json_data.since_version,
+ subsystem: None, // Not in current JSON format
+ sysfs_path: None, // Not in current JSON format
+ permissions: None, // Not in current JSON format
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ struct_specs: vec![],
+ };
+
+ // Convert capabilities
+ if let Some(caps) = json_data.capabilities {
+ for cap in caps {
+ spec.capabilities.push(CapabilitySpec {
+ capability: cap.capability,
+ name: cap.name,
+ action: Self::parse_capability_action(&cap.action),
+ allows: cap.allows,
+ without_cap: cap.without_cap,
+ check_condition: cap.check_condition,
+ priority: cap.priority,
+ alternatives: cap.alternatives.unwrap_or_default(),
+ });
+ }
+ }
+
+ Some(spec)
+ }
+
+ /// Parse a single API specification file
+ fn parse_spec_file(&self, api_name: &str) -> Result<ApiSpec> {
+ let spec_path = self.debugfs_path.join(format!("kapi/specs/{}", api_name));
+ let content = fs::read_to_string(&spec_path)
+ .with_context(|| format!("Failed to read {}", spec_path.display()))?;
+
+ // Try JSON parsing first
+ if let Some(spec) = self.try_parse_json(&content) {
+ return Ok(spec);
+ }
+
+ // Fall back to plain text parsing
+ let mut spec = ApiSpec {
+ name: api_name.to_string(),
+ api_type: "unknown".to_string(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: Vec::new(),
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ struct_specs: vec![],
+ };
+
+ // Parse the content
+ let mut collecting_multiline = false;
+ let mut multiline_buffer = String::new();
+ let mut multiline_field = "";
+ let mut parsing_capability = false;
+ let mut current_capability: Option<CapabilitySpec> = None;
+
+ for line in content.lines() {
+ // Handle capability sections
+ if line.starts_with("Capabilities (") {
+ continue; // Skip the header
+ }
+ if line.starts_with(" ") && line.contains(" (") && line.ends_with("):") {
+ // Start of a capability entry like " CAP_IPC_LOCK (14):"
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+
+ let parts: Vec<&str> = line.trim().split(" (").collect();
+ if parts.len() == 2 {
+ let cap_name = parts[0].to_string();
+ let cap_id = parts[1].trim_end_matches("):").parse().unwrap_or(0);
+ current_capability = Some(CapabilitySpec {
+ capability: cap_id,
+ name: cap_name,
+ action: String::new(),
+ allows: String::new(),
+ without_cap: String::new(),
+ check_condition: None,
+ priority: None,
+ alternatives: Vec::new(),
+ });
+ parsing_capability = true;
+ }
+ continue;
+ }
+ if parsing_capability && line.starts_with(" ") {
+ // Parse capability fields
+ if let Some(ref mut cap) = current_capability {
+ if let Some(action) = line.strip_prefix(" Action: ") {
+ cap.action = action.to_string();
+ } else if let Some(allows) = line.strip_prefix(" Allows: ") {
+ cap.allows = allows.to_string();
+ } else if let Some(without) = line.strip_prefix(" Without: ") {
+ cap.without_cap = without.to_string();
+ } else if let Some(cond) = line.strip_prefix(" Condition: ") {
+ cap.check_condition = Some(cond.to_string());
+ } else if let Some(prio) = line.strip_prefix(" Priority: ") {
+ cap.priority = prio.parse().ok();
+ } else if let Some(alts) = line.strip_prefix(" Alternatives: ") {
+ cap.alternatives =
+ alts.split(", ").filter_map(|s| s.parse().ok()).collect();
+ }
+ }
+ continue;
+ }
+ if parsing_capability && !line.starts_with(" ") {
+ // End of capabilities section
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+ parsing_capability = false;
+ }
+
+ // Handle section headers
+ if line.starts_with("Parameters (") {
+ if let Some(count_str) = line
+ .strip_prefix("Parameters (")
+ .and_then(|s| s.strip_suffix("):"))
+ {
+ spec.param_count = count_str.parse().ok();
+ }
+ continue;
+ } else if line.starts_with("Errors (") {
+ if let Some(count_str) = line
+ .strip_prefix("Errors (")
+ .and_then(|s| s.strip_suffix("):"))
+ {
+ spec.error_count = count_str.parse().ok();
+ }
+ continue;
+ } else if line.starts_with("Examples:") {
+ collecting_multiline = true;
+ multiline_field = "examples";
+ multiline_buffer.clear();
+ continue;
+ } else if line.starts_with("Notes:") {
+ collecting_multiline = true;
+ multiline_field = "notes";
+ multiline_buffer.clear();
+ continue;
+ }
+
+ // Handle multiline sections
+ if collecting_multiline {
+ if line.trim().is_empty() && multiline_buffer.ends_with("\n\n") {
+ collecting_multiline = false;
+ match multiline_field {
+ "examples" => spec.examples = Some(multiline_buffer.trim().to_string()),
+ "notes" => spec.notes = Some(multiline_buffer.trim().to_string()),
+ _ => {}
+ }
+ multiline_buffer.clear();
+ } else {
+ if !multiline_buffer.is_empty() {
+ multiline_buffer.push('\n');
+ }
+ multiline_buffer.push_str(line);
+ }
+ continue;
+ }
+
+ // Parse regular fields
+ if let Some(desc) = line.strip_prefix("Description: ") {
+ spec.description = Some(desc.to_string());
+ } else if let Some(long_desc) = line.strip_prefix("Long description: ") {
+ spec.long_description = Some(long_desc.to_string());
+ } else if let Some(version) = line.strip_prefix("Version: ") {
+ spec.version = Some(version.to_string());
+ } else if let Some(since) = line.strip_prefix("Since: ") {
+ spec.since_version = Some(since.to_string());
+ } else if let Some(flags) = line.strip_prefix("Context flags: ") {
+ spec.context_flags = flags.split_whitespace().map(str::to_string).collect();
+ } else if let Some(subsys) = line.strip_prefix("Subsystem: ") {
+ spec.subsystem = Some(subsys.to_string());
+ } else if let Some(path) = line.strip_prefix("Sysfs Path: ") {
+ spec.sysfs_path = Some(path.to_string());
+ } else if let Some(perms) = line.strip_prefix("Permissions: ") {
+ spec.permissions = Some(perms.to_string());
+ }
+ }
+
+ // Handle any remaining capability
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+
+ // Determine API type based on name
+ if api_name.starts_with("sys_") {
+ spec.api_type = "syscall".to_string();
+ } else if api_name.contains("_ioctl") || api_name.starts_with("ioctl_") {
+ spec.api_type = "ioctl".to_string();
+ } else if api_name.contains("sysfs")
+ || api_name.ends_with("_show")
+ || api_name.ends_with("_store")
+ {
+ spec.api_type = "sysfs".to_string();
+ } else {
+ spec.api_type = "function".to_string();
+ }
+
+ Ok(spec)
+ }
+}
+
+impl ApiExtractor for DebugfsExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ let api_names = self.parse_list_file()?;
+ let mut specs = Vec::new();
+
+ for name in api_names {
+ match self.parse_spec_file(&name) {
+ Ok(spec) => specs.push(spec),
+ Err(_e) => {} // Silently skip files that fail to parse
+ }
+ }
+
+ Ok(specs)
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ let api_names = self.parse_list_file()?;
+
+ if api_names.contains(&name.to_string()) {
+ Ok(Some(self.parse_spec_file(name)?))
+ } else {
+ Ok(None)
+ }
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.extract_by_name(api_name)? {
+ display_api_spec(&spec, formatter, writer)?;
+ } else {
+ writeln!(writer, "API '{api_name}' not found in debugfs")?;
+ }
+
+ Ok(())
+ }
+}
--git a/tools/kapi/src/extractor/kerneldoc_parser.rs b/tools/kapi/src/extractor/kerneldoc_parser.rs
new file mode 100644
index 0000000000000..1c6924a0b5291
--- /dev/null
+++ b/tools/kapi/src/extractor/kerneldoc_parser.rs
@@ -0,0 +1,692 @@
+use super::{
+ ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec, ParamSpec,
+ ReturnSpec, SideEffectSpec, SignalSpec, StateTransitionSpec, StructSpec,
+ StructFieldSpec,
+};
+use anyhow::Result;
+use std::collections::HashMap;
+
+/// Real kerneldoc parser that extracts KAPI annotations
+pub struct KerneldocParserImpl;
+
+impl KerneldocParserImpl {
+ pub fn new() -> Self {
+ KerneldocParserImpl
+ }
+
+ pub fn parse_kerneldoc(
+ &self,
+ doc: &str,
+ name: &str,
+ api_type: &str,
+ _signature: Option<&str>,
+ ) -> Result<ApiSpec> {
+ let mut spec = ApiSpec {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: vec![],
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ struct_specs: vec![],
+ };
+
+ // Parse line by line
+ let lines: Vec<&str> = doc.lines().collect();
+ let mut i = 0;
+
+ // Extract main description from function name line
+ if let Some(first_line) = lines.first() {
+ if let Some((_, desc)) = first_line.split_once(" - ") {
+ spec.description = Some(desc.trim().to_string());
+ }
+ }
+
+ // Keep track of parameters we've seen
+ let mut param_map: HashMap<String, ParamSpec> = HashMap::new();
+ let mut struct_fields: Vec<StructFieldSpec> = Vec::new();
+ let mut current_lock: Option<LockSpec> = None;
+ let mut current_signal: Option<SignalSpec> = None;
+ let mut current_capability: Option<CapabilitySpec> = None;
+
+ while i < lines.len() {
+ let line = lines[i].trim();
+
+ // Skip empty lines
+ if line.is_empty() {
+ i += 1;
+ continue;
+ }
+
+ // Parse @param lines
+ if let Some(rest) = line.strip_prefix("@") {
+ if let Some((param_name, desc)) = rest.split_once(':') {
+ let param_name = param_name.trim();
+ let desc = desc.trim();
+ if !param_name.contains('-') {
+ // This is a basic parameter description - add to map
+ param_map.insert(param_name.to_string(), ParamSpec {
+ index: param_map.len() as u32,
+ name: param_name.to_string(),
+ type_name: String::new(),
+ description: desc.to_string(),
+ flags: 0,
+ param_type: 0,
+ constraint_type: 0,
+ constraint: None,
+ min_value: None,
+ max_value: None,
+ valid_mask: None,
+ enum_values: vec![],
+ size: None,
+ alignment: None,
+ });
+ }
+ }
+ }
+ // Parse long-desc
+ else if let Some(rest) = line.strip_prefix("long-desc:") {
+ spec.long_description = Some(self.collect_multiline_value(&lines, i, rest));
+ }
+ // Parse context-flags
+ else if let Some(rest) = line.strip_prefix("context-flags:") {
+ spec.context_flags = self.parse_context_flags(rest.trim());
+ }
+ // Parse param-count
+ else if let Some(rest) = line.strip_prefix("param-count:") {
+ spec.param_count = rest.trim().parse().ok();
+ }
+ // Parse param-type
+ else if let Some(rest) = line.strip_prefix("param-type:") {
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ if let Some(param) = param_map.get_mut(parts[0]) {
+ param.param_type = self.parse_param_type(parts[1]);
+ }
+ }
+ }
+ // Parse param-flags
+ else if let Some(rest) = line.strip_prefix("param-flags:") {
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ if let Some(param) = param_map.get_mut(parts[0]) {
+ param.flags = self.parse_param_flags(parts[1]);
+ }
+ }
+ }
+ // Parse param-range
+ else if let Some(rest) = line.strip_prefix("param-range:") {
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 3 {
+ if let Some(param) = param_map.get_mut(parts[0]) {
+ param.min_value = parts[1].parse().ok();
+ param.max_value = parts[2].parse().ok();
+ param.constraint_type = 1; // KAPI_CONSTRAINT_RANGE
+ }
+ }
+ }
+ // Parse param-constraint
+ else if let Some(rest) = line.strip_prefix("param-constraint:") {
+ let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ if let Some(param) = param_map.get_mut(parts[0]) {
+ param.constraint = Some(parts[1].to_string());
+ }
+ }
+ }
+ // Parse error
+ else if let Some(rest) = line.strip_prefix("error:") {
+ // Parse error in format: "ERROR_CODE, description"
+ let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ let error_name = parts[0].to_string();
+ let description = parts[1].to_string();
+
+ // Look for desc: line on the next line
+ let mut full_description = description;
+ if i + 1 < lines.len() {
+ if let Some(desc_line) = lines[i + 1].strip_prefix("* desc:") {
+ full_description = desc_line.trim().to_string();
+ } else if let Some(desc_line) = lines[i + 1].strip_prefix("* desc:") {
+ full_description = desc_line.trim().to_string();
+ }
+ }
+
+ // Map common error names to codes
+ let error_code = match error_name.as_str() {
+ "E2BIG" => -7,
+ "EACCES" => -13,
+ "EAGAIN" => -11,
+ "EBADF" => -9,
+ "EBUSY" => -16,
+ "EFAULT" => -14,
+ "EINTR" => -4,
+ "EINVAL" => -22,
+ "EIO" => -5,
+ "EISDIR" => -21,
+ "ELIBBAD" => -80,
+ "ELOOP" => -40,
+ "EMFILE" => -24,
+ "ENAMETOOLONG" => -36,
+ "ENFILE" => -23,
+ "ENOENT" => -2,
+ "ENOEXEC" => -8,
+ "ENOMEM" => -12,
+ "ENOTDIR" => -20,
+ "EOPNOTSUPP" => -95,
+ "EPERM" => -1,
+ "ESRCH" => -3,
+ "ETXTBSY" => -26,
+ _ => 0,
+ };
+
+ spec.errors.push(ErrorSpec {
+ error_code,
+ name: error_name,
+ condition: String::new(),
+ description: full_description,
+ });
+ }
+ }
+ // Parse lock
+ else if let Some(rest) = line.strip_prefix("lock:") {
+ // Save previous lock if any
+ if let Some(lock) = current_lock.take() {
+ spec.locks.push(lock);
+ }
+
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ current_lock = Some(LockSpec {
+ lock_name: parts[0].to_string(),
+ lock_type: self.parse_lock_type(parts[1]),
+ scope: super::KAPI_LOCK_INTERNAL, // default: acquired and released
+ description: String::new(),
+ });
+ }
+ }
+ // Parse lock scope
+ else if let Some(rest) = line.strip_prefix("lock-scope:") {
+ if let Some(lock) = current_lock.as_mut() {
+ lock.scope = match rest.trim() {
+ "internal" => super::KAPI_LOCK_INTERNAL,
+ "acquires" => super::KAPI_LOCK_ACQUIRES,
+ "releases" => super::KAPI_LOCK_RELEASES,
+ "caller_held" => super::KAPI_LOCK_CALLER_HELD,
+ _ => super::KAPI_LOCK_INTERNAL,
+ };
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("lock-desc:") {
+ if let Some(lock) = current_lock.as_mut() {
+ lock.description = self.collect_multiline_value(&lines, i, rest);
+ }
+ }
+ // Parse signal
+ else if let Some(rest) = line.strip_prefix("signal:") {
+ // Save previous signal if any
+ if let Some(signal) = current_signal.take() {
+ spec.signals.push(signal);
+ }
+
+ let signal_name = rest.trim().to_string();
+ current_signal = Some(SignalSpec {
+ signal_num: 0,
+ signal_name,
+ direction: 1,
+ action: 0,
+ target: None,
+ condition: None,
+ description: None,
+ restartable: false,
+ timing: 0,
+ priority: 0,
+ interruptible: false,
+ queue: None,
+ sa_flags: 0,
+ sa_flags_required: 0,
+ sa_flags_forbidden: 0,
+ state_required: 0,
+ state_forbidden: 0,
+ error_on_signal: None,
+ });
+ }
+ // Parse signal attributes
+ else if let Some(rest) = line.strip_prefix("signal-direction:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.direction = self.parse_signal_direction(rest.trim());
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("signal-action:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.action = self.parse_signal_action(rest.trim());
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("signal-condition:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.condition = Some(self.collect_multiline_value(&lines, i, rest));
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("signal-desc:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.description = Some(self.collect_multiline_value(&lines, i, rest));
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("signal-timing:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.timing = self.parse_signal_timing(rest.trim());
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("signal-priority:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.priority = rest.trim().parse().unwrap_or(0);
+ }
+ }
+ else if line.strip_prefix("signal-interruptible:").is_some() {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.interruptible = true;
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("signal-state-req:") {
+ if let Some(signal) = current_signal.as_mut() {
+ signal.state_required = self.parse_signal_state(rest.trim());
+ }
+ }
+ // Parse side-effect
+ else if let Some(rest) = line.strip_prefix("side-effect:") {
+ let full_effect = self.collect_multiline_value(&lines, i, rest);
+ let parts: Vec<&str> = full_effect.splitn(3, ',').map(|s| s.trim()).collect();
+ if parts.len() >= 3 {
+ let mut effect = SideEffectSpec {
+ effect_type: self.parse_effect_type(parts[0]),
+ target: parts[1].to_string(),
+ condition: None,
+ description: parts[2].to_string(),
+ reversible: false,
+ };
+
+ // Check for additional attributes
+ if let Some(pos) = parts[2].find("condition=") {
+ let cond_str = &parts[2][pos + 10..];
+ if let Some(end) = cond_str.find(',') {
+ effect.condition = Some(cond_str[..end].to_string());
+ } else {
+ effect.condition = Some(cond_str.to_string());
+ }
+ }
+
+ if parts[2].contains("reversible=yes") {
+ effect.reversible = true;
+ }
+
+ spec.side_effects.push(effect);
+ }
+ }
+ // Parse state-trans
+ else if let Some(rest) = line.strip_prefix("state-trans:") {
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 4 {
+ spec.state_transitions.push(StateTransitionSpec {
+ object: parts[0].to_string(),
+ from_state: parts[1].to_string(),
+ to_state: parts[2].to_string(),
+ condition: None,
+ description: parts[3].to_string(),
+ });
+ }
+ }
+ // Parse capability
+ else if let Some(rest) = line.strip_prefix("capability:") {
+ // Save previous capability if any
+ if let Some(cap) = current_capability.take() {
+ spec.capabilities.push(cap);
+ }
+
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 3 {
+ current_capability = Some(CapabilitySpec {
+ capability: self.parse_capability_value(parts[0]),
+ action: parts[1].to_string(),
+ name: parts[2].to_string(),
+ allows: String::new(),
+ without_cap: String::new(),
+ check_condition: None,
+ priority: Some(0),
+ alternatives: vec![],
+ });
+ }
+ }
+ // Parse capability attributes
+ else if let Some(rest) = line.strip_prefix("capability-allows:") {
+ if let Some(cap) = current_capability.as_mut() {
+ cap.allows = self.collect_multiline_value(&lines, i, rest);
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("capability-without:") {
+ if let Some(cap) = current_capability.as_mut() {
+ cap.without_cap = self.collect_multiline_value(&lines, i, rest);
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("capability-condition:") {
+ if let Some(cap) = current_capability.as_mut() {
+ cap.check_condition = Some(self.collect_multiline_value(&lines, i, rest));
+ }
+ }
+ else if let Some(rest) = line.strip_prefix("capability-priority:") {
+ if let Some(cap) = current_capability.as_mut() {
+ cap.priority = rest.trim().parse().ok();
+ }
+ }
+ // Parse constraint
+ else if let Some(rest) = line.strip_prefix("constraint:") {
+ let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ spec.constraints.push(ConstraintSpec {
+ name: parts[0].to_string(),
+ description: parts[1].to_string(),
+ expression: None,
+ });
+ }
+ }
+ // Parse constraint-expr
+ else if let Some(rest) = line.strip_prefix("constraint-expr:") {
+ let parts: Vec<&str> = rest.splitn(2, ',').map(|s| s.trim()).collect();
+ if parts.len() >= 2 {
+ // Find matching constraint and update it
+ if let Some(constraint) = spec.constraints.iter_mut().find(|c| c.name == parts[0]) {
+ constraint.expression = Some(parts[1].to_string());
+ }
+ }
+ }
+ // Parse struct-field
+ else if let Some(rest) = line.strip_prefix("struct-field:") {
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 3 {
+ struct_fields.push(StructFieldSpec {
+ name: parts[0].to_string(),
+ field_type: self.parse_field_type(parts[1]),
+ type_name: parts[1].to_string(),
+ offset: 0,
+ size: 0,
+ flags: 0,
+ constraint_type: 0,
+ min_value: 0,
+ max_value: 0,
+ valid_mask: 0,
+ description: parts[2].to_string(),
+ });
+ }
+ }
+ // Parse struct-field-range
+ else if let Some(rest) = line.strip_prefix("struct-field-range:") {
+ let parts: Vec<&str> = rest.split(',').map(|s| s.trim()).collect();
+ if parts.len() >= 3 {
+ // Update the field with range
+ if let Some(field) = struct_fields.iter_mut().find(|f| f.name == parts[0]) {
+ field.min_value = parts[1].parse().unwrap_or(0);
+ field.max_value = parts[2].parse().unwrap_or(0);
+ field.constraint_type = 1; // KAPI_CONSTRAINT_RANGE
+ }
+ }
+ }
+ // Parse examples
+ else if let Some(rest) = line.strip_prefix("examples:") {
+ spec.examples = Some(self.collect_multiline_value(&lines, i, rest));
+ }
+ // Parse notes
+ else if let Some(rest) = line.strip_prefix("notes:") {
+ spec.notes = Some(self.collect_multiline_value(&lines, i, rest));
+ }
+ // Parse since-version
+ else if let Some(rest) = line.strip_prefix("since-version:") {
+ spec.since_version = Some(rest.trim().to_string());
+ }
+ // Parse return-type
+ else if let Some(rest) = line.strip_prefix("return-type:") {
+ if spec.return_spec.is_none() {
+ spec.return_spec = Some(ReturnSpec {
+ type_name: rest.trim().to_string(),
+ description: String::new(),
+ return_type: self.parse_param_type(rest.trim()),
+ check_type: 0,
+ success_value: None,
+ success_min: None,
+ success_max: None,
+ error_values: vec![],
+ });
+ }
+ }
+ // Parse return-check-type
+ else if let Some(rest) = line.strip_prefix("return-check-type:") {
+ if let Some(ret) = spec.return_spec.as_mut() {
+ ret.check_type = self.parse_return_check_type(rest.trim());
+ }
+ }
+ // Parse return-success
+ else if let Some(rest) = line.strip_prefix("return-success:") {
+ if let Some(ret) = spec.return_spec.as_mut() {
+ ret.success_value = rest.trim().parse().ok();
+ }
+ }
+
+ i += 1;
+ }
+
+ // Save any remaining items
+ if let Some(lock) = current_lock {
+ spec.locks.push(lock);
+ }
+ if let Some(signal) = current_signal {
+ spec.signals.push(signal);
+ }
+ if let Some(cap) = current_capability {
+ spec.capabilities.push(cap);
+ }
+
+ // Convert param_map to vec preserving order
+ let mut params: Vec<ParamSpec> = param_map.into_values().collect();
+ params.sort_by_key(|p| p.index);
+ spec.parameters = params;
+
+ // Create struct spec if we have fields
+ if !struct_fields.is_empty() {
+ spec.struct_specs.push(StructSpec {
+ name: "struct sched_attr".to_string(),
+ size: 120, // Default for sched_attr
+ alignment: 8,
+ field_count: struct_fields.len() as u32,
+ fields: struct_fields,
+ description: "Structure specification".to_string(),
+ });
+ }
+
+ Ok(spec)
+ }
+
+ fn collect_multiline_value(&self, lines: &[&str], start_idx: usize, first_part: &str) -> String {
+ let mut result = String::from(first_part.trim());
+ let mut i = start_idx + 1;
+
+ // Continue collecting lines until we hit another annotation or end
+ while i < lines.len() {
+ let line = lines[i];
+
+ // Stop if we hit another annotation (contains ':' and starts with valid keyword)
+ if self.is_annotation_line(line) {
+ break;
+ }
+
+ // Add continuation lines
+ if !line.trim().is_empty() && line.starts_with(" ") {
+ if !result.is_empty() {
+ result.push(' ');
+ }
+ result.push_str(line.trim());
+ } else if line.trim().is_empty() {
+ // Empty line might be part of multiline
+ i += 1;
+ continue;
+ } else {
+ // Non-continuation line, stop
+ break;
+ }
+
+ i += 1;
+ }
+
+ result
+ }
+
+ fn is_annotation_line(&self, line: &str) -> bool {
+ let annotations = [
+ "param-", "error-", "lock", "signal", "side-effect:",
+ "state-trans:", "capability", "constraint", "struct-",
+ "return-", "examples:", "notes:", "since-", "context-",
+ "long-desc:"
+ ];
+
+ for ann in &annotations {
+ if line.trim_start().starts_with(ann) {
+ return true;
+ }
+ }
+ false
+ }
+
+ fn parse_context_flags(&self, flags: &str) -> Vec<String> {
+ flags.split('|')
+ .map(|f| f.trim().to_string())
+ .collect()
+ }
+
+ fn parse_param_type(&self, type_str: &str) -> u32 {
+ match type_str {
+ "KAPI_TYPE_INT" => 1,
+ "KAPI_TYPE_UINT" => 2,
+ "KAPI_TYPE_LONG" => 3,
+ "KAPI_TYPE_ULONG" => 4,
+ "KAPI_TYPE_STRING" => 5,
+ "KAPI_TYPE_USER_PTR" => 6,
+ _ => 0,
+ }
+ }
+
+ fn parse_field_type(&self, type_str: &str) -> u32 {
+ match type_str {
+ "__s32" | "int" => 1,
+ "__u32" | "unsigned int" => 2,
+ "__s64" | "long" => 3,
+ "__u64" | "unsigned long" => 4,
+ _ => 0,
+ }
+ }
+
+ fn parse_param_flags(&self, flags: &str) -> u32 {
+ let mut result = 0;
+ for flag in flags.split('|') {
+ match flag.trim() {
+ "KAPI_PARAM_IN" => result |= 1,
+ "KAPI_PARAM_OUT" => result |= 2,
+ "KAPI_PARAM_INOUT" => result |= 3,
+ "KAPI_PARAM_USER" => result |= 4,
+ _ => {}
+ }
+ }
+ result
+ }
+
+ fn parse_lock_type(&self, type_str: &str) -> u32 {
+ match type_str {
+ "KAPI_LOCK_SPINLOCK" => 0,
+ "KAPI_LOCK_MUTEX" => 1,
+ "KAPI_LOCK_RWLOCK" => 2,
+ _ => 3,
+ }
+ }
+
+ fn parse_signal_direction(&self, dir: &str) -> u32 {
+ match dir {
+ "KAPI_SIGNAL_SEND" => 1,
+ "KAPI_SIGNAL_RECEIVE" => 2,
+ _ => 0,
+ }
+ }
+
+ fn parse_signal_action(&self, action: &str) -> u32 {
+ match action {
+ "KAPI_SIGNAL_ACTION_DEFAULT" => 0,
+ "KAPI_SIGNAL_ACTION_IGNORE" => 1,
+ "KAPI_SIGNAL_ACTION_CUSTOM" => 2,
+ _ => 0,
+ }
+ }
+
+ fn parse_signal_timing(&self, timing: &str) -> u32 {
+ match timing {
+ "KAPI_SIGNAL_TIME_BEFORE" => 0,
+ "KAPI_SIGNAL_TIME_DURING" => 1,
+ "KAPI_SIGNAL_TIME_AFTER" => 2,
+ _ => 0,
+ }
+ }
+
+ fn parse_signal_state(&self, state: &str) -> u32 {
+ match state {
+ "KAPI_SIGNAL_STATE_RUNNING" => 1,
+ "KAPI_SIGNAL_STATE_SLEEPING" => 2,
+ _ => 0,
+ }
+ }
+
+ fn parse_effect_type(&self, type_str: &str) -> u32 {
+ let mut result = 0;
+ for flag in type_str.split('|') {
+ match flag.trim() {
+ "KAPI_EFFECT_MODIFY_STATE" => result |= 1,
+ "KAPI_EFFECT_PROCESS_STATE" => result |= 2,
+ "KAPI_EFFECT_SCHEDULE" => result |= 4,
+ _ => {}
+ }
+ }
+ result
+ }
+
+ fn parse_capability_value(&self, cap: &str) -> i32 {
+ match cap {
+ "CAP_SYS_NICE" => 23,
+ _ => 0,
+ }
+ }
+
+ fn parse_return_check_type(&self, check: &str) -> u32 {
+ match check {
+ "KAPI_RETURN_ERROR_CHECK" => 1,
+ "KAPI_RETURN_SUCCESS_CHECK" => 2,
+ _ => 0,
+ }
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/mod.rs b/tools/kapi/src/extractor/mod.rs
new file mode 100644
index 0000000000000..4eeb03b9a4ca3
--- /dev/null
+++ b/tools/kapi/src/extractor/mod.rs
@@ -0,0 +1,464 @@
+use crate::formatter::OutputFormatter;
+use anyhow::Result;
+use std::convert::TryInto;
+use std::io::Write;
+
+pub mod debugfs;
+pub mod kerneldoc_parser;
+pub mod source_parser;
+pub mod vmlinux;
+
+pub use debugfs::DebugfsExtractor;
+pub use source_parser::SourceExtractor;
+pub use vmlinux::VmlinuxExtractor;
+
+/// Socket state specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SocketStateSpec {
+ pub required_states: Vec<String>,
+ pub forbidden_states: Vec<String>,
+ pub resulting_state: Option<String>,
+ pub condition: Option<String>,
+ pub applicable_protocols: Option<String>,
+}
+
+/// Protocol behavior specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ProtocolBehaviorSpec {
+ pub applicable_protocols: String,
+ pub behavior: String,
+ pub protocol_flags: Option<String>,
+ pub flag_description: Option<String>,
+}
+
+/// Address family specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AddrFamilySpec {
+ pub family: i32,
+ pub family_name: String,
+ pub addr_struct_size: usize,
+ pub min_addr_len: usize,
+ pub max_addr_len: usize,
+ pub addr_format: Option<String>,
+ pub supports_wildcard: bool,
+ pub supports_multicast: bool,
+ pub supports_broadcast: bool,
+ pub special_addresses: Option<String>,
+ pub port_range_min: u32,
+ pub port_range_max: u32,
+}
+
+/// Buffer specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct BufferSpec {
+ pub buffer_behaviors: Option<String>,
+ pub min_buffer_size: Option<usize>,
+ pub max_buffer_size: Option<usize>,
+ pub optimal_buffer_size: Option<usize>,
+}
+
+/// Async specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AsyncSpec {
+ pub supported_modes: Option<String>,
+ pub nonblock_errno: Option<i32>,
+}
+
+/// Capability specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct CapabilitySpec {
+ pub capability: i32,
+ pub name: String,
+ pub action: String,
+ pub allows: String,
+ pub without_cap: String,
+ pub check_condition: Option<String>,
+ pub priority: Option<u8>,
+ pub alternatives: Vec<i32>,
+}
+
+/// Parameter specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ParamSpec {
+ pub index: u32,
+ pub name: String,
+ pub type_name: String,
+ pub description: String,
+ pub flags: u32,
+ pub param_type: u32,
+ pub constraint_type: u32,
+ pub constraint: Option<String>,
+ pub min_value: Option<i64>,
+ pub max_value: Option<i64>,
+ pub valid_mask: Option<u64>,
+ pub enum_values: Vec<String>,
+ pub size: Option<u32>,
+ pub alignment: Option<u32>,
+}
+
+/// Return value specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ReturnSpec {
+ pub type_name: String,
+ pub description: String,
+ pub return_type: u32,
+ pub check_type: u32,
+ pub success_value: Option<i64>,
+ pub success_min: Option<i64>,
+ pub success_max: Option<i64>,
+ pub error_values: Vec<i32>,
+}
+
+/// Error specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ErrorSpec {
+ pub error_code: i32,
+ pub name: String,
+ pub condition: String,
+ pub description: String,
+}
+
+/// Signal specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalSpec {
+ pub signal_num: i32,
+ pub signal_name: String,
+ pub direction: u32,
+ pub action: u32,
+ pub target: Option<String>,
+ pub condition: Option<String>,
+ pub description: Option<String>,
+ pub timing: u32,
+ pub priority: u32,
+ pub restartable: bool,
+ pub interruptible: bool,
+ pub queue: Option<String>,
+ pub sa_flags: u32,
+ pub sa_flags_required: u32,
+ pub sa_flags_forbidden: u32,
+ pub state_required: u32,
+ pub state_forbidden: u32,
+ pub error_on_signal: Option<i32>,
+}
+
+/// Signal mask specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalMaskSpec {
+ pub name: String,
+ pub description: String,
+}
+
+/// Side effect specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SideEffectSpec {
+ pub effect_type: u32,
+ pub target: String,
+ pub condition: Option<String>,
+ pub description: String,
+ pub reversible: bool,
+}
+
+/// State transition specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StateTransitionSpec {
+ pub object: String,
+ pub from_state: String,
+ pub to_state: String,
+ pub condition: Option<String>,
+ pub description: String,
+}
+
+/// Constraint specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ConstraintSpec {
+ pub name: String,
+ pub description: String,
+ pub expression: Option<String>,
+}
+
+/// Lock scope enum values matching kernel enum kapi_lock_scope
+pub const KAPI_LOCK_INTERNAL: u32 = 0;
+pub const KAPI_LOCK_ACQUIRES: u32 = 1;
+pub const KAPI_LOCK_RELEASES: u32 = 2;
+pub const KAPI_LOCK_CALLER_HELD: u32 = 3;
+
+/// Lock specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct LockSpec {
+ pub lock_name: String,
+ pub lock_type: u32,
+ pub scope: u32,
+ pub description: String,
+}
+
+/// Struct field specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StructFieldSpec {
+ pub name: String,
+ pub field_type: u32,
+ pub type_name: String,
+ pub offset: usize,
+ pub size: usize,
+ pub flags: u32,
+ pub constraint_type: u32,
+ pub min_value: i64,
+ pub max_value: i64,
+ pub valid_mask: u64,
+ pub description: String,
+}
+
+/// Struct specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StructSpec {
+ pub name: String,
+ pub size: usize,
+ pub alignment: usize,
+ pub field_count: u32,
+ pub fields: Vec<StructFieldSpec>,
+ pub description: String,
+}
+
+/// Common API specification information that all extractors should provide
+#[derive(Debug, Clone)]
+pub struct ApiSpec {
+ pub name: String,
+ pub api_type: String,
+ pub description: Option<String>,
+ pub long_description: Option<String>,
+ pub version: Option<String>,
+ pub context_flags: Vec<String>,
+ pub param_count: Option<u32>,
+ pub error_count: Option<u32>,
+ pub examples: Option<String>,
+ pub notes: Option<String>,
+ pub since_version: Option<String>,
+ // Sysfs-specific fields
+ pub subsystem: Option<String>,
+ pub sysfs_path: Option<String>,
+ pub permissions: Option<String>,
+ // Networking-specific fields
+ pub socket_state: Option<SocketStateSpec>,
+ pub protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+ pub addr_families: Vec<AddrFamilySpec>,
+ pub buffer_spec: Option<BufferSpec>,
+ pub async_spec: Option<AsyncSpec>,
+ pub net_data_transfer: Option<String>,
+ pub capabilities: Vec<CapabilitySpec>,
+ pub parameters: Vec<ParamSpec>,
+ pub return_spec: Option<ReturnSpec>,
+ pub errors: Vec<ErrorSpec>,
+ pub signals: Vec<SignalSpec>,
+ pub signal_masks: Vec<SignalMaskSpec>,
+ pub side_effects: Vec<SideEffectSpec>,
+ pub state_transitions: Vec<StateTransitionSpec>,
+ pub constraints: Vec<ConstraintSpec>,
+ pub locks: Vec<LockSpec>,
+ pub struct_specs: Vec<StructSpec>,
+}
+
+/// Trait for extracting API specifications from different sources
+pub trait ApiExtractor {
+ /// Extract all API specifications from the source
+ fn extract_all(&self) -> Result<Vec<ApiSpec>>;
+
+ /// Extract a specific API specification by name
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>>;
+
+ /// Display detailed information about a specific API
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()>;
+}
+
+/// Helper function to display an ApiSpec using a formatter
+pub fn display_api_spec(
+ spec: &ApiSpec,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+) -> Result<()> {
+ formatter.begin_api_details(writer, &spec.name)?;
+
+ if let Some(desc) = &spec.description {
+ formatter.description(writer, desc)?;
+ }
+
+ if let Some(long_desc) = &spec.long_description {
+ formatter.long_description(writer, long_desc)?;
+ }
+
+ if let Some(version) = &spec.since_version {
+ formatter.since_version(writer, version)?;
+ }
+
+ if !spec.context_flags.is_empty() {
+ formatter.begin_context_flags(writer)?;
+ for flag in &spec.context_flags {
+ formatter.context_flag(writer, flag)?;
+ }
+ formatter.end_context_flags(writer)?;
+ }
+
+ if !spec.parameters.is_empty() {
+ formatter.begin_parameters(writer, spec.parameters.len().try_into().unwrap_or(u32::MAX))?;
+ for param in &spec.parameters {
+ formatter.parameter(writer, param)?;
+ }
+ formatter.end_parameters(writer)?;
+ }
+
+ if let Some(ret) = &spec.return_spec {
+ formatter.return_spec(writer, ret)?;
+ }
+
+ if !spec.errors.is_empty() {
+ formatter.begin_errors(writer, spec.errors.len().try_into().unwrap_or(u32::MAX))?;
+ for error in &spec.errors {
+ formatter.error(writer, error)?;
+ }
+ formatter.end_errors(writer)?;
+ }
+
+ if let Some(notes) = &spec.notes {
+ formatter.notes(writer, notes)?;
+ }
+
+ if let Some(examples) = &spec.examples {
+ formatter.examples(writer, examples)?;
+ }
+
+ // Display sysfs-specific fields
+ if spec.api_type == "sysfs" {
+ if let Some(subsystem) = &spec.subsystem {
+ formatter.sysfs_subsystem(writer, subsystem)?;
+ }
+ if let Some(path) = &spec.sysfs_path {
+ formatter.sysfs_path(writer, path)?;
+ }
+ if let Some(perms) = &spec.permissions {
+ formatter.sysfs_permissions(writer, perms)?;
+ }
+ }
+
+ // Display networking-specific fields
+ if let Some(socket_state) = &spec.socket_state {
+ formatter.socket_state(writer, socket_state)?;
+ }
+
+ if !spec.protocol_behaviors.is_empty() {
+ formatter.begin_protocol_behaviors(writer)?;
+ for behavior in &spec.protocol_behaviors {
+ formatter.protocol_behavior(writer, behavior)?;
+ }
+ formatter.end_protocol_behaviors(writer)?;
+ }
+
+ if !spec.addr_families.is_empty() {
+ formatter.begin_addr_families(writer)?;
+ for family in &spec.addr_families {
+ formatter.addr_family(writer, family)?;
+ }
+ formatter.end_addr_families(writer)?;
+ }
+
+ if let Some(buffer_spec) = &spec.buffer_spec {
+ formatter.buffer_spec(writer, buffer_spec)?;
+ }
+
+ if let Some(async_spec) = &spec.async_spec {
+ formatter.async_spec(writer, async_spec)?;
+ }
+
+ if let Some(net_data_transfer) = &spec.net_data_transfer {
+ formatter.net_data_transfer(writer, net_data_transfer)?;
+ }
+
+ if !spec.capabilities.is_empty() {
+ formatter.begin_capabilities(writer)?;
+ for cap in &spec.capabilities {
+ formatter.capability(writer, cap)?;
+ }
+ formatter.end_capabilities(writer)?;
+ }
+
+ // Display signals
+ if !spec.signals.is_empty() {
+ formatter.begin_signals(writer, spec.signals.len().try_into().unwrap_or(u32::MAX))?;
+ for signal in &spec.signals {
+ formatter.signal(writer, signal)?;
+ }
+ formatter.end_signals(writer)?;
+ }
+
+ // Display signal masks
+ if !spec.signal_masks.is_empty() {
+ formatter.begin_signal_masks(
+ writer,
+ spec.signal_masks.len().try_into().unwrap_or(u32::MAX),
+ )?;
+ for mask in &spec.signal_masks {
+ formatter.signal_mask(writer, mask)?;
+ }
+ formatter.end_signal_masks(writer)?;
+ }
+
+ // Display side effects
+ if !spec.side_effects.is_empty() {
+ formatter.begin_side_effects(
+ writer,
+ spec.side_effects.len().try_into().unwrap_or(u32::MAX),
+ )?;
+ for effect in &spec.side_effects {
+ formatter.side_effect(writer, effect)?;
+ }
+ formatter.end_side_effects(writer)?;
+ }
+
+ // Display state transitions
+ if !spec.state_transitions.is_empty() {
+ formatter.begin_state_transitions(
+ writer,
+ spec.state_transitions.len().try_into().unwrap_or(u32::MAX),
+ )?;
+ for trans in &spec.state_transitions {
+ formatter.state_transition(writer, trans)?;
+ }
+ formatter.end_state_transitions(writer)?;
+ }
+
+ // Display constraints
+ if !spec.constraints.is_empty() {
+ formatter.begin_constraints(
+ writer,
+ spec.constraints.len().try_into().unwrap_or(u32::MAX),
+ )?;
+ for constraint in &spec.constraints {
+ formatter.constraint(writer, constraint)?;
+ }
+ formatter.end_constraints(writer)?;
+ }
+
+ // Display locks
+ if !spec.locks.is_empty() {
+ formatter.begin_locks(writer, spec.locks.len().try_into().unwrap_or(u32::MAX))?;
+ for lock in &spec.locks {
+ formatter.lock(writer, lock)?;
+ }
+ formatter.end_locks(writer)?;
+ }
+
+ // Display struct specs
+ if !spec.struct_specs.is_empty() {
+ formatter.begin_struct_specs(writer, spec.struct_specs.len().try_into().unwrap_or(u32::MAX))?;
+ for struct_spec in &spec.struct_specs {
+ formatter.struct_spec(writer, struct_spec)?;
+ }
+ formatter.end_struct_specs(writer)?;
+ }
+
+ formatter.end_api_details(writer)?;
+
+ Ok(())
+}
--git a/tools/kapi/src/extractor/source_parser.rs b/tools/kapi/src/extractor/source_parser.rs
new file mode 100644
index 0000000000000..7a72b85a83bea
--- /dev/null
+++ b/tools/kapi/src/extractor/source_parser.rs
@@ -0,0 +1,213 @@
+use super::{
+ ApiExtractor, ApiSpec, display_api_spec,
+};
+use super::kerneldoc_parser::KerneldocParserImpl;
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result};
+use regex::Regex;
+use std::fs;
+use std::io::Write;
+use std::path::Path;
+use walkdir::WalkDir;
+
+/// Extractor for kernel source files with KAPI-annotated kerneldoc
+pub struct SourceExtractor {
+ path: String,
+ parser: KerneldocParserImpl,
+ syscall_regex: Regex,
+ ioctl_regex: Regex,
+ function_regex: Regex,
+}
+
+impl SourceExtractor {
+ pub fn new(path: &str) -> Result<Self> {
+ Ok(SourceExtractor {
+ path: path.to_string(),
+ parser: KerneldocParserImpl::new(),
+ syscall_regex: Regex::new(r"SYSCALL_DEFINE\d+\((\w+)")?,
+ ioctl_regex: Regex::new(r"(?:static\s+)?long\s+(\w+_ioctl)\s*\(")?,
+ function_regex: Regex::new(
+ r"(?m)^(?:static\s+)?(?:inline\s+)?(?:(?:unsigned\s+)?(?:long|int|void|char|short|struct\s+\w+\s*\*?|[\w_]+_t)\s*\*?\s+)?(\w+)\s*\([^)]*\)",
+ )?,
+ })
+ }
+
+ fn extract_from_file(&self, path: &Path) -> Result<Vec<ApiSpec>> {
+ let content = fs::read_to_string(path)
+ .with_context(|| format!("Failed to read file: {}", path.display()))?;
+
+ self.extract_from_content(&content)
+ }
+
+ fn extract_from_content(&self, content: &str) -> Result<Vec<ApiSpec>> {
+ let mut specs = Vec::new();
+ let mut in_kerneldoc = false;
+ let mut current_doc = String::new();
+ let lines: Vec<&str> = content.lines().collect();
+ let mut i = 0;
+
+ while i < lines.len() {
+ let line = lines[i];
+
+ // Start of kerneldoc comment
+ if line.trim_start().starts_with("/**") {
+ in_kerneldoc = true;
+ current_doc.clear();
+ i += 1;
+ continue;
+ }
+
+ // Inside kerneldoc comment
+ if in_kerneldoc {
+ if line.contains("*/") {
+ in_kerneldoc = false;
+
+ // Check if this kerneldoc has KAPI annotations
+ if current_doc.contains("context-flags:") ||
+ current_doc.contains("param-count:") ||
+ current_doc.contains("side-effect:") ||
+ current_doc.contains("state-trans:") ||
+ current_doc.contains("error-code:") {
+
+ // Look ahead for the function declaration
+ if let Some((name, api_type, signature)) = self.find_function_after(&lines, i + 1) {
+ if let Ok(spec) = self.parser.parse_kerneldoc(¤t_doc, &name, &api_type, Some(&signature)) {
+ specs.push(spec);
+ }
+ }
+ }
+ } else {
+ // Remove leading asterisk and preserve content
+ let cleaned = if let Some(stripped) = line.trim_start().strip_prefix("*") {
+ if let Some(no_space) = stripped.strip_prefix(' ') {
+ no_space
+ } else {
+ stripped
+ }
+ } else {
+ line.trim_start()
+ };
+ current_doc.push_str(cleaned);
+ current_doc.push('\n');
+ }
+ }
+
+ i += 1;
+ }
+
+ Ok(specs)
+ }
+
+ fn find_function_after(&self, lines: &[&str], start: usize) -> Option<(String, String, String)> {
+ for i in start..lines.len().min(start + 10) {
+ let line = lines[i];
+
+ // Skip empty lines
+ if line.trim().is_empty() {
+ continue;
+ }
+
+ // Check for SYSCALL_DEFINE
+ if let Some(caps) = self.syscall_regex.captures(line) {
+ let name = format!("sys_{}", caps.get(1).unwrap().as_str());
+ let signature = self.extract_syscall_signature(lines, i);
+ return Some((name, "syscall".to_string(), signature));
+ }
+
+ // Check for ioctl function
+ if let Some(caps) = self.ioctl_regex.captures(line) {
+ let name = caps.get(1).unwrap().as_str().to_string();
+ return Some((name, "ioctl".to_string(), line.to_string()));
+ }
+
+ // Check for regular function
+ if let Some(caps) = self.function_regex.captures(line) {
+ let name = caps.get(1).unwrap().as_str().to_string();
+ return Some((name, "function".to_string(), line.to_string()));
+ }
+
+ // Stop if we hit something that's clearly not part of the function declaration
+ if !line.starts_with(' ') && !line.starts_with('\t') && !line.trim().is_empty() {
+ break;
+ }
+ }
+
+ None
+ }
+
+ fn extract_syscall_signature(&self, lines: &[&str], start: usize) -> String {
+ // Extract the full SYSCALL_DEFINE signature
+ let mut sig = String::new();
+ let mut in_paren = false;
+ let mut paren_count = 0;
+
+ for line in lines.iter().skip(start).take(20) {
+ let line = *line;
+
+ // Start of SYSCALL_DEFINE
+ if line.contains("SYSCALL_DEFINE") {
+ if let Some(pos) = line.find('(') {
+ sig.push_str(&line[pos..]);
+ in_paren = true;
+ paren_count = line[pos..].chars().filter(|&c| c == '(').count() -
+ line[pos..].chars().filter(|&c| c == ')').count();
+ }
+ } else if in_paren {
+ sig.push(' ');
+ sig.push_str(line.trim());
+ paren_count += line.chars().filter(|&c| c == '(').count();
+ paren_count -= line.chars().filter(|&c| c == ')').count();
+
+ if paren_count == 0 {
+ break;
+ }
+ }
+ }
+
+ sig
+ }
+}
+
+impl ApiExtractor for SourceExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ let path = Path::new(&self.path);
+ let mut all_specs = Vec::new();
+
+ if path.is_file() {
+ // Single file
+ all_specs.extend(self.extract_from_file(path)?);
+ } else if path.is_dir() {
+ // Directory - walk all .c files
+ for entry in WalkDir::new(path)
+ .into_iter()
+ .filter_map(|e| e.ok())
+ .filter(|e| e.path().extension().is_some_and(|ext| ext == "c"))
+ {
+ if let Ok(specs) = self.extract_from_file(entry.path()) {
+ all_specs.extend(specs);
+ }
+ }
+ }
+
+ Ok(all_specs)
+ }
+
+ fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+ let all_specs = self.extract_all()?;
+ Ok(all_specs.into_iter().find(|s| s.name == name))
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ output: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.extract_by_name(api_name)? {
+ display_api_spec(&spec, formatter, output)?;
+ } else {
+ writeln!(output, "API '{}' not found", api_name)?;
+ }
+ Ok(())
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/vmlinux/binary_utils.rs b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
new file mode 100644
index 0000000000000..0a51943e1c027
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
@@ -0,0 +1,180 @@
+// Constants for all structure field sizes
+pub mod sizes {
+ pub const NAME: usize = 128;
+ pub const DESC: usize = 512;
+ pub const MAX_PARAMS: usize = 16;
+ pub const MAX_ERRORS: usize = 32;
+ pub const MAX_CONSTRAINTS: usize = 16;
+ pub const MAX_CAPABILITIES: usize = 8;
+ pub const MAX_SIGNALS: usize = 16;
+ pub const MAX_STRUCT_SPECS: usize = 8;
+ pub const MAX_SIDE_EFFECTS: usize = 32;
+ pub const MAX_STATE_TRANS: usize = 16;
+ pub const MAX_PROTOCOL_BEHAVIORS: usize = 8;
+ pub const MAX_ADDR_FAMILIES: usize = 8;
+}
+
+// Helper for reading data at specific offsets
+pub struct DataReader<'a> {
+ pub data: &'a [u8],
+ pub pos: usize,
+}
+
+impl<'a> DataReader<'a> {
+ pub fn new(data: &'a [u8], offset: usize) -> Self {
+ Self { data, pos: offset }
+ }
+
+ pub fn read_bytes(&mut self, len: usize) -> Option<&'a [u8]> {
+ if self.pos + len <= self.data.len() {
+ let bytes = &self.data[self.pos..self.pos + len];
+ self.pos += len;
+ Some(bytes)
+ } else {
+ None
+ }
+ }
+
+ pub fn read_cstring(&mut self, max_len: usize) -> Option<String> {
+ let bytes = self.read_bytes(max_len)?;
+ if let Some(null_pos) = bytes.iter().position(|&b| b == 0) {
+ if null_pos > 0 {
+ if let Ok(s) = std::str::from_utf8(&bytes[..null_pos]) {
+ return Some(s.to_string());
+ }
+ }
+ }
+ None
+ }
+
+ pub fn read_u32(&mut self) -> Option<u32> {
+ self.read_bytes(4).map(|b| u32::from_le_bytes(b.try_into().unwrap()))
+ }
+
+ pub fn read_u8(&mut self) -> Option<u8> {
+ self.read_bytes(1).map(|b| b[0])
+ }
+
+ pub fn read_i32(&mut self) -> Option<i32> {
+ self.read_bytes(4).map(|b| i32::from_le_bytes(b.try_into().unwrap()))
+ }
+
+ pub fn read_u64(&mut self) -> Option<u64> {
+ self.read_bytes(8).map(|b| u64::from_le_bytes(b.try_into().unwrap()))
+ }
+
+ pub fn read_i64(&mut self) -> Option<i64> {
+ self.read_bytes(8).map(|b| i64::from_le_bytes(b.try_into().unwrap()))
+ }
+
+ pub fn read_usize(&mut self) -> Option<usize> {
+ self.read_u64().map(|v| v as usize)
+ }
+
+ pub fn skip(&mut self, len: usize) {
+ self.pos = (self.pos + len).min(self.data.len());
+ }
+
+ // Helper methods for common patterns
+ pub fn read_bool(&mut self) -> Option<bool> {
+ self.read_u8().map(|v| v != 0)
+ }
+
+ pub fn read_optional_string(&mut self, max_len: usize) -> Option<String> {
+ self.read_cstring(max_len).filter(|s| !s.is_empty())
+ }
+
+ pub fn read_string_or_default(&mut self, max_len: usize) -> String {
+ self.read_cstring(max_len).unwrap_or_default()
+ }
+
+ // Skip and discard - advances position by reading and discarding
+ pub fn discard_cstring(&mut self, max_len: usize) {
+ let _ = self.read_cstring(max_len);
+ }
+
+ // Read multiple booleans at once
+ pub fn read_bools<const N: usize>(&mut self) -> Option<[bool; N]> {
+ let mut result = [false; N];
+ for item in &mut result {
+ *item = self.read_bool()?;
+ }
+ Some(result)
+ }
+
+
+}
+
+// Structure layout definitions for calculating sizes
+pub fn signal_mask_spec_layout_size() -> usize {
+ // Packed structure from struct kapi_signal_mask_spec
+ sizes::NAME + // mask_name
+ 4 * sizes::MAX_SIGNALS + // signals array
+ 4 + // signal_count
+ sizes::DESC // description
+}
+
+pub fn struct_field_layout_size() -> usize {
+ // Packed structure from struct kapi_struct_field
+ sizes::NAME + // name
+ 4 + // type (enum)
+ sizes::NAME + // type_name
+ 8 + // offset (size_t)
+ 8 + // size (size_t)
+ 4 + // flags
+ 4 + // constraint_type (enum)
+ 8 + // min_value (s64)
+ 8 + // max_value (s64)
+ 8 + // valid_mask (u64)
+ sizes::DESC + // enum_values
+ sizes::DESC // description
+}
+
+pub fn socket_state_spec_layout_size() -> usize {
+ // struct kapi_socket_state_spec
+ sizes::NAME * sizes::MAX_CONSTRAINTS + // required_states array
+ sizes::NAME * sizes::MAX_CONSTRAINTS + // forbidden_states array
+ sizes::NAME + // resulting_state
+ sizes::DESC + // condition
+ sizes::NAME + // applicable_protocols
+ 4 + // required_count
+ 4 // forbidden_count
+}
+
+pub fn protocol_behavior_spec_layout_size() -> usize {
+ // struct kapi_protocol_behavior
+ sizes::NAME + // applicable_protocols
+ sizes::DESC + // behavior
+ sizes::NAME + // protocol_flags
+ sizes::DESC // flag_description
+}
+
+pub fn buffer_spec_layout_size() -> usize {
+ // struct kapi_buffer_spec
+ sizes::DESC + // buffer_behaviors
+ 8 + // min_buffer_size (size_t)
+ 8 + // max_buffer_size (size_t)
+ 8 // optimal_buffer_size (size_t)
+}
+
+pub fn async_spec_layout_size() -> usize {
+ // struct kapi_async_spec
+ sizes::NAME + // supported_modes
+ 4 // nonblock_errno (int)
+}
+
+pub fn addr_family_spec_layout_size() -> usize {
+ // struct kapi_addr_family_spec
+ 4 + // family (int)
+ sizes::NAME + // family_name
+ 8 + // addr_struct_size (size_t)
+ 8 + // min_addr_len (size_t)
+ 8 + // max_addr_len (size_t)
+ sizes::DESC + // addr_format
+ 1 + // supports_wildcard (bool)
+ 1 + // supports_multicast (bool)
+ 1 + // supports_broadcast (bool)
+ sizes::DESC + // special_addresses
+ 4 + // port_range_min (u32)
+ 4 // port_range_max (u32)
+}
--git a/tools/kapi/src/extractor/vmlinux/magic_finder.rs b/tools/kapi/src/extractor/vmlinux/magic_finder.rs
new file mode 100644
index 0000000000000..cb7dc535801a0
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/magic_finder.rs
@@ -0,0 +1,102 @@
+// Magic markers for each section
+pub const MAGIC_PARAM: u32 = 0x4B415031; // 'KAP1'
+pub const MAGIC_RETURN: u32 = 0x4B415232; // 'KAR2'
+pub const MAGIC_ERROR: u32 = 0x4B414533; // 'KAE3'
+pub const MAGIC_LOCK: u32 = 0x4B414C34; // 'KAL4'
+pub const MAGIC_CONSTRAINT: u32 = 0x4B414335; // 'KAC5'
+pub const MAGIC_INFO: u32 = 0x4B414936; // 'KAI6'
+pub const MAGIC_SIGNAL: u32 = 0x4B415337; // 'KAS7'
+pub const MAGIC_SIGMASK: u32 = 0x4B414D38; // 'KAM8'
+pub const MAGIC_STRUCT: u32 = 0x4B415439; // 'KAT9'
+pub const MAGIC_EFFECT: u32 = 0x4B414641; // 'KAFA'
+pub const MAGIC_TRANS: u32 = 0x4B415442; // 'KATB'
+pub const MAGIC_CAP: u32 = 0x4B414343; // 'KACC'
+
+pub struct MagicOffsets {
+ pub param_offset: Option<usize>,
+ pub return_offset: Option<usize>,
+ pub error_offset: Option<usize>,
+ pub lock_offset: Option<usize>,
+ pub constraint_offset: Option<usize>,
+ pub info_offset: Option<usize>,
+ pub signal_offset: Option<usize>,
+ pub sigmask_offset: Option<usize>,
+ pub struct_offset: Option<usize>,
+ pub effect_offset: Option<usize>,
+ pub trans_offset: Option<usize>,
+ pub cap_offset: Option<usize>,
+}
+
+impl MagicOffsets {
+ /// Find magic markers in the provided data slice
+ /// data: slice of data to search (typically one spec's worth)
+ /// base_offset: absolute offset where this slice starts in the full buffer
+ pub fn find_in_data(data: &[u8], base_offset: usize) -> Self {
+ let mut offsets = MagicOffsets {
+ param_offset: None,
+ return_offset: None,
+ error_offset: None,
+ lock_offset: None,
+ constraint_offset: None,
+ info_offset: None,
+ signal_offset: None,
+ sigmask_offset: None,
+ struct_offset: None,
+ effect_offset: None,
+ trans_offset: None,
+ cap_offset: None,
+ };
+
+ // Scan through data looking for magic markers
+ // Only find the first occurrence of each magic to avoid cross-spec contamination
+ let mut i = 0;
+ while i + 4 <= data.len() {
+ let bytes = &data[i..i + 4];
+ let value = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
+
+ match value {
+ MAGIC_PARAM if offsets.param_offset.is_none() => {
+ offsets.param_offset = Some(base_offset + i);
+ },
+ MAGIC_RETURN if offsets.return_offset.is_none() => {
+ offsets.return_offset = Some(base_offset + i);
+ },
+ MAGIC_ERROR if offsets.error_offset.is_none() => {
+ offsets.error_offset = Some(base_offset + i);
+ },
+ MAGIC_LOCK if offsets.lock_offset.is_none() => {
+ offsets.lock_offset = Some(base_offset + i);
+ },
+ MAGIC_CONSTRAINT if offsets.constraint_offset.is_none() => {
+ offsets.constraint_offset = Some(base_offset + i);
+ },
+ MAGIC_INFO if offsets.info_offset.is_none() => {
+ offsets.info_offset = Some(base_offset + i);
+ },
+ MAGIC_SIGNAL if offsets.signal_offset.is_none() => {
+ offsets.signal_offset = Some(base_offset + i);
+ },
+ MAGIC_SIGMASK if offsets.sigmask_offset.is_none() => {
+ offsets.sigmask_offset = Some(base_offset + i);
+ },
+ MAGIC_STRUCT if offsets.struct_offset.is_none() => {
+ offsets.struct_offset = Some(base_offset + i);
+ },
+ MAGIC_EFFECT if offsets.effect_offset.is_none() => {
+ offsets.effect_offset = Some(base_offset + i);
+ },
+ MAGIC_TRANS if offsets.trans_offset.is_none() => {
+ offsets.trans_offset = Some(base_offset + i);
+ },
+ MAGIC_CAP if offsets.cap_offset.is_none() => {
+ offsets.cap_offset = Some(base_offset + i);
+ },
+ _ => {}
+ }
+
+ i += 1;
+ }
+
+ offsets
+ }
+}
\ No newline at end of file
--git a/tools/kapi/src/extractor/vmlinux/mod.rs b/tools/kapi/src/extractor/vmlinux/mod.rs
new file mode 100644
index 0000000000000..bf3da4df6e66a
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/mod.rs
@@ -0,0 +1,864 @@
+use super::{
+ ApiExtractor, ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec, ParamSpec,
+ ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec, StateTransitionSpec, StructSpec,
+ StructFieldSpec,
+};
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result};
+use goblin::elf::Elf;
+use std::convert::TryInto;
+use std::fs;
+use std::io::Write;
+
+mod binary_utils;
+mod magic_finder;
+use binary_utils::{
+ DataReader, addr_family_spec_layout_size, async_spec_layout_size, buffer_spec_layout_size,
+ protocol_behavior_spec_layout_size, signal_mask_spec_layout_size,
+ sizes, socket_state_spec_layout_size, struct_field_layout_size,
+};
+
+// Helper to convert empty strings to None
+fn opt_string(s: String) -> Option<String> {
+ if s.is_empty() { None } else { Some(s) }
+}
+
+pub struct VmlinuxExtractor {
+ kapi_data: Vec<u8>,
+ specs: Vec<KapiSpec>,
+}
+
+#[derive(Debug)]
+struct KapiSpec {
+ name: String,
+ api_type: String,
+ offset: usize,
+}
+
+impl VmlinuxExtractor {
+ pub fn new(vmlinux_path: &str) -> Result<Self> {
+ let vmlinux_data = fs::read(vmlinux_path)
+ .with_context(|| format!("Failed to read vmlinux file: {vmlinux_path}"))?;
+
+ let elf = Elf::parse(&vmlinux_data).context("Failed to parse ELF file")?;
+
+ // Find __start_kapi_specs and __stop_kapi_specs symbols first
+ let mut start_addr = None;
+ let mut stop_addr = None;
+
+ for sym in &elf.syms {
+ if let Some(name) = elf.strtab.get_at(sym.st_name) {
+ match name {
+ "__start_kapi_specs" => start_addr = Some(sym.st_value),
+ "__stop_kapi_specs" => stop_addr = Some(sym.st_value),
+ _ => {}
+ }
+ }
+ }
+
+ let start = start_addr.context("Could not find __start_kapi_specs symbol")?;
+ let stop = stop_addr.context("Could not find __stop_kapi_specs symbol")?;
+
+ if stop <= start {
+ anyhow::bail!("No kernel API specifications found in vmlinux");
+ }
+
+ // Find the section containing the kapi specs data
+ // The specs may be in .kapi_specs (standalone) or .rodata (embedded in RO_DATA)
+ let containing_section = elf
+ .section_headers
+ .iter()
+ .find(|sh| {
+ // Check if this section contains the start address
+ start >= sh.sh_addr && start < sh.sh_addr + sh.sh_size
+ })
+ .context("Could not find section containing kapi_specs data")?;
+
+ // Calculate the offset within the file
+ let section_vaddr = containing_section.sh_addr;
+ let file_offset = containing_section.sh_offset + (start - section_vaddr);
+ let data_size: usize = (stop - start)
+ .try_into()
+ .context("Data size too large for platform")?;
+
+ let file_offset_usize: usize = file_offset
+ .try_into()
+ .context("File offset too large for platform")?;
+
+ if file_offset_usize + data_size > vmlinux_data.len() {
+ anyhow::bail!("Invalid offset/size for kapi_specs data");
+ }
+
+ // Extract the raw data
+ let kapi_data = vmlinux_data[file_offset_usize..(file_offset_usize + data_size)].to_vec();
+
+ // Parse the specifications
+ let specs = parse_kapi_specs(&kapi_data)?;
+
+ Ok(VmlinuxExtractor { kapi_data, specs })
+ }
+}
+
+fn parse_kapi_specs(data: &[u8]) -> Result<Vec<KapiSpec>> {
+ let mut specs = Vec::new();
+ let mut offset = 0;
+ let mut last_found_offset = None;
+
+ // Expected offset from struct start to param_magic based on struct layout
+ let param_magic_offset = sizes::NAME + 4 + sizes::DESC + (sizes::DESC * 4) + 4;
+
+ // Find specs by validating API name and magic marker pairs
+ while offset + param_magic_offset + 4 <= data.len() {
+ // Read potential API name
+ let name_bytes = &data[offset..offset + sizes::NAME.min(data.len() - offset)];
+
+ // Find null terminator
+ let name_len = name_bytes.iter().position(|&b| b == 0).unwrap_or(0);
+
+ if name_len > 0 && name_len < 100 {
+ let name = String::from_utf8_lossy(&name_bytes[..name_len]).to_string();
+
+ // Validate API name format
+ if is_valid_api_name(&name) {
+ // Verify magic marker at expected position
+ let magic_offset = offset + param_magic_offset;
+ if magic_offset + 4 <= data.len() {
+ let magic_bytes = &data[magic_offset..magic_offset + 4];
+ let magic_value = u32::from_le_bytes([magic_bytes[0], magic_bytes[1], magic_bytes[2], magic_bytes[3]]);
+
+ if magic_value == magic_finder::MAGIC_PARAM {
+ // Avoid duplicate detection of the same spec
+ if last_found_offset.is_none() || offset >= last_found_offset.unwrap() + param_magic_offset {
+ let api_type = if name.starts_with("sys_") {
+ "syscall"
+ } else if name.ends_with("_ioctl") {
+ "ioctl"
+ } else if name.contains("sysfs") {
+ "sysfs"
+ } else {
+ "function"
+ }
+ .to_string();
+
+ specs.push(KapiSpec {
+ name: name.clone(),
+ api_type,
+ offset,
+ });
+
+ last_found_offset = Some(offset);
+ }
+ }
+ }
+ }
+ }
+
+ // Scan byte by byte to find all specs
+ offset += 1;
+ }
+
+ Ok(specs)
+}
+
+
+
+
+fn is_valid_api_name(name: &str) -> bool {
+ // Validate API name format and length
+ if name.is_empty() || name.len() < 3 || name.len() > 100 {
+ return false;
+ }
+
+ // Alphanumeric and underscore characters only
+ if !name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_') {
+ return false;
+ }
+
+ // Must start with letter or underscore
+ let first_char = name.chars().next().unwrap();
+ if !first_char.is_ascii_alphabetic() && first_char != '_' {
+ return false;
+ }
+
+ // Match common kernel API patterns
+ name.starts_with("sys_") ||
+ name.starts_with("__") ||
+ name.ends_with("_ioctl") ||
+ name.contains("_") ||
+ name.len() > 6
+}
+
+impl ApiExtractor for VmlinuxExtractor {
+ fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+ Ok(self
+ .specs
+ .iter()
+ .map(|spec| {
+ // Parse the full spec for listing
+ parse_binary_to_api_spec(&self.kapi_data, spec.offset)
+ .unwrap_or_else(|_| ApiSpec {
+ name: spec.name.clone(),
+ api_type: spec.api_type.clone(),
+ description: None,
+ long_description: None,
+ version: None,
+ context_flags: vec![],
+ param_count: None,
+ error_count: None,
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: vec![],
+ parameters: vec![],
+ return_spec: None,
+ errors: vec![],
+ signals: vec![],
+ signal_masks: vec![],
+ side_effects: vec![],
+ state_transitions: vec![],
+ constraints: vec![],
+ locks: vec![],
+ struct_specs: vec![],
+ })
+ })
+ .collect())
+ }
+
+ fn extract_by_name(&self, api_name: &str) -> Result<Option<ApiSpec>> {
+ if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+ Ok(Some(parse_binary_to_api_spec(&self.kapi_data, spec.offset)?))
+ } else {
+ Ok(None)
+ }
+ }
+
+ fn display_api_details(
+ &self,
+ api_name: &str,
+ formatter: &mut dyn OutputFormatter,
+ writer: &mut dyn Write,
+ ) -> Result<()> {
+ if let Some(spec) = self.specs.iter().find(|s| s.name == api_name) {
+ let api_spec = parse_binary_to_api_spec(&self.kapi_data, spec.offset)?;
+ super::display_api_spec(&api_spec, formatter, writer)?;
+ }
+ Ok(())
+ }
+}
+
+/// Helper to read count and parse array items with optional magic offset
+fn parse_array_with_magic<T, F>(
+ reader: &mut DataReader,
+ magic_offset: Option<usize>,
+ max_items: u32,
+ parse_fn: F,
+) -> Vec<T>
+where
+ F: Fn(&mut DataReader) -> Option<T>,
+{
+ // Read count - position at magic+4 if magic offset exists
+ let count = if let Some(offset) = magic_offset {
+ reader.pos = offset + 4;
+ reader.read_u32()
+ } else {
+ reader.read_u32()
+ };
+
+ let mut items = Vec::new();
+ if let Some(count) = count {
+ // Position at start of array data if magic offset exists
+ if let Some(offset) = magic_offset {
+ reader.pos = offset + 8; // +4 for magic, +4 for count
+ }
+ // Parse items up to max_items
+ for _ in 0..count.min(max_items) as usize {
+ if let Some(item) = parse_fn(reader) {
+ items.push(item);
+ }
+ }
+ }
+ items
+}
+
+fn parse_binary_to_api_spec(data: &[u8], offset: usize) -> Result<ApiSpec> {
+ let mut reader = DataReader::new(data, offset);
+
+ // Search for magic markers in the entire spec data
+ let search_end = (offset + 0x70000).min(data.len()); // Search full spec size
+ let spec_data = &data[offset..search_end];
+
+ // Find magic markers relative to the spec start
+ let magic_offsets = magic_finder::MagicOffsets::find_in_data(spec_data, offset);
+
+ // Read fields in exact order of struct kernel_api_spec
+
+ // Read name (128 bytes)
+ let name = reader
+ .read_cstring(sizes::NAME)
+ .ok_or_else(|| anyhow::anyhow!("Failed to read API name"))?;
+
+ // Determine API type
+ let api_type = if name.starts_with("sys_") {
+ "syscall"
+ } else if name.ends_with("_ioctl") {
+ "ioctl"
+ } else if name.contains("sysfs") {
+ "sysfs"
+ } else {
+ "function"
+ }
+ .to_string();
+
+ // Read version (u32)
+ let version = reader.read_u32().map(|v| v.to_string());
+
+ // Read description (512 bytes)
+ let description = reader.read_cstring(sizes::DESC).filter(|s| !s.is_empty());
+
+ // Read long_description (2048 bytes)
+ let long_description = reader
+ .read_cstring(sizes::DESC * 4)
+ .filter(|s| !s.is_empty());
+
+ // Read context_flags (u32)
+ let context_flags = parse_context_flags(&mut reader);
+
+ // Parse params array
+ let parameters = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.param_offset,
+ sizes::MAX_PARAMS as u32,
+ |r| parse_param(r, 0), // Index doesn't seem to be used in parse_param
+ );
+
+ // Read return_spec
+ let return_spec = parse_return_spec(&mut reader);
+
+ // Parse errors array
+ let errors = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.error_offset,
+ sizes::MAX_ERRORS as u32,
+ parse_error,
+ );
+
+ // Parse locks array
+ let locks = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.lock_offset,
+ sizes::MAX_CONSTRAINTS as u32,
+ parse_lock,
+ );
+
+ // Parse constraints array
+ let constraints = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.constraint_offset,
+ sizes::MAX_CONSTRAINTS as u32,
+ parse_constraint,
+ );
+
+ // Read examples and notes - position reader at info section if magic found
+ let (examples, notes) = if let Some(info_offset) = magic_offsets.info_offset {
+ reader.pos = info_offset + 4; // +4 to skip magic
+ let examples = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+ let notes = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+ (examples, notes)
+ } else {
+ let examples = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+ let notes = reader.read_cstring(sizes::DESC * 2).filter(|s| !s.is_empty());
+ (examples, notes)
+ };
+
+ // Read since_version (32 bytes)
+ let since_version = reader.read_cstring(32).filter(|s| !s.is_empty());
+
+ // Skip deprecated (bool = 1 byte + 3 bytes padding) and replacement (128 bytes)
+ // These fields were removed from kernel but we need to skip them for binary compatibility
+ reader.skip(4); // deprecated + padding
+ reader.discard_cstring(sizes::NAME); // replacement
+
+ // Parse signals array
+ let signals = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.signal_offset,
+ sizes::MAX_SIGNALS as u32,
+ parse_signal,
+ );
+
+ // Read signal_mask_count (u32)
+ let signal_mask_count = reader.read_u32();
+
+ // Parse signal_masks array
+ let mut signal_masks = Vec::new();
+ if let Some(count) = signal_mask_count {
+ for i in 0..sizes::MAX_SIGNALS {
+ if i < count as usize {
+ if let Some(mask) = parse_signal_mask(&mut reader) {
+ signal_masks.push(mask);
+ }
+ } else {
+ reader.skip(signal_mask_spec_layout_size());
+ }
+ }
+ } else {
+ reader.skip(signal_mask_spec_layout_size() * sizes::MAX_SIGNALS);
+ }
+
+ // Parse struct_specs array
+ let struct_specs = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.struct_offset,
+ sizes::MAX_STRUCT_SPECS as u32,
+ parse_struct_spec,
+ );
+
+ // According to the C struct, the order is:
+ // side_effect_count, side_effects array, state_trans_count, state_transitions array,
+ // capability_count, capabilities array
+
+ // Parse side_effects array
+ let side_effects = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.effect_offset,
+ sizes::MAX_SIDE_EFFECTS as u32,
+ parse_side_effect,
+ );
+
+ // Parse state_transitions array
+ let state_transitions = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.trans_offset,
+ sizes::MAX_STATE_TRANS as u32,
+ parse_state_transition,
+ );
+
+ // Parse capabilities array
+ let capabilities = parse_array_with_magic(
+ &mut reader,
+ magic_offsets.cap_offset,
+ sizes::MAX_CAPABILITIES as u32,
+ parse_capability,
+ );
+
+ // Skip remaining network/socket fields
+ reader.skip(
+ socket_state_spec_layout_size() +
+ protocol_behavior_spec_layout_size() * sizes::MAX_PROTOCOL_BEHAVIORS +
+ 4 + // protocol_behavior_count
+ buffer_spec_layout_size() +
+ async_spec_layout_size() +
+ addr_family_spec_layout_size() * sizes::MAX_ADDR_FAMILIES +
+ 4 + // addr_family_count
+ 6 + 2 + // 6 bool flags + padding
+ sizes::DESC * 3 // 3 semantic descriptions
+ );
+
+ Ok(ApiSpec {
+ name,
+ api_type,
+ description,
+ long_description,
+ version,
+ context_flags,
+ param_count: if parameters.is_empty() { None } else { Some(parameters.len() as u32) },
+ error_count: if errors.is_empty() { None } else { Some(errors.len() as u32) },
+ examples,
+ notes,
+ since_version,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: vec![],
+ addr_families: vec![],
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities,
+ parameters,
+ return_spec,
+ errors,
+ signals,
+ signal_masks,
+ side_effects,
+ state_transitions,
+ constraints,
+ locks,
+ struct_specs,
+ })
+}
+
+// Helper parsing functions
+
+fn parse_context_flags(reader: &mut DataReader) -> Vec<String> {
+ const KAPI_CTX_PROCESS: u32 = 1 << 0;
+ const KAPI_CTX_SOFTIRQ: u32 = 1 << 1;
+ const KAPI_CTX_HARDIRQ: u32 = 1 << 2;
+ const KAPI_CTX_NMI: u32 = 1 << 3;
+ const KAPI_CTX_ATOMIC: u32 = 1 << 4;
+ const KAPI_CTX_SLEEPABLE: u32 = 1 << 5;
+ const KAPI_CTX_PREEMPT_DISABLED: u32 = 1 << 6;
+ const KAPI_CTX_IRQ_DISABLED: u32 = 1 << 7;
+
+ if let Some(flags) = reader.read_u32() {
+ let mut parts = Vec::new();
+
+ if flags & KAPI_CTX_PROCESS != 0 {
+ parts.push("KAPI_CTX_PROCESS");
+ }
+ if flags & KAPI_CTX_SOFTIRQ != 0 {
+ parts.push("KAPI_CTX_SOFTIRQ");
+ }
+ if flags & KAPI_CTX_HARDIRQ != 0 {
+ parts.push("KAPI_CTX_HARDIRQ");
+ }
+ if flags & KAPI_CTX_NMI != 0 {
+ parts.push("KAPI_CTX_NMI");
+ }
+ if flags & KAPI_CTX_ATOMIC != 0 {
+ parts.push("KAPI_CTX_ATOMIC");
+ }
+ if flags & KAPI_CTX_SLEEPABLE != 0 {
+ parts.push("KAPI_CTX_SLEEPABLE");
+ }
+ if flags & KAPI_CTX_PREEMPT_DISABLED != 0 {
+ parts.push("KAPI_CTX_PREEMPT_DISABLED");
+ }
+ if flags & KAPI_CTX_IRQ_DISABLED != 0 {
+ parts.push("KAPI_CTX_IRQ_DISABLED");
+ }
+
+ if !parts.is_empty() {
+ vec![parts.join(" | ")]
+ } else {
+ vec![]
+ }
+ } else {
+ vec![]
+ }
+}
+
+fn parse_param(reader: &mut DataReader, index: usize) -> Option<ParamSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let type_name = reader.read_cstring(sizes::NAME)?;
+ let param_type = reader.read_u32()?;
+ let flags = reader.read_u32()?;
+ let size = reader.read_usize()?;
+ let alignment = reader.read_usize()?;
+ let min_value = reader.read_i64()?;
+ let max_value = reader.read_i64()?;
+ let valid_mask = reader.read_u64()?;
+
+ // Skip enum_values pointer (8 bytes)
+ reader.skip(8);
+ let _enum_count = reader.read_u32()?; // Must use ? to propagate errors
+ let constraint_type = reader.read_u32()?;
+ // Skip validate function pointer (8 bytes)
+ reader.skip(8);
+
+ let description = reader.read_string_or_default(sizes::DESC);
+ let constraint = reader.read_optional_string(sizes::DESC);
+ let _size_param_idx = reader.read_i32()?; // Must use ? to propagate errors
+ let _size_multiplier = reader.read_usize()?; // Must use ? to propagate errors
+
+ Some(ParamSpec {
+ index: index as u32,
+ name,
+ type_name,
+ description,
+ flags,
+ param_type,
+ constraint_type,
+ constraint,
+ min_value: Some(min_value),
+ max_value: Some(max_value),
+ valid_mask: Some(valid_mask),
+ enum_values: vec![],
+ size: Some(size as u32),
+ alignment: Some(alignment as u32),
+ })
+}
+
+fn parse_return_spec(reader: &mut DataReader) -> Option<ReturnSpec> {
+ // Read type_name, but treat empty as valid (will be empty string)
+ let type_name = reader.read_string_or_default(sizes::NAME);
+
+ // Read return_type and check_type
+ let return_type = reader.read_u32().unwrap_or(0);
+ let check_type = reader.read_u32().unwrap_or(0);
+ let success_value = reader.read_i64().unwrap_or(0);
+ let success_min = reader.read_i64().unwrap_or(0);
+ let success_max = reader.read_i64().unwrap_or(0);
+
+ // Skip error_values pointer (8 bytes)
+ reader.skip(8);
+ let _error_count = reader.read_u32().unwrap_or(0); // Don't fail on return spec
+ // Skip is_success function pointer (8 bytes)
+ reader.skip(8);
+
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ // Return a spec even if type_name is empty, as long as we have some data
+ // The type_name might be a string like "KAPI_TYPE_INT" that gets stored literally
+ if type_name.is_empty() && return_type == 0 && check_type == 0 && success_value == 0 {
+ // No return spec at all
+ return None;
+ }
+
+ Some(ReturnSpec {
+ type_name,
+ description,
+ return_type,
+ check_type,
+ success_value: Some(success_value),
+ success_min: Some(success_min),
+ success_max: Some(success_max),
+ error_values: vec![],
+ })
+}
+
+fn parse_error(reader: &mut DataReader) -> Option<ErrorSpec> {
+ let error_code = reader.read_i32()?;
+ let name = reader.read_cstring(sizes::NAME)?;
+ let condition = reader.read_string_or_default(sizes::DESC);
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ Some(ErrorSpec {
+ error_code,
+ name,
+ condition,
+ description,
+ })
+}
+
+fn parse_lock(reader: &mut DataReader) -> Option<LockSpec> {
+ let lock_name = reader.read_cstring(sizes::NAME)?;
+ let lock_type = reader.read_u32()?;
+ let scope = reader.read_u32()?;
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ Some(LockSpec {
+ lock_name,
+ lock_type,
+ scope,
+ description,
+ })
+}
+
+fn parse_constraint(reader: &mut DataReader) -> Option<ConstraintSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let description = reader.read_string_or_default(sizes::DESC);
+ let expression = reader.read_string_or_default(sizes::DESC);
+
+ // No function pointer in packed struct
+
+ Some(ConstraintSpec {
+ name,
+ description,
+ expression: opt_string(expression),
+ })
+}
+
+fn parse_signal(reader: &mut DataReader) -> Option<SignalSpec> {
+ let signal_num = reader.read_i32()?;
+ let signal_name = reader.read_cstring(32)?; // signal_name[32]
+ let direction = reader.read_u32()?;
+ let action = reader.read_u32()?;
+ let target = reader.read_optional_string(sizes::DESC); // target[512]
+ let condition = reader.read_optional_string(sizes::DESC); // condition[512]
+ let description = reader.read_optional_string(sizes::DESC); // description[512]
+ let restartable = reader.read_bool()?;
+ let sa_flags_required = reader.read_u32()?;
+ let sa_flags_forbidden = reader.read_u32()?;
+ let error_on_signal = reader.read_i32()?;
+ let _transform_to = reader.read_i32()?; // transform_to
+ let timing_bytes = reader.read_bytes(32)?; // timing[32]
+ let timing = if let Some(end) = timing_bytes.iter().position(|&b| b == 0) {
+ String::from_utf8_lossy(&timing_bytes[..end]).parse().unwrap_or(0)
+ } else {
+ 0
+ };
+ let priority = reader.read_u8()?;
+ let interruptible = reader.read_bool()?;
+ let _queue_behavior = reader.read_bytes(128)?; // queue_behavior[128]
+ let state_required = reader.read_u32()?;
+ let state_forbidden = reader.read_u32()?;
+
+ Some(SignalSpec {
+ signal_num,
+ signal_name,
+ direction,
+ action,
+ target,
+ condition,
+ description,
+ timing,
+ priority: priority as u32,
+ restartable,
+ interruptible,
+ queue: None, // queue_behavior not exposed in SignalSpec
+ sa_flags: 0, // Not directly available
+ sa_flags_required,
+ sa_flags_forbidden,
+ state_required,
+ state_forbidden,
+ error_on_signal: Some(error_on_signal),
+ })
+}
+
+fn parse_signal_mask(reader: &mut DataReader) -> Option<SignalMaskSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ // Skip signals array
+ for _ in 0..sizes::MAX_SIGNALS {
+ reader.read_i32();
+ }
+
+ let _signal_count = reader.read_u32()?;
+
+ Some(SignalMaskSpec {
+ name,
+ description,
+ })
+}
+
+fn parse_struct_field(reader: &mut DataReader) -> Option<StructFieldSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let field_type = reader.read_u32()?;
+ let type_name = reader.read_cstring(sizes::NAME)?;
+ let offset = reader.read_usize()?;
+ let size = reader.read_usize()?;
+ let flags = reader.read_u32()?;
+ let constraint_type = reader.read_u32()?;
+ let min_value = reader.read_i64()?;
+ let max_value = reader.read_i64()?;
+ let valid_mask = reader.read_u64()?;
+ // Skip enum_values field (512 bytes)
+ let _enum_values = reader.read_cstring(sizes::DESC); // Don't fail on optional field
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ Some(StructFieldSpec {
+ name,
+ field_type,
+ type_name,
+ offset,
+ size,
+ flags,
+ constraint_type,
+ min_value,
+ max_value,
+ valid_mask,
+ description,
+ })
+}
+
+fn parse_struct_spec(reader: &mut DataReader) -> Option<StructSpec> {
+ let name = reader.read_cstring(sizes::NAME)?;
+ let size = reader.read_usize()?;
+ let alignment = reader.read_usize()?;
+ let field_count = reader.read_u32()?;
+
+ // Parse fields array
+ let mut fields = Vec::new();
+ for _ in 0..field_count.min(sizes::MAX_PARAMS as u32) {
+ if let Some(field) = parse_struct_field(reader) {
+ fields.push(field);
+ } else {
+ // Skip this field if we can't parse it
+ reader.skip(struct_field_layout_size());
+ }
+ }
+
+ // Skip remaining fields if any
+ let remaining = sizes::MAX_PARAMS as u32 - field_count.min(sizes::MAX_PARAMS as u32);
+ for _ in 0..remaining {
+ reader.skip(struct_field_layout_size());
+ }
+
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ Some(StructSpec {
+ name,
+ size,
+ alignment,
+ field_count,
+ fields,
+ description,
+ })
+}
+
+fn parse_side_effect(reader: &mut DataReader) -> Option<SideEffectSpec> {
+ let effect_type = reader.read_u32()?;
+ let target = reader.read_cstring(sizes::NAME)?;
+ let condition = reader.read_string_or_default(sizes::DESC);
+ let description = reader.read_string_or_default(sizes::DESC);
+ let reversible = reader.read_bool()?;
+ // No padding needed for packed struct
+
+ Some(SideEffectSpec {
+ effect_type,
+ target,
+ condition: opt_string(condition),
+ description,
+ reversible,
+ })
+}
+
+fn parse_state_transition(reader: &mut DataReader) -> Option<StateTransitionSpec> {
+ let from_state = reader.read_cstring(sizes::NAME)?;
+ let to_state = reader.read_cstring(sizes::NAME)?;
+ let condition = reader.read_string_or_default(sizes::DESC);
+ let object = reader.read_cstring(sizes::NAME)?;
+ let description = reader.read_string_or_default(sizes::DESC);
+
+ Some(StateTransitionSpec {
+ object,
+ from_state,
+ to_state,
+ condition: opt_string(condition),
+ description,
+ })
+}
+
+fn parse_capability(reader: &mut DataReader) -> Option<CapabilitySpec> {
+ let capability = reader.read_i32()?;
+ let cap_name = reader.read_cstring(sizes::NAME)?;
+ let action = reader.read_u32()?;
+ let allows = reader.read_string_or_default(sizes::DESC);
+ let without_cap = reader.read_string_or_default(sizes::DESC);
+ let check_condition = reader.read_optional_string(sizes::DESC);
+ let priority = reader.read_u32()?;
+
+ let mut alternatives = Vec::new();
+ for _ in 0..sizes::MAX_CAPABILITIES {
+ if let Some(alt) = reader.read_i32() {
+ if alt != 0 {
+ alternatives.push(alt);
+ }
+ }
+ }
+
+ let _alternative_count = reader.read_u32()?; // alternative_count
+
+ Some(CapabilitySpec {
+ capability,
+ name: cap_name,
+ action: action.to_string(),
+ allows,
+ without_cap,
+ check_condition,
+ priority: Some(priority as u8),
+ alternatives,
+ })
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/json.rs b/tools/kapi/src/formatter/json.rs
new file mode 100644
index 0000000000000..8025467409d64
--- /dev/null
+++ b/tools/kapi/src/formatter/json.rs
@@ -0,0 +1,468 @@
+use super::OutputFormatter;
+use crate::extractor::{
+ AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+ ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+ SocketStateSpec, StateTransitionSpec, StructSpec,
+};
+use serde::Serialize;
+use std::io::Write;
+
+pub struct JsonFormatter {
+ data: JsonData,
+}
+
+#[derive(Serialize)]
+struct JsonData {
+ #[serde(skip_serializing_if = "Option::is_none")]
+ apis: Option<Vec<JsonApi>>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ api_details: Option<JsonApiDetails>,
+}
+
+#[derive(Serialize)]
+struct JsonApi {
+ name: String,
+ api_type: String,
+}
+
+#[derive(Serialize)]
+struct JsonApiDetails {
+ name: String,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ description: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ long_description: Option<String>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ context_flags: Vec<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ examples: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ notes: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ since_version: Option<String>,
+ // Sysfs-specific fields
+ #[serde(skip_serializing_if = "Option::is_none")]
+ subsystem: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ sysfs_path: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ permissions: Option<String>,
+ // Networking-specific fields
+ #[serde(skip_serializing_if = "Option::is_none")]
+ socket_state: Option<SocketStateSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ addr_families: Vec<AddrFamilySpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ buffer_spec: Option<BufferSpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ async_spec: Option<AsyncSpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ net_data_transfer: Option<String>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ capabilities: Vec<CapabilitySpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ state_transitions: Vec<StateTransitionSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ side_effects: Vec<SideEffectSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ parameters: Vec<ParamSpec>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ return_spec: Option<ReturnSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ errors: Vec<ErrorSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ locks: Vec<LockSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ struct_specs: Vec<StructSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ signals: Vec<SignalSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ signal_masks: Vec<SignalMaskSpec>,
+ #[serde(skip_serializing_if = "Vec::is_empty")]
+ constraints: Vec<ConstraintSpec>,
+}
+
+impl JsonFormatter {
+ pub fn new() -> Self {
+ JsonFormatter {
+ data: JsonData {
+ apis: None,
+ api_details: None,
+ },
+ }
+ }
+}
+
+impl OutputFormatter for JsonFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ let json = serde_json::to_string_pretty(&self.data)?;
+ writeln!(w, "{json}")?;
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, _w: &mut dyn Write, _title: &str) -> std::io::Result<()> {
+ self.data.apis = Some(Vec::new());
+ Ok(())
+ }
+
+ fn api_item(&mut self, _w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+ if let Some(apis) = &mut self.data.apis {
+ apis.push(JsonApi {
+ name: name.to_string(),
+ api_type: api_type.to_string(),
+ });
+ }
+ Ok(())
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, _w: &mut dyn Write, _count: usize) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_details(&mut self, _w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.data.api_details = Some(JsonApiDetails {
+ name: name.to_string(),
+ description: None,
+ long_description: None,
+ context_flags: Vec::new(),
+ examples: None,
+ notes: None,
+ since_version: None,
+ subsystem: None,
+ sysfs_path: None,
+ permissions: None,
+ socket_state: None,
+ protocol_behaviors: Vec::new(),
+ addr_families: Vec::new(),
+ buffer_spec: None,
+ async_spec: None,
+ net_data_transfer: None,
+ capabilities: Vec::new(),
+ state_transitions: Vec::new(),
+ side_effects: Vec::new(),
+ parameters: Vec::new(),
+ return_spec: None,
+ errors: Vec::new(),
+ locks: Vec::new(),
+ struct_specs: Vec::new(),
+ signals: Vec::new(),
+ signal_masks: Vec::new(),
+ constraints: Vec::new(),
+ });
+ Ok(())
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.description = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn long_description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.long_description = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn begin_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn context_flag(&mut self, _w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.context_flags.push(flag.to_string());
+ }
+ Ok(())
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, _w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.examples = Some(examples.to_string());
+ }
+ Ok(())
+ }
+
+ fn notes(&mut self, _w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.notes = Some(notes.to_string());
+ }
+ Ok(())
+ }
+
+ fn since_version(&mut self, _w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.since_version = Some(version.to_string());
+ }
+ Ok(())
+ }
+
+ fn sysfs_subsystem(&mut self, _w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.subsystem = Some(subsystem.to_string());
+ }
+ Ok(())
+ }
+
+ fn sysfs_path(&mut self, _w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.sysfs_path = Some(path.to_string());
+ }
+ Ok(())
+ }
+
+ fn sysfs_permissions(&mut self, _w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.permissions = Some(perms.to_string());
+ }
+ Ok(())
+ }
+
+ // Networking-specific methods
+ fn socket_state(&mut self, _w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.socket_state = Some(state.clone());
+ }
+ Ok(())
+ }
+
+ fn begin_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn protocol_behavior(
+ &mut self,
+ _w: &mut dyn Write,
+ behavior: &ProtocolBehaviorSpec,
+ ) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.protocol_behaviors.push(behavior.clone());
+ }
+ Ok(())
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn addr_family(&mut self, _w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.addr_families.push(family.clone());
+ }
+ Ok(())
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, _w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.buffer_spec = Some(spec.clone());
+ }
+ Ok(())
+ }
+
+ fn async_spec(&mut self, _w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.async_spec = Some(spec.clone());
+ }
+ Ok(())
+ }
+
+ fn net_data_transfer(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.net_data_transfer = Some(desc.to_string());
+ }
+ Ok(())
+ }
+
+ fn begin_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn capability(&mut self, _w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.capabilities.push(cap.clone());
+ }
+ Ok(())
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Stub implementations for new methods
+ fn parameter(&mut self, _w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.parameters.push(param.clone());
+ }
+ Ok(())
+ }
+
+ fn return_spec(&mut self, _w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.return_spec = Some(ret.clone());
+ }
+ Ok(())
+ }
+
+ fn error(&mut self, _w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.errors.push(error.clone());
+ }
+ Ok(())
+ }
+
+ fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal(&mut self, _w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()> {
+ if let Some(api_details) = &mut self.data.api_details {
+ api_details.signals.push(signal.clone());
+ }
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal_mask(&mut self, _w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()> {
+ if let Some(api_details) = &mut self.data.api_details {
+ api_details.signal_masks.push(mask.clone());
+ }
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_side_effects(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn side_effect(&mut self, _w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.side_effects.push(effect.clone());
+ }
+ Ok(())
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn state_transition(
+ &mut self,
+ _w: &mut dyn Write,
+ trans: &StateTransitionSpec,
+ ) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.state_transitions.push(trans.clone());
+ }
+ Ok(())
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn constraint(
+ &mut self,
+ _w: &mut dyn Write,
+ constraint: &ConstraintSpec,
+ ) -> std::io::Result<()> {
+ if let Some(api_details) = &mut self.data.api_details {
+ api_details.constraints.push(constraint.clone());
+ }
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn lock(&mut self, _w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ if let Some(details) = &mut self.data.api_details {
+ details.locks.push(lock.clone());
+ }
+ Ok(())
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_struct_specs(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn struct_spec(&mut self, _w: &mut dyn Write, spec: &StructSpec) -> std::io::Result<()> {
+ if let Some(ref mut details) = self.data.api_details {
+ details.struct_specs.push(spec.clone());
+ }
+ Ok(())
+ }
+
+ fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
diff --git a/tools/kapi/src/formatter/mod.rs b/tools/kapi/src/formatter/mod.rs
new file mode 100644
index 0000000000000..3de8bf23bc29a
--- /dev/null
+++ b/tools/kapi/src/formatter/mod.rs
@@ -0,0 +1,140 @@
+use crate::extractor::{
+ AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+ ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+ SocketStateSpec, StateTransitionSpec, StructSpec,
+};
+use std::io::Write;
+
+mod json;
+mod plain;
+mod rst;
+
+pub use json::JsonFormatter;
+pub use plain::PlainFormatter;
+pub use rst::RstFormatter;
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum OutputFormat {
+ Plain,
+ Json,
+ Rst,
+}
+
+impl std::str::FromStr for OutputFormat {
+ type Err = String;
+
+ fn from_str(s: &str) -> Result<Self, Self::Err> {
+ match s.to_lowercase().as_str() {
+ "plain" => Ok(OutputFormat::Plain),
+ "json" => Ok(OutputFormat::Json),
+ "rst" => Ok(OutputFormat::Rst),
+ _ => Err(format!("Unknown output format: {}", s)),
+ }
+ }
+}
+
+pub trait OutputFormatter {
+ fn begin_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()>;
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()>;
+ fn end_api_list(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()>;
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()>;
+ fn end_api_details(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()>;
+ fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()>;
+ fn end_parameters(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()>;
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()>;
+ fn end_errors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()>;
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()>;
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()>;
+
+ // Sysfs-specific methods
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()>;
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()>;
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()>;
+
+ // Networking-specific methods
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()>;
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn protocol_behavior(
+ &mut self,
+ w: &mut dyn Write,
+ behavior: &ProtocolBehaviorSpec,
+ ) -> std::io::Result<()>;
+ fn end_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()>;
+ fn end_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()>;
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()>;
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()>;
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()>;
+ fn end_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ // Signal-related methods
+ fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()>;
+ fn end_signals(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()>;
+ fn end_signal_masks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ // Side effects and state transitions
+ fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()>;
+ fn end_side_effects(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn state_transition(
+ &mut self,
+ w: &mut dyn Write,
+ trans: &StateTransitionSpec,
+ ) -> std::io::Result<()>;
+ fn end_state_transitions(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ // Constraints and locks
+ fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpec)
+ -> std::io::Result<()>;
+ fn end_constraints(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()>;
+ fn end_locks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+ fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()>;
+ fn struct_spec(&mut self, w: &mut dyn Write, spec: &StructSpec) -> std::io::Result<()>;
+ fn end_struct_specs(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+}
+
+pub fn create_formatter(format: OutputFormat) -> Box<dyn OutputFormatter> {
+ match format {
+ OutputFormat::Plain => Box::new(PlainFormatter::new()),
+ OutputFormat::Json => Box::new(JsonFormatter::new()),
+ OutputFormat::Rst => Box::new(RstFormatter::new()),
+ }
+}
diff --git a/tools/kapi/src/formatter/plain.rs b/tools/kapi/src/formatter/plain.rs
new file mode 100644
index 0000000000000..569af9fd7b09b
--- /dev/null
+++ b/tools/kapi/src/formatter/plain.rs
@@ -0,0 +1,549 @@
+use super::OutputFormatter;
+use crate::extractor::{
+ AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+ ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+ SocketStateSpec, StateTransitionSpec,
+};
+use std::io::Write;
+
+pub struct PlainFormatter;
+
+impl PlainFormatter {
+ pub fn new() -> Self {
+ PlainFormatter
+ }
+}
+
+impl OutputFormatter for PlainFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ writeln!(w, "\n{title}:")?;
+ writeln!(w, "{}", "-".repeat(title.len() + 1))
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str) -> std::io::Result<()> {
+ writeln!(w, " {name}")
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\nTotal specifications found: {count}")
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ writeln!(w, "\nDetailed information for {name}:")?;
+ writeln!(w, "{}=", "=".repeat(25 + name.len()))
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "Description: {desc}")
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "\nDetailed Description:")?;
+ writeln!(w, "{desc}")
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nExecution Context:")
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ writeln!(w, " - {flag}")
+ }
+
+ fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nParameters ({count}):")
+ }
+
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ writeln!(
+ w,
+ " [{}] {} ({})",
+ param.index, param.name, param.type_name
+ )?;
+ if !param.description.is_empty() {
+ writeln!(w, " {}", param.description)?;
+ }
+
+ // Display flags
+ let mut flags = Vec::new();
+ if param.flags & 0x01 != 0 {
+ flags.push("IN");
+ }
+ if param.flags & 0x02 != 0 {
+ flags.push("OUT");
+ }
+ if param.flags & 0x04 != 0 {
+ flags.push("INOUT");
+ }
+ if param.flags & 0x08 != 0 {
+ flags.push("USER");
+ }
+ if param.flags & 0x10 != 0 {
+ flags.push("OPTIONAL");
+ }
+ if !flags.is_empty() {
+ writeln!(w, " Flags: {}", flags.join(" | "))?;
+ }
+
+ // Display constraints
+ if let Some(constraint) = ¶m.constraint {
+ writeln!(w, " Constraint: {constraint}")?;
+ }
+ if let (Some(min), Some(max)) = (param.min_value, param.max_value) {
+ writeln!(w, " Range: {min} to {max}")?;
+ }
+ if let Some(mask) = param.valid_mask {
+ writeln!(w, " Valid mask: 0x{mask:x}")?;
+ }
+ Ok(())
+ }
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ writeln!(w, "\nReturn Value:")?;
+ writeln!(w, " Type: {}", ret.type_name)?;
+ writeln!(w, " {}", ret.description)?;
+ if let Some(val) = ret.success_value {
+ writeln!(w, " Success value: {val}")?;
+ }
+ if let (Some(min), Some(max)) = (ret.success_min, ret.success_max) {
+ writeln!(w, " Success range: {min} to {max}")?;
+ }
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nPossible Errors ({count}):")
+ }
+
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ writeln!(w, " {} ({})", error.name, error.error_code)?;
+ if !error.condition.is_empty() {
+ writeln!(w, " Condition: {}", error.condition)?;
+ }
+ if !error.description.is_empty() {
+ writeln!(w, " {}", error.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ writeln!(w, "\nExamples:")?;
+ writeln!(w, "{examples}")
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ writeln!(w, "\nNotes:")?;
+ writeln!(w, "{notes}")
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, "\nAvailable since: {version}")
+ }
+
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ writeln!(w, "Subsystem: {subsystem}")
+ }
+
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ writeln!(w, "Sysfs Path: {path}")
+ }
+
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ writeln!(w, "Permissions: {perms}")
+ }
+
+ // Networking-specific methods
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ writeln!(w, "\nSocket State Requirements:")?;
+ if !state.required_states.is_empty() {
+ writeln!(w, " Required states: {:?}", state.required_states)?;
+ }
+ if !state.forbidden_states.is_empty() {
+ writeln!(w, " Forbidden states: {:?}", state.forbidden_states)?;
+ }
+ if let Some(result) = &state.resulting_state {
+ writeln!(w, " Resulting state: {result}")?;
+ }
+ if let Some(cond) = &state.condition {
+ writeln!(w, " Condition: {cond}")?;
+ }
+ if let Some(protos) = &state.applicable_protocols {
+ writeln!(w, " Applicable protocols: {protos}")?;
+ }
+ Ok(())
+ }
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nProtocol-Specific Behaviors:")
+ }
+
+ fn protocol_behavior(
+ &mut self,
+ w: &mut dyn Write,
+ behavior: &ProtocolBehaviorSpec,
+ ) -> std::io::Result<()> {
+ writeln!(
+ w,
+ " {} - {}",
+ behavior.applicable_protocols, behavior.behavior
+ )?;
+ if let Some(flags) = &behavior.protocol_flags {
+ writeln!(w, " Flags: {flags}")?;
+ }
+ Ok(())
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nSupported Address Families:")
+ }
+
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ writeln!(w, " {} ({}):", family.family_name, family.family)?;
+ writeln!(w, " Struct size: {} bytes", family.addr_struct_size)?;
+ writeln!(
+ w,
+ " Address length: {}-{} bytes",
+ family.min_addr_len, family.max_addr_len
+ )?;
+ if let Some(format) = &family.addr_format {
+ writeln!(w, " Format: {format}")?;
+ }
+ writeln!(
+ w,
+ " Features: wildcard={}, multicast={}, broadcast={}",
+ family.supports_wildcard, family.supports_multicast, family.supports_broadcast
+ )?;
+ if let Some(special) = &family.special_addresses {
+ writeln!(w, " Special addresses: {special}")?;
+ }
+ if family.port_range_max > 0 {
+ writeln!(
+ w,
+ " Port range: {}-{}",
+ family.port_range_min, family.port_range_max
+ )?;
+ }
+ Ok(())
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ writeln!(w, "\nBuffer Specification:")?;
+ if let Some(behaviors) = &spec.buffer_behaviors {
+ writeln!(w, " Behaviors: {behaviors}")?;
+ }
+ if let Some(min) = spec.min_buffer_size {
+ writeln!(w, " Min size: {min} bytes")?;
+ }
+ if let Some(max) = spec.max_buffer_size {
+ writeln!(w, " Max size: {max} bytes")?;
+ }
+ if let Some(optimal) = spec.optimal_buffer_size {
+ writeln!(w, " Optimal size: {optimal} bytes")?;
+ }
+ Ok(())
+ }
+
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ writeln!(w, "\nAsynchronous Operation:")?;
+ if let Some(modes) = &spec.supported_modes {
+ writeln!(w, " Supported modes: {modes}")?;
+ }
+ if let Some(errno) = spec.nonblock_errno {
+ writeln!(w, " Non-blocking errno: {errno}")?;
+ }
+ Ok(())
+ }
+
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "\nNetwork Data Transfer: {desc}")
+ }
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w, "\nRequired Capabilities:")
+ }
+
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ writeln!(w, " {} ({}) - {}", cap.name, cap.capability, cap.action)?;
+ if !cap.allows.is_empty() {
+ writeln!(w, " Allows: {}", cap.allows)?;
+ }
+ if !cap.without_cap.is_empty() {
+ writeln!(w, " Without capability: {}", cap.without_cap)?;
+ }
+ if let Some(cond) = &cap.check_condition {
+ writeln!(w, " Condition: {cond}")?;
+ }
+ Ok(())
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Signal-related methods
+ fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSignal Specifications ({count}):")
+ }
+
+ fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::io::Result<()> {
+ write!(w, " {} ({})", signal.signal_name, signal.signal_num)?;
+
+ // Display direction
+ let direction = match signal.direction {
+ 0 => "SEND",
+ 1 => "RECEIVE",
+ 2 => "HANDLE",
+ 3 => "IGNORE",
+ _ => "UNKNOWN",
+ };
+ write!(w, " - {direction}")?;
+
+ // Display action
+ let action = match signal.action {
+ 0 => "DEFAULT",
+ 1 => "TERMINATE",
+ 2 => "COREDUMP",
+ 3 => "STOP",
+ 4 => "CONTINUE",
+ 5 => "IGNORE",
+ 6 => "CUSTOM",
+ 7 => "DISCARD",
+ _ => "UNKNOWN",
+ };
+ writeln!(w, " - {action}")?;
+
+ if let Some(target) = &signal.target {
+ writeln!(w, " Target: {target}")?;
+ }
+ if let Some(condition) = &signal.condition {
+ writeln!(w, " Condition: {condition}")?;
+ }
+ if let Some(desc) = &signal.description {
+ writeln!(w, " {desc}")?;
+ }
+
+ // Display timing
+ let timing = match signal.timing {
+ 0 => "BEFORE",
+ 1 => "DURING",
+ 2 => "AFTER",
+ 3 => "EXIT",
+ _ => "UNKNOWN",
+ };
+ writeln!(w, " Timing: {timing}")?;
+ writeln!(w, " Priority: {}", signal.priority)?;
+
+ if signal.restartable {
+ writeln!(w, " Restartable: yes")?;
+ }
+ if signal.interruptible {
+ writeln!(w, " Interruptible: yes")?;
+ }
+ if let Some(error) = signal.error_on_signal {
+ writeln!(w, " Error on signal: {error}")?;
+ }
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSignal Masks ({count}):")
+ }
+
+ fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) -> std::io::Result<()> {
+ writeln!(w, " {}", mask.name)?;
+ if !mask.description.is_empty() {
+ writeln!(w, " {}", mask.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Side effects and state transitions
+ fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nSide Effects ({count}):")
+ }
+
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ writeln!(w, " {} - {}", effect.target, effect.description)?;
+ if let Some(condition) = &effect.condition {
+ writeln!(w, " Condition: {condition}")?;
+ }
+ if effect.reversible {
+ writeln!(w, " Reversible: yes")?;
+ }
+ Ok(())
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nState Transitions ({count}):")
+ }
+
+ fn state_transition(
+ &mut self,
+ w: &mut dyn Write,
+ trans: &StateTransitionSpec,
+ ) -> std::io::Result<()> {
+ writeln!(
+ w,
+ " {} : {} -> {}",
+ trans.object, trans.from_state, trans.to_state
+ )?;
+ if let Some(condition) = &trans.condition {
+ writeln!(w, " Condition: {condition}")?;
+ }
+ if !trans.description.is_empty() {
+ writeln!(w, " {}", trans.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Constraints and locks
+ fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nAdditional Constraints ({count}):")
+ }
+
+ fn constraint(
+ &mut self,
+ w: &mut dyn Write,
+ constraint: &ConstraintSpec,
+ ) -> std::io::Result<()> {
+ writeln!(w, " {}", constraint.name)?;
+ if !constraint.description.is_empty() {
+ writeln!(w, " {}", constraint.description)?;
+ }
+ if let Some(expr) = &constraint.expression {
+ writeln!(w, " Expression: {expr}")?;
+ }
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nLocking Requirements ({count}):")
+ }
+
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ write!(w, " {}", lock.lock_name)?;
+
+ // Display lock type
+ let lock_type = match lock.lock_type {
+ 0 => "NONE",
+ 1 => "MUTEX",
+ 2 => "SPINLOCK",
+ 3 => "RWLOCK",
+ 4 => "SEQLOCK",
+ 5 => "RCU",
+ 6 => "SEMAPHORE",
+ 7 => "CUSTOM",
+ _ => "UNKNOWN",
+ };
+ writeln!(w, " ({lock_type})")?;
+
+ let scope_str = match lock.scope {
+ 0 => "acquired and released",
+ 1 => "acquired (not released)",
+ 2 => "released (held on entry)",
+ 3 => "held by caller",
+ _ => "unknown",
+ };
+ writeln!(w, " Scope: {scope_str}")?;
+
+ if !lock.description.is_empty() {
+ writeln!(w, " {}", lock.description)?;
+ }
+ Ok(())
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ writeln!(w, "\nStructure Specifications ({count}):")
+ }
+
+ fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::StructSpec) -> std::io::Result<()> {
+ writeln!(w, " {} (size={}, align={}):", spec.name, spec.size, spec.alignment)?;
+ if !spec.description.is_empty() {
+ writeln!(w, " {}", spec.description)?;
+ }
+
+ if !spec.fields.is_empty() {
+ writeln!(w, " Fields ({}):", spec.field_count)?;
+ for field in &spec.fields {
+ write!(w, " - {} ({}):", field.name, field.type_name)?;
+ if !field.description.is_empty() {
+ write!(w, " {}", field.description)?;
+ }
+ writeln!(w)?;
+
+ // Show constraints if present
+ if field.min_value != 0 || field.max_value != 0 {
+ writeln!(w, " Range: [{}, {}]", field.min_value, field.max_value)?;
+ }
+ if field.valid_mask != 0 {
+ writeln!(w, " Mask: {:#x}", field.valid_mask)?;
+ }
+ }
+ }
+ Ok(())
+ }
+
+ fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
diff --git a/tools/kapi/src/formatter/rst.rs b/tools/kapi/src/formatter/rst.rs
new file mode 100644
index 0000000000000..51d0be911480b
--- /dev/null
+++ b/tools/kapi/src/formatter/rst.rs
@@ -0,0 +1,621 @@
+use super::OutputFormatter;
+use crate::extractor::{
+ AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec,
+ ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec,
+ SocketStateSpec, StateTransitionSpec,
+};
+use std::io::Write;
+
+pub struct RstFormatter {
+ current_section_level: usize,
+}
+
+impl RstFormatter {
+ pub fn new() -> Self {
+ RstFormatter {
+ current_section_level: 0,
+ }
+ }
+
+ fn section_char(level: usize) -> char {
+ match level {
+ 0 => '=',
+ 1 => '-',
+ 2 => '~',
+ 3 => '^',
+ _ => '"',
+ }
+ }
+}
+
+impl OutputFormatter for RstFormatter {
+ fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::io::Result<()> {
+ writeln!(w, "\n{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(0).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) -> std::io::Result<()> {
+ writeln!(w, "* **{name}** (*{api_type}*)")
+ }
+
+ fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io::Result<()> {
+ writeln!(w, "\n**Total specifications found:** {count}")
+ }
+
+ fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std::io::Result<()> {
+ self.current_section_level = 0;
+ writeln!(w, "\n{name}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(0).to_string().repeat(name.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "**{desc}**")?;
+ writeln!(w)
+ }
+
+ fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "{desc}")?;
+ writeln!(w)
+ }
+
+ fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Execution Context";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::Result<()> {
+ writeln!(w, "* {flag}")
+ }
+
+ fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ writeln!(w)
+ }
+
+ fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Parameters ({count})");
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Possible Errors ({count})");
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Examples";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)?;
+ writeln!(w, ".. code-block:: c")?;
+ writeln!(w)?;
+ for line in examples.lines() {
+ writeln!(w, " {line}")?;
+ }
+ writeln!(w)
+ }
+
+ fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Notes";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)?;
+ writeln!(w, "{notes}")?;
+ writeln!(w)
+ }
+
+ fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::io::Result<()> {
+ writeln!(w, ":Available since: {version}")?;
+ writeln!(w)
+ }
+
+ fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> std::io::Result<()> {
+ writeln!(w, ":Subsystem: {subsystem}")?;
+ writeln!(w)
+ }
+
+ fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Result<()> {
+ writeln!(w, ":Sysfs Path: {path}")?;
+ writeln!(w)
+ }
+
+ fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std::io::Result<()> {
+ writeln!(w, ":Permissions: {perms}")?;
+ writeln!(w)
+ }
+
+ // Networking-specific methods
+ fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Socket State Requirements";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)?;
+
+ if !state.required_states.is_empty() {
+ writeln!(
+ w,
+ "**Required states:** {}",
+ state.required_states.join(", ")
+ )?;
+ }
+ if !state.forbidden_states.is_empty() {
+ writeln!(
+ w,
+ "**Forbidden states:** {}",
+ state.forbidden_states.join(", ")
+ )?;
+ }
+ if let Some(result) = &state.resulting_state {
+ writeln!(w, "**Resulting state:** {result}")?;
+ }
+ if let Some(cond) = &state.condition {
+ writeln!(w, "**Condition:** {cond}")?;
+ }
+ if let Some(protos) = &state.applicable_protocols {
+ writeln!(w, "**Applicable protocols:** {protos}")?;
+ }
+ writeln!(w)
+ }
+
+ fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Protocol-Specific Behaviors";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn protocol_behavior(
+ &mut self,
+ w: &mut dyn Write,
+ behavior: &ProtocolBehaviorSpec,
+ ) -> std::io::Result<()> {
+ writeln!(w, "**{}**", behavior.applicable_protocols)?;
+ writeln!(w)?;
+ writeln!(w, "{}", behavior.behavior)?;
+ if let Some(flags) = &behavior.protocol_flags {
+ writeln!(w)?;
+ writeln!(w, "*Flags:* {flags}")?;
+ }
+ writeln!(w)
+ }
+
+ fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Supported Address Families";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) -> std::io::Result<()> {
+ writeln!(w, "**{} ({})**", family.family_name, family.family)?;
+ writeln!(w)?;
+ writeln!(w, "* **Struct size:** {} bytes", family.addr_struct_size)?;
+ writeln!(
+ w,
+ "* **Address length:** {}-{} bytes",
+ family.min_addr_len, family.max_addr_len
+ )?;
+ if let Some(format) = &family.addr_format {
+ writeln!(w, "* **Format:** ``{format}``")?;
+ }
+ writeln!(
+ w,
+ "* **Features:** wildcard={}, multicast={}, broadcast={}",
+ family.supports_wildcard, family.supports_multicast, family.supports_broadcast
+ )?;
+ if let Some(special) = &family.special_addresses {
+ writeln!(w, "* **Special addresses:** {special}")?;
+ }
+ if family.port_range_max > 0 {
+ writeln!(
+ w,
+ "* **Port range:** {}-{}",
+ family.port_range_min, family.port_range_max
+ )?;
+ }
+ writeln!(w)
+ }
+
+ fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Buffer Specification";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)?;
+
+ if let Some(behaviors) = &spec.buffer_behaviors {
+ writeln!(w, "**Behaviors:** {behaviors}")?;
+ }
+ if let Some(min) = spec.min_buffer_size {
+ writeln!(w, "**Min size:** {min} bytes")?;
+ }
+ if let Some(max) = spec.max_buffer_size {
+ writeln!(w, "**Max size:** {max} bytes")?;
+ }
+ if let Some(optimal) = spec.optimal_buffer_size {
+ writeln!(w, "**Optimal size:** {optimal} bytes")?;
+ }
+ writeln!(w)
+ }
+
+ fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Asynchronous Operation";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)?;
+
+ if let Some(modes) = &spec.supported_modes {
+ writeln!(w, "**Supported modes:** {modes}")?;
+ }
+ if let Some(errno) = spec.nonblock_errno {
+ writeln!(w, "**Non-blocking errno:** {errno}")?;
+ }
+ writeln!(w)
+ }
+
+ fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std::io::Result<()> {
+ writeln!(w, "**Network Data Transfer:** {desc}")?;
+ writeln!(w)
+ }
+
+ fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = "Required Capabilities";
+ writeln!(w, "{title}")?;
+ writeln!(
+ w,
+ "{}",
+ Self::section_char(1).to_string().repeat(title.len())
+ )?;
+ writeln!(w)
+ }
+
+ fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> std::io::Result<()> {
+ writeln!(w, "**{} ({})** - {}", cap.name, cap.capability, cap.action)?;
+ writeln!(w)?;
+ if !cap.allows.is_empty() {
+ writeln!(w, "* **Allows:** {}", cap.allows)?;
+ }
+ if !cap.without_cap.is_empty() {
+ writeln!(w, "* **Without capability:** {}", cap.without_cap)?;
+ }
+ if let Some(cond) = &cap.check_condition {
+ writeln!(w, "* **Condition:** {}", cond)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ // Stub implementations for new methods
+ fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::io::Result<()> {
+ writeln!(
+ w,
+ "**[{}] {}** (*{}*)",
+ param.index, param.name, param.type_name
+ )?;
+ writeln!(w)?;
+ writeln!(w, " {}", param.description)?;
+
+ // Display flags
+ let mut flags = Vec::new();
+ if param.flags & 0x01 != 0 {
+ flags.push("IN");
+ }
+ if param.flags & 0x02 != 0 {
+ flags.push("OUT");
+ }
+ if param.flags & 0x04 != 0 {
+ flags.push("USER");
+ }
+ if param.flags & 0x08 != 0 {
+ flags.push("OPTIONAL");
+ }
+ if !flags.is_empty() {
+ writeln!(w, " :Flags: {}", flags.join(", "))?;
+ }
+
+ if let Some(constraint) = ¶m.constraint {
+ writeln!(w, " :Constraint: {}", constraint)?;
+ }
+
+ if let (Some(min), Some(max)) = (param.min_value, param.max_value) {
+ writeln!(w, " :Range: {} to {}", min, max)?;
+ }
+
+ writeln!(w)
+ }
+
+ fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std::io::Result<()> {
+ writeln!(w, "\nReturn Value")?;
+ writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(12))?;
+ writeln!(w)?;
+ writeln!(w, ":Type: {}", ret.type_name)?;
+ writeln!(w, ":Description: {}", ret.description)?;
+ if let Some(success) = ret.success_value {
+ writeln!(w, ":Success value: {}", success)?;
+ }
+ writeln!(w)
+ }
+
+ fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::Result<()> {
+ writeln!(w, "**{}** ({})", error.name, error.error_code)?;
+ writeln!(w)?;
+ writeln!(w, " :Condition: {}", error.condition)?;
+ if !error.description.is_empty() {
+ writeln!(w, " :Description: {}", error.description)?;
+ }
+ writeln!(w)
+ }
+
+ fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal(&mut self, _w: &mut dyn Write, _signal: &SignalSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn signal_mask(&mut self, _w: &mut dyn Write, _mask: &SignalMaskSpec) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Side Effects ({count})");
+ writeln!(w, "{}\n", title)?;
+ writeln!(
+ w,
+ "{}\n",
+ Self::section_char(1).to_string().repeat(title.len())
+ )
+ }
+
+ fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) -> std::io::Result<()> {
+ write!(w, "* **{}**", effect.target)?;
+ if effect.reversible {
+ write!(w, " *(reversible)*")?;
+ }
+ writeln!(w)?;
+ writeln!(w, " {}", effect.description)?;
+ if let Some(cond) = &effect.condition {
+ writeln!(w, " :Condition: {}", cond)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("State Transitions ({count})");
+ writeln!(w, "{}\n", title)?;
+ writeln!(
+ w,
+ "{}\n",
+ Self::section_char(1).to_string().repeat(title.len())
+ )
+ }
+
+ fn state_transition(
+ &mut self,
+ w: &mut dyn Write,
+ trans: &StateTransitionSpec,
+ ) -> std::io::Result<()> {
+ writeln!(
+ w,
+ "* **{}**: {} → {}",
+ trans.object, trans.from_state, trans.to_state
+ )?;
+ writeln!(w, " {}", trans.description)?;
+ if let Some(cond) = &trans.condition {
+ writeln!(w, " :Condition: {}", cond)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn constraint(
+ &mut self,
+ _w: &mut dyn Write,
+ _constraint: &ConstraintSpec,
+ ) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::Result<()> {
+ self.current_section_level = 1;
+ let title = format!("Locks ({count})");
+ writeln!(w, "{}\n", title)?;
+ writeln!(
+ w,
+ "{}\n",
+ Self::section_char(1).to_string().repeat(title.len())
+ )
+ }
+
+ fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Result<()> {
+ write!(w, "* **{}**", lock.lock_name)?;
+ let lock_type_str = match lock.lock_type {
+ 1 => " *(mutex)*",
+ 2 => " *(spinlock)*",
+ 3 => " *(rwlock)*",
+ 4 => " *(semaphore)*",
+ 5 => " *(RCU)*",
+ _ => "",
+ };
+ writeln!(w, "{}", lock_type_str)?;
+ if !lock.description.is_empty() {
+ writeln!(w, " {}", lock.description)?;
+ }
+ writeln!(w)
+ }
+
+ fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+
+ fn begin_struct_specs(&mut self, w: &mut dyn Write, _count: u32) -> std::io::Result<()> {
+ writeln!(w)?;
+ writeln!(w, "Structure Specifications")?;
+ writeln!(w, "~~~~~~~~~~~~~~~~~~~~~~~")?;
+ writeln!(w)
+ }
+
+ fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::StructSpec) -> std::io::Result<()> {
+ writeln!(w, "**{}**", spec.name)?;
+ writeln!(w)?;
+
+ if !spec.description.is_empty() {
+ writeln!(w, " {}", spec.description)?;
+ writeln!(w)?;
+ }
+
+ writeln!(w, " :Size: {} bytes", spec.size)?;
+ writeln!(w, " :Alignment: {} bytes", spec.alignment)?;
+ writeln!(w, " :Fields: {}", spec.field_count)?;
+ writeln!(w)?;
+
+ if !spec.fields.is_empty() {
+ for field in &spec.fields {
+ writeln!(w, " * **{}** ({})", field.name, field.type_name)?;
+ if !field.description.is_empty() {
+ writeln!(w, " {}", field.description)?;
+ }
+ if field.min_value != 0 || field.max_value != 0 {
+ writeln!(w, " Range: [{}, {}]", field.min_value, field.max_value)?;
+ }
+ }
+ writeln!(w)?;
+ }
+
+ Ok(())
+ }
+
+ fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+ Ok(())
+ }
+}
diff --git a/tools/kapi/src/main.rs b/tools/kapi/src/main.rs
new file mode 100644
index 0000000000000..2d219046f3287
--- /dev/null
+++ b/tools/kapi/src/main.rs
@@ -0,0 +1,116 @@
+//! kapi - Kernel API Specification Tool
+//!
+//! This tool extracts and displays kernel API specifications from multiple sources:
+//! - Kernel source code (KAPI macros)
+//! - Compiled vmlinux binaries (`.kapi_specs` ELF section)
+//! - Running kernel via debugfs
+
+use anyhow::Result;
+use clap::Parser;
+use std::io::{self, Write};
+
+mod extractor;
+mod formatter;
+
+use extractor::{ApiExtractor, DebugfsExtractor, SourceExtractor, VmlinuxExtractor};
+use formatter::{OutputFormat, create_formatter};
+
+#[derive(Parser, Debug)]
+#[command(author, version, about, long_about = None)]
+struct Args {
+ /// Path to the vmlinux file
+ #[arg(long, value_name = "PATH", group = "input")]
+ vmlinux: Option<String>,
+
+ /// Path to kernel source directory or file
+ #[arg(long, value_name = "PATH", group = "input")]
+ source: Option<String>,
+
+ /// Path to debugfs (defaults to /sys/kernel/debug if not specified)
+ #[arg(long, value_name = "PATH", group = "input")]
+ debugfs: Option<String>,
+
+ /// Optional: Name of specific API to show details for
+ api_name: Option<String>,
+
+ /// Output format
+ #[arg(long, short = 'f', default_value = "plain")]
+ format: String,
+}
+
+fn main() -> Result<()> {
+ let args = Args::parse();
+
+ let output_format: OutputFormat = args
+ .format
+ .parse()
+ .map_err(|e: String| anyhow::anyhow!(e))?;
+
+ let extractor: Box<dyn ApiExtractor> = match (args.vmlinux, args.source, args.debugfs.clone()) {
+ (Some(vmlinux_path), None, None) => Box::new(VmlinuxExtractor::new(&vmlinux_path)?),
+ (None, Some(source_path), None) => Box::new(SourceExtractor::new(&source_path)?),
+ (None, None, Some(_) | None) => {
+ // If debugfs is specified or no input is provided, use debugfs
+ Box::new(DebugfsExtractor::new(args.debugfs)?)
+ }
+ _ => {
+ anyhow::bail!("Please specify only one of --vmlinux, --source, or --debugfs")
+ }
+ };
+
+ display_apis(extractor.as_ref(), args.api_name, output_format)
+}
+
+fn display_apis(
+ extractor: &dyn ApiExtractor,
+ api_name: Option<String>,
+ output_format: OutputFormat,
+) -> Result<()> {
+ let mut formatter = create_formatter(output_format);
+ let mut stdout = io::stdout();
+
+ formatter.begin_document(&mut stdout)?;
+
+ if let Some(api_name_req) = api_name {
+ // Use the extractor to display API details
+ if let Some(_spec) = extractor.extract_by_name(&api_name_req)? {
+ extractor.display_api_details(&api_name_req, &mut *formatter, &mut stdout)?;
+ } else if output_format == OutputFormat::Plain {
+ writeln!(stdout, "\nAPI '{}' not found.", api_name_req)?;
+ writeln!(stdout, "\nAvailable APIs:")?;
+ for spec in extractor.extract_all()? {
+ writeln!(stdout, " {} ({})", spec.name, spec.api_type)?;
+ }
+ }
+ } else {
+ // Display list of APIs using the extractor
+ let all_specs = extractor.extract_all()?;
+
+ // Helper to display API list for a specific type
+ let mut display_api_type = |api_type: &str, title: &str| -> Result<()> {
+ let filtered: Vec<_> = all_specs.iter()
+ .filter(|s| s.api_type == api_type)
+ .collect();
+
+ if !filtered.is_empty() {
+ formatter.begin_api_list(&mut stdout, title)?;
+ for spec in filtered {
+ formatter.api_item(&mut stdout, &spec.name, &spec.api_type)?;
+ }
+ formatter.end_api_list(&mut stdout)?;
+ }
+ Ok(())
+ };
+
+ display_api_type("syscall", "System Calls")?;
+ display_api_type("ioctl", "IOCTLs")?;
+ display_api_type("function", "Functions")?;
+ display_api_type("sysfs", "Sysfs Attributes")?;
+
+ formatter.total_specs(&mut stdout, all_specs.len())?;
+ }
+
+ formatter.end_document(&mut stdout)?;
+
+ Ok(())
+}
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 05/15] kernel/api: add API specification for io_setup
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (3 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 04/15] tools/kapi: Add kernel API specification extraction tool Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 06/15] kernel/api: add API specification for io_destroy Sasha Levin
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/aio.c | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 216 insertions(+), 12 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 0a23a8c0717ff..36556e7a8e2c0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1366,18 +1366,222 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
return ret;
}
-/* sys_io_setup:
- * Create an aio_context capable of receiving at least nr_events.
- * ctxp must not point to an aio_context that already exists, and
- * must be initialized to 0 prior to the call. On successful
- * creation of the aio_context, *ctxp is filled in with the resulting
- * handle. May fail with -EINVAL if *ctxp is not initialized,
- * if the specified nr_events exceeds internal limits. May fail
- * with -EAGAIN if the specified nr_events exceeds the user's limit
- * of available events. May fail with -ENOMEM if insufficient kernel
- * resources are available. May fail with -EFAULT if an invalid
- * pointer is passed for ctxp. Will fail with -ENOSYS if not
- * implemented.
+/**
+ * sys_io_setup - Create an asynchronous I/O context
+ * @nr_events: Minimum number of concurrent AIO operations the context should support
+ * @ctxp: Pointer to aio_context_t variable to receive the context handle
+ *
+ * long-desc: Creates an asynchronous I/O context capable of receiving at least
+ * nr_events concurrent operations. The context handle is returned via ctxp,
+ * which must be initialized to 0 prior to the call. The returned context
+ * handle is used with subsequent AIO operations (io_submit, io_getevents,
+ * io_cancel, io_destroy).
+ *
+ * The AIO context consists of a memory-mapped ring buffer shared between
+ * kernel and userspace for efficient completion notification. The kernel
+ * internally allocates more capacity than requested to account for percpu
+ * batching (approximately nr_events * 2, but at least num_cpus * 8).
+ *
+ * The context is bound to the calling process and cannot be shared across
+ * processes. Each process can have multiple AIO contexts, limited only by
+ * the system-wide aio-max-nr sysctl.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: nr_events
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 1, 8388608
+ * constraint: Must be greater than 0. Internal limit of approximately 8M events
+ * prevents overflow when calculating ring buffer size (0x10000000 / 32 bytes
+ * per io_event). The kernel may allocate more capacity than requested to
+ * optimize for percpu batching.
+ *
+ * param: ctxp
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_INOUT | KAPI_PARAM_USER
+ * size: sizeof(aio_context_t)
+ * constraint-type: KAPI_CONSTRAINT_USER_PTR
+ * constraint: Must be a valid userspace pointer to an aio_context_t variable.
+ * The memory pointed to MUST be initialized to 0 before the call. On success,
+ * receives the context handle (actually the mmap address of the ring buffer).
+ * The context handle is opaque and should not be interpreted by userspace
+ * except to pass to other io_* syscalls.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: 0
+ * desc: Returns 0 on success. On success, *ctxp contains the new context handle.
+ *
+ * error: EFAULT, Invalid pointer
+ * desc: The ctxp pointer is invalid, not accessible, or points to memory that
+ * cannot be read or written. Returned from get_user() when reading the
+ * initial value or from put_user() when storing the context handle.
+ *
+ * error: EINVAL, Invalid parameter
+ * desc: Either *ctxp is not initialized to 0 (indicating an existing context or
+ * uninitialized memory), or nr_events is 0, or nr_events is too large causing
+ * internal overflow when calculating ring buffer size. The internal limit is
+ * approximately 0x10000000 / sizeof(struct io_event) events.
+ *
+ * error: EAGAIN, Resource limit exceeded
+ * desc: The system-wide limit on AIO contexts would be exceeded. The limit is
+ * controlled by /proc/sys/fs/aio-max-nr (default 65536). Each context counts
+ * as nr_events toward this limit. Also returned if nr_events exceeds the
+ * current aio-max-nr value. Unlike ENOMEM, this error indicates a policy
+ * limit rather than physical resource exhaustion.
+ *
+ * error: ENOMEM, Insufficient memory
+ * desc: Kernel could not allocate required memory for the AIO context. This
+ * includes the kioctx structure, percpu data, ring buffer pages, or the
+ * anonymous file backing the ring buffer. Also returned if the kernel could
+ * not establish the memory mapping for the ring buffer, or if ioctx_table
+ * expansion failed.
+ *
+ * error: EINTR, Interrupted by signal
+ * desc: A fatal signal was received while attempting to acquire the mmap_lock
+ * for the ring buffer memory mapping. The operation was aborted before any
+ * state was modified. Only fatal signals (SIGKILL) can cause this error;
+ * normal signals like SIGINT do not interrupt the operation.
+ *
+ * lock: aio_nr_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Global spinlock protecting the system-wide aio_nr counter. Held briefly
+ * to check and update the system-wide AIO context count.
+ *
+ * lock: mm->ioctx_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-mm spinlock protecting the ioctx_table. Held while adding the new
+ * context to the process's AIO context table.
+ *
+ * lock: ctx->ring_lock
+ * type: KAPI_LOCK_MUTEX
+ * desc: Per-context mutex protecting ring buffer setup. Held throughout context
+ * initialization to prevent page migration during setup, then released once
+ * the context is fully initialized.
+ *
+ * lock: mmap_lock
+ * type: KAPI_LOCK_RWLOCK
+ * desc: Process memory map write lock. Acquired via mmap_write_lock_killable()
+ * during ring buffer mmap operation. This is where EINTR can occur.
+ *
+ * signal: SIGKILL
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RETURN
+ * condition: Fatal signal pending during mmap_write_lock_killable
+ * desc: Fatal signals can interrupt the context creation during the mmap phase.
+ * The mmap_write_lock_killable() function checks for fatal signals and returns
+ * -EINTR if one is pending. Non-fatal signals do not interrupt this syscall.
+ * error: -EINTR
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * priority: 0
+ * restartable: no
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: kioctx structure
+ * desc: Allocates the main AIO context structure from kioctx_cachep slab cache.
+ * Contains ring buffer metadata, locks, and request tracking.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: percpu kioctx_cpu structures
+ * desc: Allocates per-CPU structures for request batching via alloc_percpu().
+ * Used to reduce contention on the global request counter.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: ring buffer pages
+ * desc: Allocates pages for the completion event ring buffer. The ring is backed
+ * by an anonymous file on the internal "aio" filesystem and memory-mapped into
+ * the process address space.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_CREATE
+ * target: anonymous inode and file
+ * desc: Creates an anonymous inode and file on the internal aio filesystem to
+ * back the ring buffer mapping. This enables proper page migration support.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: process virtual memory
+ * desc: Creates a new memory mapping (VMA) for the ring buffer in the process
+ * address space. The mapping is marked VM_DONTEXPAND and uses aio_ring_vm_ops.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: mm->ioctx_table
+ * desc: Adds the new context to the process's AIO context table. The table is
+ * dynamically expanded if needed (grows by 4x each time).
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: aio_nr (global counter)
+ * desc: Increments the system-wide AIO context counter by nr_events. This counter
+ * is visible via /proc/sys/fs/aio-nr and counts toward the aio-max-nr limit.
+ * reversible: yes
+ *
+ * state-trans: process AIO state
+ * from: no AIO context (or fewer contexts)
+ * to: has AIO context
+ * condition: successful io_setup
+ * desc: Process gains an AIO context that can be used for asynchronous I/O
+ * operations. The context remains until explicitly destroyed via io_destroy
+ * or process exit.
+ *
+ * state-trans: system AIO resources
+ * from: aio_nr = N
+ * to: aio_nr = N + nr_events
+ * condition: successful io_setup
+ * desc: System-wide AIO resource counter increases. The counter tracks total
+ * requested AIO capacity across all processes.
+ *
+ * constraint: System-wide AIO limit (aio-max-nr)
+ * desc: The /proc/sys/fs/aio-max-nr sysctl (default 65536) limits the total
+ * number of AIO events system-wide. Each io_setup call adds nr_events to
+ * the aio_nr counter. If aio_nr + nr_events would exceed aio_max_nr, the
+ * call fails with EAGAIN. Administrators can increase aio-max-nr if needed.
+ * expr: aio_nr + nr_events <= aio_max_nr
+ *
+ * constraint: Per-process context limit
+ * desc: Each process can have multiple AIO contexts, limited only by the
+ * system-wide aio-max-nr limit and available memory. The ioctx_table grows
+ * dynamically to accommodate new contexts.
+ *
+ * constraint: CONFIG_AIO required
+ * desc: The kernel must be compiled with CONFIG_AIO=y for this syscall to be
+ * available. If not configured, the syscall returns -ENOSYS. This is typically
+ * enabled by default but may be disabled on embedded systems.
+ *
+ * constraint: Memory for ring buffer
+ * desc: The kernel must be able to allocate sufficient contiguous pages for the
+ * ring buffer and establish the memory mapping. Large nr_events values require
+ * more memory and may fail with ENOMEM on memory-constrained systems.
+ *
+ * examples: aio_context_t ctx = 0; io_setup(128, &ctx); // Create context for 128 events
+ * aio_context_t ctx = 0; io_setup(1024, &ctx); // Create context for 1024 events
+ *
+ * notes: The returned context handle is actually the virtual address of the ring
+ * buffer mapping in the process address space. This allows userspace libraries
+ * to directly access completion events without syscall overhead in some cases.
+ *
+ * The kernel internally doubles nr_events and ensures a minimum of num_cpus * 8
+ * events for percpu batching efficiency. This means the actual ring capacity may
+ * be significantly larger than requested.
+ *
+ * Historical note: A race condition between io_setup and io_destroy was fixed
+ * in commit 86b62a2cb4fc ("aio: fix io_setup/io_destroy race"). Earlier kernels
+ * could have the context freed while io_setup was still completing.
+ *
+ * io_uring (since Linux 5.1) is a more modern alternative that provides better
+ * performance and more features. Consider using io_uring for new applications.
+ *
+ * There is no glibc wrapper for this syscall. Use syscall(SYS_io_setup, ...) or
+ * the libaio library wrapper (note: libaio has slightly different error semantics,
+ * returning negative error numbers directly instead of -1 with errno).
+ *
+ * since-version: 2.5
*/
SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
{
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 06/15] kernel/api: add API specification for io_destroy
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (4 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 05/15] kernel/api: add API specification for io_setup Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 07/15] kernel/api: add API specification for io_submit Sasha Levin
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/aio.c | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 184 insertions(+), 5 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 36556e7a8e2c0..ff2a8527e1b85 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1646,11 +1646,190 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p)
}
#endif
-/* sys_io_destroy:
- * Destroy the aio_context specified. May cancel any outstanding
- * AIOs and block on completion. Will fail with -ENOSYS if not
- * implemented. May fail with -EINVAL if the context pointed to
- * is invalid.
+/**
+ * sys_io_destroy - Destroy an asynchronous I/O context
+ * @ctx: AIO context handle returned by io_setup
+ *
+ * long-desc: Destroys the asynchronous I/O context identified by ctx. This
+ * syscall will attempt to cancel all outstanding asynchronous I/O operations
+ * against the context and block until all operations have completed. Once
+ * this syscall returns successfully, the context handle becomes invalid and
+ * must not be used with any other io_* syscalls.
+ *
+ * The context's memory-mapped ring buffer is unmapped from the process address
+ * space, and all associated kernel resources are freed. The system-wide AIO
+ * event counter (aio_nr) is decremented by the original nr_events value that
+ * was passed to io_setup when creating this context.
+ *
+ * This syscall blocks until all in-flight I/O operations have completed. This
+ * ensures that userspace buffers passed to io_submit are no longer accessed
+ * by the kernel after io_destroy returns. The wait is NOT interruptible by
+ * signals, so callers cannot cancel this blocking behavior.
+ *
+ * If two threads call io_destroy on the same context simultaneously, only the
+ * first call will succeed; subsequent calls return -EINVAL as the context is
+ * already marked as dead.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: ctx
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid context handle previously returned by io_setup.
+ * The handle is actually the virtual address of the ring buffer mapping in
+ * the calling process's address space. A value of 0 is always invalid.
+ * The context must not have been previously destroyed.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: 0
+ * desc: Returns 0 on success. After successful return, the context handle is
+ * invalid and all resources have been released. All outstanding I/O
+ * operations have completed.
+ *
+ * error: EINVAL, Invalid context
+ * desc: The ctx argument does not refer to a valid AIO context in the calling
+ * process. This can occur if: (1) ctx was never returned by io_setup,
+ * (2) ctx was returned by io_setup in a different process, (3) ctx was
+ * already destroyed by a previous io_destroy call, (4) ctx is 0 or an
+ * arbitrary invalid value, or (5) the ring buffer at the ctx address has
+ * been corrupted (e.g., the id field no longer matches).
+ *
+ * lock: mm->ioctx_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-mm spinlock protecting the ioctx_table. Held briefly while
+ * marking the context as dead and removing it from the process's AIO
+ * context table.
+ *
+ * lock: RCU read lock
+ * type: KAPI_LOCK_RCU
+ * desc: RCU read-side critical section held during context lookup in
+ * lookup_ioctx(). Protects against concurrent modification of the
+ * ioctx_table.
+ *
+ * lock: ctx->ctx_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-context spinlock held while cancelling outstanding I/O requests
+ * in free_ioctx_users(). Protects the active_reqs list.
+ *
+ * lock: mmap_lock
+ * type: KAPI_LOCK_RWLOCK
+ * desc: Process memory map write lock acquired during vm_munmap() when
+ * unmapping the ring buffer. May contend with other memory operations
+ * in the same process.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: ctx->dead flag
+ * desc: Atomically sets the context's dead flag to 1, marking it as being
+ * destroyed. This prevents new I/O submissions and ensures subsequent
+ * io_destroy calls return -EINVAL.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: mm->ioctx_table
+ * desc: Removes the context from the process's AIO context table by setting
+ * the corresponding table entry to NULL. After this, lookup_ioctx will
+ * no longer find this context.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: aio_nr (global counter)
+ * desc: Decrements the system-wide AIO context counter by the context's
+ * max_reqs value (the nr_events originally passed to io_setup). This
+ * counter is visible via /proc/sys/fs/aio-nr.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: process virtual memory
+ * desc: Unmaps the ring buffer from the process's address space via
+ * vm_munmap(). The memory region at ctx becomes invalid.
+ * condition: ctx->mmap_size > 0
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ * target: kioctx structure and associated resources
+ * desc: Frees the AIO context structure, percpu data, ring buffer pages, and
+ * the anonymous file backing the ring buffer. Deferred via RCU work queue
+ * to ensure safe cleanup after all references are dropped.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SIGNAL_SEND
+ * target: outstanding AIO operations
+ * desc: Cancels all outstanding asynchronous I/O operations by invoking their
+ * ki_cancel callbacks. The specific effect depends on the operation type
+ * (read, write, fsync, poll).
+ * condition: active_reqs list is not empty
+ * reversible: no
+ *
+ * state-trans: AIO context state
+ * from: alive (ctx->dead == 0)
+ * to: dead (ctx->dead == 1)
+ * condition: successful atomic exchange in kill_ioctx
+ * desc: The context transitions from usable to destroyed. Once dead, the
+ * context cannot be used for any operations and will be freed after all
+ * references are dropped.
+ *
+ * state-trans: process AIO state
+ * from: has AIO context(s)
+ * to: context removed (or no contexts)
+ * condition: successful io_destroy
+ * desc: The destroyed context is removed from the process's context table.
+ * If this was the only context, the process no longer has any active
+ * AIO contexts.
+ *
+ * state-trans: system AIO resources
+ * from: aio_nr = N
+ * to: aio_nr = N - max_reqs
+ * condition: successful io_destroy
+ * desc: System-wide AIO resource counter decreases, making room for other
+ * processes to create new AIO contexts.
+ *
+ * constraint: CONFIG_AIO required
+ * desc: The kernel must be compiled with CONFIG_AIO=y for this syscall to be
+ * available. If not configured, the syscall returns -ENOSYS. This is
+ * typically enabled by default but may be disabled on embedded systems.
+ *
+ * constraint: Context must belong to calling process
+ * desc: Each AIO context is bound to a specific process (mm_struct). A context
+ * created by one process cannot be destroyed by another process, even if
+ * the context handle value is somehow known.
+ * expr: ctx belongs to current->mm
+ *
+ * examples: io_destroy(ctx); // Destroy context and wait for completion
+ * if (io_destroy(ctx) == -EINVAL) handle_error(); // Invalid context
+ *
+ * notes: The man page documents EFAULT as a possible error, but code analysis
+ * shows that EFAULT conditions (e.g., invalid ring buffer pointer) actually
+ * result in EINVAL being returned, as lookup_ioctx returns NULL on any
+ * failure to access the ring buffer header.
+ *
+ * This syscall blocks in TASK_UNINTERRUPTIBLE state while waiting for
+ * outstanding I/O operations to complete. This means the process cannot be
+ * interrupted by signals during this wait. In extreme cases with very slow
+ * I/O devices, this could cause the process to appear hung.
+ *
+ * Historical note: Before kernel 3.11, io_destroy blocked waiting for I/O
+ * completion. A refactoring in 3.11 accidentally removed this behavior,
+ * creating a race where userspace buffers could be freed while the kernel
+ * was still using them. This was fixed by commit e02ba72aabfa that blocks
+ * io_destroy until all context requests are completed.
+ *
+ * Race condition handling: A race between io_destroy and io_submit was fixed
+ * by commit 7137c6bd4552. A race between io_setup and io_destroy was fixed
+ * by commit 86b62a2cb4fc. Both fixes ensure proper synchronization via
+ * reference counting.
+ *
+ * io_uring (since Linux 5.1) is a more modern alternative that provides better
+ * performance and more features. Consider using io_uring for new applications.
+ *
+ * There is no glibc wrapper for this syscall. Use syscall(SYS_io_destroy, ctx)
+ * or the libaio library wrapper io_destroy(). Note: libaio has slightly
+ * different error semantics, returning negative error numbers directly instead
+ * of -1 with errno.
+ *
+ * since-version: 2.5
*/
SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
{
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 07/15] kernel/api: add API specification for io_submit
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (5 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 06/15] kernel/api: add API specification for io_destroy Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel Sasha Levin
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/aio.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 308 insertions(+), 11 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index ff2a8527e1b85..f6f1b3790c88b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2450,17 +2450,314 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
return err;
}
-/* sys_io_submit:
- * Queue the nr iocbs pointed to by iocbpp for processing. Returns
- * the number of iocbs queued. May return -EINVAL if the aio_context
- * specified by ctx_id is invalid, if nr is < 0, if the iocb at
- * *iocbpp[0] is not properly initialized, if the operation specified
- * is invalid for the file descriptor in the iocb. May fail with
- * -EFAULT if any of the data structures point to invalid data. May
- * fail with -EBADF if the file descriptor specified in the first
- * iocb is invalid. May fail with -EAGAIN if insufficient resources
- * are available to queue any iocbs. Will return 0 if nr is 0. Will
- * fail with -ENOSYS if not implemented.
+/**
+ * sys_io_submit - Submit asynchronous I/O operations for processing
+ * @ctx_id: AIO context handle returned by io_setup
+ * @nr: Number of I/O control blocks to submit
+ * @iocbpp: Array of pointers to iocb structures describing the operations
+ *
+ * long-desc: Submits one or more asynchronous I/O operations for processing
+ * against a previously created AIO context. Each iocb structure describes
+ * a single I/O operation including the operation type, file descriptor,
+ * buffer, size, and offset.
+ *
+ * The syscall processes iocbs sequentially from the array. If an error
+ * occurs while processing an iocb, submission stops at that point and
+ * the number of successfully submitted operations is returned. This means
+ * partial submission is possible: if submitting 10 iocbs and the 5th fails,
+ * 4 is returned and iocbs 0-3 are queued for processing.
+ *
+ * Supported operations (specified via aio_lio_opcode):
+ * - IOCB_CMD_PREAD (0): Positioned read from file
+ * - IOCB_CMD_PWRITE (1): Positioned write to file
+ * - IOCB_CMD_FSYNC (2): Sync file data and metadata
+ * - IOCB_CMD_FDSYNC (3): Sync file data only
+ * - IOCB_CMD_POLL (5): Poll for events on file descriptor
+ * - IOCB_CMD_NOOP (6): No operation (useful for testing)
+ * - IOCB_CMD_PREADV (7): Positioned scatter read
+ * - IOCB_CMD_PWRITEV (8): Positioned gather write
+ *
+ * The iocb structure fields include:
+ * - aio_data: User data copied to io_event on completion
+ * - aio_lio_opcode: Operation type (one of IOCB_CMD_*)
+ * - aio_fildes: File descriptor for the operation
+ * - aio_buf: Buffer address (or iovec array for vectored ops)
+ * - aio_nbytes: Buffer size (or iovec count for vectored ops)
+ * - aio_offset: File offset for positioned operations
+ * - aio_flags: Optional flags (IOCB_FLAG_RESFD, IOCB_FLAG_IOPRIO)
+ * - aio_resfd: eventfd to signal on completion (if IOCB_FLAG_RESFD set)
+ * - aio_rw_flags: Per-operation RWF_* flags
+ * - aio_reqprio: I/O priority (if IOCB_FLAG_IOPRIO set)
+ *
+ * After successful submission, operations complete asynchronously. Results
+ * are delivered to the completion ring buffer and can be retrieved via
+ * io_getevents(). If aio_resfd specifies a valid eventfd, it is signaled
+ * when each operation completes.
+ *
+ * The actual I/O may complete synchronously if the data is cached or if
+ * the underlying filesystem doesn't support truly asynchronous I/O. In
+ * such cases, the operation is still reported via the completion ring.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: ctx_id
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid AIO context handle previously returned by
+ * io_setup() for the current process. The context must not have been
+ * destroyed. A value of 0 is always invalid. The handle is actually
+ * the virtual address of the ring buffer mapping.
+ *
+ * param: nr
+ * type: KAPI_TYPE_INT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, LONG_MAX
+ * constraint: Must be >= 0. If 0, the syscall returns immediately with 0.
+ * The actual number processed is capped to ctx->nr_events (the context's
+ * capacity). Very large values are effectively limited by the context
+ * capacity and available ring buffer slots.
+ *
+ * param: iocbpp
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid userspace pointer to an array of nr pointers
+ * to struct iocb. Each iocb pointer must itself be valid and point to a
+ * properly initialized iocb structure. The iocb structures must have
+ * aio_reserved2 set to 0 for forward compatibility.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_RANGE
+ * success: >= 0
+ * desc: Returns the number of iocbs successfully submitted (0 to nr). If
+ * partial submission occurs due to an error, returns the count of
+ * successfully submitted operations. Returns 0 if nr is 0.
+ *
+ * error: EINVAL, Invalid context or parameter
+ * desc: Returned if ctx_id is invalid, nr is negative, aio_reserved2 is
+ * non-zero, aio_lio_opcode is invalid, aio_buf/aio_nbytes overflow,
+ * aio_resfd is not an eventfd, conflicting aio_rw_flags, file lacks
+ * required operation support, invalid POLL/FSYNC parameters, or
+ * invalid aio_reqprio class.
+ *
+ * error: EFAULT, Invalid memory access
+ * desc: Returned if: (1) iocbpp is not a valid userspace pointer, (2) any
+ * pointer in the iocbpp array is invalid, (3) the iocb data cannot be
+ * copied from userspace, (4) aio_buf points to invalid memory, or
+ * (5) the kernel cannot write the aio_key field back to userspace.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: Returned if: (1) aio_fildes in an iocb does not refer to an open
+ * file, (2) aio_resfd does not refer to a valid file descriptor when
+ * IOCB_FLAG_RESFD is set, (3) the file is not opened with appropriate
+ * mode for the operation (e.g., read on write-only file).
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ * desc: Returned if insufficient slots are available in the completion
+ * ring buffer. This typically means too many operations are already
+ * in flight and the application should call io_getevents() to consume
+ * completed events before submitting more.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: Returned if: (1) IOCB_FLAG_IOPRIO is set and aio_reqprio specifies
+ * IOPRIO_CLASS_RT (real-time I/O priority) but the process lacks
+ * CAP_SYS_ADMIN or CAP_SYS_NICE capability, or (2) RWF_NOAPPEND is
+ * specified but the file has the append-only attribute (IS_APPEND).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ * desc: Returned if: (1) unsupported aio_rw_flags are specified, (2)
+ * RWF_NOWAIT is specified but the file doesn't support non-blocking I/O
+ * (FMODE_NOWAIT not set), (3) RWF_ATOMIC is specified for a read or
+ * the file doesn't support atomic writes, or (4) RWF_DONTCACHE is
+ * specified but not supported by the filesystem or file is DAX-mapped.
+ *
+ * error: EOVERFLOW, Value too large
+ * desc: Returned if aio_offset plus aio_nbytes would overflow and the
+ * file does not support unsigned offsets. This check prevents reading
+ * or writing past the maximum representable file position.
+ *
+ * error: ENOMEM, Out of memory
+ * desc: Returned if memory allocation fails when preparing credentials
+ * for IOCB_CMD_FSYNC operations, or if vectored I/O (preadv/pwritev)
+ * requires allocating iovec arrays larger than the stack buffer.
+ *
+ * lock: RCU read lock
+ * type: KAPI_LOCK_RCU
+ * desc: Acquired during context lookup in lookup_ioctx(). Protects against
+ * concurrent modification of the ioctx_table while looking up the
+ * context. Released before processing any iocbs.
+ *
+ * lock: ctx->completion_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-context spinlock acquired briefly during request slot allocation
+ * via user_refill_reqs_available() if the percpu request counter is empty.
+ * Protects the ring buffer tail and completed_events counters.
+ *
+ * lock: ctx->ctx_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-context spinlock acquired when adding cancellable requests to
+ * the active_reqs list. This enables io_cancel() to find and cancel
+ * in-flight operations.
+ *
+ * lock: blk_plug
+ * type: KAPI_LOCK_CUSTOM
+ * desc: Block layer plugging is enabled when nr > 2 (AIO_PLUG_THRESHOLD)
+ * to batch block I/O requests for better performance. This is not a
+ * traditional lock but affects I/O scheduling.
+ *
+ * signal: any
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_TRANSFORM
+ * condition: Signal arrives during underlying read/write operation
+ * desc: If a signal arrives during the underlying file read/write operation
+ * and the operation returns ERESTARTSYS/ERESTARTNOINTR/etc., the error
+ * is transformed to EINTR for the completion event. AIO operations cannot
+ * be restarted in the traditional sense because other operations may have
+ * already been submitted. The syscall itself (io_submit) is NOT interrupted
+ * by signals - only the individual async operations can be.
+ * error: -EINTR (in io_event.res, not syscall return)
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: no
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: aio_kiocb structures
+ * desc: Allocates one aio_kiocb structure per submitted operation from the
+ * kiocb_cachep slab cache. These structures track the in-flight operations
+ * and are freed after completion is recorded in the ring buffer.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: AIO context request counters
+ * desc: Decrements the available request slot counter in the context.
+ * Slots are reclaimed when completion events are consumed from the ring
+ * buffer via io_getevents().
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: ctx->active_reqs list
+ * desc: Cancellable operations (reads, writes, polls) are added to the
+ * context's active_reqs list, enabling cancellation via io_cancel().
+ * condition: Operation supports cancellation
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: iocb->aio_key field
+ * desc: The kernel writes KIOCB_KEY (0) to the aio_key field of each
+ * submitted iocb in userspace memory. This marks the iocb as submitted
+ * and is checked by io_cancel() to validate the iocb.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: file reference count
+ * desc: Increments the reference count of the file descriptor's struct file
+ * via fget() for each submitted operation. The reference is released
+ * when the operation completes (via fput() in iocb_destroy()).
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: target file(s)
+ * desc: For write operations, the file content may be modified. For fsync
+ * operations, dirty data is flushed to storage. The actual I/O may
+ * complete synchronously or asynchronously depending on the filesystem.
+ * condition: IOCB_CMD_PWRITE, IOCB_CMD_PWRITEV, IOCB_CMD_FSYNC, IOCB_CMD_FDSYNC
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ * target: fsync work queue
+ * desc: FSYNC and FDSYNC operations are scheduled to run on a workqueue
+ * because vfs_fsync() can block. The operation runs asynchronously and
+ * completion is signaled via the ring buffer.
+ * condition: IOCB_CMD_FSYNC or IOCB_CMD_FDSYNC
+ * reversible: no
+ *
+ * state-trans: iocb state
+ * from: user-prepared iocb
+ * to: submitted (aio_key set to KIOCB_KEY)
+ * condition: successful submission of each iocb
+ * desc: Each successfully submitted iocb transitions from user-prepared
+ * state to submitted state, marked by the kernel writing KIOCB_KEY to
+ * aio_key. The iocb remains in submitted state until completion.
+ *
+ * state-trans: AIO context slot availability
+ * from: slots_available = N
+ * to: slots_available = N - submitted_count
+ * condition: successful submission
+ * desc: Available slots in the context decrease by the number of successfully
+ * submitted operations. Slots are reclaimed when io_getevents() consumes
+ * completion events.
+ *
+ * capability: CAP_SYS_ADMIN
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Use of IOPRIO_CLASS_RT (real-time I/O priority class)
+ * without: Returns EPERM when attempting to use RT I/O priority
+ * condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLASS_RT
+ *
+ * capability: CAP_SYS_NICE
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Use of IOPRIO_CLASS_RT (alternative to CAP_SYS_ADMIN)
+ * without: Returns EPERM when attempting to use RT I/O priority
+ * condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLASS_RT
+ *
+ * constraint: Ring buffer slot availability
+ * desc: There must be available slots in the completion ring buffer for
+ * each operation to be submitted. If all slots are occupied by pending
+ * completion events, submission fails with EAGAIN. The number of slots
+ * is determined by nr_events passed to io_setup(), though internal
+ * doubling means more slots may be available.
+ * expr: available_slots >= 1 for each submission
+ *
+ * constraint: Valid file descriptor per iocb
+ * desc: Each iocb must reference a valid, open file descriptor via
+ * aio_fildes. The file must be opened with appropriate access mode
+ * for the requested operation (read access for PREAD, write access
+ * for PWRITE, etc.).
+ *
+ * constraint: File must support operation
+ * desc: For read/write operations, the underlying file must implement
+ * read_iter/write_iter file operations. For fsync, the file must
+ * implement fsync. For poll, the file must support vfs_poll().
+ *
+ * constraint: CONFIG_AIO required
+ * desc: The kernel must be compiled with CONFIG_AIO=y for this syscall
+ * to be available. If not configured, returns -ENOSYS.
+ *
+ * examples: struct iocb iocb, *iocbp = &iocb; io_submit(ctx, 1, &iocbp);
+ * struct iocb iocbs[10], *ptrs[10]; io_submit(ctx, 10, ptrs); // Batch submit
+ *
+ * notes: Unlike traditional synchronous I/O, errors from io_submit() indicate
+ * submission failures, not I/O errors. Actual I/O errors are reported via
+ * the res field of struct io_event when retrieved via io_getevents().
+ *
+ * The return value indicates how many iocbs were successfully submitted.
+ * If this is less than nr, the application should check which operation
+ * failed (it's the one at index = return_value) and handle the error.
+ * Previously submitted operations in the batch are still queued.
+ *
+ * For vectored operations (PREADV/PWRITEV), aio_buf points to an array
+ * of struct iovec and aio_nbytes contains the iovec count. The maximum
+ * iovec count is UIO_MAXIOV (1024).
+ *
+ * Block layer plugging is automatically enabled for batches larger than
+ * 2 operations, improving I/O merging and reducing per-I/O overhead.
+ *
+ * The COMPAT_SYSCALL variant handles 32-bit userspace on 64-bit kernels,
+ * using compat_uptr_t for the iocbpp array elements.
+ *
+ * Historical note: commit d6b2615f7d31d ("aio: simplify - and fix - fget/fput
+ * for io_submit()") fixed file descriptor reference counting issues. Earlier
+ * kernels could leak file references on certain error paths.
+ *
+ * io_uring (since Linux 5.1) is a more modern and performant alternative.
+ * Consider using io_uring_enter() for new applications requiring async I/O.
+ *
+ * There is no glibc wrapper; use syscall(SYS_io_submit, ...) or the libaio
+ * library. The libaio wrapper io_submit() returns negative error numbers
+ * directly rather than returning -1 and setting errno.
+ *
+ * since-version: 2.5
*/
SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
struct iocb __user * __user *, iocbpp)
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (6 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 07/15] kernel/api: add API specification for io_submit Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 09/15] kernel/api: add API specification for setxattr Sasha Levin
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/aio.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 237 insertions(+), 9 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index f6f1b3790c88b..710517c9a990d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2843,15 +2843,243 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
}
#endif
-/* sys_io_cancel:
- * Attempts to cancel an iocb previously passed to io_submit. If
- * the operation is successfully cancelled, the resulting event is
- * copied into the memory pointed to by result without being placed
- * into the completion queue and 0 is returned. May fail with
- * -EFAULT if any of the data structures pointed to are invalid.
- * May fail with -EINVAL if aio_context specified by ctx_id is
- * invalid. May fail with -EAGAIN if the iocb specified was not
- * cancelled. Will fail with -ENOSYS if not implemented.
+/**
+ * sys_io_cancel - Attempt to cancel an outstanding asynchronous I/O operation
+ * @ctx_id: AIO context handle returned by io_setup
+ * @iocb: Pointer to the iocb structure that was previously submitted
+ * @result: Unused parameter (historically for result storage, now ignored)
+ *
+ * long-desc: Attempts to cancel an asynchronous I/O operation that was
+ * previously submitted via io_submit(). The syscall searches for the
+ * specified iocb in the context's active request list and invokes the
+ * operation-specific cancellation callback if found.
+ *
+ * The cancellation behavior depends on the type of I/O operation:
+ * - For poll operations (IOCB_CMD_POLL): The request is marked as cancelled
+ * and a work item is scheduled to complete the cancellation.
+ * - For USB gadget I/O: The USB endpoint dequeue function is called, which
+ * triggers the completion callback with -ECONNRESET status.
+ * - For most direct I/O operations: Cancellation is typically not supported
+ * as these operations do not register a cancel callback.
+ *
+ * If the iocb is found and has a registered cancellation callback, that
+ * callback is invoked and the iocb is removed from the active request list.
+ * The completion event is delivered via the ring buffer (not via the result
+ * parameter, which is now unused for this purpose).
+ *
+ * On successful cancellation initiation, the syscall returns -EINPROGRESS
+ * (not 0) to indicate that cancellation is in progress. This is because
+ * the actual completion may occur asynchronously via the cancel callback.
+ *
+ * Important limitations:
+ * - Most file I/O operations do not support cancellation
+ * - The iocb must still be pending (not yet completed)
+ * - The iocb must have been submitted via io_submit (aio_key == KIOCB_KEY)
+ * - Only operations that register a ki_cancel callback can be cancelled
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC
+ *
+ * param: ctx_id
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid AIO context handle previously returned by
+ * io_setup() for the current process. The context must not have been
+ * destroyed via io_destroy(). A value of 0 is always invalid. The handle
+ * is actually the virtual address of the ring buffer mapping, and must
+ * belong to the calling process's address space.
+ *
+ * param: iocb
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * size: sizeof(struct iocb)
+ * constraint-type: KAPI_CONSTRAINT_USER_PTR
+ * constraint: Must be a valid userspace pointer to a struct iocb that was
+ * previously submitted via io_submit(). The iocb's aio_key field must
+ * contain KIOCB_KEY (0), which is written by the kernel during io_submit.
+ * A NULL pointer will result in EFAULT. The iocb must still be pending
+ * (present in the context's active_reqs list) for cancellation to succeed.
+ *
+ * param: result
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ * constraint-type: KAPI_CONSTRAINT_NONE
+ * constraint: This parameter is no longer used by the kernel. It was
+ * historically intended to receive the io_event result on successful
+ * cancellation, but completion events are now always delivered via the
+ * ring buffer. May be NULL.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: -EINPROGRESS
+ * desc: Returns -EINPROGRESS when the cancellation callback was successfully
+ * invoked and the request is being cancelled. This is the expected return
+ * value on successful cancellation initiation. The completion event will
+ * be delivered via the ring buffer. Note that this is different from the
+ * man page which claims 0 is returned on success.
+ *
+ * error: EFAULT, Cannot read iocb from userspace
+ * desc: Returned if the iocb pointer is invalid or points to memory that
+ * cannot be read. Specifically, the kernel attempts to read the aio_key
+ * field from the iocb via get_user() and returns EFAULT if this fails.
+ * A NULL iocb pointer will trigger this error.
+ *
+ * error: EINVAL, iocb not submitted via io_submit
+ * desc: Returned if the aio_key field of the iocb does not contain KIOCB_KEY
+ * (which is 0). The kernel sets aio_key to KIOCB_KEY when an iocb is
+ * successfully submitted via io_submit(). If aio_key contains a different
+ * value, it indicates the iocb was never successfully submitted, is
+ * corrupted, or the memory has been reused.
+ *
+ * error: EINVAL, Invalid AIO context
+ * desc: Returned if ctx_id does not refer to a valid AIO context. This can
+ * occur if: (1) the context was never created, (2) the context was
+ * destroyed via io_destroy(), (3) the ctx_id is 0, (4) the ring buffer
+ * header cannot be read from userspace, (5) the context belongs to a
+ * different process, or (6) the context's internal ID doesn't match.
+ *
+ * error: EINVAL, iocb not found or not cancellable
+ * desc: Returned if the specified iocb is not present in the context's
+ * active request list. This occurs when: (1) the operation has already
+ * completed and the completion event is in the ring buffer, (2) the
+ * operation was never submitted to this context, (3) the iocb pointer
+ * does not match any pending operation (comparison is by pointer value
+ * converted to u64), or (4) the operation did not register a cancellation
+ * callback (though in this case EINVAL comes from the default ret value).
+ * Note: The man page documents EAGAIN for this case, but the actual
+ * implementation returns EINVAL.
+ *
+ * error: ENOSYS, AIO not implemented
+ * desc: Returned if the kernel was compiled without CONFIG_AIO support.
+ * This error is returned by the syscall dispatch mechanism before the
+ * io_cancel implementation is even reached.
+ *
+ * error: (driver-specific), Cancellation callback failed
+ * desc: If the iocb is found and its ki_cancel callback is invoked, the
+ * callback's return value is propagated to userspace if non-zero. For
+ * USB gadget operations, usb_ep_dequeue() may return various errors
+ * including EINVAL if the request wasn't queued. The aio_poll_cancel
+ * callback always returns 0. Driver-specific cancellation functions
+ * may return other error codes.
+ *
+ * lock: RCU read lock
+ * type: KAPI_LOCK_RCU
+ * desc: Acquired in lookup_ioctx() during context lookup. Protects against
+ * concurrent modification of the mm->ioctx_table while searching for the
+ * context. Released before any spinlocks are acquired.
+ *
+ * lock: ctx->ctx_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * desc: Per-context spinlock acquired with interrupts disabled via
+ * spin_lock_irq(). Held while iterating through the active_reqs list
+ * searching for the iocb, while invoking the ki_cancel callback, and
+ * while removing the iocb from the list. The cancel callback is invoked
+ * with this lock held, so callbacks must not sleep and must be IRQ-safe.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: ctx->active_reqs list
+ * desc: If the iocb is found and its cancellation callback is invoked, the
+ * kiocb is removed from the context's active_reqs list via list_del_init().
+ * This prevents the iocb from being found by subsequent io_cancel calls.
+ * condition: iocb found and ki_cancel callback invoked
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: Pending I/O operation
+ * desc: The cancellation callback may modify the state of the underlying
+ * I/O operation. For poll operations, the cancelled flag is set. For USB
+ * operations, the USB request is dequeued which triggers the completion
+ * callback. The completion event is delivered via the ring buffer.
+ * condition: ki_cancel callback is invoked
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ * target: aio_poll work queue
+ * desc: For poll operations (IOCB_CMD_POLL), the aio_poll_cancel callback
+ * schedules a work item via schedule_work() to complete the cancellation
+ * asynchronously. This work item will eventually deliver the completion
+ * event to the ring buffer.
+ * condition: Cancelling a poll operation
+ * reversible: no
+ *
+ * state-trans: kiocb state
+ * from: in_flight (in active_reqs list)
+ * to: cancelling (removed from list, cancel callback invoked)
+ * condition: iocb found and ki_cancel invoked
+ * desc: When the iocb is found in the active_reqs list and its cancellation
+ * callback is invoked, the kiocb transitions from in-flight to cancelling
+ * state. The kiocb is removed from the active_reqs list, preventing
+ * duplicate cancellation attempts. Final completion occurs asynchronously.
+ *
+ * state-trans: poll_iocb cancelled flag
+ * from: false
+ * to: true
+ * condition: aio_poll_cancel is invoked
+ * desc: For poll operations, the aio_poll_cancel callback sets the cancelled
+ * flag on the poll_iocb structure. This signals to the poll completion
+ * handler that the operation was cancelled rather than completed normally.
+ *
+ * constraint: Operation must support cancellation
+ * desc: Only operations that register a ki_cancel callback can be cancelled.
+ * Operations that don't set this callback (most direct I/O operations)
+ * will never appear in the active_reqs list and thus cannot be cancelled.
+ * Currently, only IOCB_CMD_POLL operations in the kernel AIO subsystem
+ * and USB gadget operations support cancellation.
+ *
+ * constraint: Timing window for cancellation
+ * desc: The iocb must still be pending at the time io_cancel is called.
+ * There is an inherent race condition: the operation may complete
+ * naturally between the time the application decides to cancel and when
+ * io_cancel is invoked. In this case, EINVAL is returned because the
+ * iocb is no longer in the active_reqs list.
+ *
+ * constraint: CONFIG_AIO required
+ * desc: The kernel must be compiled with CONFIG_AIO=y for this syscall
+ * to be available. If not configured, ENOSYS is returned.
+ *
+ * examples: io_cancel(ctx, &iocb, NULL); // Cancel with unused result param
+ * if (io_cancel(ctx, &iocb, NULL) == -EINPROGRESS) handle_cancellation();
+ * ret = io_cancel(ctx, &iocb, NULL); if (ret && ret != -EINPROGRESS) error();
+ *
+ * notes: The return value semantics are unusual: -EINPROGRESS indicates
+ * successful cancellation initiation, not an error. This is because the
+ * actual cancellation may complete asynchronously, with the completion
+ * event delivered via the ring buffer.
+ *
+ * The result parameter is completely ignored by current kernels. It was
+ * historically used to return the io_event directly, but since commit
+ * 28468cbed92e ("Revert 'fs/aio: Make io_cancel() generate completions
+ * again'"), completion events are always delivered via the ring buffer.
+ * Applications should use io_getevents() to retrieve the cancelled
+ * operation's completion event.
+ *
+ * The man page documents EAGAIN as a possible error when "the iocb specified
+ * was not cancelled", but code analysis shows that EINVAL is actually
+ * returned in this case. The man page is outdated in this regard.
+ *
+ * The aio_key field must equal KIOCB_KEY (0) because the kernel writes this
+ * value during io_submit. If an application attempts to cancel an iocb
+ * before submitting it, or after the memory has been reused, this check
+ * will fail with EINVAL.
+ *
+ * For poll operations specifically, the cancellation is marked but the
+ * actual completion may be delayed until a worker processes it. The
+ * -EINPROGRESS return value reflects this asynchronous completion model.
+ *
+ * USB gadget operations are an exception: when usb_ep_dequeue() is called,
+ * it typically completes the request synchronously with -ECONNRESET status
+ * in the completion callback.
+ *
+ * There is no glibc wrapper for this syscall. Applications must use
+ * syscall(SYS_io_cancel, ...) or the libaio library. The libaio wrapper
+ * returns negative error numbers directly rather than returning -1 and
+ * setting errno.
+ *
+ * io_uring (since Linux 5.1) provides a more capable and widely-supported
+ * async I/O interface with better cancellation support via IORING_OP_ASYNC_CANCEL.
+ *
+ * since-version: 2.5
*/
SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
struct io_event __user *, result)
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 09/15] kernel/api: add API specification for setxattr
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (7 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 10/15] kernel/api: add API specification for lsetxattr Sasha Levin
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/xattr.c | 310 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 310 insertions(+)
diff --git a/fs/xattr.c b/fs/xattr.c
index 32d445fb60aaf..02a946227129e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -740,6 +740,316 @@ SYSCALL_DEFINE6(setxattrat, int, dfd, const char __user *, pathname, unsigned in
args.flags);
}
+/**
+ * sys_setxattr - Set an extended attribute value on a file
+ * @pathname: Path to the file on which to set the extended attribute
+ * @name: Null-terminated name of the extended attribute (includes namespace prefix)
+ * @value: Buffer containing the attribute value to set
+ * @size: Size of the value buffer in bytes
+ * @flags: Flags controlling attribute creation/replacement behavior
+ *
+ * long-desc: Sets the value of an extended attribute identified by name on
+ * the file specified by pathname. Extended attributes are name:value pairs
+ * associated with inodes (files, directories, symbolic links, etc.) that
+ * extend the normal attributes (stat data) associated with all inodes.
+ *
+ * The attribute name must include a namespace prefix. Valid namespaces are:
+ * - "user." - User-defined attributes (regular files and directories only)
+ * - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN)
+ * - "security." - Security module attributes (e.g., SELinux, Smack, capabilities)
+ * - "system." - System attributes (e.g., POSIX ACLs via system.posix_acl_access)
+ *
+ * The value can be arbitrary binary data or text. A zero-length value is
+ * permitted and creates an attribute with an empty value (different from
+ * removing the attribute).
+ *
+ * This syscall follows symbolic links. Use lsetxattr() to operate on the
+ * symbolic link itself, or fsetxattr() to operate on an open file descriptor.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: pathname
+ * type: KAPI_TYPE_PATH
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_USER_PATH
+ * constraint: Must be a valid null-terminated path string in user memory.
+ * The path is resolved following symbolic links. Maximum path length is
+ * PATH_MAX (4096 bytes). The file must exist and the caller must have
+ * appropriate permissions.
+ *
+ * param: name
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_USER_STRING
+ * range: 1, 255
+ * constraint: Must be a valid null-terminated string in user memory containing
+ * the extended attribute name with namespace prefix (e.g., "user.myattr").
+ * The name (including prefix) must be between 1 and XATTR_NAME_MAX (255)
+ * characters. An empty name returns ERANGE.
+ *
+ * param: value
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid pointer to user memory containing the attribute
+ * value, or NULL if size is 0. When size is non-zero, the pointer must be
+ * valid and accessible for size bytes.
+ *
+ * param: size
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, 65536
+ * constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX
+ * (65536 bytes). Zero is permitted and creates an attribute with empty value.
+ * Filesystem-specific limits may be smaller (e.g., ext4 limits total xattr
+ * space to one filesystem block, typically 4KB).
+ *
+ * param: flags
+ * type: KAPI_TYPE_INT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_MASK
+ * valid-mask: XATTR_CREATE | XATTR_REPLACE
+ * constraint: Controls creation/replacement behavior. Valid values are 0,
+ * XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if the
+ * attribute already exists. XATTR_REPLACE fails if the attribute does not
+ * exist. With flags=0, the attribute is created if it doesn't exist or
+ * replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually exclusive.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: 0
+ * desc: Returns 0 on success. The extended attribute is set with the specified
+ * value. Any previous value for the attribute is replaced.
+ *
+ * error: ENOENT, File not found
+ * desc: The file specified by pathname does not exist, or a directory component
+ * in the path does not exist. Returned from path lookup (filename_lookup).
+ *
+ * error: EACCES, Permission denied
+ * desc: Permission denied during path resolution (search permission on a directory
+ * component) or write access to the file is denied based on DAC permissions.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: Returned in several cases: (1) The file is marked immutable (chattr +i)
+ * or append-only (chattr +a). (2) For trusted.* namespace, caller lacks
+ * CAP_SYS_ADMIN in the filesystem's user namespace. (3) For security.*
+ * namespace (except security.capability), caller lacks CAP_SYS_ADMIN.
+ * (4) For user.* namespace on sticky directories, caller is not the owner
+ * and lacks CAP_FOWNER. (5) The inode has an unmapped ID in an idmapped mount.
+ *
+ * error: ENODATA, Attribute not found
+ * desc: XATTR_REPLACE was specified but the named attribute does not exist on
+ * the file. Also returned when reading trusted.* without CAP_SYS_ADMIN (for
+ * read operations, but included here for completeness with the flag).
+ *
+ * error: EEXIST, Attribute already exists
+ * desc: XATTR_CREATE was specified but the named attribute already exists on
+ * the file.
+ *
+ * error: ERANGE, Name out of range
+ * desc: The attribute name is empty (zero length) or exceeds XATTR_NAME_MAX
+ * (255 characters). Returned from import_xattr_name() via strncpy_from_user().
+ *
+ * error: E2BIG, Value too large
+ * desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Returned from
+ * setxattr_copy() before attempting to copy the value from userspace.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: The flags parameter contains bits other than XATTR_CREATE and
+ * XATTR_REPLACE. Also returned for malformed capability values when setting
+ * security.capability, or when the xattr name doesn't match any handler prefix.
+ *
+ * error: EFAULT, Bad address
+ * desc: One of the user pointers (pathname, name, or value) is invalid or
+ * points to memory that cannot be accessed. Returned from strncpy_from_user()
+ * for pathname/name or vmemdup_user()/copy_from_user() for value.
+ *
+ * error: ENOMEM, Out of memory
+ * desc: Kernel could not allocate memory to copy the attribute value from
+ * userspace (via vmemdup_user), or for namespace capability conversion
+ * (cap_convert_nscap allocates memory for v3 capability format).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ * desc: The filesystem does not support extended attributes (IOP_XATTR not set),
+ * or no xattr handler exists for the given namespace prefix, or the handler
+ * does not implement the set operation. Also returned for POSIX ACL xattrs
+ * (system.posix_acl_*) when CONFIG_FS_POSIX_ACL is disabled.
+ *
+ * error: EROFS, Read-only filesystem
+ * desc: The filesystem containing the file is mounted read-only. Returned from
+ * mnt_want_write() before attempting any modification.
+ *
+ * error: EIO, I/O error
+ * desc: The inode is marked as bad (is_bad_inode), indicating filesystem
+ * corruption or I/O failure. Also may be returned by filesystem-specific
+ * xattr handler operations.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's disk quota for extended attributes has been exceeded.
+ * Filesystem-specific error returned from the handler's set operation.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: The filesystem has insufficient space to store the extended attribute.
+ * Filesystem-specific error from handler's set operation.
+ *
+ * error: ELOOP, Too many symbolic links
+ * desc: Too many symbolic links were encountered during path resolution
+ * (more than MAXSYMLINKS, typically 40).
+ *
+ * error: ENAMETOOLONG, Filename too long
+ * desc: The pathname or a component of the pathname exceeds the system limit
+ * (PATH_MAX or NAME_MAX).
+ *
+ * error: ENOTDIR, Not a directory
+ * desc: A component of the path prefix is not a directory.
+ *
+ * error: ESTALE, Stale file handle
+ * desc: The file handle became stale during the operation (NFS). The syscall
+ * automatically retries with LOOKUP_REVAL in this case.
+ *
+ * lock: inode->i_rwsem
+ * type: KAPI_LOCK_MUTEX
+ * desc: The inode's read-write semaphore is acquired exclusively via inode_lock()
+ * before calling __vfs_setxattr_locked() and released via inode_unlock() after.
+ * This serializes concurrent xattr modifications on the same inode.
+ *
+ * lock: sb->s_writers (superblock freeze protection)
+ * type: KAPI_LOCK_SEMAPHORE
+ * desc: Write access to the mount is acquired via mnt_want_write() which calls
+ * sb_start_write(). This prevents filesystem freeze during the operation.
+ * Released via mnt_drop_write() after the operation completes.
+ *
+ * lock: file_rwsem (delegation breaking)
+ * type: KAPI_LOCK_SEMAPHORE
+ * desc: If the file has NFSv4 delegations, the percpu file_rwsem is acquired
+ * during delegation breaking in __break_lease(). The syscall may wait for
+ * delegation holders to acknowledge the break.
+ *
+ * signal: Any
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RESTART
+ * condition: Signal arrives during interruptible waits (delegation breaking)
+ * desc: The syscall may wait for NFSv4 delegation holders to release their
+ * delegations. During this wait, signals can interrupt the operation. If a
+ * signal is pending, the wait may be interrupted and the operation retried.
+ * Most blocking points in this syscall use non-interruptible waits.
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: Kernel buffer for attribute value
+ * desc: The attribute value is copied from userspace to a kernel buffer
+ * allocated via vmemdup_user(). This memory is freed (kvfree) after the
+ * operation completes, regardless of success or failure.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: File's extended attributes
+ * desc: On success, the specified extended attribute is created or modified.
+ * The change is typically persisted to storage synchronously or asynchronously
+ * depending on filesystem and mount options.
+ * reversible: yes
+ * condition: Operation succeeds
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: Inode flags (S_NOSEC)
+ * desc: When setting security.* attributes, the S_NOSEC flag is cleared from
+ * the inode. This flag is an optimization that indicates no security xattrs
+ * exist; clearing it ensures proper security checks on subsequent accesses.
+ * condition: Setting security.* namespace attribute
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: fsnotify event
+ * desc: On success, fsnotify_xattr() is called to notify any registered
+ * watchers (inotify, fanotify) of the extended attribute modification.
+ * This generates an IN_ATTRIB event.
+ * condition: Operation succeeds
+ *
+ * state-trans: extended attribute
+ * from: nonexistent or has old value
+ * to: has new value
+ * condition: Operation succeeds with flags=0 or appropriate flags
+ * desc: The extended attribute transitions from not existing (or having its
+ * previous value) to containing the new value. With XATTR_CREATE, the
+ * attribute must not exist beforehand. With XATTR_REPLACE, it must exist.
+ *
+ * capability: CAP_SYS_ADMIN
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Setting trusted.* namespace attributes and most security.* attributes
+ * without: Setting trusted.* returns EPERM. Setting security.* (except
+ * security.capability) returns EPERM. The check uses ns_capable() against
+ * the filesystem's user namespace.
+ * condition: Attribute name starts with "trusted." or "security." (except
+ * security.capability)
+ *
+ * capability: CAP_SETFCAP
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Setting the security.capability extended attribute
+ * without: Setting security.capability returns EPERM
+ * condition: Attribute name is "security.capability". Checked via
+ * capable_wrt_inode_uidgid() which considers the inode's ownership.
+ *
+ * capability: CAP_FOWNER
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypassing owner check for user.* on sticky directories
+ * without: Non-owners cannot set user.* attributes on files in sticky
+ * directories without this capability
+ * condition: Setting user.* namespace attribute on a file in a sticky directory
+ *
+ * constraint: Filesystem support
+ * desc: The filesystem must support extended attributes (have IOP_XATTR flag
+ * set and provide xattr handlers). Common filesystems supporting xattrs
+ * include ext4, XFS, Btrfs, and tmpfs. Some filesystems (e.g., FAT, older
+ * ext2) do not support extended attributes.
+ *
+ * constraint: Filesystem-specific size limits
+ * desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may impose
+ * smaller limits. For example, ext4 limits all xattrs on an inode to fit
+ * in a single filesystem block (typically 4KB). XFS and ReiserFS support
+ * the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG.
+ *
+ * constraint: user.* namespace restrictions
+ * desc: The user.* namespace is only supported on regular files and directories.
+ * Attempting to set user.* attributes on other file types (symlinks, devices,
+ * sockets, FIFOs) returns EPERM (for write) or ENODATA (for read).
+ *
+ * constraint: LSM checks
+ * desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose additional
+ * restrictions via security_inode_setxattr() hook. These can return various
+ * error codes depending on the security policy. The LSM is called after
+ * permission checks but before the actual xattr modification.
+ *
+ * examples: setxattr("/path/file", "user.comment", "test", 4, 0); // Set user attr
+ * setxattr("/path/file", "user.new", "val", 3, XATTR_CREATE); // Create only
+ * setxattr("/path/file", "user.existing", "new", 3, XATTR_REPLACE); // Replace
+ *
+ * notes: Extended attributes provide a way to associate arbitrary metadata with
+ * files beyond the standard stat attributes. They are commonly used for:
+ * - SELinux security contexts (security.selinux)
+ * - File capabilities (security.capability)
+ * - POSIX ACLs (system.posix_acl_access, system.posix_acl_default)
+ * - User-defined metadata (user.* namespace)
+ *
+ * The trusted.* namespace is designed for use by privileged processes to store
+ * data that should not be accessible to unprivileged users (e.g., during
+ * backup/restore operations).
+ *
+ * NFSv4 delegation support means this syscall may need to wait for remote
+ * clients to release their delegations before the operation can complete.
+ * This can introduce unbounded delays in pathological cases.
+ *
+ * For security.capability specifically, the kernel may convert between v2
+ * (non-namespaced) and v3 (namespaced) capability formats depending on the
+ * filesystem's user namespace and caller's capabilities.
+ *
+ * The setxattrat() syscall (added in Linux 6.17) provides more flexibility
+ * with AT_FDCWD and AT_* flags for specifying the file location.
+ *
+ * since-version: 2.4
+ */
SYSCALL_DEFINE5(setxattr, const char __user *, pathname,
const char __user *, name, const void __user *, value,
size_t, size, int, flags)
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 10/15] kernel/api: add API specification for lsetxattr
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (8 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 09/15] kernel/api: add API specification for setxattr Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 11/15] kernel/api: add API specification for fsetxattr Sasha Levin
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/xattr.c | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 327 insertions(+)
diff --git a/fs/xattr.c b/fs/xattr.c
index 02a946227129e..466dcaf7ba83e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1057,6 +1057,333 @@ SYSCALL_DEFINE5(setxattr, const char __user *, pathname,
return path_setxattrat(AT_FDCWD, pathname, 0, name, value, size, flags);
}
+/**
+ * sys_lsetxattr - Set an extended attribute value on a symbolic link
+ * @pathname: Path to the file or symbolic link on which to set the attribute
+ * @name: Null-terminated name of the extended attribute (includes namespace prefix)
+ * @value: Buffer containing the attribute value to set
+ * @size: Size of the value buffer in bytes
+ * @flags: Flags controlling attribute creation/replacement behavior
+ *
+ * long-desc: Sets the value of an extended attribute identified by name on
+ * the file specified by pathname. Unlike setxattr(), this syscall does not
+ * follow symbolic links - if pathname refers to a symbolic link, the
+ * extended attribute is set on the link itself, not on the file it refers to.
+ *
+ * Extended attributes are name:value pairs associated with inodes (files,
+ * directories, symbolic links, etc.) that extend the normal attributes
+ * (stat data) associated with all inodes.
+ *
+ * The attribute name must include a namespace prefix. Valid namespaces are:
+ * - "user." - User-defined attributes (regular files and directories only)
+ * - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN)
+ * - "security." - Security module attributes (e.g., SELinux, Smack, capabilities)
+ * - "system." - System attributes (e.g., POSIX ACLs via system.posix_acl_access)
+ *
+ * The value can be arbitrary binary data or text. A zero-length value is
+ * permitted and creates an attribute with an empty value (different from
+ * removing the attribute).
+ *
+ * Note that not all filesystems support extended attributes on symbolic links.
+ * Additionally, the user.* namespace is not available on symbolic links since
+ * they are not regular files or directories.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: pathname
+ * type: KAPI_TYPE_PATH
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_USER_PATH
+ * constraint: Must be a valid null-terminated path string in user memory.
+ * The path is resolved WITHOUT following symbolic links - if the final
+ * component is a symbolic link, the operation applies to the link itself.
+ * Maximum path length is PATH_MAX (4096 bytes). The file or link must
+ * exist and the caller must have appropriate permissions.
+ *
+ * param: name
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_USER_STRING
+ * range: 1, 255
+ * constraint: Must be a valid null-terminated string in user memory containing
+ * the extended attribute name with namespace prefix (e.g., "security.selinux").
+ * The name (including prefix) must be between 1 and XATTR_NAME_MAX (255)
+ * characters. An empty name returns ERANGE. Note that user.* namespace is
+ * not supported on symbolic links.
+ *
+ * param: value
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid pointer to user memory containing the attribute
+ * value, or NULL if size is 0. When size is non-zero, the pointer must be
+ * valid and accessible for size bytes.
+ *
+ * param: size
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, 65536
+ * constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX
+ * (65536 bytes). Zero is permitted and creates an attribute with empty value.
+ * Filesystem-specific limits may be smaller (e.g., ext4 limits total xattr
+ * space to one filesystem block, typically 4KB).
+ *
+ * param: flags
+ * type: KAPI_TYPE_INT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_MASK
+ * valid-mask: XATTR_CREATE | XATTR_REPLACE
+ * constraint: Controls creation/replacement behavior. Valid values are 0,
+ * XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if the
+ * attribute already exists. XATTR_REPLACE fails if the attribute does not
+ * exist. With flags=0, the attribute is created if it doesn't exist or
+ * replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually exclusive.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: 0
+ * desc: Returns 0 on success. The extended attribute is set with the specified
+ * value on the symbolic link itself. Any previous value for the attribute
+ * is replaced.
+ *
+ * error: ENOENT, File or symlink not found
+ * desc: The file or symbolic link specified by pathname does not exist, or a
+ * directory component in the path does not exist. Returned from path lookup.
+ *
+ * error: EACCES, Permission denied
+ * desc: Permission denied during path resolution (search permission on a directory
+ * component) or write access to the file is denied based on DAC permissions.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: Returned in several cases: (1) The file is marked immutable (chattr +i)
+ * or append-only (chattr +a). (2) For trusted.* namespace, caller lacks
+ * CAP_SYS_ADMIN in the filesystem's user namespace. (3) For security.*
+ * namespace (except security.capability), caller lacks CAP_SYS_ADMIN.
+ * (4) For user.* namespace on sticky directories, caller is not the owner
+ * and lacks CAP_FOWNER. (5) The inode has an unmapped ID in an idmapped mount.
+ * (6) Attempting to set user.* namespace on a symbolic link (not supported).
+ *
+ * error: ENODATA, Attribute not found
+ * desc: XATTR_REPLACE was specified but the named attribute does not exist on
+ * the symbolic link.
+ *
+ * error: EEXIST, Attribute already exists
+ * desc: XATTR_CREATE was specified but the named attribute already exists on
+ * the symbolic link.
+ *
+ * error: ERANGE, Name out of range
+ * desc: The attribute name is empty (zero length) or exceeds XATTR_NAME_MAX
+ * (255 characters). Returned from import_xattr_name() via strncpy_from_user().
+ *
+ * error: E2BIG, Value too large
+ * desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Returned from
+ * setxattr_copy() before attempting to copy the value from userspace.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: The flags parameter contains bits other than XATTR_CREATE and
+ * XATTR_REPLACE. Also returned for malformed capability values when setting
+ * security.capability, or when the xattr name doesn't match any handler prefix.
+ *
+ * error: EFAULT, Bad address
+ * desc: One of the user pointers (pathname, name, or value) is invalid or
+ * points to memory that cannot be accessed. Returned from strncpy_from_user()
+ * for pathname/name or vmemdup_user()/copy_from_user() for value.
+ *
+ * error: ENOMEM, Out of memory
+ * desc: Kernel could not allocate memory to copy the attribute value from
+ * userspace (via vmemdup_user), or for namespace capability conversion
+ * (cap_convert_nscap allocates memory for v3 capability format).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ * desc: The filesystem does not support extended attributes on symbolic links,
+ * or no xattr handler exists for the given namespace prefix, or the handler
+ * does not implement the set operation. Many filesystems do not support
+ * setting xattrs on symbolic links.
+ *
+ * error: EROFS, Read-only filesystem
+ * desc: The filesystem containing the symbolic link is mounted read-only.
+ * Returned from mnt_want_write() before attempting any modification.
+ *
+ * error: EIO, I/O error
+ * desc: The inode is marked as bad (is_bad_inode), indicating filesystem
+ * corruption or I/O failure. Also may be returned by filesystem-specific
+ * xattr handler operations.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's disk quota for extended attributes has been exceeded.
+ * Filesystem-specific error returned from the handler's set operation.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: The filesystem has insufficient space to store the extended attribute.
+ * Filesystem-specific error from handler's set operation.
+ *
+ * error: ELOOP, Too many symbolic links
+ * desc: Too many symbolic links were encountered during path resolution of
+ * directory components (more than MAXSYMLINKS, typically 40). Note that the
+ * final component (the target of the operation) is not followed.
+ *
+ * error: ENAMETOOLONG, Filename too long
+ * desc: The pathname or a component of the pathname exceeds the system limit
+ * (PATH_MAX or NAME_MAX).
+ *
+ * error: ENOTDIR, Not a directory
+ * desc: A component of the path prefix is not a directory.
+ *
+ * error: ESTALE, Stale file handle
+ * desc: The file handle became stale during the operation (NFS). The syscall
+ * automatically retries with LOOKUP_REVAL in this case.
+ *
+ * lock: inode->i_rwsem
+ * type: KAPI_LOCK_MUTEX
+ * acquired: true
+ * released: true
+ * desc: The inode's read-write semaphore is acquired exclusively via inode_lock()
+ * before calling __vfs_setxattr_locked() and released via inode_unlock() after.
+ * This serializes concurrent xattr modifications on the same inode.
+ *
+ * lock: sb->s_writers (superblock freeze protection)
+ * type: KAPI_LOCK_SEMAPHORE
+ * acquired: true
+ * released: true
+ * desc: Write access to the mount is acquired via mnt_want_write() which calls
+ * sb_start_write(). This prevents filesystem freeze during the operation.
+ * Released via mnt_drop_write() after the operation completes.
+ *
+ * lock: file_rwsem (delegation breaking)
+ * type: KAPI_LOCK_SEMAPHORE
+ * acquired: true
+ * released: true
+ * desc: If the file has NFSv4 delegations, the percpu file_rwsem is acquired
+ * during delegation breaking in __break_lease(). The syscall may wait for
+ * delegation holders to acknowledge the break.
+ *
+ * signal: Any
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RESTART
+ * condition: Signal arrives during interruptible waits (delegation breaking)
+ * desc: The syscall may wait for NFSv4 delegation holders to release their
+ * delegations. During this wait, signals can interrupt the operation. If a
+ * signal is pending, the wait may be interrupted and the operation retried.
+ * Most blocking points in this syscall use non-interruptible waits.
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: Kernel buffer for attribute value
+ * desc: The attribute value is copied from userspace to a kernel buffer
+ * allocated via vmemdup_user(). This memory is freed (kvfree) after the
+ * operation completes, regardless of success or failure.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: Symbolic link's extended attributes
+ * desc: On success, the specified extended attribute is created or modified
+ * on the symbolic link itself. The change is typically persisted to storage
+ * synchronously or asynchronously depending on filesystem and mount options.
+ * reversible: yes
+ * condition: Operation succeeds
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: Inode flags (S_NOSEC)
+ * desc: When setting security.* attributes, the S_NOSEC flag is cleared from
+ * the inode. This flag is an optimization that indicates no security xattrs
+ * exist; clearing it ensures proper security checks on subsequent accesses.
+ * condition: Setting security.* namespace attribute
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: fsnotify event
+ * desc: On success, fsnotify_xattr() is called to notify any registered
+ * watchers (inotify, fanotify) of the extended attribute modification.
+ * This generates an IN_ATTRIB event.
+ * condition: Operation succeeds
+ *
+ * state-trans: extended attribute
+ * from: nonexistent or has old value
+ * to: has new value
+ * condition: Operation succeeds with flags=0 or appropriate flags
+ * desc: The extended attribute on the symbolic link transitions from not
+ * existing (or having its previous value) to containing the new value.
+ * With XATTR_CREATE, the attribute must not exist beforehand. With
+ * XATTR_REPLACE, it must exist.
+ *
+ * capability: CAP_SYS_ADMIN
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Setting trusted.* namespace attributes and most security.* attributes
+ * without: Setting trusted.* returns EPERM. Setting security.* (except
+ * security.capability) returns EPERM. The check uses ns_capable() against
+ * the filesystem's user namespace.
+ * condition: Attribute name starts with "trusted." or "security." (except
+ * security.capability)
+ *
+ * capability: CAP_SETFCAP
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Setting the security.capability extended attribute
+ * without: Setting security.capability returns EPERM
+ * condition: Attribute name is "security.capability". Checked via
+ * capable_wrt_inode_uidgid() which considers the inode's ownership.
+ *
+ * capability: CAP_FOWNER
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypassing owner check for user.* on sticky directories
+ * without: Non-owners cannot set user.* attributes on files in sticky
+ * directories without this capability
+ * condition: Setting user.* namespace attribute on a file in a sticky directory
+ *
+ * constraint: Filesystem support for symlinks
+ * desc: Not all filesystems support extended attributes on symbolic links.
+ * Some filesystems (like ext4) may only support certain xattr namespaces
+ * on symlinks. The user.* namespace is explicitly not supported on symbolic
+ * links since they are not regular files or directories.
+ *
+ * constraint: Filesystem-specific size limits
+ * desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may impose
+ * smaller limits. For example, ext4 limits all xattrs on an inode to fit
+ * in a single filesystem block (typically 4KB). XFS and ReiserFS support
+ * the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG.
+ *
+ * constraint: user.* namespace restrictions on symlinks
+ * desc: The user.* namespace is only supported on regular files and directories.
+ * Attempting to set user.* attributes on symbolic links returns EPERM.
+ * This is because user.* xattrs have permission semantics that don't apply
+ * to symbolic links which anyone can follow.
+ *
+ * constraint: LSM checks
+ * desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose additional
+ * restrictions via security_inode_setxattr() hook. These can return various
+ * error codes depending on the security policy. The LSM is called after
+ * permission checks but before the actual xattr modification.
+ *
+ * examples: lsetxattr("/path/symlink", "security.selinux", ctx, len, 0); // Set SELinux context on link
+ * lsetxattr("/path/symlink", "trusted.overlay.opaque", "y", 1, XATTR_CREATE); // Set overlay attr
+ *
+ * notes: This syscall is primarily used for security labeling of symbolic links
+ * themselves (as opposed to their targets). Common use cases include:
+ * - SELinux security contexts on symbolic links (security.selinux)
+ * - Overlay filesystem metadata (trusted.overlay.*)
+ * - IMA/EVM integrity metadata (security.ima, security.evm)
+ *
+ * Unlike regular files and directories, symbolic links do not support the
+ * user.* xattr namespace. This is because user.* xattrs require ownership
+ * or capability checks that don't make sense for symlinks which can be
+ * followed by anyone with directory access.
+ *
+ * The trusted.* namespace on symbolic links requires CAP_SYS_ADMIN and is
+ * commonly used by overlay filesystems to store metadata about redirected
+ * or opaque directories.
+ *
+ * NFSv4 delegation support means this syscall may need to wait for remote
+ * clients to release their delegations before the operation can complete.
+ *
+ * This syscall was introduced alongside setxattr(), fsetxattr(), and the
+ * corresponding get/list/remove variants in Linux 2.4 to provide the
+ * non-following behavior needed for backup/restore tools and security
+ * labeling of links.
+ *
+ * since-version: 2.4
+ */
SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
const char __user *, name, const void __user *, value,
size_t, size, int, flags)
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 11/15] kernel/api: add API specification for fsetxattr
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (9 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 10/15] kernel/api: add API specification for lsetxattr Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 12/15] kernel/api: add API specification for sys_open Sasha Levin
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/xattr.c | 322 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 322 insertions(+)
diff --git a/fs/xattr.c b/fs/xattr.c
index 466dcaf7ba83e..8a27c11905f7e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1392,6 +1392,328 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
value, size, flags);
}
+/**
+ * sys_fsetxattr - Set an extended attribute value on an open file descriptor
+ * @fd: File descriptor of the file on which to set the extended attribute
+ * @name: Null-terminated name of the extended attribute (includes namespace prefix)
+ * @value: Buffer containing the attribute value to set
+ * @size: Size of the value buffer in bytes
+ * @flags: Flags controlling attribute creation/replacement behavior
+ *
+ * long-desc: Sets the value of an extended attribute identified by name on
+ * the file referred to by the open file descriptor fd. Extended attributes
+ * are name:value pairs associated with inodes (files, directories, symbolic
+ * links, etc.) that extend the normal attributes (stat data) associated with
+ * all inodes.
+ *
+ * This syscall is similar to setxattr() but operates on an already-open file
+ * descriptor rather than a pathname. This is useful when the file is already
+ * open, when the caller wants to avoid race conditions between opening and
+ * setting attributes, or when operating on file descriptors that cannot be
+ * easily reopened.
+ *
+ * The attribute name must include a namespace prefix. Valid namespaces are:
+ * - "user." - User-defined attributes (regular files and directories only)
+ * - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN)
+ * - "security." - Security module attributes (e.g., SELinux, Smack, capabilities)
+ * - "system." - System attributes (e.g., POSIX ACLs via system.posix_acl_access)
+ *
+ * The value can be arbitrary binary data or text. A zero-length value is
+ * permitted and creates an attribute with an empty value (different from
+ * removing the attribute).
+ *
+ * The file descriptor must have been opened for writing to modify extended
+ * attributes. The file descriptor cannot be an O_PATH file descriptor.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ * type: KAPI_TYPE_FD
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid file descriptor returned by open(), creat(),
+ * or similar syscalls. The file descriptor cannot be an O_PATH file
+ * descriptor. The file must be on a filesystem that is not mounted
+ * read-only. AT_FDCWD (-100) is NOT valid for this syscall as it operates
+ * on file descriptors, not directory handles.
+ *
+ * param: name
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_USER_STRING
+ * range: 1, 255
+ * constraint: Must be a valid null-terminated string in user memory containing
+ * the extended attribute name with namespace prefix (e.g., "user.myattr").
+ * The name (including prefix) must be between 1 and XATTR_NAME_MAX (255)
+ * characters. An empty name returns ERANGE.
+ *
+ * param: value
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must be a valid pointer to user memory containing the attribute
+ * value, or NULL if size is 0. When size is non-zero, the pointer must be
+ * valid and accessible for size bytes.
+ *
+ * param: size
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, 65536
+ * constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX
+ * (65536 bytes). Zero is permitted and creates an attribute with empty value.
+ * Filesystem-specific limits may be smaller (e.g., ext4 limits total xattr
+ * space to one filesystem block, typically 4KB).
+ *
+ * param: flags
+ * type: KAPI_TYPE_INT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_MASK
+ * valid-mask: XATTR_CREATE | XATTR_REPLACE
+ * constraint: Controls creation/replacement behavior. Valid values are 0,
+ * XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if the
+ * attribute already exists. XATTR_REPLACE fails if the attribute does not
+ * exist. With flags=0, the attribute is created if it doesn't exist or
+ * replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually exclusive.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_ERROR_CHECK
+ * success: 0
+ * desc: Returns 0 on success. The extended attribute is set with the specified
+ * value. Any previous value for the attribute is replaced.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: The file descriptor fd is not valid or is not open for writing. This
+ * is returned from the fd class lookup when the file descriptor does not
+ * refer to an open file.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: Returned when: (1) file is immutable or append-only, (2) trusted.*
+ * without CAP_SYS_ADMIN, (3) security.* (except capability) without
+ * CAP_SYS_ADMIN, (4) user.* on sticky dir without ownership/CAP_FOWNER,
+ * (5) unmapped ID in idmapped mount, (6) user.* on non-regular/non-dir.
+ *
+ * error: ENODATA, Attribute not found
+ * desc: XATTR_REPLACE was specified but the named attribute does not exist on
+ * the file. Also returned when reading trusted.* without CAP_SYS_ADMIN.
+ *
+ * error: EEXIST, Attribute already exists
+ * desc: XATTR_CREATE was specified but the named attribute already exists on
+ * the file.
+ *
+ * error: ERANGE, Name out of range
+ * desc: The attribute name is empty (zero length) or exceeds XATTR_NAME_MAX
+ * (255 characters). Returned from import_xattr_name() via strncpy_from_user().
+ *
+ * error: E2BIG, Value too large
+ * desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Returned from
+ * setxattr_copy() before attempting to copy the value from userspace.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: The flags parameter contains bits other than XATTR_CREATE and
+ * XATTR_REPLACE. Also returned for malformed capability values when setting
+ * security.capability (invalid header format, invalid rootid mapping), or
+ * when the xattr name doesn't match any handler prefix.
+ *
+ * error: EFAULT, Bad address
+ * desc: One of the user pointers (name or value) is invalid or points to
+ * memory that cannot be accessed. Returned from strncpy_from_user() for
+ * name or vmemdup_user()/copy_from_user() for value.
+ *
+ * error: ENOMEM, Out of memory
+ * desc: Kernel could not allocate memory to copy the attribute value from
+ * userspace (via vmemdup_user), or for namespace capability conversion
+ * (cap_convert_nscap allocates memory for v3 capability format).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ * desc: The filesystem does not support extended attributes (IOP_XATTR not set),
+ * or no xattr handler exists for the given namespace prefix, or the handler
+ * does not implement the set operation. Also returned for POSIX ACL xattrs
+ * (system.posix_acl_*) when CONFIG_FS_POSIX_ACL is disabled.
+ *
+ * error: EROFS, Read-only filesystem
+ * desc: The filesystem containing the file is mounted read-only. Returned from
+ * mnt_want_write_file() before attempting any modification.
+ *
+ * error: EIO, I/O error
+ * desc: The inode is marked as bad (is_bad_inode), indicating filesystem
+ * corruption or I/O failure. Also may be returned by filesystem-specific
+ * xattr handler operations.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's disk quota for extended attributes has been exceeded.
+ * Filesystem-specific error returned from the handler's set operation.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: The filesystem has insufficient space to store the extended attribute.
+ * Filesystem-specific error from handler's set operation.
+ *
+ * error: EACCES, Permission denied
+ * desc: Write access to the file is denied based on DAC permissions. The caller
+ * does not have appropriate permission to modify xattrs on this file.
+ *
+ * lock: inode->i_rwsem
+ * type: KAPI_LOCK_MUTEX
+ * acquired: true
+ * released: true
+ * desc: The inode's read-write semaphore is acquired exclusively via inode_lock()
+ * before calling __vfs_setxattr_locked() and released via inode_unlock() after.
+ * This serializes concurrent xattr modifications on the same inode.
+ *
+ * lock: sb->s_writers (superblock freeze protection)
+ * type: KAPI_LOCK_SEMAPHORE
+ * acquired: true
+ * released: true
+ * desc: Write access to the mount is acquired via mnt_want_write_file() which
+ * calls sb_start_write(). This prevents filesystem freeze during the operation.
+ * Released via mnt_drop_write_file() after the operation completes.
+ *
+ * lock: file_rwsem (delegation breaking)
+ * type: KAPI_LOCK_SEMAPHORE
+ * acquired: true
+ * released: true
+ * desc: If the file has NFSv4 delegations, the percpu file_rwsem is acquired
+ * during delegation breaking in __break_lease(). The syscall may wait for
+ * delegation holders to acknowledge the break.
+ *
+ * signal: Any
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RESTART
+ * condition: Signal arrives during interruptible wait for delegation breaking
+ * desc: The syscall may wait for NFSv4 delegation holders to release their
+ * delegations via wait_event_interruptible_timeout() in __break_lease().
+ * During this wait, signals can interrupt the operation. If a signal is
+ * pending, the wait is interrupted and the operation may be retried by
+ * the kernel automatically if the signal disposition allows (SA_RESTART).
+ * error: -ERESTARTSYS
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ * target: Kernel buffer for attribute value
+ * desc: The attribute value is copied from userspace to a kernel buffer
+ * allocated via vmemdup_user(). This memory is freed (kvfree) after the
+ * operation completes, regardless of success or failure.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: File's extended attributes
+ * desc: On success, the specified extended attribute is created or modified.
+ * The change is typically persisted to storage synchronously or asynchronously
+ * depending on filesystem and mount options.
+ * reversible: yes
+ * condition: Operation succeeds
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: Inode flags (S_NOSEC)
+ * desc: When setting security.* attributes, the S_NOSEC flag is cleared from
+ * the inode. This flag is an optimization that indicates no security xattrs
+ * exist; clearing it ensures proper security checks on subsequent accesses.
+ * condition: Setting security.* namespace attribute
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: fsnotify event
+ * desc: On success, fsnotify_xattr() is called to notify any registered
+ * watchers (inotify, fanotify) of the extended attribute modification.
+ * This generates an IN_ATTRIB event.
+ * condition: Operation succeeds
+ *
+ * state-trans: extended attribute
+ * from: nonexistent or has old value
+ * to: has new value
+ * condition: Operation succeeds with flags=0 or appropriate flags
+ * desc: The extended attribute transitions from not existing (or having its
+ * previous value) to containing the new value. With XATTR_CREATE, the
+ * attribute must not exist beforehand. With XATTR_REPLACE, it must exist.
+ *
+ * capability: CAP_SYS_ADMIN
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Setting trusted.* namespace attributes and most security.* attributes
+ * without: Setting trusted.* returns EPERM. Setting security.* (except
+ * security.capability) returns EPERM. The check uses ns_capable() against
+ * the filesystem's user namespace.
+ * condition: Attribute name starts with "trusted." or "security." (except
+ * security.capability)
+ *
+ * capability: CAP_SETFCAP
+ * type: KAPI_CAP_GRANT_PERMISSION
+ * allows: Setting the security.capability extended attribute
+ * without: Setting security.capability returns EPERM
+ * condition: Attribute name is "security.capability". Checked via
+ * capable_wrt_inode_uidgid() which considers the inode's ownership.
+ *
+ * capability: CAP_FOWNER
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypassing owner check for user.* on sticky directories
+ * without: Non-owners cannot set user.* attributes on files in sticky
+ * directories without this capability
+ * condition: Setting user.* namespace attribute on a file in a sticky directory
+ *
+ * constraint: Filesystem support
+ * desc: The filesystem must support extended attributes (have IOP_XATTR flag
+ * set and provide xattr handlers). Common filesystems supporting xattrs
+ * include ext4, XFS, Btrfs, and tmpfs. Some filesystems (e.g., FAT, older
+ * ext2) do not support extended attributes.
+ *
+ * constraint: Filesystem-specific size limits
+ * desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may impose
+ * smaller limits. For example, ext4 limits all xattrs on an inode to fit
+ * in a single filesystem block (typically 4KB). XFS and ReiserFS support
+ * the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG.
+ *
+ * constraint: user.* namespace restrictions
+ * desc: The user.* namespace is only supported on regular files and directories.
+ * Attempting to set user.* attributes on other file types (symlinks, devices,
+ * sockets, FIFOs) returns EPERM (for write) or ENODATA (for read).
+ *
+ * constraint: LSM checks
+ * desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose additional
+ * restrictions via security_inode_setxattr() hook. These can return various
+ * error codes depending on the security policy. The LSM is called after
+ * permission checks but before the actual xattr modification.
+ *
+ * constraint: File descriptor must not be O_PATH
+ * desc: The file descriptor must be a regular file descriptor, not one opened
+ * with O_PATH. O_PATH file descriptors do not provide access to the file
+ * contents or metadata modification operations.
+ *
+ * examples: fsetxattr(fd, "user.comment", "test", 4, 0); // Set user attr
+ * fsetxattr(fd, "user.new", "val", 3, XATTR_CREATE); // Create only, fail if exists
+ * fsetxattr(fd, "user.existing", "new", 3, XATTR_REPLACE); // Replace only
+ * fsetxattr(fd, "user.empty", "", 0, 0); // Create attribute with empty value
+ *
+ * notes: Extended attributes provide a way to associate arbitrary metadata with
+ * files beyond the standard stat attributes. They are commonly used for:
+ * - SELinux security contexts (security.selinux)
+ * - File capabilities (security.capability)
+ * - POSIX ACLs (system.posix_acl_access, system.posix_acl_default)
+ * - User-defined metadata (user.* namespace)
+ *
+ * Using fsetxattr() with an already-open file descriptor avoids potential
+ * TOCTOU (time-of-check-time-of-use) race conditions that can occur when
+ * using setxattr() with a pathname, where the file might be replaced between
+ * opening and setting the attribute.
+ *
+ * The trusted.* namespace is designed for use by privileged processes to store
+ * data that should not be accessible to unprivileged users (e.g., during
+ * backup/restore operations).
+ *
+ * NFSv4 delegation support means this syscall may need to wait for remote
+ * clients to release their delegations before the operation can complete.
+ * This can introduce unbounded delays in pathological cases.
+ *
+ * For security.capability specifically, the kernel may convert between v2
+ * (non-namespaced) and v3 (namespaced) capability formats depending on the
+ * filesystem's user namespace and caller's capabilities.
+ *
+ * Unlike setxattr() and lsetxattr(), fsetxattr() does not involve path
+ * resolution, so errors related to path traversal (ENOENT, ENOTDIR,
+ * ENAMETOOLONG, ELOOP, ESTALE) are not possible.
+ *
+ * since-version: 2.4
+ */
SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
const void __user *,value, size_t, size, int, flags)
{
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 12/15] kernel/api: add API specification for sys_open
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (10 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 11/15] kernel/api: add API specification for fsetxattr Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 13/15] kernel/api: add API specification for sys_close Sasha Levin
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/open.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 318 insertions(+)
diff --git a/fs/open.c b/fs/open.c
index f328622061c56..343e6d3798ec3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1437,6 +1437,324 @@ int do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
}
+/**
+ * sys_open - Open or create a file
+ * @filename: Pathname of the file to open or create
+ * @flags: File access mode and behavior flags (O_RDONLY, O_WRONLY, O_RDWR, etc.)
+ * @mode: File permission bits for newly created files (only with O_CREAT/O_TMPFILE)
+ *
+ * long-desc: Opens the file specified by pathname. If O_CREAT or O_TMPFILE is
+ * specified in flags, the file is created if it does not exist; its mode is
+ * set according to the mode parameter modified by the process's umask.
+ *
+ * The flags argument must include one of the following access modes: O_RDONLY
+ * (read-only), O_WRONLY (write-only), or O_RDWR (read/write). These are the
+ * low-order two bits of flags. In addition, zero or more file creation and
+ * file status flags can be bitwise-ORed in flags.
+ *
+ * File creation flags: O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC, O_DIRECTORY,
+ * O_NOFOLLOW, O_CLOEXEC, O_TMPFILE. These flags affect open behavior.
+ *
+ * File status flags: O_APPEND, O_ASYNC, O_DIRECT, O_DSYNC, O_LARGEFILE,
+ * O_NOATIME, O_NONBLOCK (O_NDELAY), O_PATH, O_SYNC. These become part of the
+ * file's open file description and can be retrieved/modified with fcntl().
+ *
+ * The return value is a file descriptor, a small nonnegative integer used in
+ * subsequent system calls (read, write, lseek, fcntl, etc.) to refer to the
+ * open file. The file descriptor returned by a successful open is the lowest-
+ * numbered file descriptor not currently open for the process.
+ *
+ * On 64-bit systems, O_LARGEFILE is automatically added to the flags. On 32-bit
+ * systems, files larger than 2GB require O_LARGEFILE to be explicitly set.
+ *
+ * This syscall is a legacy interface. Modern code should prefer openat() for
+ * relative path operations and openat2() for additional control via resolve
+ * flags. The open() call is equivalent to openat(AT_FDCWD, pathname, flags).
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: filename
+ * type: KAPI_TYPE_PATH
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_USER_PATH
+ * constraint: Must be a valid null-terminated path string in user memory.
+ * Maximum path length is PATH_MAX (4096 bytes) including null terminator.
+ * For relative paths, resolution starts from current working directory.
+ * The path is followed (symlinks resolved) unless O_NOFOLLOW is specified.
+ *
+ * param: flags
+ * type: KAPI_TYPE_INT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_MASK
+ * valid-mask: O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY |
+ * O_TRUNC | O_APPEND | O_NONBLOCK | O_DSYNC | O_SYNC | FASYNC |
+ * O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | O_NOATIME |
+ * O_CLOEXEC | O_PATH | O_TMPFILE
+ * constraint: Must include exactly one of O_RDONLY (0), O_WRONLY (1), or
+ * O_RDWR (2) as the access mode. Additional flags may be ORed. Invalid flag
+ * combinations (e.g., O_DIRECTORY|O_CREAT, O_PATH with incompatible flags,
+ * O_TMPFILE without O_DIRECTORY, O_TMPFILE with read-only mode) return
+ * EINVAL. Unknown flags are silently ignored for backward compatibility
+ * (unlike openat2 which rejects them).
+ *
+ * param: mode
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_MASK
+ * valid-mask: S_ISUID | S_ISGID | S_ISVTX | S_IRWXU | S_IRWXG | S_IRWXO
+ * constraint: Only meaningful when O_CREAT or O_TMPFILE is specified in
+ * flags. Specifies the file mode bits (permissions and setuid/setgid/sticky
+ * bits) for a newly created file. The effective mode is (mode & ~umask).
+ * When O_CREAT/O_TMPFILE is not set, mode is ignored. Mode values exceeding
+ * S_IALLUGO (07777) are masked off.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_FD
+ * success: >= 0
+ * desc: On success, returns a new file descriptor (non-negative integer).
+ * The returned file descriptor is the lowest-numbered descriptor not
+ * currently open for the process. On error, returns -1 and errno is set.
+ *
+ * error: EACCES, Permission denied
+ * desc: The requested access to the file is not allowed, or search permission
+ * is denied for one of the directories in the path prefix of pathname, or
+ * the file did not exist yet and write access to the parent directory is
+ * not allowed, or O_TRUNC is specified but write permission is denied, or
+ * the file is on a filesystem mounted with noexec and MAY_EXEC was implied.
+ *
+ * error: EBUSY, Device or resource busy
+ * desc: O_EXCL was specified in flags and pathname refers to a block device
+ * that is in use by the system (e.g., it is mounted).
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: O_CREAT is specified and the file does not exist, and the user's quota
+ * of disk blocks or inodes on the filesystem has been exhausted.
+ *
+ * error: EEXIST, File exists
+ * desc: O_CREAT and O_EXCL were specified in flags, but pathname already exists.
+ * This error is atomic with respect to file creation - it prevents race
+ * conditions (TOCTOU) when creating files.
+ *
+ * error: EFAULT, Bad address
+ * desc: pathname points outside the process's accessible address space.
+ *
+ * error: EINTR, Interrupted system call
+ * desc: The call was interrupted by a signal handler before completing file
+ * open. This can occur during lock acquisition or when breaking leases.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: Returned for several conditions: (1) Invalid O_* flag combinations
+ * (O_DIRECTORY|O_CREAT, O_TMPFILE without O_DIRECTORY, O_TMPFILE with
+ * read-only access, O_PATH with flags other than O_DIRECTORY|O_NOFOLLOW|
+ * O_CLOEXEC). (2) mode contains bits outside S_IALLUGO when O_CREAT/O_TMPFILE
+ * is set (openat2 only). (3) O_DIRECT requested but filesystem doesn't
+ * support it. (4) The filesystem does not support O_SYNC or O_DSYNC.
+ *
+ * error: EISDIR, Is a directory
+ * desc: pathname refers to a directory and the access requested involved
+ * writing (O_WRONLY, O_RDWR, or O_TRUNC). Also returned when O_TMPFILE is
+ * used on a directory that doesn't support tmpfile operations.
+ *
+ * error: ELOOP, Too many symbolic links
+ * desc: Too many symbolic links were encountered in resolving pathname, or
+ * O_NOFOLLOW was specified but pathname refers to a symbolic link.
+ *
+ * error: EMFILE, Too many open files
+ * desc: The per-process limit on the number of open file descriptors has been
+ * reached. This limit is RLIMIT_NOFILE (default typically 1024, max set by
+ * /proc/sys/fs/nr_open).
+ *
+ * error: ENAMETOOLONG, File name too long
+ * desc: pathname was too long, exceeding PATH_MAX (4096) bytes, or a single
+ * path component exceeded NAME_MAX (usually 255) bytes.
+ *
+ * error: ENFILE, Too many open files in system
+ * desc: The system-wide limit on the total number of open files has been
+ * reached (/proc/sys/fs/file-max). Processes with CAP_SYS_ADMIN can exceed
+ * this limit.
+ *
+ * error: ENODEV, No such device
+ * desc: pathname refers to a special file that has no corresponding device, or
+ * the file's inode has no file operations assigned.
+ *
+ * error: ENOENT, No such file or directory
+ * desc: A directory component in pathname does not exist or is a dangling
+ * symbolic link, or O_CREAT is not set and the named file does not exist,
+ * or pathname is an empty string (unless AT_EMPTY_PATH is used with openat2).
+ *
+ * error: ENOMEM, Out of memory
+ * desc: The kernel could not allocate sufficient memory for the file structure,
+ * path lookup structures, or the filename buffer.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: O_CREAT was specified and the file does not exist, and the directory
+ * or filesystem containing the file has no room for a new file entry.
+ *
+ * error: ENOTDIR, Not a directory
+ * desc: A component used as a directory in pathname is not actually a directory,
+ * or O_DIRECTORY was specified and pathname was not a directory.
+ *
+ * error: ENXIO, No such device or address
+ * desc: O_NONBLOCK | O_WRONLY is set and the named file is a FIFO and no
+ * process has the FIFO open for reading. Also returned when opening a device
+ * special file that does not exist.
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ * desc: The filesystem containing pathname does not support O_TMPFILE.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ * desc: pathname refers to a regular file that is too large to be opened.
+ * This occurs on 32-bit systems without O_LARGEFILE when the file size
+ * exceeds 2GB (2^31 - 1 bytes).
+ *
+ * error: EPERM, Operation not permitted
+ * desc: O_NOATIME flag was specified but the effective UID of the caller did
+ * not match the owner of the file and the caller is not privileged, or the
+ * file is append-only and O_TRUNC was specified or write mode without
+ * O_APPEND, or the file is immutable, or a seal prevents the operation.
+ *
+ * error: EROFS, Read-only file system
+ * desc: pathname refers to a file on a read-only filesystem and write access
+ * was requested.
+ *
+ * error: ETXTBSY, Text file busy
+ * desc: pathname refers to an executable image which is currently being
+ * executed, or to a swap file, and write access or truncation was requested.
+ *
+ * error: EWOULDBLOCK, Resource temporarily unavailable
+ * desc: O_NONBLOCK was specified and an incompatible lease is held on the file.
+ *
+ * lock: files->file_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: Acquired when allocating a file descriptor slot. Held briefly during
+ * fd allocation via alloc_fd() and released before the syscall returns.
+ *
+ * lock: inode->i_rwsem (parent directory)
+ * type: KAPI_LOCK_RWLOCK
+ * acquired: conditional
+ * released: true
+ * desc: Write lock acquired on parent directory inode when creating a new file
+ * (O_CREAT). Acquired via inode_lock_nested() in lookup path. May use
+ * killable variant which can return EINTR on fatal signal.
+ *
+ * lock: RCU read-side
+ * type: KAPI_LOCK_RCU
+ * acquired: true
+ * released: true
+ * desc: Path lookup uses RCU mode initially for performance. If RCU lookup
+ * fails (returns -ECHILD), falls back to reference-based lookup.
+ *
+ * signal: Any signal
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RETURN
+ * condition: When blocked on interruptible or killable operations
+ * desc: The syscall may be interrupted during path lookup, lock acquisition,
+ * or lease breaking. Fatal signals (SIGKILL, etc.) will interrupt killable
+ * operations. Non-fatal signals may interrupt interruptible operations.
+ * error: -EINTR
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY
+ * target: file descriptor, file structure, dentry cache
+ * desc: Allocates a new file descriptor in the process's fd table. Allocates
+ * a struct file from the filp slab cache. May allocate dentries and inodes
+ * during path lookup. System-wide file count (nr_files) is incremented.
+ * reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: filesystem, inode
+ * condition: When O_CREAT is specified and file doesn't exist
+ * desc: Creates a new file on the filesystem. Creates new inode, allocates
+ * data blocks as needed, and creates directory entry. Updates parent
+ * directory mtime and ctime.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: file content
+ * condition: When O_TRUNC is specified for existing file
+ * desc: Truncates the file to zero length, releasing data blocks. Updates
+ * file mtime and ctime. May trigger notifications to lease holders.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: inode timestamps
+ * condition: Unless O_NOATIME is specified
+ * desc: Opens for reading may update inode access time (atime) unless mounted
+ * with noatime/relatime or O_NOATIME is specified. Opens for writing that
+ * truncate or create update mtime and ctime.
+ *
+ * capability: CAP_DAC_OVERRIDE
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypass file read, write, and execute permission checks
+ * without: Standard DAC (discretionary access control) checks are applied
+ * condition: Checked when file permission would otherwise deny access
+ *
+ * capability: CAP_DAC_READ_SEARCH
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypass read permission on files and search permission on directories
+ * without: Must have read permission on file or search permission on directory
+ * condition: Checked during path traversal and file open
+ *
+ * capability: CAP_FOWNER
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Use O_NOATIME on files not owned by caller
+ * without: O_NOATIME returns EPERM if caller is not file owner
+ * condition: Checked when O_NOATIME is specified and caller is not owner
+ *
+ * capability: CAP_SYS_ADMIN
+ * type: KAPI_CAP_INCREASE_LIMIT
+ * allows: Exceed the system-wide file limit (file-max)
+ * without: Returns ENFILE when system limit is reached
+ * condition: Checked in alloc_empty_file() when nr_files >= max_files
+ *
+ * constraint: RLIMIT_NOFILE (per-process fd limit)
+ * desc: The returned file descriptor must be less than the process's
+ * RLIMIT_NOFILE limit. Default is typically 1024, maximum is controlled
+ * by /proc/sys/fs/nr_open (default 1048576). Exceeding returns EMFILE.
+ * expr: fd < rlimit(RLIMIT_NOFILE)
+ *
+ * constraint: file-max (system-wide limit)
+ * desc: System-wide limit on open files in /proc/sys/fs/file-max. Processes
+ * without CAP_SYS_ADMIN receive ENFILE when this limit is reached. The
+ * limit is computed based on system memory at boot time.
+ * expr: nr_files < files_stat.max_files || capable(CAP_SYS_ADMIN)
+ *
+ * constraint: PATH_MAX
+ * desc: Maximum length of pathname including null terminator is PATH_MAX
+ * (4096 bytes). Individual path components must not exceed NAME_MAX (255).
+ *
+ * examples: fd = open("/etc/passwd", O_RDONLY); // Read existing file
+ * fd = open("/tmp/newfile", O_WRONLY | O_CREAT | O_TRUNC, 0644); // Create/truncate
+ * fd = open("/tmp/lockfile", O_WRONLY | O_CREAT | O_EXCL, 0600); // Exclusive create
+ * fd = open("/dev/null", O_RDWR); // Open device
+ * fd = open("/tmp", O_RDONLY | O_DIRECTORY); // Open directory
+ * fd = open("/tmp", O_TMPFILE | O_RDWR, 0600); // Anonymous temp file
+ *
+ * notes: The distinction between O_RDONLY, O_WRONLY, and O_RDWR is critical.
+ * O_RDONLY is defined as 0, so (flags & O_RDONLY) will be true for all flags.
+ * Test access mode using (flags & O_ACCMODE) == O_RDONLY.
+ *
+ * When O_CREAT is specified without O_EXCL, there is a race condition between
+ * testing for file existence and creating it. Use O_CREAT | O_EXCL for atomic
+ * exclusive file creation.
+ *
+ * O_CLOEXEC should be used in multithreaded programs to prevent file descriptor
+ * leaks to child processes between fork() and execve().
+ *
+ * O_DIRECT has alignment requirements that vary by filesystem. Use statx()
+ * with STATX_DIOALIGN (Linux 6.1+) to query requirements. Unaligned I/O may
+ * fail with EINVAL or fall back to buffered I/O.
+ *
+ * O_PATH opens a file descriptor that can be used only for certain operations
+ * (fstat, dup, fcntl, close, fchdir on directories, as dirfd for *at() calls).
+ * I/O operations will fail with EBADF.
+ *
+ * since-version: 1.0
+ */
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
if (force_o_largefile())
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 13/15] kernel/api: add API specification for sys_close
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (11 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 12/15] kernel/api: add API specification for sys_open Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 14/15] kernel/api: add API specification for sys_read Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 15/15] kernel/api: add API specification for sys_write Sasha Levin
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/open.c | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 243 insertions(+), 4 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 343e6d3798ec3..26d8ee8336405 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1868,10 +1868,249 @@ int filp_close(struct file *filp, fl_owner_t id)
}
EXPORT_SYMBOL(filp_close);
-/*
- * Careful here! We test whether the file pointer is NULL before
- * releasing the fd. This ensures that one clone task can't release
- * an fd while another clone is opening it.
+/**
+ * sys_close - Close a file descriptor
+ * @fd: The file descriptor to close
+ *
+ * long-desc: Terminates access to an open file descriptor, releasing the file
+ * descriptor for reuse by subsequent open(), dup(), or similar syscalls. Any
+ * advisory record locks (POSIX locks, OFD locks, and flock locks) held on the
+ * associated file are released. When this is the last file descriptor
+ * referring to the underlying open file description, associated resources are
+ * freed. If the file was previously unlinked, the file itself is deleted when
+ * the last reference is closed.
+ *
+ * CRITICAL: The file descriptor is ALWAYS closed, even when close() returns
+ * an error. This differs from POSIX semantics where the state of the file
+ * descriptor is unspecified after EINTR. On Linux, the fd is released early
+ * in close() processing before flush operations that may fail. Therefore,
+ * retrying close() after an error return is DANGEROUS and may close an
+ * unrelated file descriptor that was assigned to another thread.
+ *
+ * Errors returned from close() (EIO, ENOSPC, EDQUOT) indicate that the final
+ * flush of buffered data failed. These errors commonly occur on network
+ * filesystems like NFS when write errors are deferred to close time. A
+ * successful return from close() does NOT guarantee that data has been
+ * successfully written to disk; the kernel uses buffer cache to defer writes.
+ * To ensure data persistence, call fsync() before close().
+ *
+ * On close, the following cleanup operations are performed: POSIX advisory
+ * locks are removed, dnotify registrations are cleaned up, the file is
+ * flushed if the file operations define a flush callback, and the file
+ * reference is released. If this was the last reference, additional cleanup
+ * includes: fsnotify close notification, epoll cleanup, flock and lease
+ * removal, FASYNC cleanup, the file's release callback invocation, and
+ * the file structure deallocation.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ * type: KAPI_TYPE_FD
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, INT_MAX
+ * constraint: Must be a valid, open file descriptor for the current process.
+ * The value 0, 1, or 2 (stdin, stdout, stderr) may be closed like any other
+ * fd, though this is unusual and may cause issues with libraries that assume
+ * these descriptors are valid. The parameter is unsigned int to match kernel
+ * file descriptor table indexing, but values exceeding INT_MAX are effectively
+ * invalid due to internal checks.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_EXACT
+ * success: 0
+ * desc: Returns 0 on success. On error, returns a negative error code.
+ * IMPORTANT: Even when an error is returned, the file descriptor is still
+ * closed and must not be used again. The error indicates a problem with
+ * the final flush operation, not that the fd remains open.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: The file descriptor fd is not a valid open file descriptor, or was
+ * already closed. This is the only error that indicates the fd was NOT
+ * closed (because it was never open to begin with). Occurs when fd is out
+ * of range, has no file assigned, or was already closed.
+ *
+ * error: EINTR, Interrupted system call
+ * desc: The flush operation was interrupted by a signal before completion.
+ * This occurs when a file's flush callback (e.g., NFS) performs an
+ * interruptible wait that receives a signal. IMPORTANT: Despite this error,
+ * the file descriptor IS closed and must not be used again. This error
+ * is generated by converting kernel-internal restart codes (ERESTARTSYS,
+ * ERESTARTNOINTR, ERESTARTNOHAND, ERESTART_RESTARTBLOCK) to EINTR because
+ * restarting the syscall would be incorrect once the fd is freed.
+ *
+ * error: EIO, I/O error
+ * desc: An I/O error occurred during the flush of buffered data to the
+ * underlying storage. This typically indicates a hardware error, network
+ * failure on NFS, or other storage system error. The file descriptor is
+ * still closed. Previously buffered write data may have been lost.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: There was insufficient space on the storage device to flush buffered
+ * writes. This is common on NFS when the server runs out of space between
+ * write() and close(). The file descriptor is still closed.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's disk quota was exceeded while attempting to flush buffered
+ * writes. Common on NFS when quota is exceeded between write() and close().
+ * The file descriptor is still closed.
+ *
+ * lock: files->file_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: Acquired via file_close_fd() to atomically lookup and remove the fd
+ * from the file descriptor table. Held only during the table manipulation;
+ * released before flush and final cleanup operations. This ensures that
+ * another thread cannot allocate the same fd number while close is in
+ * progress.
+ *
+ * lock: file->f_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: Acquired during epoll cleanup (eventpoll_release_file) and dnotify
+ * cleanup to safely unlink the file from monitoring structures. May also
+ * be acquired during lock context operations.
+ *
+ * lock: ep->mtx
+ * type: KAPI_LOCK_MUTEX
+ * acquired: true
+ * released: true
+ * desc: Acquired during epoll cleanup if the file was monitored by epoll.
+ * Used to safely remove the file from epoll interest lists.
+ *
+ * lock: flc_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: File lock context spinlock, acquired during locks_remove_file() to
+ * safely remove POSIX, flock, and lease locks associated with the file.
+ *
+ * signal: pending_signals
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RETURN
+ * condition: When flush callback performs interruptible wait
+ * desc: If the file's flush callback (e.g., nfs_file_flush) performs an
+ * interruptible wait and a signal is pending, the wait is interrupted.
+ * Any kernel restart codes are converted to EINTR since close cannot be
+ * restarted after the fd is freed.
+ * error: -EINTR
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY | KAPI_EFFECT_IRREVERSIBLE
+ * target: File descriptor table entry
+ * desc: The file descriptor is removed from the process's file descriptor
+ * table, making the fd number available for reuse by subsequent open(),
+ * dup(), or similar calls. This occurs BEFORE any flush or cleanup that
+ * might fail, making the operation irreversible regardless of return value.
+ * condition: Always (when fd is valid)
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_LOCK_RELEASE
+ * target: POSIX advisory locks, OFD locks, flock locks
+ * desc: All advisory locks held on the file by this process are removed.
+ * POSIX locks are removed via locks_remove_posix() during filp_flush().
+ * All lock types (POSIX, OFD, flock) are removed via locks_remove_file()
+ * during __fput() when this is the last reference.
+ * condition: File has FMODE_OPENED and !(FMODE_PATH)
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY
+ * target: File leases
+ * desc: Any file leases held on the file are removed during locks_remove_file()
+ * when this is the last reference to the open file description.
+ * condition: File had leases and this is the last close
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: dnotify registrations
+ * desc: Directory notification (dnotify) registrations associated with this
+ * file are cleaned up via dnotify_flush(). This only applies to directories.
+ * condition: File is a directory with dnotify registrations
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: epoll interest lists
+ * desc: If the file was being monitored by epoll instances, it is removed
+ * from those interest lists via eventpoll_release().
+ * condition: File was added to epoll instances
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: Buffered data
+ * desc: The file's flush callback is invoked if defined (e.g., NFS calls
+ * nfs_file_flush). This attempts to write any buffered data to storage
+ * and may return errors (EIO, ENOSPC, EDQUOT) if the flush fails. The
+ * success of this flush is NOT guaranteed even with a 0 return; use
+ * fsync() before close() to ensure data persistence.
+ * condition: File has a flush callback and was opened for writing
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ * target: struct file and related structures
+ * desc: When this is the last reference to the file, __fput() is called
+ * synchronously (fput_close_sync), which frees the file structure, releases
+ * the dentry and mount references, and invokes the file's release callback.
+ * condition: This is the last reference to the file
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: Unlinked file deletion
+ * desc: If the file was previously unlinked (deleted) but kept open, closing
+ * the last reference causes the actual file data to be removed from the
+ * filesystem and the inode to be freed.
+ * condition: File was unlinked and this is the last reference
+ * reversible: no
+ *
+ * state-trans: file_descriptor
+ * from: open
+ * to: closed/free
+ * condition: Valid fd passed to close
+ * desc: The file descriptor transitions from open (usable) to closed (invalid).
+ * The fd number becomes available for reuse. This transition occurs early
+ * in close() processing, before any operations that might fail.
+ *
+ * state-trans: file_reference_count
+ * from: n
+ * to: n-1 (or freed if n was 1)
+ * condition: Always on successful fd lookup
+ * desc: The file's reference count is decremented. If this was the last
+ * reference, the file is fully cleaned up and freed.
+ *
+ * constraint: File Descriptor Reuse Race
+ * desc: Because the fd is freed early in close() processing, another thread
+ * may receive the same fd number from a concurrent open() before close()
+ * returns. Applications must not retry close() after an error return, as
+ * this could close an unrelated file opened by another thread.
+ * expr: After close(fd) returns (even with error), fd is invalid
+ *
+ * examples: close(fd); // Basic usage - ignore errors (common but not ideal)
+ * if (close(fd) == -1) perror("close"); // Log errors for debugging
+ * fsync(fd); close(fd); // Ensure data persistence before closing
+ *
+ * notes: This syscall has subtle non-POSIX semantics: the fd is ALWAYS closed
+ * regardless of the return value. POSIX specifies that on EINTR, the state
+ * of the fd is unspecified, but Linux always closes it. HP-UX requires
+ * retrying close() on EINTR, but doing so on Linux may close an unrelated
+ * fd that was reassigned by another thread. For portable code, the safest
+ * approach is to check for errors but never retry close().
+ *
+ * Error codes from the flush callback (EIO, ENOSPC, EDQUOT) indicate that
+ * previously written data may have been lost. These errors are particularly
+ * common on NFS where write errors are often deferred to close time.
+ *
+ * The driver's release() callback errors are explicitly ignored by the
+ * kernel, so device driver cleanup errors are not propagated to userspace.
+ *
+ * Calling close() on a file descriptor while another thread is using it
+ * (e.g., in a blocking read() or write()) has implementation-defined
+ * behavior. On Linux, the blocked operation continues on the underlying
+ * file and may complete even after close() returns.
+ *
+ * since-version: 1.0
*/
SYSCALL_DEFINE1(close, unsigned int, fd)
{
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 14/15] kernel/api: add API specification for sys_read
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (12 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 13/15] kernel/api: add API specification for sys_close Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
2025-12-18 20:42 ` [RFC PATCH v5 15/15] kernel/api: add API specification for sys_write Sasha Levin
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/read_write.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 287 insertions(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index 833bae068770a..422046a666b1d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -719,6 +719,293 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
return ret;
}
+/**
+ * sys_read - Read data from a file descriptor
+ * @fd: File descriptor to read from
+ * @buf: User-space buffer to read data into
+ * @count: Maximum number of bytes to read
+ *
+ * long-desc: Attempts to read up to count bytes from file descriptor fd into
+ * the buffer starting at buf. For seekable files (regular files, block
+ * devices), the read begins at the current file offset, and the file offset
+ * is advanced by the number of bytes read. For non-seekable files (pipes,
+ * FIFOs, sockets, character devices), the file offset is not used.
+ *
+ * If count is zero and fd refers to a regular file, read() may detect errors
+ * as described below. In the absence of errors, or if read() does not check
+ * for errors, a read() with a count of 0 returns zero and has no other effects.
+ *
+ * On success, the number of bytes read is returned (zero indicates end of
+ * file for regular files). It is not an error if this number is smaller than
+ * the number of bytes requested; this may happen because fewer bytes are
+ * actually available right now (maybe because we were close to end-of-file,
+ * or because we are reading from a pipe, socket, or terminal), or because
+ * read() was interrupted by a signal.
+ *
+ * On Linux, read() transfers at most MAX_RW_COUNT (0x7ffff000, approximately
+ * 2GB) bytes per call, regardless of whether the filesystem would allow more.
+ * This is to avoid issues with signed arithmetic overflow on 32-bit systems.
+ *
+ * POSIX allows reads that are interrupted after reading some data to either
+ * return -1 (with errno set to EINTR) or return the number of bytes already
+ * read. Linux follows the latter behavior: if data has been read before a
+ * signal arrives, the call returns the bytes read rather than failing.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ * type: KAPI_TYPE_FD
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, INT_MAX
+ * constraint: Must be a valid, open file descriptor with read permission.
+ * The file must have been opened with O_RDONLY or O_RDWR. Special values
+ * like AT_FDCWD are not valid. File descriptors for directories return
+ * EISDIR. Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are
+ * valid if open and readable.
+ *
+ * param: buf
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_OUT | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must point to a valid, writable user-space memory region of at
+ * least count bytes. The buffer is validated via access_ok() before any
+ * read operation. NULL is invalid and will return EFAULT. The buffer may
+ * be partially written if an error occurs mid-read. For O_DIRECT reads,
+ * the buffer may need to be aligned to the filesystem's block size (varies
+ * by filesystem, check via statx() with STATX_DIOALIGN).
+ *
+ * param: count
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, SIZE_MAX
+ * constraint: Maximum number of bytes to read. Clamped internally to
+ * MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) to
+ * prevent signed overflow issues. A count of 0 returns immediately with 0
+ * without accessing the file (but may still detect errors). Large values
+ * are not errors but will be clamped. Cast to ssize_t must not be negative.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_RANGE
+ * success: >= 0
+ * desc: On success, returns the number of bytes read (non-negative). Zero
+ * indicates end-of-file (EOF) for regular files, or no data available
+ * from a device that does not block. The return value may be less than
+ * count if fewer bytes were available (short read). Partial reads are
+ * not errors. On error, returns a negative error code.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: fd is not a valid file descriptor, or fd was not opened for reading.
+ * This includes file descriptors opened with O_WRONLY, O_PATH, or file
+ * descriptors that have been closed. Also returned if the file structure
+ * does not have FMODE_READ set.
+ *
+ * error: EFAULT, Bad address
+ * desc: buf points outside the accessible address space. The buffer address
+ * failed access_ok() validation. Can also occur if a fault happens during
+ * copy_to_user() when transferring data to user space after the read
+ * completes in kernel space.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: Returned in several cases: (1) The file descriptor refers to an
+ * object that is not suitable for reading (no read or read_iter method).
+ * (2) The file was opened with O_DIRECT and the buffer alignment, offset,
+ * or count does not meet the filesystem's alignment requirements. (3) For
+ * timerfd file descriptors, the buffer is smaller than 8 bytes. (4) The
+ * count argument, when cast to ssize_t, is negative.
+ *
+ * error: EISDIR, Is a directory
+ * desc: fd refers to a directory. Directories cannot be read using read();
+ * use getdents64() instead. This error is returned by the generic_read_dir()
+ * handler installed for directory file operations.
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ * desc: fd refers to a file (pipe, socket, device) that is marked non-blocking
+ * (O_NONBLOCK) and the read would block. Also returned with IOCB_NOWAIT
+ * when data is not immediately available. Equivalent to EWOULDBLOCK.
+ * The application should retry the read later or use select/poll/epoll.
+ *
+ * error: EINTR, Interrupted system call
+ * desc: The call was interrupted by a signal before any data was read. This
+ * only occurs if no data has been transferred; if some data was read before
+ * the signal, the call returns the number of bytes read. The caller should
+ * typically restart the read.
+ *
+ * error: EIO, Input/output error
+ * desc: A low-level I/O error occurred. For regular files, this typically
+ * indicates a hardware error on the storage device, a filesystem error,
+ * or a network filesystem timeout. For terminals, this may indicate the
+ * controlling terminal has been closed for a background process.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ * desc: The file position plus count would exceed LLONG_MAX. Also returned
+ * when reading from certain files (e.g., some /proc files) where the file
+ * position would overflow. For files without FOP_UNSIGNED_OFFSET flag,
+ * negative file positions are not allowed.
+ *
+ * error: ENOBUFS, No buffer space available
+ * desc: Returned when reading from pipe-based watch queues (CONFIG_WATCH_QUEUE)
+ * when the buffer is too small to hold a complete notification, or when
+ * reading packets from pipes with PIPE_BUF_FLAG_WHOLE set.
+ *
+ * error: ERESTARTSYS, Restart system call (internal)
+ * desc: Internal error code indicating the syscall should be restarted. This
+ * is typically translated to EINTR if SA_RESTART is not set on the signal
+ * handler, or the syscall is transparently restarted if SA_RESTART is set.
+ * User space should not see this error code directly.
+ *
+ * error: EACCES, Permission denied
+ * desc: The security subsystem (LSM such as SELinux or AppArmor) denied
+ * the read operation via security_file_permission(). This can occur even
+ * if the file was successfully opened, as LSM policies may enforce per-
+ * operation checks.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: Returned by fanotify permission events (CONFIG_FANOTIFY_ACCESS_PERMISSIONS)
+ * when a user-space fanotify listener denies the read operation via
+ * fsnotify_file_area_perm().
+ *
+ * lock: file->f_pos_lock
+ * type: KAPI_LOCK_MUTEX
+ * acquired: conditional
+ * released: true
+ * desc: For regular files that require atomic position updates (FMODE_ATOMIC_POS),
+ * the f_pos_lock mutex is acquired by fdget_pos() at syscall entry and released
+ * by fdput_pos() at syscall exit. This serializes concurrent reads that share
+ * the same file description. Not acquired for files opened with FMODE_STREAM
+ * (pipes, sockets) or when the file is not shared.
+ *
+ * lock: Filesystem-specific locks
+ * type: KAPI_LOCK_CUSTOM
+ * acquired: conditional
+ * released: true
+ * desc: The filesystem's read_iter or read method may acquire additional locks.
+ * For regular files, this typically includes the inode's i_rwsem for certain
+ * operations. For pipes, the pipe->mutex is acquired. For sockets, socket
+ * lock is acquired. These are internal to the file operation and released
+ * before return.
+ *
+ * lock: RCU read-side
+ * type: KAPI_LOCK_RCU
+ * acquired: conditional
+ * released: true
+ * desc: Used during file descriptor lookup via fdget(). RCU read lock protects
+ * access to the file descriptor table. Released by fdput() at syscall exit.
+ *
+ * signal: Any signal
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RETURN
+ * condition: When blocked waiting for data on interruptible operations
+ * desc: The syscall may be interrupted by signals while waiting for data to
+ * become available (pipes, sockets, terminals) or waiting for locks. If
+ * interrupted before any data is read, returns -EINTR or -ERESTARTSYS.
+ * If data has already been read, returns the number of bytes read.
+ * error: -EINTR
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_FILE_POSITION
+ * target: file->f_pos
+ * condition: For seekable files when read succeeds (returns > 0)
+ * desc: The file offset (f_pos) is advanced by the number of bytes read.
+ * For stream files (FMODE_STREAM such as pipes and sockets), the offset
+ * is not used or modified. The offset update is protected by f_pos_lock
+ * when the file is shared between threads/processes.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: inode access time (atime)
+ * condition: When read succeeds and O_NOATIME is not set
+ * desc: Updates the file's access time (atime) via touch_atime(). The update
+ * may be suppressed by mount options (noatime, relatime), the O_NOATIME
+ * flag, or if the filesystem does not support atime. Relatime only updates
+ * atime if it is older than mtime or ctime, or more than a day old.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: task I/O accounting
+ * condition: Always
+ * desc: Updates the current task's I/O accounting statistics. The rchar field
+ * (read characters) is incremented by bytes read via add_rchar(). The syscr
+ * field (syscall read count) is incremented via inc_syscr(). These statistics
+ * are visible in /proc/[pid]/io. Updated regardless of success or failure.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: fsnotify events
+ * condition: When read returns > 0
+ * desc: Generates an FS_ACCESS fsnotify event via fsnotify_access() allowing
+ * inotify, fanotify, and dnotify watchers to be notified of the read. This
+ * occurs after data transfer completes successfully.
+ * reversible: no
+ *
+ * capability: CAP_DAC_OVERRIDE
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypass discretionary access control on read permission
+ * without: Standard DAC checks are enforced
+ * condition: Checked via security_file_permission() during rw_verify_area()
+ *
+ * capability: CAP_DAC_READ_SEARCH
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypass read permission checks on regular files
+ * without: Must have read permission on file
+ * condition: Checked by LSM hooks during the read operation
+ *
+ * constraint: MAX_RW_COUNT
+ * desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MAX &
+ * PAGE_MASK, approximately 2GB minus one page) to prevent integer overflow
+ * in internal calculations. This is transparent to the caller; the syscall
+ * succeeds but reads at most MAX_RW_COUNT bytes.
+ * expr: actual_count = min(count, MAX_RW_COUNT)
+ *
+ * constraint: File must be open for reading
+ * desc: The file descriptor must have been opened with O_RDONLY or O_RDWR.
+ * Files opened with O_WRONLY or O_PATH cannot be read and return EBADF.
+ * The file must have both FMODE_READ and FMODE_CAN_READ flags set.
+ * expr: (file->f_mode & FMODE_READ) && (file->f_mode & FMODE_CAN_READ)
+ *
+ * examples: n = read(fd, buf, sizeof(buf)); // Basic read
+ * n = read(STDIN_FILENO, buf, 1024); // Read from stdin
+ * while ((n = read(fd, buf, 4096)) > 0) { process(buf, n); } // Read loop
+ * if (read(fd, buf, count) == 0) { handle_eof(); } // Check for EOF
+ *
+ * notes: The behavior of read() varies significantly depending on the type of
+ * file descriptor:
+ *
+ * - Regular files: Reads from current position, advances position, returns 0
+ * at EOF. Short reads are rare but possible near EOF or on signal.
+ *
+ * - Pipes and FIFOs: Blocking by default. Returns available data (up to count)
+ * or blocks until data is available. Returns 0 when all writers have closed.
+ * O_NONBLOCK returns EAGAIN when empty instead of blocking.
+ *
+ * - Sockets: Similar to pipes. Specific behavior depends on socket type and
+ * protocol. MSG_* flags can be specified via recv() for more control.
+ *
+ * - Terminals: Line-buffered in canonical mode; read returns when newline is
+ * entered or buffer is full. Raw mode returns immediately when data available.
+ * Special handling for signals (SIGINT on Ctrl+C, etc.).
+ *
+ * - Device special files: Behavior is device-specific. Some devices support
+ * seeking, others do not. Read size may be constrained by device.
+ *
+ * Race condition: Concurrent reads from the same file description (not just
+ * file descriptor) can race on the file position. Linux 3.14+ provides atomic
+ * position updates for regular files via f_pos_lock, but applications should
+ * use pread() for concurrent positioned reads.
+ *
+ * O_DIRECT reads bypass the page cache and typically require aligned buffers
+ * and positions. Alignment requirements are filesystem-specific; use statx()
+ * with STATX_DIOALIGN (Linux 6.1+) to query. Unaligned O_DIRECT reads fail
+ * with EINVAL on most filesystems.
+ *
+ * For splice(2)-like zero-copy reads, consider using splice(), sendfile(),
+ * or copy_file_range() instead of read() + write().
+ *
+ * since-version: 1.0
+ */
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
return ksys_read(fd, buf, count);
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread* [RFC PATCH v5 15/15] kernel/api: add API specification for sys_write
2025-12-18 20:42 [RFC PATCH v5 00/15] Kernel API Specification Framework Sasha Levin
` (13 preceding siblings ...)
2025-12-18 20:42 ` [RFC PATCH v5 14/15] kernel/api: add API specification for sys_read Sasha Levin
@ 2025-12-18 20:42 ` Sasha Levin
14 siblings, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2025-12-18 20:42 UTC (permalink / raw)
To: linux-api; +Cc: linux-doc, linux-kernel, tools, gpaoloni, Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/read_write.c | 377 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 377 insertions(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index 422046a666b1d..685bf6b9bd3b1 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1030,6 +1030,383 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
return ret;
}
+/**
+ * sys_write - Write data to a file descriptor
+ * @fd: File descriptor to write to
+ * @buf: User-space buffer containing data to write
+ * @count: Maximum number of bytes to write
+ *
+ * long-desc: Attempts to write up to count bytes from the buffer starting at
+ * buf to the file referred to by the file descriptor fd. For seekable files
+ * (regular files, block devices), the write begins at the current file offset,
+ * and the file offset is advanced by the number of bytes written. If the file
+ * was opened with O_APPEND, the file offset is first set to the end of the
+ * file before writing. For non-seekable files (pipes, FIFOs, sockets, character
+ * devices), the file offset is not used and writing occurs at the current
+ * position as defined by the device.
+ *
+ * The number of bytes written may be less than count if, for example, there is
+ * insufficient space on the underlying physical medium, or the RLIMIT_FSIZE
+ * resource limit is encountered, or the call was interrupted by a signal
+ * handler after having written less than count bytes. In the event of a
+ * successful partial write, the caller should make another write() call to
+ * transfer the remaining bytes. This behavior is called a "short write."
+ *
+ * On Linux, write() transfers at most MAX_RW_COUNT (0x7ffff000, approximately
+ * 2GB minus one page) bytes per call, regardless of whether the file or
+ * filesystem would allow more. This prevents signed arithmetic overflow.
+ *
+ * For regular files, a successful write() does not guarantee that data has been
+ * committed to disk. Use fsync(2) or fdatasync(2) if durability is required.
+ * For O_SYNC or O_DSYNC files, the kernel automatically syncs data on write.
+ *
+ * POSIX permits writes that are interrupted after partial writes to either
+ * return -1 with errno=EINTR, or to return the count of bytes already written.
+ * Linux implements the latter behavior: if some data has been written before
+ * a signal arrives, write() returns the number of bytes written rather than
+ * failing with EINTR.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ * type: KAPI_TYPE_FD
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, INT_MAX
+ * constraint: Must be a valid, open file descriptor with write permission.
+ * The file must have been opened with O_WRONLY or O_RDWR. File descriptors
+ * opened with O_RDONLY, O_PATH, or that have been closed return EBADF.
+ * Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are valid if
+ * open and writable. AT_FDCWD and other special values are not valid.
+ *
+ * param: buf
+ * type: KAPI_TYPE_USER_PTR
+ * flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ * constraint-type: KAPI_CONSTRAINT_CUSTOM
+ * constraint: Must point to a valid, readable user-space memory region of at
+ * least count bytes. The buffer is validated via access_ok() before any
+ * write operation. NULL is invalid and returns EFAULT. For O_DIRECT writes,
+ * the buffer may need to be aligned to the filesystem's block size (varies
+ * by filesystem; query with statx() using STATX_DIOALIGN on Linux 6.1+).
+ *
+ * param: count
+ * type: KAPI_TYPE_UINT
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, SIZE_MAX
+ * constraint: Maximum number of bytes to write. Clamped internally to
+ * MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) to
+ * prevent signed overflow. A count of 0 returns 0 immediately without any
+ * file operations. Cast to ssize_t must not be negative.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_RANGE
+ * success: >= 0
+ * desc: On success, returns the number of bytes written (non-negative). Zero
+ * indicates that nothing was written (count was 0, or no space available
+ * for non-blocking writes). The return value may be less than count due to
+ * resource limits, signal interruption, or device constraints (short write).
+ * On error, returns a negative error code.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: fd is not a valid file descriptor, or fd was not opened for writing.
+ * This includes file descriptors opened with O_RDONLY, O_PATH, or file
+ * descriptors that have been closed. Also returned if the file structure
+ * does not have FMODE_WRITE or FMODE_CAN_WRITE set.
+ *
+ * error: EFAULT, Bad address
+ * desc: buf points outside the accessible address space. The buffer address
+ * failed access_ok() validation. Can also occur if a fault happens during
+ * copy_from_user() when reading data from user space.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: Returned in several cases: (1) The file descriptor refers to an
+ * object that is not suitable for writing (no write or write_iter method).
+ * (2) The file was opened with O_DIRECT and the buffer alignment, offset,
+ * or count does not meet the filesystem's alignment requirements. (3) The
+ * count argument, when cast to ssize_t, is negative. (4) For IOCB_NOWAIT
+ * operations on non-O_DIRECT files that don't support WASYNC.
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ * desc: fd refers to a file (pipe, socket, device) that is marked non-blocking
+ * (O_NONBLOCK) and the write would block because the buffer is full. Also
+ * returned with IOCB_NOWAIT when data cannot be written immediately.
+ * Equivalent to EWOULDBLOCK. The application should retry later or use
+ * select/poll/epoll to wait for writability.
+ *
+ * error: EINTR, Interrupted system call
+ * desc: The call was interrupted by a signal before any data was written. This
+ * only occurs if no data has been transferred; if some data was written
+ * before the signal, the call returns the number of bytes written. The
+ * caller should typically restart the write.
+ *
+ * error: EPIPE, Broken pipe
+ * desc: fd refers to a pipe or socket whose reading end has been closed.
+ * When this condition occurs, the calling process also receives a SIGPIPE
+ * signal unless MSG_NOSIGNAL is used (for sockets) or IOCB_NOSIGNAL is set.
+ * If the signal is caught or ignored, EPIPE is still returned.
+ *
+ * error: EFBIG, File too large
+ * desc: An attempt was made to write a file that exceeds the implementation-
+ * defined maximum file size or the file size limit (RLIMIT_FSIZE) of the
+ * process. When RLIMIT_FSIZE is exceeded, the process also receives SIGXFSZ.
+ * For files not opened with O_LARGEFILE on 32-bit systems, the limit is 2GB.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: The device containing the file has no room for the data. This can
+ * occur mid-write resulting in a short write followed by ENOSPC on retry.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's quota of disk blocks on the filesystem has been exhausted.
+ * Like ENOSPC, this can result in a short write.
+ *
+ * error: EIO, Input/output error
+ * desc: A low-level I/O error occurred while modifying the inode or writing
+ * data. This typically indicates hardware failure, filesystem corruption,
+ * or network filesystem timeout. Some data may have been written.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: The operation was prevented: (1) by a file seal (F_SEAL_WRITE or
+ * F_SEAL_FUTURE_WRITE on memfd/shmem), (2) writing to an immutable inode
+ * (IS_IMMUTABLE), (3) by an LSM hook denying the operation, or (4) by a
+ * fanotify permission event denying the write.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ * desc: The file position plus count would exceed LLONG_MAX. Also returned
+ * when the offset would exceed filesystem limits after the write.
+ *
+ * error: EDESTADDRREQ, Destination address required
+ * desc: fd is a datagram socket for which no peer address has been set using
+ * connect(2). Use sendto(2) to specify the destination address.
+ *
+ * error: ETXTBSY, Text file busy
+ * desc: The file is being used as a swap file (IS_SWAPFILE).
+ *
+ * error: EXDEV, Cross-device link
+ * desc: When writing to a pipe that has been configured as a watch queue
+ * (CONFIG_WATCH_QUEUE), direct write() calls are not supported.
+ *
+ * error: ENOMEM, Out of memory
+ * desc: Insufficient kernel memory was available for the write operation.
+ * For pipes, this occurs when allocating pages for the pipe buffer.
+ *
+ * error: ERESTARTSYS, Restart system call (internal)
+ * desc: Internal error code indicating the syscall should be restarted. This
+ * is converted to EINTR if SA_RESTART is not set on the signal handler, or
+ * the syscall is transparently restarted if SA_RESTART is set. User space
+ * should not see this error code directly.
+ *
+ * error: EACCES, Permission denied
+ * desc: The security subsystem (LSM such as SELinux or AppArmor) denied the
+ * write operation via security_file_permission(). This can occur even if
+ * the file was successfully opened.
+ *
+ * lock: file->f_pos_lock
+ * type: KAPI_LOCK_MUTEX
+ * acquired: conditional
+ * released: true
+ * desc: For regular files that require atomic position updates (FMODE_ATOMIC_POS),
+ * the f_pos_lock mutex is acquired by fdget_pos() at syscall entry and released
+ * by fdput_pos() at syscall exit. This serializes concurrent writes sharing
+ * the same file description. Not acquired for stream files (FMODE_STREAM like
+ * pipes and sockets) or when the file is not shared.
+ *
+ * lock: sb->s_writers (freeze protection)
+ * type: KAPI_LOCK_CUSTOM
+ * acquired: conditional
+ * released: true
+ * desc: For regular files, file_start_write() acquires freeze protection on
+ * the superblock via sb_start_write() before the write, and file_end_write()
+ * releases it after. This prevents writes during filesystem freeze. Not
+ * acquired for non-regular files (pipes, sockets, devices).
+ *
+ * lock: inode->i_rwsem
+ * type: KAPI_LOCK_RWLOCK
+ * acquired: conditional
+ * released: true
+ * desc: For regular files using generic_file_write_iter(), the inode's i_rwsem
+ * is acquired in write mode before modifying file data. This is internal to
+ * the filesystem and released before return. Not all filesystems use this
+ * pattern.
+ *
+ * lock: pipe->mutex
+ * type: KAPI_LOCK_MUTEX
+ * acquired: conditional
+ * released: true
+ * desc: For pipes and FIFOs, the pipe's mutex is held while modifying pipe
+ * buffers. Released temporarily while waiting for space, then reacquired.
+ *
+ * lock: RCU read-side
+ * type: KAPI_LOCK_RCU
+ * acquired: conditional
+ * released: true
+ * desc: Used during file descriptor lookup via fdget(). RCU read lock protects
+ * access to the file descriptor table. Released by fdput() at syscall exit.
+ *
+ * signal: SIGPIPE
+ * direction: KAPI_SIGNAL_SEND
+ * action: KAPI_SIGNAL_ACTION_TERMINATE
+ * condition: Writing to a pipe or socket with no readers
+ * desc: When writing to a pipe whose read end is closed, or a socket whose
+ * peer has closed, SIGPIPE is sent to the calling process. The default
+ * action terminates the process. Use signal(SIGPIPE, SIG_IGN) or set
+ * IOCB_NOSIGNAL/MSG_NOSIGNAL to suppress. EPIPE is returned regardless.
+ * timing: KAPI_SIGNAL_TIME_DURING
+ *
+ * signal: SIGXFSZ
+ * direction: KAPI_SIGNAL_SEND
+ * action: KAPI_SIGNAL_ACTION_COREDUMP
+ * condition: Writing exceeds RLIMIT_FSIZE
+ * desc: When a write would exceed the soft file size limit (RLIMIT_FSIZE),
+ * SIGXFSZ is sent. The default action terminates with a core dump. The
+ * write returns EFBIG. If RLIMIT_FSIZE is RLIM_INFINITY, no signal is sent.
+ * timing: KAPI_SIGNAL_TIME_DURING
+ *
+ * signal: Any signal
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RETURN
+ * condition: While blocked waiting for space (pipes, sockets)
+ * desc: The syscall may be interrupted by signals while waiting for buffer
+ * space to become available. If interrupted before any data is written,
+ * returns -EINTR or -ERESTARTSYS. If data was already written, returns the
+ * byte count. Restartable if SA_RESTART is set and no data was written.
+ * error: -EINTR
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_FILE_POSITION
+ * target: file->f_pos
+ * condition: For seekable files when write succeeds (returns > 0)
+ * desc: The file offset (f_pos) is advanced by the number of bytes written.
+ * For files opened with O_APPEND, f_pos is first set to file size. For
+ * stream files (FMODE_STREAM such as pipes and sockets), the offset is not
+ * used or modified. Position updates are protected by f_pos_lock when
+ * shared.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: inode timestamps (mtime, ctime)
+ * condition: When write succeeds (returns > 0)
+ * desc: Updates the file's modification time (mtime) and change time (ctime)
+ * via file_update_time(). The update precision depends on filesystem mount
+ * options (fine-grained timestamps for multigrain inodes).
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: SUID/SGID bits (mode)
+ * condition: When writing to a setuid/setgid file
+ * desc: The SUID bit is cleared when a non-root user writes to a file with
+ * the bit set. The SGID bit may also be cleared. This is a security feature
+ * to prevent privilege escalation via modified setuid binaries. Done via
+ * file_remove_privs().
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: file data
+ * condition: When write succeeds (returns > 0)
+ * desc: Modifies the file's data content. For regular files, data is written
+ * to the page cache (buffered I/O) or directly to storage (O_DIRECT).
+ * Data may not be persistent until fsync() is called or the file is closed.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: task I/O accounting
+ * condition: Always
+ * desc: Updates the current task's I/O accounting statistics. The wchar field
+ * (write characters) is incremented by bytes written via add_wchar(). The
+ * syscw field (syscall write count) is incremented via inc_syscw(). These
+ * statistics are visible in /proc/[pid]/io.
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: fsnotify events
+ * condition: When write returns > 0
+ * desc: Generates an FS_MODIFY fsnotify event via fsnotify_modify(), allowing
+ * inotify, fanotify, and dnotify watchers to be notified of the write.
+ *
+ * capability: CAP_DAC_OVERRIDE
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypass discretionary access control on write permission
+ * without: Standard DAC checks are enforced
+ * condition: Checked via security_file_permission() during rw_verify_area()
+ *
+ * capability: CAP_FOWNER
+ * type: KAPI_CAP_BYPASS_CHECK
+ * allows: Bypass ownership checks for SUID/SGID clearing
+ * without: SUID/SGID bits are cleared on write by non-owner
+ * condition: Checked during file_remove_privs()
+ *
+ * constraint: MAX_RW_COUNT
+ * desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MAX &
+ * PAGE_MASK, approximately 2GB minus one page) to prevent integer overflow
+ * in internal calculations. This is transparent to the caller.
+ * expr: actual_count = min(count, MAX_RW_COUNT)
+ *
+ * constraint: File must be open for writing
+ * desc: The file descriptor must have been opened with O_WRONLY or O_RDWR.
+ * Files opened with O_RDONLY or O_PATH cannot be written and return EBADF.
+ * The file must have both FMODE_WRITE and FMODE_CAN_WRITE flags set.
+ * expr: (file->f_mode & FMODE_WRITE) && (file->f_mode & FMODE_CAN_WRITE)
+ *
+ * constraint: RLIMIT_FSIZE
+ * desc: The size of data written is constrained by the RLIMIT_FSIZE resource
+ * limit. If writing would exceed this limit, SIGXFSZ is sent and EFBIG is
+ * returned. The limit does not apply to files beyond the limit - only to
+ * writes that would cross it.
+ * expr: pos + count <= rlimit(RLIMIT_FSIZE) || rlimit(RLIMIT_FSIZE) == RLIM_INFINITY
+ *
+ * constraint: File seals
+ * desc: For memfd or shmem files with F_SEAL_WRITE or F_SEAL_FUTURE_WRITE
+ * seals applied, all write operations fail with EPERM. With F_SEAL_GROW,
+ * writes that would extend file size fail with EPERM.
+ *
+ * examples: n = write(fd, buf, sizeof(buf)); // Basic write
+ * n = write(STDOUT_FILENO, msg, strlen(msg)); // Write to stdout
+ * while (total < len) { n = write(fd, buf+total, len-total); if (n<0) break; total += n; } // Handle short writes
+ * if (write(pipefd[1], &byte, 1) < 0 && errno == EPIPE) { handle_broken_pipe(); } // Pipe error handling
+ *
+ * notes: The behavior of write() varies significantly depending on the type of
+ * file descriptor:
+ *
+ * - Regular files: Writes to the page cache (buffered) or directly to storage
+ * (O_DIRECT). Short writes are rare except near RLIMIT_FSIZE or disk full.
+ * O_APPEND is atomic for determining write position.
+ *
+ * - Pipes and FIFOs: Blocking by default. Writes up to PIPE_BUF (4096 bytes
+ * on Linux) are guaranteed atomic. Larger writes may be interleaved with
+ * writes from other processes. Blocks if pipe is full; returns EAGAIN with
+ * O_NONBLOCK. SIGPIPE/EPIPE if no readers.
+ *
+ * - Sockets: Behavior depends on socket type and protocol. Stream sockets
+ * (TCP) may return partial writes. Datagram sockets (UDP) typically write
+ * complete messages or fail. SIGPIPE/EPIPE for broken connections (unless
+ * MSG_NOSIGNAL). EDESTADDRREQ for unconnected datagram sockets.
+ *
+ * - Terminals: May block on flow control. Canonical vs raw mode affects
+ * behavior. Special characters may be interpreted.
+ *
+ * - Device special files: Behavior is device-specific. Block devices behave
+ * similarly to regular files. Character device behavior varies.
+ *
+ * Race condition considerations: Concurrent writes from threads sharing a
+ * file description race on the file position. Linux 3.14+ provides atomic
+ * position updates via f_pos_lock for regular files (FMODE_ATOMIC_POS), but
+ * for maximum safety, use pwrite() for concurrent positioned writes.
+ *
+ * O_DIRECT writes bypass the page cache and typically require buffer and
+ * offset alignment to filesystem block size. Query requirements via statx()
+ * with STATX_DIOALIGN (Linux 6.1+). Unaligned O_DIRECT writes return EINVAL
+ * on most filesystems.
+ *
+ * For zero-copy writes, consider using splice(2), sendfile(2), or vmsplice(2)
+ * instead of copying data through user-space buffers with write().
+ *
+ * Partial writes (short writes) must be handled by application code.
+ * Applications should loop until all data is written or an error occurs.
+ *
+ * since-version: 1.0
+ */
SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
size_t, count)
{
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread