From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kbuild@vger.kernel.org, linux-kselftest@vger.kernel.org,
workflows@vger.kernel.org, tools@kernel.org, x86@kernel.org,
Thomas Gleixner <tglx@kernel.org>,
"Paul E . McKenney" <paulmck@kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Dmitry Vyukov <dvyukov@google.com>,
Randy Dunlap <rdunlap@infradead.org>,
Cyril Hrubis <chrubis@suse.cz>, Kees Cook <kees@kernel.org>,
Jake Edge <jake@lwn.net>,
David Laight <david.laight.linux@gmail.com>,
Askar Safin <safinaskar@zohomail.com>,
Gabriele Paoloni <gpaoloni@redhat.com>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Christian Brauner <brauner@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Masahiro Yamada <masahiroy@kernel.org>,
Shuah Khan <skhan@linuxfoundation.org>,
Ingo Molnar <mingo@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH v2 6/9] kernel/api: add API specification for sys_close
Date: Sun, 22 Mar 2026 08:10:20 -0400 [thread overview]
Message-ID: <20260322121026.869758-7-sashal@kernel.org> (raw)
In-Reply-To: <20260322121026.869758-1-sashal@kernel.org>
Add KAPI-annotated kerneldoc for the sys_close system call in fs/open.c.
The specification documents the file descriptor parameter, error
conditions, locking requirements, side effects on pending I/O, and
the close-on-exec relationship.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/open.c | 238 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 234 insertions(+), 4 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 8e805233a277b..cf74912d15eb5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1808,10 +1808,240 @@ int filp_close(struct file *filp, fl_owner_t id)
}
EXPORT_SYMBOL(filp_close);
-/*
- * Careful here! We test whether the file pointer is NULL before
- * releasing the fd. This ensures that one clone task can't release
- * an fd while another clone is opening it.
+/**
+ * sys_close - Close a file descriptor
+ * @fd: The file descriptor to close
+ *
+ * long-desc: Terminates access to an open file descriptor, releasing the file
+ * descriptor for reuse by subsequent open(), dup(), or similar syscalls. Any
+ * advisory record locks (POSIX locks, OFD locks, and flock locks) held on the
+ * associated file are released. When this is the last file descriptor
+ * referring to the underlying open file description, associated resources are
+ * freed. If the file was previously unlinked, the file itself is deleted when
+ * the last reference is closed.
+ *
+ * CRITICAL: The file descriptor is ALWAYS closed, even when close() returns
+ * an error. This differs from POSIX semantics where the state of the file
+ * descriptor is unspecified after EINTR. On Linux, the fd is released early
+ * in close() processing before flush operations that may fail. Therefore,
+ * retrying close() after an error return is DANGEROUS and may close an
+ * unrelated file descriptor that was assigned to another thread.
+ *
+ * Errors returned from close() (EIO, ENOSPC, EDQUOT) indicate that the final
+ * flush of buffered data failed. These errors commonly occur on network
+ * filesystems like NFS when write errors are deferred to close time. A
+ * successful return from close() does NOT guarantee that data has been
+ * successfully written to disk; the kernel uses buffer cache to defer writes.
+ * To ensure data persistence, call fsync() before close().
+ *
+ * On close, the following cleanup operations are performed: POSIX advisory
+ * locks are removed, dnotify registrations are cleaned up, the file is
+ * flushed to storage if applicable, and the file
+ * reference is released. If this was the last reference, additional cleanup
+ * includes: fsnotify close notification, epoll cleanup, flock and lease
+ * removal, FASYNC cleanup, and the file structure deallocation.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ * type: KAPI_TYPE_FD
+ * flags: KAPI_PARAM_IN
+ * constraint-type: KAPI_CONSTRAINT_RANGE
+ * range: 0, INT_MAX
+ * cdesc: Must be a valid, open file descriptor for the current process.
+ * The value 0, 1, or 2 (stdin, stdout, stderr) may be closed like any other
+ * fd, though this is unusual and may cause issues with libraries that assume
+ * these descriptors are valid. The parameter is unsigned int to match kernel
+ * file descriptor table indexing, but values exceeding INT_MAX are effectively
+ * invalid due to internal checks.
+ *
+ * return:
+ * type: KAPI_TYPE_INT
+ * check-type: KAPI_RETURN_EXACT
+ * success: 0
+ * desc: Returns 0 on success. On error, returns a negative error code.
+ * IMPORTANT: Even when an error is returned, the file descriptor is still
+ * closed and must not be used again. The error indicates a problem with
+ * the final flush operation, not that the fd remains open.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: The file descriptor fd is not a valid open file descriptor, or was
+ * already closed. This is the only error that indicates the fd was NOT
+ * closed (because it was never open to begin with). Occurs when fd is out
+ * of range, has no file assigned, or was already closed.
+ *
+ * error: EINTR, Interrupted system call
+ * desc: The flush operation was interrupted by a signal before completion.
+ * This occurs when the close-time flush operation (e.g., on NFS) performs an
+ * interruptible wait that receives a signal. IMPORTANT: Despite this error,
+ * the file descriptor IS closed and must not be used again. This error
+ * is generated by converting kernel-internal restart codes (ERESTARTSYS,
+ * ERESTARTNOINTR, ERESTARTNOHAND, ERESTART_RESTARTBLOCK) to EINTR because
+ * restarting the syscall would be incorrect once the fd is freed.
+ *
+ * error: EIO, I/O error
+ * desc: An I/O error occurred during the flush of buffered data to the
+ * underlying storage. This typically indicates a hardware error, network
+ * failure on NFS, or other storage system error. The file descriptor is
+ * still closed. Previously buffered write data may have been lost.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: There was insufficient space on the storage device to flush buffered
+ * writes. This is common on NFS when the server runs out of space between
+ * write() and close(). The file descriptor is still closed.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's disk quota was exceeded while attempting to flush buffered
+ * writes. Common on NFS when quota is exceeded between write() and close().
+ * The file descriptor is still closed.
+ *
+ * lock: files->file_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: Acquired via file_close_fd() to atomically lookup and remove the fd
+ * from the file descriptor table. Held only during the table manipulation;
+ * released before flush and final cleanup operations. This ensures that
+ * another thread cannot allocate the same fd number while close is in
+ * progress.
+ *
+ * lock: file->f_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: Acquired during epoll cleanup (eventpoll_release_file) and dnotify
+ * cleanup to safely unlink the file from monitoring structures. May also
+ * be acquired during lock context operations.
+ *
+ * lock: ep->mtx
+ * type: KAPI_LOCK_MUTEX
+ * acquired: true
+ * released: true
+ * desc: Acquired during epoll cleanup if the file was monitored by epoll.
+ * Used to safely remove the file from epoll interest lists.
+ *
+ * lock: flc_lock
+ * type: KAPI_LOCK_SPINLOCK
+ * acquired: true
+ * released: true
+ * desc: File lock context spinlock, acquired during locks_remove_file() to
+ * safely remove POSIX, flock, and lease locks associated with the file.
+ *
+ * signal: pending_signals
+ * direction: KAPI_SIGNAL_RECEIVE
+ * action: KAPI_SIGNAL_ACTION_RETURN
+ * condition: When close-time flush performs interruptible wait
+ * desc: If the close-time flush operation (e.g., on NFS) performs an
+ * interruptible wait and a signal is pending, the wait is interrupted.
+ * Any kernel restart codes are converted to EINTR since close cannot be
+ * restarted after the fd is freed.
+ * error: -EINTR
+ * timing: KAPI_SIGNAL_TIME_DURING
+ * restartable: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY | KAPI_EFFECT_IRREVERSIBLE
+ * target: File descriptor table entry
+ * desc: The file descriptor is removed from the process's file descriptor
+ * table, making the fd number available for reuse by subsequent open(),
+ * dup(), or similar calls. This occurs BEFORE any flush or cleanup that
+ * might fail, making the operation irreversible regardless of return value.
+ * condition: Always (when fd is valid)
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_LOCK_RELEASE
+ * target: POSIX advisory locks, OFD locks, flock locks
+ * desc: All advisory locks held on the file by this process are removed.
+ * POSIX locks are removed via locks_remove_posix() during filp_flush().
+ * All lock types (POSIX, OFD, flock) are removed via locks_remove_file()
+ * during __fput() when this is the last reference.
+ * condition: File has FMODE_OPENED and !(FMODE_PATH)
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY
+ * target: File leases
+ * desc: Any file leases held on the file are removed during locks_remove_file()
+ * when this is the last reference to the open file description.
+ * condition: File had leases and this is the last close
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: dnotify registrations
+ * desc: Directory notification (dnotify) registrations associated with this
+ * file are cleaned up via dnotify_flush(). This only applies to directories.
+ * condition: File is a directory with dnotify registrations
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ * target: epoll interest lists
+ * desc: If the file was being monitored by epoll instances, it is removed
+ * from those interest lists via eventpoll_release().
+ * condition: File was added to epoll instances
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: Buffered data
+ * desc: Any buffered data is flushed if applicable (e.g., on NFS). This
+ * attempts to write any buffered data to storage
+ * and may return errors (EIO, ENOSPC, EDQUOT) if the flush fails. The
+ * success of this flush is NOT guaranteed even with a 0 return; use
+ * fsync() before close() to ensure data persistence.
+ * condition: File was opened for writing and has buffered data
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ * target: struct file and related structures
+ * desc: When this is the last reference to the file, the file structure is
+ * freed and the dentry and mount references are released.
+ * condition: This is the last reference to the file
+ * reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ * target: Unlinked file deletion
+ * desc: If the file was previously unlinked (deleted) but kept open, closing
+ * the last reference causes the actual file data to be removed from the
+ * filesystem and the inode to be freed.
+ * condition: File was unlinked and this is the last reference
+ * reversible: no
+ *
+ * state-trans: file_descriptor
+ * from: open
+ * to: closed/free
+ * condition: Valid fd passed to close
+ * desc: The file descriptor transitions from open (usable) to closed (invalid).
+ * The fd number becomes available for reuse. This transition occurs early
+ * in close() processing, before any operations that might fail.
+ *
+ * state-trans: file_reference_count
+ * from: n
+ * to: n-1 (or freed if n was 1)
+ * condition: Always on successful fd lookup
+ * desc: The file's reference count is decremented. If this was the last
+ * reference, the file is fully cleaned up and freed.
+ *
+ * constraint: File Descriptor Reuse Race
+ * desc: Because the fd is freed early in close() processing, another thread
+ * may receive the same fd number from a concurrent open() before close()
+ * returns. Applications must not retry close() after an error return, as
+ * this could close an unrelated file opened by another thread.
+ * expr: After close(fd) returns (even with error), fd is invalid
+ *
+ * examples: close(fd); // Basic usage - ignore errors (common but not ideal)
+ * if (close(fd) == -1) perror("close"); // Log errors for debugging
+ * fsync(fd); close(fd); // Ensure data persistence before closing
+ *
+ * notes: The fd is always freed regardless of the return value. POSIX
+ * specifies that on EINTR the state of the fd is unspecified, but Linux
+ * always closes it. Retrying close() after an error may close an unrelated
+ * fd that was reassigned by another thread, so callers should never retry.
+ *
+ * Error codes like EIO, ENOSPC, and EDQUOT indicate that previously buffered
+ * writes may have failed to reach storage. These errors are particularly
+ * common on NFS where write errors are often deferred to close time.
+ *
+ * Calling close() on a file descriptor while another thread is using it
+ * (e.g., in a blocking read() or write()) does not interrupt the blocked
+ * operation. The blocked operation continues on the underlying file and
+ * may complete even after close() returns.
*/
SYSCALL_DEFINE1(close, unsigned int, fd)
{
--
2.51.0
next prev parent reply other threads:[~2026-03-22 12:10 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-22 12:10 [PATCH v2 0/9] Kernel API Specification Framework Sasha Levin
2026-03-22 12:10 ` [PATCH v2 1/9] kernel/api: introduce kernel API specification framework Sasha Levin
2026-03-22 12:10 ` [PATCH v2 2/9] kernel/api: enable kerneldoc-based API specifications Sasha Levin
2026-03-22 12:10 ` [PATCH v2 3/9] kernel/api: add debugfs interface for kernel " Sasha Levin
2026-03-23 13:52 ` Greg Kroah-Hartman
2026-03-23 23:58 ` Sasha Levin
2026-03-24 8:20 ` Greg Kroah-Hartman
2026-03-24 11:33 ` Sasha Levin
2026-03-24 11:45 ` Greg Kroah-Hartman
2026-03-24 9:49 ` Mauro Carvalho Chehab
2026-03-22 12:10 ` [PATCH v2 4/9] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2026-03-22 12:10 ` [PATCH v2 5/9] kernel/api: add API specification for sys_open Sasha Levin
2026-03-22 12:10 ` Sasha Levin [this message]
2026-03-22 12:10 ` [PATCH v2 7/9] kernel/api: add API specification for sys_read Sasha Levin
2026-03-22 12:10 ` [PATCH v2 8/9] kernel/api: add API specification for sys_write Sasha Levin
2026-03-22 12:10 ` [PATCH v2 9/9] kernel/api: add runtime verification selftest Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260322121026.869758-7-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=chrubis@suse.cz \
--cc=corbet@lwn.net \
--cc=david.laight.linux@gmail.com \
--cc=dvyukov@google.com \
--cc=gpaoloni@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=jake@lwn.net \
--cc=kees@kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kbuild@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=masahiroy@kernel.org \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=rdunlap@infradead.org \
--cc=safinaskar@zohomail.com \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=tools@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=workflows@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.