* [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd
@ 2025-03-05 12:36 Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 1/4] file: add fput and file_ref_put routines optimized for use " Mateusz Guzik
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Mateusz Guzik @ 2025-03-05 12:36 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
The stock kernel transitioning the file to no refs held penalizes the
caller with an extra atomic to block any increments.
For cases where the file is highly likely to be going away this is
easily avoidable.
In the open+close case the win is very modest because of the following
problems:
- kmem and memcg having terrible performance
- putname using an atomic (I have a wip to whack that)
- open performing an extra ref/unref on the dentry (there are patches to
do it, including by Al. I mailed about them in [1])
- creds using atomics (I have a wip to whack that)
- apparmor using atomics (ditto, same mechanism)
On top of that I have a WIP patch to dodge some of the work at lookup
itself.
All in all there is several % avoidably lost here.
stats colected during a kernel build with:
bpftrace -e 'kprobe:filp_close,kprobe:fput,kprobe:fput_close* { @[probe] = hist(((struct file *)arg0)->f_ref.refcnt.counter > 0); }'
@[kprobe:filp_close]:
[0] 32195 |@@@@@@@@@@ |
[1] 164567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@[kprobe:fput]:
[0] 339240 |@@@@@@ |
[1] 2888064 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@[kprobe:fput_close]:
[0] 5116767 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1] 164544 |@ |
@[kprobe:fput_close_sync]:
[0] 5340660 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1] 358943 |@@@ |
0 indicates the last reference, 1 that there is more.
filp_close is largely skewed because of close_on_exec.
vast majority of last fputs are from remove_vma. I think that code wants
to be patched to batch them (as in something like fput_many should be
added -- something for later).
[1] https://lore.kernel.org/linux-fsdevel/20250304165728.491785-1-mjguzik@gmail.com/T/#u
v3:
- inline file_ref_put_close
- unexport the new routines and move their declaration to fs/internal.h
- s/__fput_defer_free/__fput_deferred/
v2:
- patch filp_close
- patch failing open
Mateusz Guzik (4):
file: add fput and file_ref_put routines optimized for use when
closing a fd
fs: use fput_close_sync() in close()
fs: use fput_close() in filp_close()
fs: use fput_close() in path_openat()
fs/file.c | 41 ++++++++++++-----------
fs/file_table.c | 70 ++++++++++++++++++++++++++++------------
fs/internal.h | 3 ++
fs/namei.c | 2 +-
fs/open.c | 4 +--
include/linux/file_ref.h | 34 +++++++++++++++++++
6 files changed, 112 insertions(+), 42 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/4] file: add fput and file_ref_put routines optimized for use when closing a fd
2025-03-05 12:36 [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Mateusz Guzik
@ 2025-03-05 12:36 ` Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 2/4] fs: use fput_close_sync() in close() Mateusz Guzik
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2025-03-05 12:36 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
Vast majority of the time closing a file descriptor also operates on the
last reference, where a regular fput usage will result in 2 atomics.
This can be changed to only suffer 1.
See commentary above file_ref_put_close() for more information.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/file.c | 41 ++++++++++++-----------
fs/file_table.c | 70 ++++++++++++++++++++++++++++------------
fs/internal.h | 3 ++
include/linux/file_ref.h | 34 +++++++++++++++++++
4 files changed, 109 insertions(+), 39 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index 44efdc8c1e27..6c159ede55f1 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -26,6 +26,28 @@
#include "internal.h"
+bool __file_ref_put_badval(file_ref_t *ref, unsigned long cnt)
+{
+ /*
+ * If the reference count was already in the dead zone, then this
+ * put() operation is imbalanced. Warn, put the reference count back to
+ * DEAD and tell the caller to not deconstruct the object.
+ */
+ if (WARN_ONCE(cnt >= FILE_REF_RELEASED, "imbalanced put on file reference count")) {
+ atomic_long_set(&ref->refcnt, FILE_REF_DEAD);
+ return false;
+ }
+
+ /*
+ * This is a put() operation on a saturated refcount. Restore the
+ * mean saturation value and tell the caller to not deconstruct the
+ * object.
+ */
+ if (cnt > FILE_REF_MAXREF)
+ atomic_long_set(&ref->refcnt, FILE_REF_SATURATED);
+ return false;
+}
+
/**
* __file_ref_put - Slowpath of file_ref_put()
* @ref: Pointer to the reference count
@@ -67,24 +89,7 @@ bool __file_ref_put(file_ref_t *ref, unsigned long cnt)
return true;
}
- /*
- * If the reference count was already in the dead zone, then this
- * put() operation is imbalanced. Warn, put the reference count back to
- * DEAD and tell the caller to not deconstruct the object.
- */
- if (WARN_ONCE(cnt >= FILE_REF_RELEASED, "imbalanced put on file reference count")) {
- atomic_long_set(&ref->refcnt, FILE_REF_DEAD);
- return false;
- }
-
- /*
- * This is a put() operation on a saturated refcount. Restore the
- * mean saturation value and tell the caller to not deconstruct the
- * object.
- */
- if (cnt > FILE_REF_MAXREF)
- atomic_long_set(&ref->refcnt, FILE_REF_SATURATED);
- return false;
+ return __file_ref_put_badval(ref, cnt);
}
EXPORT_SYMBOL_GPL(__file_ref_put);
diff --git a/fs/file_table.c b/fs/file_table.c
index 5c00dc38558d..5884cc659ea4 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -511,31 +511,37 @@ void flush_delayed_fput(void)
}
EXPORT_SYMBOL_GPL(flush_delayed_fput);
-void fput(struct file *file)
+static void __fput_deferred(struct file *file)
{
- if (file_ref_put(&file->f_ref)) {
- struct task_struct *task = current;
+ struct task_struct *task = current;
- if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
- file_free(file);
+ if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
+ file_free(file);
+ return;
+ }
+ if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
+ init_task_work(&file->f_task_work, ____fput);
+ if (!task_work_add(task, &file->f_task_work, TWA_RESUME))
return;
- }
- if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
- init_task_work(&file->f_task_work, ____fput);
- if (!task_work_add(task, &file->f_task_work, TWA_RESUME))
- return;
- /*
- * After this task has run exit_task_work(),
- * task_work_add() will fail. Fall through to delayed
- * fput to avoid leaking *file.
- */
- }
-
- if (llist_add(&file->f_llist, &delayed_fput_list))
- schedule_delayed_work(&delayed_fput_work, 1);
+ /*
+ * After this task has run exit_task_work(),
+ * task_work_add() will fail. Fall through to delayed
+ * fput to avoid leaking *file.
+ */
}
+
+ if (llist_add(&file->f_llist, &delayed_fput_list))
+ schedule_delayed_work(&delayed_fput_work, 1);
}
+void fput(struct file *file)
+{
+ if (unlikely(file_ref_put(&file->f_ref))) {
+ __fput_deferred(file);
+ }
+}
+EXPORT_SYMBOL(fput);
+
/*
* synchronous analog of fput(); for kernel threads that might be needed
* in some umount() (and thus can't use flush_delayed_fput() without
@@ -549,10 +555,32 @@ void __fput_sync(struct file *file)
if (file_ref_put(&file->f_ref))
__fput(file);
}
-
-EXPORT_SYMBOL(fput);
EXPORT_SYMBOL(__fput_sync);
+/*
+ * Equivalent to __fput_sync(), but optimized for being called with the last
+ * reference.
+ *
+ * See file_ref_put_close() for details.
+ */
+void fput_close_sync(struct file *file)
+{
+ if (likely(file_ref_put_close(&file->f_ref)))
+ __fput(file);
+}
+
+/*
+ * Equivalent to fput(), but optimized for being called with the last
+ * reference.
+ *
+ * See file_ref_put_close() for details.
+ */
+void fput_close(struct file *file)
+{
+ if (file_ref_put_close(&file->f_ref))
+ __fput_deferred(file);
+}
+
void __init files_init(void)
{
struct kmem_cache_args args = {
diff --git a/fs/internal.h b/fs/internal.h
index 3d05a989e4fa..3b3f315394ba 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -118,6 +118,9 @@ static inline void put_file_access(struct file *file)
}
}
+void fput_close_sync(struct file *);
+void fput_close(struct file *);
+
/*
* super.c
*/
diff --git a/include/linux/file_ref.h b/include/linux/file_ref.h
index 9b3a8d9b17ab..6ef92d765a66 100644
--- a/include/linux/file_ref.h
+++ b/include/linux/file_ref.h
@@ -61,6 +61,7 @@ static inline void file_ref_init(file_ref_t *ref, unsigned long cnt)
atomic_long_set(&ref->refcnt, cnt - 1);
}
+bool __file_ref_put_badval(file_ref_t *ref, unsigned long cnt);
bool __file_ref_put(file_ref_t *ref, unsigned long cnt);
/**
@@ -160,6 +161,39 @@ static __always_inline __must_check bool file_ref_put(file_ref_t *ref)
return __file_ref_put(ref, cnt);
}
+/**
+ * file_ref_put_close - drop a reference expecting it would transition to FILE_REF_NOREF
+ * @ref: Pointer to the reference count
+ *
+ * Semantically it is equivalent to calling file_ref_put(), but it trades lower
+ * performance in face of other CPUs also modifying the refcount for higher
+ * performance when this happens to be the last reference.
+ *
+ * For the last reference file_ref_put() issues 2 atomics. One to drop the
+ * reference and another to transition it to FILE_REF_DEAD. This routine does
+ * the work in one step, but in order to do it has to pre-read the variable which
+ * decreases scalability.
+ *
+ * Use with close() et al, stick to file_ref_put() by default.
+ */
+static __always_inline __must_check bool file_ref_put_close(file_ref_t *ref)
+{
+ long old, new;
+
+ old = atomic_long_read(&ref->refcnt);
+ do {
+ if (unlikely(old < 0))
+ return __file_ref_put_badval(ref, old);
+
+ if (old == FILE_REF_ONEREF)
+ new = FILE_REF_DEAD;
+ else
+ new = old - 1;
+ } while (!atomic_long_try_cmpxchg(&ref->refcnt, &old, new));
+
+ return new == FILE_REF_DEAD;
+}
+
/**
* file_ref_read - Read the number of file references
* @ref: Pointer to the reference count
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/4] fs: use fput_close_sync() in close()
2025-03-05 12:36 [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 1/4] file: add fput and file_ref_put routines optimized for use " Mateusz Guzik
@ 2025-03-05 12:36 ` Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 3/4] fs: use fput_close() in filp_close() Mateusz Guzik
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2025-03-05 12:36 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
This bumps open+close rate by 1% on Sapphire Rapids by eliding one
atomic.
It would be higher if it was not for several other slowdowns of the same
nature.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/open.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/open.c b/fs/open.c
index a5def5611b2f..f2fcfaeb2232 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1577,7 +1577,7 @@ SYSCALL_DEFINE1(close, unsigned int, fd)
* We're returning to user space. Don't bother
* with any delayed fput() cases.
*/
- __fput_sync(file);
+ fput_close_sync(file);
if (likely(retval == 0))
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/4] fs: use fput_close() in filp_close()
2025-03-05 12:36 [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 1/4] file: add fput and file_ref_put routines optimized for use " Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 2/4] fs: use fput_close_sync() in close() Mateusz Guzik
@ 2025-03-05 12:36 ` Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 4/4] fs: use fput_close() in path_openat() Mateusz Guzik
2025-03-05 17:31 ` [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Christian Brauner
4 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2025-03-05 12:36 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
When tracing a kernel build over refcounts seen this is a wash:
@[kprobe:filp_close]:
[0] 32195 |@@@@@@@@@@ |
[1] 164567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
I verified vast majority of the skew comes from do_close_on_exec() which
could be changed to use a different variant instead.
Even without changing that, the 19.5% of calls which got here still can
save the extra atomic. Calls here are borderline non-existent compared
to fput (over 3.2 mln!), so they should not negatively affect
scalability.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/open.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/open.c b/fs/open.c
index f2fcfaeb2232..bdbf03f799a1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1551,7 +1551,7 @@ int filp_close(struct file *filp, fl_owner_t id)
int retval;
retval = filp_flush(filp, id);
- fput(filp);
+ fput_close(filp);
return retval;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 4/4] fs: use fput_close() in path_openat()
2025-03-05 12:36 [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Mateusz Guzik
` (2 preceding siblings ...)
2025-03-05 12:36 ` [PATCH v3 3/4] fs: use fput_close() in filp_close() Mateusz Guzik
@ 2025-03-05 12:36 ` Mateusz Guzik
2025-03-05 17:31 ` [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Christian Brauner
4 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2025-03-05 12:36 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
This bumps failing open rate by 1.7% on Sapphire Rapids by avoiding one
atomic.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/namei.c b/fs/namei.c
index d00443e38d3a..06765d320e7e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4005,7 +4005,7 @@ static struct file *path_openat(struct nameidata *nd,
WARN_ON(1);
error = -EINVAL;
}
- fput(file);
+ fput_close(file);
if (error == -EOPENSTALE) {
if (flags & LOOKUP_RCU)
error = -ECHILD;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd
2025-03-05 12:36 [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Mateusz Guzik
` (3 preceding siblings ...)
2025-03-05 12:36 ` [PATCH v3 4/4] fs: use fput_close() in path_openat() Mateusz Guzik
@ 2025-03-05 17:31 ` Christian Brauner
4 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2025-03-05 17:31 UTC (permalink / raw)
To: Mateusz Guzik; +Cc: Christian Brauner, viro, jack, linux-kernel, linux-fsdevel
On Wed, 05 Mar 2025 13:36:40 +0100, Mateusz Guzik wrote:
> The stock kernel transitioning the file to no refs held penalizes the
> caller with an extra atomic to block any increments.
>
> For cases where the file is highly likely to be going away this is
> easily avoidable.
>
> In the open+close case the win is very modest because of the following
> problems:
> - kmem and memcg having terrible performance
> - putname using an atomic (I have a wip to whack that)
> - open performing an extra ref/unref on the dentry (there are patches to
> do it, including by Al. I mailed about them in [1])
> - creds using atomics (I have a wip to whack that)
> - apparmor using atomics (ditto, same mechanism)
>
> [...]
Applied to the vfs-6.15.file branch of the vfs/vfs.git tree.
Patches in the vfs-6.15.file branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.15.file
[1/4] file: add fput and file_ref_put routines optimized for use when closing a fd
https://git.kernel.org/vfs/vfs/c/e83588458f65
[2/4] fs: use fput_close_sync() in close()
https://git.kernel.org/vfs/vfs/c/3e46a92a27c2
[3/4] fs: use fput_close() in filp_close()
https://git.kernel.org/vfs/vfs/c/a914bd93f3ed
[4/4] fs: use fput_close() in path_openat()
https://git.kernel.org/vfs/vfs/c/606623de503f
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-05 17:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-05 12:36 [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 1/4] file: add fput and file_ref_put routines optimized for use " Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 2/4] fs: use fput_close_sync() in close() Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 3/4] fs: use fput_close() in filp_close() Mateusz Guzik
2025-03-05 12:36 ` [PATCH v3 4/4] fs: use fput_close() in path_openat() Mateusz Guzik
2025-03-05 17:31 ` [RFC PATCH v3 0/4] avoid the extra atomic on a ref when closing a fd Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox