From: Max Kellermann <max.kellermann@ionos.com>
To: idryomov@gmail.com, amarkuze@redhat.com,
ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Max Kellermann <max.kellermann@ionos.com>
Subject: [PATCH] fs/ceph/caps: skip __touch_cap() most of the time
Date: Fri, 3 Jul 2026 13:32:52 +0200 [thread overview]
Message-ID: <20260703113253.3635335-1-max.kellermann@ionos.com> (raw)
__touch_cap() moves one capability to the end of the LRU list; this
list is sorted by access time for just one thing: ceph_trim_caps().
That function is supposed to discard the least-recently used
capabilities.
__touch_cap() is called extremely often - several times for every
system call, but ceph_trim_caps() is only called rarely.
__touch_cap() causes considerable lock contention on
`ceph_mds_session.s_cap_lock`; this is a /proc/lock_stat I captured on
one of our web servers for 5 minutes:
class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
&s->s_cap_lock: 336304046 341686597 0.04 4905.76 418498578.76 1.22 892783632 1957814739 0.04 959.40 355752146.24 0.18
--------------
&s->s_cap_lock 339379730 [<00000000a2197200>] __ceph_caps_issued_mask+0x1bc/0x240
&s->s_cap_lock 1268054 [<00000000c96a24b7>] ceph_add_cap+0x234/0x3e0
&s->s_cap_lock 1021360 [<00000000aa76f996>] ceph_add_cap+0x108/0x3e0
&s->s_cap_lock 16042 [<0000000099463548>] __ceph_remove_cap+0x1f4/0x270
--------------
&s->s_cap_lock 338509619 [<00000000a2197200>] __ceph_caps_issued_mask+0x1bc/0x240
&s->s_cap_lock 1937864 [<00000000c96a24b7>] ceph_add_cap+0x234/0x3e0
&s->s_cap_lock 1203451 [<00000000aa76f996>] ceph_add_cap+0x108/0x3e0
&s->s_cap_lock 202 [<00000000888f212a>] __ceph_remove_cap+0x7c/0x270
In this /proc/lock_stat output, __touch_cap() is inlined in
__ceph_caps_issued_mask(). It is responsible for 99% of all
contentions.
Since __touch_cap() is called so often, it is acceptable to just skip
most calls. The most busy capabilities will still gravitate towards
the end of the linked list, and if not, it doesn't hurt as much as the
lock contention. This is still good enough for ceph_trim_caps().
This patch adds a static variable that gets incremented with each
call, and 255 out of 256 calls will just be skipped. I didn't bother
to make the increment atomic or use READ_ONCE because I don't think
that makes a practical difference for this use case.
Another /proc/lock_stat for 5 minutes with this patch (__touch_cap()
is no longer inlined probably because it contains a static variable):
class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
&s->s_cap_lock: 1043711 1065182 0.04 502.72 737472.88 0.69 10522578 25069948 0.04 796.44 11053669.64 0.44
--------------
&s->s_cap_lock 1043074 [<00000000f4367d73>] __touch_cap.isra.0+0x50/0xa8
&s->s_cap_lock 12147 [<0000000096f45706>] ceph_add_cap+0x234/0x3e0
&s->s_cap_lock 9472 [<0000000038a23e0f>] ceph_add_cap+0x108/0x3e0
&s->s_cap_lock 471 [<00000000e2eba934>] __ceph_remove_cap+0x1f4/0x270
--------------
&s->s_cap_lock 978499 [<00000000f4367d73>] __touch_cap.isra.0+0x50/0xa8
&s->s_cap_lock 57794 [<0000000038a23e0f>] ceph_add_cap+0x108/0x3e0
&s->s_cap_lock 27226 [<0000000096f45706>] ceph_add_cap+0x234/0x3e0
&s->s_cap_lock 1581 [<00000000e2eba934>] __ceph_remove_cap+0x1f4/0x270
__touch_cap() is still responsible for 91% of all contentions, but the
number of contentions has been reduced by a factor of 320 and the
total wait time by a factor of 567.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
fs/ceph/caps.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 4b37d9ffdf7f..a5e2b00d91a0 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -871,6 +871,14 @@ static void __touch_cap(struct ceph_cap *cap)
struct inode *inode = &cap->ci->netfs.inode;
struct ceph_mds_session *s = cap->session;
struct ceph_client *cl = s->s_mdsc->fsc->client;
+ static uint8_t skip_counter;
+
+ if (++skip_counter)
+ /* skip this call most of the time to reduce lock
+ * contention; the LRU list is still accurate enough
+ * for ceph_trim_caps()
+ */
+ return;
spin_lock(&s->s_cap_lock);
if (!s->s_cap_iterator) {
--
2.47.3
reply other threads:[~2026-07-03 11:33 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703113253.3635335-1-max.kellermann@ionos.com \
--to=max.kellermann@ionos.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox