The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] fs/ceph/caps: skip __touch_cap() most of the time
@ 2026-07-03 11:32 Max Kellermann
  0 siblings, 0 replies; only message in thread
From: Max Kellermann @ 2026-07-03 11:32 UTC (permalink / raw)
  To: idryomov, amarkuze, ceph-devel, linux-kernel; +Cc: Max Kellermann

__touch_cap() moves one capability to the end of the LRU list; this
list is sorted by access time for just one thing: ceph_trim_caps().
That function is supposed to discard the least-recently used
capabilities.

__touch_cap() is called extremely often - several times for every
system call, but ceph_trim_caps() is only called rarely.

__touch_cap() causes considerable lock contention on
`ceph_mds_session.s_cap_lock`; this is a /proc/lock_stat I captured on
one of our web servers for 5 minutes:

      class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg

  &s->s_cap_lock:     336304046      341686597           0.04        4905.76   418498578.76           1.22      892783632     1957814739           0.04         959.40   355752146.24           0.18
  --------------
  &s->s_cap_lock      339379730          [<00000000a2197200>] __ceph_caps_issued_mask+0x1bc/0x240
  &s->s_cap_lock        1268054          [<00000000c96a24b7>] ceph_add_cap+0x234/0x3e0
  &s->s_cap_lock        1021360          [<00000000aa76f996>] ceph_add_cap+0x108/0x3e0
  &s->s_cap_lock          16042          [<0000000099463548>] __ceph_remove_cap+0x1f4/0x270
  --------------
  &s->s_cap_lock      338509619          [<00000000a2197200>] __ceph_caps_issued_mask+0x1bc/0x240
  &s->s_cap_lock        1937864          [<00000000c96a24b7>] ceph_add_cap+0x234/0x3e0
  &s->s_cap_lock        1203451          [<00000000aa76f996>] ceph_add_cap+0x108/0x3e0
  &s->s_cap_lock            202          [<00000000888f212a>] __ceph_remove_cap+0x7c/0x270

In this /proc/lock_stat output, __touch_cap() is inlined in
__ceph_caps_issued_mask().  It is responsible for 99% of all
contentions.

Since __touch_cap() is called so often, it is acceptable to just skip
most calls.  The most busy capabilities will still gravitate towards
the end of the linked list, and if not, it doesn't hurt as much as the
lock contention.  This is still good enough for ceph_trim_caps().

This patch adds a static variable that gets incremented with each
call, and 255 out of 256 calls will just be skipped.  I didn't bother
to make the increment atomic or use READ_ONCE because I don't think
that makes a practical difference for this use case.

Another /proc/lock_stat for 5 minutes with this patch (__touch_cap()
is no longer inlined probably because it contains a static variable):

      class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg

  &s->s_cap_lock:       1043711        1065182           0.04         502.72      737472.88           0.69       10522578       25069948           0.04         796.44    11053669.64           0.44
  --------------
  &s->s_cap_lock        1043074          [<00000000f4367d73>] __touch_cap.isra.0+0x50/0xa8
  &s->s_cap_lock          12147          [<0000000096f45706>] ceph_add_cap+0x234/0x3e0
  &s->s_cap_lock           9472          [<0000000038a23e0f>] ceph_add_cap+0x108/0x3e0
  &s->s_cap_lock            471          [<00000000e2eba934>] __ceph_remove_cap+0x1f4/0x270
  --------------
  &s->s_cap_lock         978499          [<00000000f4367d73>] __touch_cap.isra.0+0x50/0xa8
  &s->s_cap_lock          57794          [<0000000038a23e0f>] ceph_add_cap+0x108/0x3e0
  &s->s_cap_lock          27226          [<0000000096f45706>] ceph_add_cap+0x234/0x3e0
  &s->s_cap_lock           1581          [<00000000e2eba934>] __ceph_remove_cap+0x1f4/0x270

__touch_cap() is still responsible for 91% of all contentions, but the
number of contentions has been reduced by a factor of 320 and the
total wait time by a factor of 567.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/ceph/caps.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 4b37d9ffdf7f..a5e2b00d91a0 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -871,6 +871,14 @@ static void __touch_cap(struct ceph_cap *cap)
 	struct inode *inode = &cap->ci->netfs.inode;
 	struct ceph_mds_session *s = cap->session;
 	struct ceph_client *cl = s->s_mdsc->fsc->client;
+	static uint8_t skip_counter;
+
+	if (++skip_counter)
+		/* skip this call most of the time to reduce lock
+		 * contention; the LRU list is still accurate enough
+		 * for ceph_trim_caps()
+		 */
+		return;
 
 	spin_lock(&s->s_cap_lock);
 	if (!s->s_cap_iterator) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-07-03 11:33 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-03 11:32 [PATCH] fs/ceph/caps: skip __touch_cap() most of the time Max Kellermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox