From: Viacheslav Dubeyko <vdubeyko@redhat.com>
To: Alex Markuze <amarkuze@redhat.com>, ceph-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, idryomov@gmail.com
Subject: Re: [EXTERNAL] [PATCH v3 06/11] ceph: add manual reset debugfs control and tracepoints
Date: Thu, 30 Apr 2026 11:38:54 -0700 [thread overview]
Message-ID: <765228488ac0a9025df8116790cedbbc85264d8f.camel@redhat.com> (raw)
In-Reply-To: <20260429125206.1512203-7-amarkuze@redhat.com>
On Wed, 2026-04-29 at 12:52 +0000, Alex Markuze wrote:
> Add the debugfs and trace plumbing used to trigger and observe
> manual client reset.
>
> The reset interface exposes a trigger file for operator-initiated
> reset and a status file for tracking the most recent run. The
> tracepoints record scheduling, completion, and blocked caller
> behavior so reset progress can be diagnosed from the client side.
>
> debugfs layout under /sys/kernel/debug/ceph/<client>/reset/:
> trigger - write to initiate a manual reset
> status - read to see the most recent reset result
>
> The reset directory is cleaned up via debugfs_remove_recursive()
> on the parent, so individual file dentries are not stored.
>
> Tracepoints:
> ceph_client_reset_schedule - reset queued
> ceph_client_reset_complete - reset finished (success or failure)
> ceph_client_reset_blocked - caller blocked waiting for reset
> ceph_client_reset_unblocked - caller unblocked after reset
>
> All tracepoints use a null-safe access for monc.auth->global_id
> to guard against early-init or late-teardown edge cases.
>
> Signed-off-by: Alex Markuze <amarkuze@redhat.com>
> ---
> fs/ceph/debugfs.c | 102 ++++++++++++++++++++++++++++++++++++
> fs/ceph/mds_client.c | 8 +++
> fs/ceph/super.h | 1 +
> include/trace/events/ceph.h | 67 +++++++++++++++++++++++
> 4 files changed, 178 insertions(+)
>
> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> index 7dc307790240..beee4cfe8b18 100644
> --- a/fs/ceph/debugfs.c
> +++ b/fs/ceph/debugfs.c
> @@ -9,6 +9,7 @@
> #include <linux/seq_file.h>
> #include <linux/math64.h>
> #include <linux/ktime.h>
> +#include <linux/uaccess.h>
>
> #include <linux/ceph/libceph.h>
> #include <linux/ceph/mon_client.h>
> @@ -360,16 +361,107 @@ static int status_show(struct seq_file *s, void *p)
> return 0;
> }
>
> +static int reset_status_show(struct seq_file *s, void *p)
> +{
> + struct ceph_fs_client *fsc = s->private;
> + struct ceph_mds_client *mdsc = fsc->mdsc;
> + struct ceph_client_reset_state *st;
> + u64 trigger = 0, success = 0, failure = 0;
> + unsigned long last_start = 0, last_finish = 0;
> + int last_errno = 0;
> + enum ceph_client_reset_phase phase = CEPH_CLIENT_RESET_IDLE;
> + bool drain_timed_out = false;
> + int sessions_reset = 0;
> + int blocked_requests = 0;
> + char reason[CEPH_CLIENT_RESET_REASON_LEN];
> +
> + if (!mdsc)
> + return 0;
> +
> + st = &mdsc->reset_state;
> +
> + spin_lock(&st->lock);
> + trigger = st->trigger_count;
> + success = st->success_count;
> + failure = st->failure_count;
> + last_start = st->last_start;
> + last_finish = st->last_finish;
> + last_errno = st->last_errno;
> + phase = st->phase;
> + drain_timed_out = st->drain_timed_out;
> + sessions_reset = st->sessions_reset;
> + strscpy(reason, st->last_reason, sizeof(reason));
> + spin_unlock(&st->lock);
> +
> + blocked_requests = atomic_read(&st->blocked_requests);
> +
> + seq_printf(s, "phase: %s\n", ceph_reset_phase_name(phase));
> + seq_printf(s, "trigger_count: %llu\n", trigger);
> + seq_printf(s, "success_count: %llu\n", success);
> + seq_printf(s, "failure_count: %llu\n", failure);
> + if (last_start)
> + seq_printf(s, "last_start_ms_ago: %u\n",
> + jiffies_to_msecs(jiffies - last_start));
> + else
> + seq_puts(s, "last_start_ms_ago: (never)\n");
> + if (last_finish)
> + seq_printf(s, "last_finish_ms_ago: %u\n",
> + jiffies_to_msecs(jiffies - last_finish));
> + else
> + seq_puts(s, "last_finish_ms_ago: (never)\n");
> + seq_printf(s, "last_errno: %d\n", last_errno);
> + seq_printf(s, "last_reason: %s\n",
> + reason[0] ? reason : "(none)");
> + seq_printf(s, "drain_timed_out: %s\n",
> + drain_timed_out ? "yes" : "no");
> + seq_printf(s, "sessions_reset: %d\n", sessions_reset);
> + seq_printf(s, "blocked_requests: %d\n", blocked_requests);
> +
> + return 0;
> +}
> +
> +static ssize_t reset_trigger_write(struct file *file, const char __user *buf,
> + size_t len, loff_t *ppos)
> +{
> + struct ceph_fs_client *fsc = file->private_data;
> + struct ceph_mds_client *mdsc = fsc->mdsc;
> + char reason[CEPH_CLIENT_RESET_REASON_LEN];
> + size_t copy;
> + int ret;
> +
> + if (!mdsc)
> + return -ENODEV;
> +
> + copy = min_t(size_t, len, sizeof(reason) - 1);
> + if (copy && copy_from_user(reason, buf, copy))
> + return -EFAULT;
> + reason[copy] = '\0';
> + strim(reason);
> +
> + ret = ceph_mdsc_schedule_reset(mdsc, reason);
> + if (ret)
> + return ret;
> +
> + return len;
> +}
> +
> DEFINE_SHOW_ATTRIBUTE(mdsmap);
> DEFINE_SHOW_ATTRIBUTE(mdsc);
> DEFINE_SHOW_ATTRIBUTE(caps);
> DEFINE_SHOW_ATTRIBUTE(mds_sessions);
> DEFINE_SHOW_ATTRIBUTE(status);
> +DEFINE_SHOW_ATTRIBUTE(reset_status);
> DEFINE_SHOW_ATTRIBUTE(metrics_file);
> DEFINE_SHOW_ATTRIBUTE(metrics_latency);
> DEFINE_SHOW_ATTRIBUTE(metrics_size);
> DEFINE_SHOW_ATTRIBUTE(metrics_caps);
>
> +static const struct file_operations ceph_reset_trigger_fops = {
> + .owner = THIS_MODULE,
> + .open = simple_open,
> + .write = reset_trigger_write,
> + .llseek = noop_llseek,
> +};
>
> /*
> * debugfs
> @@ -404,6 +496,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
> debugfs_remove(fsc->debugfs_caps);
> debugfs_remove(fsc->debugfs_status);
> debugfs_remove(fsc->debugfs_mdsc);
> + debugfs_remove_recursive(fsc->debugfs_reset_dir);
I started to have troubles to apply the patches from the 3rd one. And latest
kernel version contains:
debugfs_remove(fsc->debugfs_subvolume_metrics);
So, patchset needs to be rebased on the latest state of CephFS kernel client.
Thanks,
Slava.
> debugfs_remove_recursive(fsc->debugfs_metrics_dir);
> doutc(fsc->client, "done\n");
> }
> @@ -451,6 +544,15 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
> fsc,
> &caps_fops);
>
> + fsc->debugfs_reset_dir = debugfs_create_dir("reset",
> + fsc->client->debugfs_dir);
> + debugfs_create_file("trigger", 0200,
> + fsc->debugfs_reset_dir, fsc,
> + &ceph_reset_trigger_fops);
> + debugfs_create_file("status", 0400,
> + fsc->debugfs_reset_dir, fsc,
> + &reset_status_fops);
> +
> fsc->debugfs_status = debugfs_create_file("status",
> 0400,
> fsc->client->debugfs_dir,
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 777af51ec8d8..8339c2c72f9a 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -5261,6 +5261,7 @@ int ceph_mdsc_wait_for_reset(struct ceph_mds_client *mdsc)
> blocked_count = atomic_inc_return(&st->blocked_requests);
> doutc(cl, "request blocked during reset, %d total blocked\n",
> blocked_count);
> + trace_ceph_client_reset_blocked(mdsc, blocked_count);
>
> retry:
> remaining = max_t(long, deadline - jiffies, 1);
> @@ -5272,10 +5273,12 @@ int ceph_mdsc_wait_for_reset(struct ceph_mds_client *mdsc)
> if (wait_ret == 0) {
> atomic_dec(&st->blocked_requests);
> pr_warn_client(cl, "timed out waiting for reset to complete\n");
> + trace_ceph_client_reset_unblocked(mdsc, -ETIMEDOUT);
> return -ETIMEDOUT;
> }
> if (wait_ret < 0) {
> atomic_dec(&st->blocked_requests);
> + trace_ceph_client_reset_unblocked(mdsc, (int)wait_ret);
> return (int)wait_ret; /* -ERESTARTSYS */
> }
>
> @@ -5290,12 +5293,14 @@ int ceph_mdsc_wait_for_reset(struct ceph_mds_client *mdsc)
> if (time_before(jiffies, deadline))
> goto retry;
> atomic_dec(&st->blocked_requests);
> + trace_ceph_client_reset_unblocked(mdsc, -ETIMEDOUT);
> return -ETIMEDOUT;
> }
> ret = st->last_errno;
> spin_unlock(&st->lock);
>
> atomic_dec(&st->blocked_requests);
> + trace_ceph_client_reset_unblocked(mdsc, ret);
> return ret ? -EIO : 0;
> }
>
> @@ -5324,6 +5329,8 @@ static void ceph_mdsc_reset_complete(struct ceph_mds_client *mdsc, int ret)
>
> /* Wake up all requests that were blocked waiting for reset */
> wake_up_all(&st->blocked_wq);
> +
> + trace_ceph_client_reset_complete(mdsc, ret);
> }
>
> static void ceph_mdsc_reset_workfn(struct work_struct *work)
> @@ -5633,6 +5640,7 @@ int ceph_mdsc_schedule_reset(struct ceph_mds_client *mdsc,
> pr_info_client(mdsc->fsc->client,
> "manual session reset scheduled (reason=\"%s\")\n",
> msg);
> + trace_ceph_client_reset_schedule(mdsc, msg);
> return 0;
> }
>
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 9aca42c89ea0..5bf976b6c4fe 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -179,6 +179,7 @@ struct ceph_fs_client {
> struct dentry *debugfs_status;
> struct dentry *debugfs_mds_sessions;
> struct dentry *debugfs_metrics_dir;
> + struct dentry *debugfs_reset_dir;
> #endif
>
> #ifdef CONFIG_CEPH_FSCACHE
> diff --git a/include/trace/events/ceph.h b/include/trace/events/ceph.h
> index 08cb0659fbfc..1b990632f62b 100644
> --- a/include/trace/events/ceph.h
> +++ b/include/trace/events/ceph.h
> @@ -226,6 +226,73 @@ TRACE_EVENT(ceph_handle_caps,
> __entry->mseq)
> );
>
> +/*
> + * Client reset tracepoints - identify the client by its monitor-
> + * assigned global_id so traces remain meaningful when kernel pointer
> + * hashing is enabled.
> + */
> +TRACE_EVENT(ceph_client_reset_schedule,
> + TP_PROTO(const struct ceph_mds_client *mdsc, const char *reason),
> + TP_ARGS(mdsc, reason),
> + TP_STRUCT__entry(
> + __field(u64, client_id)
> + __string(reason, reason ? reason : "")
> + ),
> + TP_fast_assign(
> + __entry->client_id = mdsc->fsc->client->monc.auth ?
> + mdsc->fsc->client->monc.auth->global_id : 0;
> + __assign_str(reason);
> + ),
> + TP_printk("client_id=%llu reason=%s",
> + __entry->client_id, __get_str(reason))
> +);
> +
> +TRACE_EVENT(ceph_client_reset_complete,
> + TP_PROTO(const struct ceph_mds_client *mdsc, int ret),
> + TP_ARGS(mdsc, ret),
> + TP_STRUCT__entry(
> + __field(u64, client_id)
> + __field(int, ret)
> + ),
> + TP_fast_assign(
> + __entry->client_id = mdsc->fsc->client->monc.auth ?
> + mdsc->fsc->client->monc.auth->global_id : 0;
> + __entry->ret = ret;
> + ),
> + TP_printk("client_id=%llu ret=%d", __entry->client_id, __entry->ret)
> +);
> +
> +TRACE_EVENT(ceph_client_reset_blocked,
> + TP_PROTO(const struct ceph_mds_client *mdsc, int blocked_count),
> + TP_ARGS(mdsc, blocked_count),
> + TP_STRUCT__entry(
> + __field(u64, client_id)
> + __field(int, blocked_count)
> + ),
> + TP_fast_assign(
> + __entry->client_id = mdsc->fsc->client->monc.auth ?
> + mdsc->fsc->client->monc.auth->global_id : 0;
> + __entry->blocked_count = blocked_count;
> + ),
> + TP_printk("client_id=%llu blocked_count=%d", __entry->client_id,
> + __entry->blocked_count)
> +);
> +
> +TRACE_EVENT(ceph_client_reset_unblocked,
> + TP_PROTO(const struct ceph_mds_client *mdsc, int ret),
> + TP_ARGS(mdsc, ret),
> + TP_STRUCT__entry(
> + __field(u64, client_id)
> + __field(int, ret)
> + ),
> + TP_fast_assign(
> + __entry->client_id = mdsc->fsc->client->monc.auth ?
> + mdsc->fsc->client->monc.auth->global_id : 0;
> + __entry->ret = ret;
> + ),
> + TP_printk("client_id=%llu ret=%d", __entry->client_id, __entry->ret)
> +);
> +
> #undef EM
> #undef E_
> #endif /* _TRACE_CEPH_H */
next prev parent reply other threads:[~2026-04-30 18:38 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 12:51 [PATCH v3 00/11] ceph: manual client session reset Alex Markuze
2026-04-29 12:51 ` [PATCH v3 01/11] ceph: convert inode flags to named bit positions and atomic bitops Alex Markuze
2026-04-29 19:31 ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:51 ` [PATCH v3 02/11] ceph: use proper endian conversion for flock_len in reconnect Alex Markuze
2026-04-29 12:51 ` [PATCH v3 03/11] ceph: harden send_mds_reconnect and handle active-MDS peer reset Alex Markuze
2026-04-29 21:22 ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:51 ` [PATCH v3 04/11] ceph: add diagnostic timeout loop to wait_caps_flush() Alex Markuze
2026-04-29 21:41 ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:52 ` [PATCH v3 05/11] ceph: add client reset state machine and session teardown Alex Markuze
2026-04-29 22:29 ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:52 ` [PATCH v3 06/11] ceph: add manual reset debugfs control and tracepoints Alex Markuze
2026-04-30 18:38 ` Viacheslav Dubeyko [this message]
2026-04-29 12:52 ` [PATCH v3 07/11] selftests: ceph: add reset consistency checker Alex Markuze
2026-04-29 12:52 ` [PATCH v3 08/11] selftests: ceph: add reset stress test Alex Markuze
2026-04-29 12:52 ` [PATCH v3 09/11] selftests: ceph: add reset corner-case tests Alex Markuze
2026-04-29 12:52 ` [PATCH v3 10/11] selftests: ceph: add validation harness Alex Markuze
2026-04-29 12:52 ` [PATCH v3 11/11] selftests: ceph: wire up Ceph reset kselftests and documentation Alex Markuze
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=765228488ac0a9025df8116790cedbbc85264d8f.camel@redhat.com \
--to=vdubeyko@redhat.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox