From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>
Cc: Sargun Dillon <sargun@sargun.me>,
linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 02/12] sunrpc: add info about xprt queue times to svc_xprt_dequeue tracepoint
Date: Fri, 11 Apr 2025 09:24:59 -0400 [thread overview]
Message-ID: <3d16719d-11f8-4016-a311-6ee8dc11667e@oracle.com> (raw)
In-Reply-To: <91a99432d2b72ddcb88de0e86dadbfbfbf4590ab.camel@kernel.org>
On 4/11/25 9:10 AM, Jeff Layton wrote:
> On Wed, 2025-04-09 at 11:26 -0400, Jeff Layton wrote:
>> On Wed, 2025-04-09 at 11:00 -0400, Chuck Lever wrote:
>>> On 4/9/25 10:32 AM, Jeff Layton wrote:
>>>> Currently, this tracepoint displays "wakeup-us", which is the time that
>>>> the woken thread spent sleeping, before dequeueing the next xprt. Add a
>>>> new statistic that shows how long the xprt sat on the queue before being
>>>> serviced.
>>>
>>> I don't understand the difference between "waiting on queue" and
>>> "sleeping". When are those two latency measurements not the same?
>>>
>>
>> These are measuring two different things:
>>
>> svc_rqst->rq_qtime represents the time between when thread on the
>> sp_idle_threads list was woken. This patch adds svc_xprt->xpt_qtime,
>> which represents the time that the svc_xprt was added to the lwq.
>>
>> The first tells us how long the interval was between the thread being
>> woken and the xprt being dequeued. The new statistic tells us how long
>> between the xprt being enqueued and dequeued.
>>
>> They could easily diverge if there were not enough threads available to
>> service all of the queued xprts.
>>
>
> Hi Chuck! If you're OK with my rationale above, I'd like to expedite
> merging this patch in particular.
>
> The reason is that we have clients with the nfs_layout_flexfiles
> dataserver_timeo module parameter set for 6s. This helps them switch to
> an alternate mirror when a DS goes down, but we see a lot of RPC
> timeouts when this is set.
>
> My theory is that the xprts are getting queued and it's taking a long
> time for a thread to pick it up. That should show up as a large value
> in the qtime field in this tracepoint if I'm correct.
>
> Would you be amenable to that?
No objection, repost this one with the beefier rationale.
But it depends on what you mean by "expedite" -- v6.16 would be the
next "normal" opportunity, since this change doesn't qualify as a
bug fix.
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>> include/linux/sunrpc/svc_xprt.h | 1 +
>>>> include/trace/events/sunrpc.h | 13 +++++++------
>>>> net/sunrpc/svc_xprt.c | 1 +
>>>> 3 files changed, 9 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>>>> index 72be609525796792274d5b8cb5ff37f73723fc23..369a89aea18618748607ee943247c327bf62c8d5 100644
>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>> @@ -53,6 +53,7 @@ struct svc_xprt {
>>>> struct svc_xprt_class *xpt_class;
>>>> const struct svc_xprt_ops *xpt_ops;
>>>> struct kref xpt_ref;
>>>> + ktime_t xpt_qtime;
>>>> struct list_head xpt_list;
>>>> struct lwq_node xpt_ready;
>>>> unsigned long xpt_flags;
>>>> diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
>>>> index 5d331383047b79b9f6dcd699c87287453c1a5f49..b5a0f0bc1a3b7cfd90ce0181a8a419db810988bb 100644
>>>> --- a/include/trace/events/sunrpc.h
>>>> +++ b/include/trace/events/sunrpc.h
>>>> @@ -2040,19 +2040,20 @@ TRACE_EVENT(svc_xprt_dequeue,
>>>>
>>>> TP_STRUCT__entry(
>>>> SVC_XPRT_ENDPOINT_FIELDS(rqst->rq_xprt)
>>>> -
>>>> __field(unsigned long, wakeup)
>>>> + __field(unsigned long, qtime)
>>>> ),
>>>>
>>>> TP_fast_assign(
>>>> - SVC_XPRT_ENDPOINT_ASSIGNMENTS(rqst->rq_xprt);
>>>> + ktime_t ktime = ktime_get();
>>>>
>>>> - __entry->wakeup = ktime_to_us(ktime_sub(ktime_get(),
>>>> - rqst->rq_qtime));
>>>> + SVC_XPRT_ENDPOINT_ASSIGNMENTS(rqst->rq_xprt);
>>>> + __entry->wakeup = ktime_to_us(ktime_sub(ktime, rqst->rq_qtime));
>>>> + __entry->qtime = ktime_to_us(ktime_sub(ktime, rqst->rq_xprt->xpt_qtime));
>>>> ),
>>>>
>>>> - TP_printk(SVC_XPRT_ENDPOINT_FORMAT " wakeup-us=%lu",
>>>> - SVC_XPRT_ENDPOINT_VARARGS, __entry->wakeup)
>>>> + TP_printk(SVC_XPRT_ENDPOINT_FORMAT " wakeup-us=%lu qtime=%lu",
>>>> + SVC_XPRT_ENDPOINT_VARARGS, __entry->wakeup, __entry->qtime)
>>>> );
>>>>
>>>> DECLARE_EVENT_CLASS(svc_xprt_event,
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index ae25405d8bd22672a361d1fd3adfdcebb403f90f..32018557797b1f683d8b7259f5fccd029aebcd71 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -488,6 +488,7 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
>>>> pool = svc_pool_for_cpu(xprt->xpt_server);
>>>>
>>>> percpu_counter_inc(&pool->sp_sockets_queued);
>>>> + xprt->xpt_qtime = ktime_get();
>>>> lwq_enqueue(&xprt->xpt_ready, &pool->sp_xprts);
>>>>
>>>> svc_pool_wake_idle_thread(pool);
>>>>
>>>
>>>
>>
>
--
Chuck Lever
next prev parent reply other threads:[~2025-04-11 13:25 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 14:32 [PATCH v2 00/12] nfsd: observability improvements Jeff Layton
2025-04-09 14:32 ` [PATCH v2 01/12] nfsd: add commit start/done tracepoints around nfsd_commit() Jeff Layton
2025-04-09 14:32 ` [PATCH v2 02/12] sunrpc: add info about xprt queue times to svc_xprt_dequeue tracepoint Jeff Layton
2025-04-09 15:00 ` Chuck Lever
2025-04-09 15:26 ` Jeff Layton
2025-04-11 13:10 ` Jeff Layton
2025-04-11 13:24 ` Chuck Lever [this message]
2025-04-11 13:40 ` Jeff Layton
2025-04-09 14:32 ` [PATCH v2 03/12] sunrpc: move the SVC_RQST_EVENT_*() macros to common header Jeff Layton
2025-04-09 14:32 ` [PATCH v2 04/12] nfsd: add a tracepoint for nfsd_setattr Jeff Layton
2025-04-09 15:03 ` Chuck Lever
2025-04-09 14:32 ` [PATCH v2 05/12] nfsd: add a tracepoint to nfsd_lookup_dentry Jeff Layton
2025-04-09 14:32 ` [PATCH v2 06/12] nfsd: add tracepoints around nfsd_create events Jeff Layton
2025-04-09 15:09 ` Chuck Lever
2025-04-09 15:36 ` Jeff Layton
2025-04-09 15:38 ` Chuck Lever
2025-04-09 15:40 ` Chuck Lever
2025-04-09 16:50 ` Jeff Layton
2025-04-09 16:59 ` Chuck Lever
2025-04-09 14:32 ` [PATCH v2 07/12] nfsd: add tracepoints for symlink events Jeff Layton
2025-04-09 15:15 ` Chuck Lever
2025-04-09 14:32 ` [PATCH v2 08/12] nfsd: add tracepoints for hardlink events Jeff Layton
2025-04-09 14:32 ` [PATCH v2 09/12] nfsd: add tracepoints for unlink events Jeff Layton
2025-04-09 14:32 ` [PATCH v2 10/12] nfsd: add tracepoints to rename events Jeff Layton
2025-04-09 14:32 ` [PATCH v2 11/12] nfsd: add tracepoints for readdir events Jeff Layton
2025-04-09 14:32 ` [PATCH v2 12/12] nfsd: add tracepoint for getattr events Jeff Layton
2025-04-11 14:52 ` (subset) [PATCH v2 00/12] nfsd: observability improvements cel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3d16719d-11f8-4016-a311-6ee8dc11667e@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=sargun@sargun.me \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox