linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Bruce Fields <bfields@fieldses.org>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 09/15] sunrpc: Report per-RPC execution stats
Date: Wed, 14 Mar 2018 21:11:49 -0400	[thread overview]
Message-ID: <D5995467-0177-4E26-8CCF-5E8150F20687@oracle.com> (raw)
In-Reply-To: <20180314205042.GB7241@fieldses.org>



> On Mar 14, 2018, at 4:50 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Tue, Mar 13, 2018 at 11:44:27AM -0400, Chuck Lever wrote:
>> Introduce a mechanism to report the server-side execution latency of
>> each RPC. The goal is to enable user space to filter the trace
>> record for latency outliers, build histograms, etc.
>=20
> Neato.
>=20
> To be useful I'd think you'd want more information about each RPC.
> (E.g. you'd like to know that the latency outliers all did reads).  I
> guess you could use the address and xid to cross-reference with
> information collected from somewhere else?

Yes. You can enable other trace points (like the nfsd_compound ones)
to see what class each operation is in.

And yes, I would like to have all the relevant information for each
RPC in a single trace record; I just haven't figured out a way to
extract it as nicely as I did it on the client (patch forthcoming).
On the client side, there is a table set up for each RPC program that
contains an RPC procedure name to procedure number mapping. I was not
able to find a similar convenience on the server side.


> What's our commitment to backwards compatibility?  Trace points seem =
to
> be the wild west compared to the rest of the kernel interface, but if
> we want to encourage tooling on top of this then I guess we'll need to
> be strict.

That has been discussed elsewhere (LWN.net and more recently on
linux-rdma). The only compatibility issues are with trace points that
have user tools and infrastructure that depends on them, such as the
scheduler trace points used for latencyTOP. The NFS and sunrpc trace
points do not have this constraint, as they are processed currently
only by generic tools like trace-cmd. So we are free to innovate for
the time being.


> Looking at the tcp case, I think it's measuring from the time
> tcp_recvfrom receives the last fragment making up an rpc request till
> the last sendpage() of the reply returns.  Did you consider other =
spots?
> (E.g. why after the send instead of before?)

Yes, I've considered other spots. I don't consider the spots I'm
proposing here to be written in stone. I welcome help for placing the
socket-based timestamp capture points.

Some sendto implementations are more complex than others. For instance,
RPC/RDMA can post RDMA Writes containing data content first, then in the
final step post the RDMA Send carrying the RPC Reply header. The RDMA
Write step can be considered server-side processing, and thus is part
of the latency. Or, if we ever decide to move the RDMA Write step into
the XDR layer, it will definitely be counted as processing latency.

One thing I would like to keep in the latency measurement is how long
this rqstp has waited to acquire the send mutex. But otherwise, I'm
open to other ideas about how to measure this latency.


> --b.
>=20
>>=20
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> include/linux/sunrpc/svc.h               |    1 +
>> include/linux/sunrpc/svc_rdma.h          |    1 +
>> include/trace/events/sunrpc.h            |   20 ++++++++++++++++++++
>> net/sunrpc/svc_xprt.c                    |    2 +-
>> net/sunrpc/svcsock.c                     |    2 ++
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |    1 +
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    1 +
>> 7 files changed, 27 insertions(+), 1 deletion(-)
>>=20
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index 786ae22..1939709 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -245,6 +245,7 @@ struct svc_rqst {
>> 	void *			rq_xprt_ctxt;	/* transport specific =
context ptr */
>> 	struct svc_deferred_req*rq_deferred;	/* deferred request we =
are replaying */
>>=20
>> +	ktime_t			rq_rtime;	/* receive time */
>> 	size_t			rq_xprt_hlen;	/* xprt header len */
>> 	struct xdr_buf		rq_arg;
>> 	struct xdr_buf		rq_res;
>> diff --git a/include/linux/sunrpc/svc_rdma.h =
b/include/linux/sunrpc/svc_rdma.h
>> index 4b731b0..8045981 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -79,6 +79,7 @@ struct svc_rdma_op_ctxt {
>> 	struct list_head list;
>> 	struct xdr_buf arg;
>> 	struct ib_cqe cqe;
>> +	ktime_t rtime;
>> 	u32 byte_len;
>> 	struct svcxprt_rdma *xprt;
>> 	enum dma_data_direction direction;
>> diff --git a/include/trace/events/sunrpc.h =
b/include/trace/events/sunrpc.h
>> index 5a8157c..50de512 100644
>> --- a/include/trace/events/sunrpc.h
>> +++ b/include/trace/events/sunrpc.h
>> @@ -705,6 +705,26 @@
>> 		__entry->len, show_svc_xprt_flags(__entry->flags))
>> );
>>=20
>> +TRACE_EVENT(svc_stats_latency,
>> +	TP_PROTO(const struct svc_rqst *rqst, ktime_t end),
>> +
>> +	TP_ARGS(rqst, end),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(u32, xid)
>> +		__field(unsigned long, execute)
>> +		__string(addr, rqst->rq_xprt->xpt_remotebuf)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->xid =3D be32_to_cpu(rqst->rq_xid);
>> +		__entry->execute =3D ktime_to_us(ktime_sub(end, =
rqst->rq_rtime));
>> +		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
>> +	),
>> +
>> +	TP_printk("addr=3D%s xid=3D0x%08x execute=3D%lu",
>> +		__get_str(addr), __entry->xid, __entry->execute)
>> +);
>>=20
>> DECLARE_EVENT_CLASS(svc_deferred_event,
>> 	TP_PROTO(struct svc_deferred_req *dr),
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index f745754..2556e6e 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -782,7 +782,6 @@ static int svc_handle_xprt(struct svc_rqst =
*rqstp, struct svc_xprt *xprt)
>> 			len =3D svc_deferred_recv(rqstp);
>> 		else
>> 			len =3D xprt->xpt_ops->xpo_recvfrom(rqstp);
>> -		dprintk("svc: got len=3D%d\n", len);
>> 		rqstp->rq_reserved =3D serv->sv_max_mesg;
>> 		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
>> 	}
>> @@ -893,6 +892,7 @@ int svc_send(struct svc_rqst *rqstp)
>> 		len =3D -ENOTCONN;
>> 	else
>> 		len =3D xprt->xpt_ops->xpo_sendto(rqstp);
>> +	trace_svc_stats_latency(rqstp, ktime_get_real());
>> 	mutex_unlock(&xprt->xpt_mutex);
>> 	rpc_wake_up(&xprt->xpt_bc_pending);
>> 	trace_svc_send(rqstp, len);
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index 4ca1d92..f616962 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -574,6 +574,7 @@ static int svc_udp_recvfrom(struct svc_rqst =
*rqstp)
>> 		/* Don't enable netstamp, sunrpc doesn't
>> 		   need that much accuracy */
>> 	}
>> +	rqstp->rq_rtime =3D skb->tstamp;
>> 	svsk->sk_sk->sk_stamp =3D skb->tstamp;
>> 	set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* there may be =
more data... */
>>=20
>> @@ -1143,6 +1144,7 @@ static int svc_tcp_recvfrom(struct svc_rqst =
*rqstp)
>> 	if (len < 0)
>> 		goto error;
>>=20
>> +	rqstp->rq_rtime =3D ktime_get_real();
>> 	svc_xprt_copy_addrs(rqstp, &svsk->sk_xprt);
>> 	if (serv->sv_stats)
>> 		serv->sv_stats->nettcpcnt++;
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c =
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index 19e9c6b..446b9d6 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -558,6 +558,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> 		goto out_readchunk;
>>=20
>> complete:
>> +	rqstp->rq_rtime =3D ctxt->rtime;
>> 	svc_rdma_put_context(ctxt, 0);
>> 	dprintk("svcrdma: recvfrom: xprt=3D%p, rqstp=3D%p, =
rq_arg.len=3D%u\n",
>> 		rdma_xprt, rqstp, rqstp->rq_arg.len);
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c =
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index ca211dc..08fa7de 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -316,6 +316,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, =
struct ib_wc *wc)
>> 		goto flushed;
>>=20
>> 	/* All wc fields are now known to be valid */
>> +	ctxt->rtime =3D ktime_get_real();
>> 	ctxt->byte_len =3D wc->byte_len;
>> 	spin_lock(&xprt->sc_rq_dto_lock);
>> 	list_add_tail(&ctxt->list, &xprt->sc_rq_dto_q);

--
Chuck Lever




  reply	other threads:[~2018-03-15  1:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-13 15:43 [PATCH 00/15] NFSD trace point clean up Chuck Lever
2018-03-13 15:43 ` [PATCH 01/15] sunrpc: Remove unneeded pointer dereference Chuck Lever
2018-03-13 15:43 ` [PATCH 02/15] svc: Simplify ->xpo_secure_port Chuck Lever
2018-03-13 15:43 ` [PATCH 03/15] sunrpc: Update show_svc_xprt_flags() to include recently added flags Chuck Lever
2018-03-13 15:44 ` [PATCH 04/15] sunrpc: Move trace_svc_xprt_dequeue() Chuck Lever
2018-03-13 15:44 ` [PATCH 05/15] sunrpc: Simplify do_enqueue tracing Chuck Lever
2018-03-13 15:44 ` [PATCH 06/15] sunrpc: Simplify trace_svc_recv Chuck Lever
2018-03-13 15:44 ` [PATCH 07/15] sunrpc: Save remote presentation address in svc_xprt for trace events Chuck Lever
2018-03-14 20:35   ` J. Bruce Fields
2018-03-15  0:54     ` Chuck Lever
2018-03-15  1:43       ` Bruce Fields
2018-03-13 15:44 ` [PATCH 08/15] sunrpc: Re-purpose trace_svc_process Chuck Lever
2018-03-13 15:44 ` [PATCH 09/15] sunrpc: Report per-RPC execution stats Chuck Lever
2018-03-14 20:50   ` J. Bruce Fields
2018-03-15  1:11     ` Chuck Lever [this message]
2018-03-22 20:32       ` Chuck Lever
2018-03-22 21:01         ` Bruce Fields
2018-03-22 21:18           ` Chuck Lever
2018-03-13 15:44 ` [PATCH 10/15] nfsd: Fix NFSD trace points Chuck Lever
2018-03-13 15:44 ` [PATCH 11/15] nfsd: Record request byte count, not count of vectors Chuck Lever
2018-03-14 20:52   ` J. Bruce Fields
2018-03-15  1:18     ` Chuck Lever
2018-03-13 15:44 ` [PATCH 12/15] nfsd: Add "nfsd_" to trace point names Chuck Lever
2018-03-13 15:44 ` [PATCH 13/15] nfsd: Add I/O trace points in the NFSv4 write path Chuck Lever
2018-03-13 15:44 ` [PATCH 14/15] nfsd: Add I/O trace points in the NFSv4 read proc Chuck Lever
2018-03-13 15:44 ` [PATCH 15/15] nfsd: Trace NFSv4 COMPOUND execution Chuck Lever
2018-03-14 20:55   ` J. Bruce Fields
2018-03-15  1:29     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D5995467-0177-4E26-8CCF-5E8150F20687@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).