Linux DTrace development list
 help / color / mirror / Atom feed
From: Kris Van Hees <kris.van.hees@oracle.com>
To: Eugene Loh <eugene.loh@oracle.com>
Cc: Kris Van Hees <kris.van.hees@oracle.com>,
	DTrace mailing lists <dtrace@lists.linux.dev>,
	dtrace-devel@oss.oracle.com
Subject: Re: [PATCH] io: adjust io provider for NFS tracepoint variants
Date: Thu, 9 Jan 2025 13:15:27 -0500	[thread overview]
Message-ID: <Z4ASPwJPQUeQwAad@oracle.com> (raw)
In-Reply-To: <474ca7db-5d3c-0426-7b17-fb04043544c5@oracle.com>

On Wed, Jan 08, 2025 at 01:01:28AM -0500, Eugene Loh wrote:
> You already have a R-b, it seems the code is right (woo hoo! cool), and
> we're trying to get a release out the door.  But here are my "late to the
> party" comments.  Feel free to ignore.
> 
> The names v1 and v2 strike me as funny;  they just make up new numbers.  Not
> a big deal, but ideally the names might more closely reflect the actual
> version changes we're tracking.

The point of using v1 and v2 is really to reflect that there are two variants
of the tracepoints.  I deliberately did not associate them (through their name)
with specific kernel versions because that fails very quickly when people start
doing things like backports.  With the current naming, the implementation is
reflecting the 1st variant of the tracepoints vs the 2nd variant.  The logic
to use one vs the other is handled where that choice needs to be made.

This naming also allows for a future choice to perhaps differentiate based on
e.g. prototype of the tracepoint rather than a kernel version (again, that
would handle backported patches much more graceafully).

> Breaking the function out into a new "v1" version is some unneeded
> copy-and-paste code bloat, I think.  The function is kind of large and the
> only difference between v1 and v2 is I guess in the two, relatively short,
> "start" sections.  So, keeping this stuff as one function -- with extra
> logic in the two "start" sections -- might be more compact and clearer than
> having two rather similar functions.

The main point of the bloat is to make it very clear what we are doing here
and why.  There certainly is opportunity for refactoring the code, but this was
not the time for it.

> On 1/7/25 17:44, Kris Van Hees wrote:
> > Kernels prior to 5.6.0 pass 3 arguments (derived from the NFS hdr)
> > to the nfs_initiate_read raw tracepoint, whereas kernels as of 5.6.0
> > pass just the NFS hdr.
> 
> Is the same true of write?  If so, then maybe say so and point out that what
> we're really doing is changing the handling of nfs "start" while leaving nfs
> "done" alone...  that would correspond more closely to what is happening in
> the code, which talks of start and done.

Yes, and the code handles that also.  So yes, should have mentioned write also.
Mea culpa.

> > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > ---
> >   libdtrace/dt_prov_io.c | 127 +++++++++++++++++++++++++++++++++++------
> >   1 file changed, 108 insertions(+), 19 deletions(-)
> > 
> > diff --git a/libdtrace/dt_prov_io.c b/libdtrace/dt_prov_io.c
> > @@ -155,12 +147,109 @@ static void deref_r3(dt_irlist_t *dlp, uint_t exitlbl, int addend, int width,
> > +/*
> > + * For NFS events, we have to construct a fake struct bio, which we have to
> > + * populate from the inode (arg0) and hdr->good_bytes (arg2) arguments the
> > + * underlying probe provides.
> > + */
> > +static void io_nfs_args_v1(dt_pcb_t *pcb, dt_irlist_t *dlp, uint_t exitlbl,
> > +			   const char *prb, const char *uprb)
> > +{
> > +	int	off;
> > +	size_t	siz;
> > +
> > +	/*
> > +	 * Determine the various sizes and offsets we want.
> > +	 *
> > +	 *     // Access these fields relative to &bio.
> > +	 *     struct bio bio = {
> > +	 *         .bi_opf = ...,
> > +	 *         .bi_iter.bi_size = ...,      // struct bvec_iter bi_iter
> > +	 *         .bi_iter.bi_sector = ...,
> > +	 *         .bi_bdev = 0,		// -or- .bi_disk = 0
> > +	 *     };
> > +	 *
> > +	 *     // Access these fields relative to hdr.
> > +	 *     struct nfs_pgio_header *hdr;
> > +	 *     ... = hdr->res.count;            // struct nfs_pgio_res  res
> > +	 */
> > +
> > +	/*
> > +	 * Declare the -io-bio variable and store its address in %r6.
> > +	 */
> > +	dt_cg_tramp_decl_var(pcb, &v_bio);
> > +	dt_cg_tramp_get_var(pcb, "this->-io-bio", 1, BPF_REG_6);
> > +
> > +	/* Fill in bi_opf */
> > +	off = dt_cg_ctf_offsetof("struct bio", "bi_opf", &siz, 0);
> > +	siz = bpf_ldst_size(siz, 1);
> > +	if (strstr(uprb, "read"))
> > +		emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, REQ_OP_READ));
> > +	else
> > +		emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, REQ_OP_WRITE));
> > +
> > +	/*
> > +	 * bio.bi_iter.bi_size = hdr->foo.count;
> 
> If "v1" is broken out, then this hdr->foo.count no longer makes sense. 
> Plus, the comment is more or less superseded by the following two comment
> lines.
> 
> Or, mimic how the comments are handled below for inode.

Yes, clean up will be needed.

> > +	 *
> > +	 * For the 'start' probe, count is arg2
> > +	 * For the 'done' probe, count is hdr->res.count (hdr in arg1)
> > +	 */
> > +	if (strcmp(prb, "start") == 0) {
> > +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_7, DMST_ARG(2)));
> > +	} else {
> > +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1)));
> > +		off = dt_cg_ctf_offsetof("struct nfs_pgio_header", "res", NULL, 0)
> > +		    + dt_cg_ctf_offsetof("struct nfs_pgio_res", "count", &siz, 0);
> > +		deref_r3(dlp, exitlbl, off, siz, BPF_REG_0);
> > +	}
> > +
> > +	off = dt_cg_ctf_offsetof("struct bio", "bi_iter", NULL, 0)
> > +	    + dt_cg_ctf_offsetof("struct bvec_iter", "bi_size", &siz, 0);
> > +	siz = bpf_ldst_size(siz, 1);
> > +	emit(dlp, BPF_STORE(siz, BPF_REG_6, off, BPF_REG_0));
> > +
> > +	/*
> > +	 * bio.bi_iter.bi_sector = inode;
> > +	 */
> > +	if (strcmp(prb, "start") == 0) {
> > +		/* inode is arg0 */
> > +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(0)));
> > +	} else {
> > +		/* use hdr->inode, hdr is arg1 */
> > +		emit(dlp, BPF_LOAD(BPF_DW, BPF_REG_3, BPF_REG_7, DMST_ARG(1)));
> > +
> > +		off = dt_cg_ctf_offsetof("struct nfs_pgio_header", "inode", &siz, 0);
> > +		deref_r3(dlp, exitlbl, off, siz, BPF_REG_3);
> > +	}
> > +
> > +	off = dt_cg_ctf_offsetof("struct nfs_inode", "fileid", &siz, 0)
> > +	    - dt_cg_ctf_offsetof("struct nfs_inode", "vfs_inode", NULL, 0);
> > +	deref_r3(dlp, exitlbl, off, siz, BPF_REG_0);
> > +
> > +	off = dt_cg_ctf_offsetof("struct bio", "bi_iter", NULL, 0)
> > +	    + dt_cg_ctf_offsetof("struct bvec_iter", "bi_sector", &siz, 0);
> > +	siz = bpf_ldst_size(siz, 1);
> > +	emit(dlp, BPF_STORE(siz, BPF_REG_6, off, BPF_REG_0));
> > +
> > +	/*
> > +	 * bio.bi_bdev = 0;
> > +	 */
> > +	off = dt_cg_ctf_offsetof("struct bio", "bi_bdev", &siz, 1);
> > +	if (off == -1)
> > +		off = dt_cg_ctf_offsetof("struct bio", "bi_disk", &siz, 0);
> > +	siz = bpf_ldst_size(siz, 1);
> > +	emit(dlp, BPF_STORE_IMM(siz, BPF_REG_6, off, 0));
> > +
> > +	/* Store a pointer to the fake bio in arg0. */
> > +	emit(dlp, BPF_STORE(BPF_DW, BPF_REG_7, DMST_ARG(0), BPF_REG_6));
> > +}

      parent reply	other threads:[~2025-01-09 18:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07 22:44 [PATCH] io: adjust io provider for NFS tracepoint variants Kris Van Hees
2025-01-08  1:47 ` Elena Zannoni
2025-01-08  6:01 ` Eugene Loh
2025-01-08 11:34   ` Nick Alcock
2025-01-09 18:18     ` Kris Van Hees
2025-01-09 18:15   ` Kris Van Hees [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4ASPwJPQUeQwAad@oracle.com \
    --to=kris.van.hees@oracle.com \
    --cc=dtrace-devel@oss.oracle.com \
    --cc=dtrace@lists.linux.dev \
    --cc=eugene.loh@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox