Re: [PATCH net-next 03/16] libeth: add a couple of XDP helpers (libeth_xdp)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: <intel-wired-lan@lists.osuosl.org>,
	Michal Kubiak <michal.kubiak@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	"Przemek Kitszel" <przemyslaw.kitszel@intel.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Simon Horman <horms@kernel.org>, <bpf@vger.kernel.org>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next 03/16] libeth: add a couple of XDP helpers (libeth_xdp)
Date: Tue, 8 Apr 2025 15:22:48 +0200	[thread overview]
Message-ID: <cba0216c-c87a-421d-bc4d-bc199165edbd@intel.com> (raw)
In-Reply-To: <Z9BDMrydhXrNlhVV@boxer>

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Tue, 11 Mar 2025 15:05:38 +0100

> On Wed, Mar 05, 2025 at 05:21:19PM +0100, Alexander Lobakin wrote:
>> "Couple" is a bit humbly... Add the following functionality to libeth:

[...]

>> +struct libeth_rq_napi_stats {
>> +	union {
>> +		struct {
>> +							u32 packets;
>> +							u32 bytes;
>> +							u32 fragments;
>> +							u32 hsplit;
>> +		};
>> +		DECLARE_FLEX_ARRAY(u32, raw);
> 
> The @raw approach is never used throughout the patchset, right?

Right, but

> Could you explain the reason for introducing this and potential use case?

initially, when my tree contained libeth generic stats also, I used this
field to update queue stats in a loop (unrolled by 4 fields) instead of
field-by-field.
Generic stats are still planned, and since ::raw is already present in
&libeth_sq_napi_stats, I'd like to keep it :z

> 
>> +	};
>> +};
>>  
>>  /**
>>   * struct libeth_sq_napi_stats - "hot" counters to update in Tx completion loop
>> @@ -22,4 +44,84 @@ struct libeth_sq_napi_stats {
>>  	};
>>  };
>>  
>> +/**
>> + * struct libeth_xdpsq_napi_stats - "hot" counters to update in XDP Tx
>> + *				    completion loop
>> + * @packets: completed frames counter
>> + * @bytes: sum of bytes of completed frames above
>> + * @fragments: sum of fragments of completed S/G frames
>> + * @raw: alias to access all the fields as an array
>> + */
>> +struct libeth_xdpsq_napi_stats {
> 
> what's the delta between this and libeth_sq_napi_stats ? couldn't you have
> a single struct for purpose of tx napi stats?

Same as previous, future-proof. &libeth_sq{,_napi}_stats will contain
stuff XDP queues will never need and vice versa.

> 
>> +	union {
>> +		struct {
>> +							u32 packets;
>> +							u32 bytes;
>> +							u32 fragments;
>> +		};
>> +		DECLARE_FLEX_ARRAY(u32, raw);
>> +	};
>> +};

[...]

>> @@ -71,7 +84,10 @@ struct libeth_sqe {
>>  /**
>>   * struct libeth_cq_pp - completion queue poll params
>>   * @dev: &device to perform DMA unmapping
>> + * @bq: XDP frame bulk to combine return operations
>>   * @ss: onstack NAPI stats to fill
>> + * @xss: onstack XDPSQ NAPI stats to fill
>> + * @xdp_tx: number of XDP frames processed
>>   * @napi: whether it's called from the NAPI context
>>   *
>>   * libeth uses this structure to access objects needed for performing full
>> @@ -80,7 +96,13 @@ struct libeth_sqe {
>>   */
>>  struct libeth_cq_pp {
>>  	struct device			*dev;
>> -	struct libeth_sq_napi_stats	*ss;
>> +	struct xdp_frame_bulk		*bq;
>> +
>> +	union {
>> +		struct libeth_sq_napi_stats	*ss;
>> +		struct libeth_xdpsq_napi_stats	*xss;
>> +	};
>> +	u32				xdp_tx;
> 
> you have this counted in xss::packets?

Nope, it's the same as in ice, you have separate ::packets AND ::xdp_tx
on the ring to speed up XSk completion when there's no XDP-non-XSk buffers.

> 
>>  
>>  	bool				napi;
>>  };

[...]

>> +/* Common Tx bits */
>> +
>> +/**
>> + * enum - libeth_xdp internal Tx flags
>> + * @LIBETH_XDP_TX_BULK: one bulk size at which it will be flushed to the queue
>> + * @LIBETH_XDP_TX_BATCH: batch size for which the queue fill loop is unrolled
>> + * @LIBETH_XDP_TX_DROP: indicates the send function must drop frames not sent
>> + * @LIBETH_XDP_TX_NDO: whether the send function is called from .ndo_xdp_xmit()
>> + */
>> +enum {
>> +	LIBETH_XDP_TX_BULK		= DEV_MAP_BULK_SIZE,
>> +	LIBETH_XDP_TX_BATCH		= 8,
>> +
>> +	LIBETH_XDP_TX_DROP		= BIT(0),
>> +	LIBETH_XDP_TX_NDO		= BIT(1),
> 
> what's the reason to group these random values onto enum?

They then will be visible in BTFs (not sure anyone will need them there).

> 
>> +};
>> +
>> +/**
>> + * enum - &libeth_xdp_tx_frame and &libeth_xdp_tx_desc flags
>> + * @LIBETH_XDP_TX_LEN: only for ``XDP_TX``, [15:0] of ::len_fl is actual length
>> + * @LIBETH_XDP_TX_FIRST: indicates the frag is the first one of the frame
>> + * @LIBETH_XDP_TX_LAST: whether the frag is the last one of the frame
>> + * @LIBETH_XDP_TX_MULTI: whether the frame contains several frags
> 
> would be good to have some extended description around usage of these
> flags.

They are internal to libeth functions anyway, hence no detailed description.

> 
>> + * @LIBETH_XDP_TX_FLAGS: only for ``XDP_TX``, [31:16] of ::len_fl is flags
>> + */
>> +enum {
>> +	LIBETH_XDP_TX_LEN		= GENMASK(15, 0),
>> +
>> +	LIBETH_XDP_TX_FIRST		= BIT(16),
>> +	LIBETH_XDP_TX_LAST		= BIT(17),
>> +	LIBETH_XDP_TX_MULTI		= BIT(18),
>> +
>> +	LIBETH_XDP_TX_FLAGS		= GENMASK(31, 16),
>> +};

[...]

>> +/**
>> + * libeth_xdp_tx_queue_head - internal helper for queueing one ``XDP_TX`` head
>> + * @bq: XDP Tx bulk to queue the head frag to
>> + * @xdp: XDP buffer with the head to queue
>> + *
>> + * Return: false if it's the only frag of the frame, true if it's an S/G frame.
>> + */
>> +static inline bool libeth_xdp_tx_queue_head(struct libeth_xdp_tx_bulk *bq,
>> +					    const struct libeth_xdp_buff *xdp)
>> +{
>> +	const struct xdp_buff *base = &xdp->base;
>> +
>> +	bq->bulk[bq->count++] = (typeof(*bq->bulk)){
>> +		.data	= xdp->data,
>> +		.len_fl	= (base->data_end - xdp->data) | LIBETH_XDP_TX_FIRST,
>> +		.soff	= xdp_data_hard_end(base) - xdp->data,
>> +	};
>> +
>> +	if (!xdp_buff_has_frags(base))
> 
> likely() ?

With the header split enabled and getting more and more popular -- not
really. likely() hurts perf here actually.

> 
>> +		return false;
>> +
>> +	bq->bulk[bq->count - 1].len_fl |= LIBETH_XDP_TX_MULTI;
>> +
>> +	return true;
>> +}
>> +
>> +/**
>> + * libeth_xdp_tx_queue_frag - internal helper for queueing one ``XDP_TX`` frag
>> + * @bq: XDP Tx bulk to queue the frag to
>> + * @frag: frag to queue
>> + */
>> +static inline void libeth_xdp_tx_queue_frag(struct libeth_xdp_tx_bulk *bq,
>> +					    const skb_frag_t *frag)
>> +{
>> +	bq->bulk[bq->count++].frag = *frag;
> 
> IMHO this helper is not providing anything useful

That's why it's stated "internal helper" :D

> 
>> +}

[...]

>> +#define libeth_xdp_tx_fill_stats(sqe, desc, sinfo)			      \
>> +	__libeth_xdp_tx_fill_stats(sqe, desc, sinfo, __UNIQUE_ID(sqe_),	      \
>> +				   __UNIQUE_ID(desc_), __UNIQUE_ID(sinfo_))
>> +
>> +#define __libeth_xdp_tx_fill_stats(sqe, desc, sinfo, ue, ud, us) do {	      \
>> +	const struct libeth_xdp_tx_desc *ud = (desc);			      \
>> +	const struct skb_shared_info *us;				      \
>> +	struct libeth_sqe *ue = (sqe);					      \
>> +									      \
>> +	ue->nr_frags = 1;						      \
>> +	ue->bytes = ud->len;						      \
>> +									      \
>> +	if (ud->flags & LIBETH_XDP_TX_MULTI) {				      \
>> +		us = (sinfo);						      \
> 
> why? what 'u' stands for? ue us don't tell the reader much from the first
> glance. sinfo tells me everything.

ue -- "unique element"
ud -- "unique descriptor"
us -- "unique sinfo"

All of them are purely internal to pass __UNIQUE_ID() in the
non-underscored version to avoid variable shadowing.

> 
>> +		ue->nr_frags += us->nr_frags;				      \
>> +		ue->bytes += us->xdp_frags_size;			      \
>> +	}								      \
>> +} while (0)

[...]

>> +/**
>> + * libeth_xdp_xmit_do_bulk - implement full .ndo_xdp_xmit() in driver
>> + * @dev: target &net_device
>> + * @n: number of frames to send
>> + * @fr: XDP frames to send
>> + * @f: flags passed by the stack
>> + * @xqs: array of XDPSQs driver structs
>> + * @nqs: number of active XDPSQs, the above array length
>> + * @fl: driver callback to flush an XDP xmit bulk
>> + * @fin: driver cabback to finalize the queue
>> + *
>> + * If the driver has active XDPSQs, perform common checks and send the frames.
>> + * Finalize the queue, if requested.
>> + *
>> + * Return: number of frames sent or -errno on error.
>> + */
>> +#define libeth_xdp_xmit_do_bulk(dev, n, fr, f, xqs, nqs, fl, fin)	      \
>> +	_libeth_xdp_xmit_do_bulk(dev, n, fr, f, xqs, nqs, fl, fin,	      \
>> +				 __UNIQUE_ID(bq_), __UNIQUE_ID(ret_),	      \
>> +				 __UNIQUE_ID(nqs_))
> 
> why __UNIQUE_ID() is needed?

As above, variable shadowing.

> 
>> +
>> +#define _libeth_xdp_xmit_do_bulk(d, n, fr, f, xqs, nqs, fl, fin, ub, ur, un)  \
> 
> why single underscore? usually we do __ for internal funcs as you did
> somewhere above.

Double-underscored is defined above already :D
So it would be either like this or __ + ___

> 
> also, why define and not inlined func?

I'll double check, but if you look at its usage in idpf/xdp.c, you'll
see that some arguments are non-trivial to obtain, IOW they cost some
cycles. Macro ensures they won't be fetched prior to
`likely(number_of_xdpsqs)`.
I'll convert to an inline and check if the compiler handles this itself.
It didn't behave in {,__}libeth_xdp_tx_fill_stats() unfortunately, hence
macro there as well =\

> 
>> +({									      \
>> +	u32 un = (nqs);							      \
>> +	int ur;								      \
>> +									      \
>> +	if (likely(un)) {						      \
>> +		struct libeth_xdp_tx_bulk ub;				      \
>> +									      \
>> +		libeth_xdp_xmit_init_bulk(&ub, d, xqs, un);		      \
>> +		ur = __libeth_xdp_xmit_do_bulk(&ub, fr, n, f, fl, fin);	      \
>> +	} else {							      \
>> +		ur = -ENXIO;						      \
>> +	}								      \
>> +									      \
>> +	ur;								      \
>> +})

[...]

>> +static inline void
>> +libeth_xdp_init_buff(struct libeth_xdp_buff *dst,
>> +		     const struct libeth_xdp_buff_stash *src,
>> +		     struct xdp_rxq_info *rxq)
> 
> what is the rationale for storing/loading xdp_buff onto/from driver's Rx
> queue? could we work directly on xdp_buff from Rx queue? ice is doing so
> currently.

Stack vs heap. I was getting lower numbers working on the queue directly.
Also note that &libeth_xdp_buff_stash is 16 bytes, while
&libeth_xdp_buff is 64. I don't think it makes sense to waste +48
bytes in the structure.

Load-store of the stash is rare anyway. It can happen *only* if the HW
for some reason hasn't written the whole multi-buffer frame yet, since
NAPI budget is counted by packets, not fragments.

> 
>> +{
>> +	if (likely(!src->data))
>> +		dst->data = NULL;
>> +	else
>> +		libeth_xdp_load_stash(dst, src);
>> +
>> +	dst->base.rxq = rxq;
>> +}

[...]

>> +static inline bool libeth_xdp_process_buff(struct libeth_xdp_buff *xdp,
>> +					   const struct libeth_fqe *fqe,
>> +					   u32 len)
>> +{
>> +	if (!libeth_rx_sync_for_cpu(fqe, len))
>> +		return false;
>> +
>> +	if (xdp->data)
> 
> unlikely() ?

Same as for libeth_xdp_tx_queue_head(): with the header split, you'll
hit this branch every frame.

> 
>> +		return libeth_xdp_buff_add_frag(xdp, fqe, len);
>> +
>> +	libeth_xdp_prepare_buff(xdp, fqe, len);
>> +
>> +	prefetch(xdp->data);
>> +
>> +	return true;
>> +}

[...]

>> +/**
>> + * libeth_xdp_run_prog - run XDP program and handle all verdicts
>> + * @xdp: XDP buffer to process
>> + * @bq: XDP Tx bulk to queue ``XDP_TX`` buffers
>> + * @fl: driver ``XDP_TX`` bulk flush callback
>> + *
>> + * Run the attached XDP program and handle all possible verdicts.
>> + * Prefer using it via LIBETH_XDP_DEFINE_RUN{,_PASS,_PROG}().
>> + *
>> + * Return: true if the buffer should be passed up the stack, false if the poll
>> + * should go to the next buffer.
>> + */
>> +#define libeth_xdp_run_prog(xdp, bq, fl)				      \
> 
> is this used in idpf in this patchset?

Sure. __LIBETH_XDP_DEFINE_RUN() builds two functions, one of which uses it.
Same for __LIBETH_XDP_DEFINE_RUN_PROG(). I know they are poor to read,
but otherwise I'd need to duplicate them for XDP and XSk separately.

> 
>> +	(__libeth_xdp_run_flush(xdp, bq, __libeth_xdp_run_prog,		      \
>> +				libeth_xdp_tx_queue_bulk,		      \
>> +				fl) == LIBETH_XDP_PASS)
>> +
>> +/**
>> + * __libeth_xdp_run_pass - helper to run XDP program and handle the result
>> + * @xdp: XDP buffer to process
>> + * @bq: XDP Tx bulk to queue ``XDP_TX`` frames
>> + * @napi: NAPI to build an skb and pass it up the stack
>> + * @rs: onstack libeth RQ stats
>> + * @md: metadata that should be filled to the XDP buffer
>> + * @prep: callback for filling the metadata
>> + * @run: driver wrapper to run XDP program
> 
> I see it's NULLed on idpf? why have this?

Only for singleq which we don't support. splitq uses
LIBETH_XDP_DEFINE_RUN() to build idpf_xdp_run_prog() and
idpf_xdp_run_pass().

> 
>> + * @populate: driver callback to populate an skb with the HW descriptor data
>> + *
>> + * Inline abstraction that does the following:
>> + * 1) adds frame size and frag number (if needed) to the onstack stats;
>> + * 2) fills the descriptor metadata to the onstack &libeth_xdp_buff
>> + * 3) runs XDP program if present;
>> + * 4) handles all possible verdicts;
>> + * 5) on ``XDP_PASS`, builds an skb from the buffer;
>> + * 6) populates it with the descriptor metadata;
>> + * 7) passes it up the stack.

[...]

>> +void __cold libeth_xdp_tx_exception(struct libeth_xdp_tx_bulk *bq, u32 sent,
>> +				    u32 flags)
>> +{
>> +	const struct libeth_xdp_tx_frame *pos = &bq->bulk[sent];
>> +	u32 left = bq->count - sent;
>> +
>> +	if (!(flags & LIBETH_XDP_TX_NDO))
>> +		libeth_trace_xdp_exception(bq->dev, bq->prog, XDP_TX);
>> +
>> +	if (!(flags & LIBETH_XDP_TX_DROP)) {
>> +		memmove(bq->bulk, pos, left * sizeof(*bq->bulk));
> 
> can this overflow? if queue got stuck for some reason.

memmove() is safe to call even when src == dst. As for XDP Tx logic, if
the queue is stuck, the bulk will never overflow, libeth will just try
send it again and again. At the end, both XDP Tx and xmit calls it with
DROP to make sure no memleaks or other issues can take place.

> 
>> +		bq->count = left;
>> +
>> +		return;
>> +	}
>> +
>> +	if (!(flags & LIBETH_XDP_TX_NDO))
>> +		libeth_xdp_tx_return_bulk(pos, left);
>> +	else
>> +		libeth_xdp_xmit_return_bulk(pos, left, bq->dev);
>> +
>> +	bq->count = 0;
>> +}
>> +EXPORT_SYMBOL_GPL(libeth_xdp_tx_exception);

Thanks,
Olek

next prev parent reply	other threads:[~2025-04-08 13:23 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 16:21 [PATCH net-next 00/16] idpf: add XDP support Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 01/16] libeth: convert to netmem Alexander Lobakin
2025-03-06  0:13   ` Mina Almasry
2025-03-11 17:22     ` Alexander Lobakin
2025-03-11 17:43       ` Mina Almasry
2025-03-05 16:21 ` [PATCH net-next 02/16] libeth: support native XDP and register memory model Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 03/16] libeth: add a couple of XDP helpers (libeth_xdp) Alexander Lobakin
2025-03-11 14:05   ` Maciej Fijalkowski
2025-03-17 15:26     ` Alexander Lobakin
2025-03-19 16:19       ` Maciej Fijalkowski
2025-04-01 13:11         ` Alexander Lobakin
2025-04-08 13:38           ` Alexander Lobakin
2025-04-08 13:22     ` Alexander Lobakin [this message]
2025-04-08 13:51       ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 04/16] libeth: add XSk helpers Alexander Lobakin
2025-03-07 10:15   ` Maciej Fijalkowski
2025-03-12 17:03     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 05/16] idpf: fix Rx descriptor ready check barrier in splitq Alexander Lobakin
2025-03-07 10:17   ` Maciej Fijalkowski
2025-03-12 17:10     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 06/16] idpf: a use saner limit for default number of queues to allocate Alexander Lobakin
2025-03-07 10:32   ` Maciej Fijalkowski
2025-03-12 17:22     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 07/16] idpf: link NAPIs to queues Alexander Lobakin
2025-03-07 10:28   ` Eric Dumazet
2025-03-12 17:16     ` Alexander Lobakin
2025-03-18 17:10       ` Alexander Lobakin
2025-03-07 10:51   ` Maciej Fijalkowski
2025-03-12 17:25     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 08/16] idpf: make complq cleaning dependent on scheduling mode Alexander Lobakin
2025-03-07 11:11   ` Maciej Fijalkowski
2025-03-13 16:16     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 09/16] idpf: remove SW marker handling from NAPI Alexander Lobakin
2025-03-07 11:42   ` Maciej Fijalkowski
2025-03-13 16:50     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 10/16] idpf: add support for nointerrupt queues Alexander Lobakin
2025-03-07 12:10   ` Maciej Fijalkowski
2025-03-13 16:19     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 11/16] idpf: prepare structures to support XDP Alexander Lobakin
2025-03-07  1:12   ` Jakub Kicinski
2025-03-12 14:00     ` [Intel-wired-lan] " Alexander Lobakin
2025-03-07 13:27   ` Maciej Fijalkowski
2025-03-17 14:50     ` Alexander Lobakin
2025-03-19 16:29       ` Maciej Fijalkowski
2025-04-08 13:42         ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 12/16] idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq Alexander Lobakin
2025-03-07 14:16   ` Maciej Fijalkowski
2025-03-17 14:58     ` Alexander Lobakin
2025-03-19 16:23       ` Maciej Fijalkowski
2025-03-05 16:21 ` [PATCH net-next 13/16] idpf: use generic functions to build xdp_buff and skb Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 14/16] idpf: add support for XDP on Rx Alexander Lobakin
2025-03-11 15:50   ` Maciej Fijalkowski
2025-04-08 13:28     ` Alexander Lobakin
2025-04-08 15:53       ` Maciej Fijalkowski
2025-03-05 16:21 ` [PATCH net-next 15/16] idpf: add support for .ndo_xdp_xmit() Alexander Lobakin
2025-03-11 16:08   ` Maciej Fijalkowski
2025-04-08 13:31     ` Alexander Lobakin
2025-03-05 16:21 ` [PATCH net-next 16/16] idpf: add XDP RSS hash hint Alexander Lobakin
2025-03-11 15:28 ` [PATCH net-next 00/16] idpf: add XDP support Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cba0216c-c87a-421d-bc4d-bc199165edbd@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=michal.kubiak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).