All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sabrina Dubroca <sd@queasysnail.net>
To: Christian Hopps <chopps@chopps.org>
Cc: devel@linux-ipsec.org,
	Steffen Klassert <steffen.klassert@secunet.com>,
	netdev@vger.kernel.org, Christian Hopps <chopps@labn.net>
Subject: Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets
Date: Mon, 5 Aug 2024 00:25:57 +0200	[thread overview]
Message-ID: <Zq__9Z4ckXNdR-Ec@hog> (raw)
In-Reply-To: <20240804203346.3654426-11-chopps@chopps.org>

Please CC the reviewers from previous versions of the patchset. It's
really hard to keep track of discussions and reposts otherwise.


2024-08-04, 16:33:39 -0400, Christian Hopps wrote:
> From: Christian Hopps <chopps@labn.net>
> 
> Add support for tunneling user (inner) packets that are larger than the
> tunnel's path MTU (outer) using IP-TFS fragmentation.
> 
> Signed-off-by: Christian Hopps <chopps@labn.net>
> ---
>  net/xfrm/xfrm_iptfs.c | 407 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 381 insertions(+), 26 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
> index 20c19894720e..38735e2d64c3 100644
> --- a/net/xfrm/xfrm_iptfs.c
> +++ b/net/xfrm/xfrm_iptfs.c
> @@ -46,12 +46,23 @@
>   */
>  #define IPTFS_DEFAULT_MAX_QUEUE_SIZE	(1024 * 10240)
>  
> +/* 1) skb->head should be cache aligned.
> + * 2) when resv is for L2 headers (i.e., ethernet) we want the cacheline to
> + * start -16 from data.
> + * 3) when resv is for L3+L2 headers IOW skb->data points at the IPTFS payload
> + * we want data to be cache line aligned so all the pushed headers will be in
> + * another cacheline.
> + */
> +#define XFRM_IPTFS_MIN_L3HEADROOM 128
> +#define XFRM_IPTFS_MIN_L2HEADROOM (64 + 16)

How did you pick those values?



> +static struct sk_buff *iptfs_alloc_skb(struct sk_buff *tpl, u32 len,
> +				       bool l3resv)
> +{
> +	struct sk_buff *skb;
> +	u32 resv;
> +
> +	if (!l3resv) {
> +		resv = XFRM_IPTFS_MIN_L2HEADROOM;
> +	} else {
> +		resv = skb_headroom(tpl);
> +		if (resv < XFRM_IPTFS_MIN_L3HEADROOM)
> +			resv = XFRM_IPTFS_MIN_L3HEADROOM;
> +	}
> +
> +	skb = alloc_skb(len + resv, GFP_ATOMIC);
> +	if (!skb) {
> +		XFRM_INC_STATS(dev_net(tpl->dev), LINUX_MIB_XFRMNOSKBERROR);

Hmpf, so we've gone from incrementing the wrong counter to
incrementing a new counter that doesn't have a precise meaning.

> +		return NULL;
> +	}
> +
> +	skb_reserve(skb, resv);
> +
> +	/* We do not want any of the tpl->headers copied over, so we do
> +	 * not use `skb_copy_header()`.
> +	 */

This is a bit of a bad sign for the implementation. It also worries
me, as this may not be updated when changes are made to
__copy_skb_header().
(c/p'd from v1 review since this was still not answered)


> +/**
> + * skb_copy_bits_seq - copy bits from a skb_seq_state to kernel buffer
> + * @st: source skb_seq_state
> + * @offset: offset in source
> + * @to: destination buffer
> + * @len: number of bytes to copy
> + *
> + * Copy @len bytes from @offset bytes into the source @st to the destination
> + * buffer @to. `offset` should increase (or be unchanged) with each subsequent
> + * call to this function. If offset needs to decrease from the previous use `st`
> + * should be reset first.
> + *
> + * Return: 0 on success or a negative error code on failure
> + */
> +static int skb_copy_bits_seq(struct skb_seq_state *st, int offset, void *to,
> +			     int len)

Probably belongs in net/core/skbuff.c, although I'm really not
convinced copying data around is the right way to implement the type
of packet splitting IPTFS does (which sounds a bit like a kind of
GSO). And there are helpers in net/core/skbuff.c (such as
pskb_carve/pskb_extract) that seem to do similar things to what you
need here, without as much data copying.


> +static int iptfs_first_skb(struct sk_buff **skbp, struct xfrm_iptfs_data *xtfs,
> +			   u32 mtu)
> +{
> +	struct sk_buff *skb = *skbp;
> +	int err;
> +
> +	/* Classic ESP skips the don't fragment ICMP error if DF is clear on
> +	 * the inner packet or ignore_df is set. Otherwise it will send an ICMP
> +	 * or local error if the inner packet won't fit it's MTU.
> +	 *
> +	 * With IPTFS we do not care about the inner packet DF bit. If the
> +	 * tunnel is configured to "don't fragment" we error back if things
> +	 * don't fit in our max packet size. Otherwise we iptfs-fragment as
> +	 * normal.
> +	 */
> +
> +	/* The opportunity for HW offload has ended */
> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +		err = skb_checksum_help(skb);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* We've split these up before queuing */
> +	BUG_ON(skb_is_gso(skb));

As I've said previously, I don't think that's a valid reason to
crash. BUG_ON should be used very rarely:

https://elixir.bootlin.com/linux/v6.10/source/Documentation/process/coding-style.rst#L1230

Dropping a bogus packet is an easy way to recover from this situation,
so we should not crash here (and probably in all of IPTFS).

-- 
Sabrina


  reply	other threads:[~2024-08-04 22:27 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-04 20:33 [PATCH ipsec-next v8 00/16] Add IP-TFS mode to xfrm Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 01/16] xfrm: config: add CONFIG_XFRM_IPTFS Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 02/16] include: uapi: add ip_tfs_*_hdr packet formats Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 03/16] include: uapi: add IPPROTO_AGGFRAG for AGGFRAG in ESP Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 04/16] xfrm: netlink: add config (netlink) options Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 05/16] xfrm: add mode_cbs module functionality Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 06/16] xfrm: add generic iptfs defines and functionality Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 07/16] xfrm: iptfs: add new iptfs xfrm mode impl Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 08/16] xfrm: iptfs: add user packet (tunnel ingress) handling Christian Hopps
2024-08-05 17:10   ` Simon Horman
2024-08-06 10:19     ` [devel-ipsec] " Christian Hopps
2024-08-06 15:24       ` Simon Horman
2024-08-04 20:33 ` [PATCH ipsec-next v8 09/16] xfrm: iptfs: share page fragments of inner packets Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets Christian Hopps
2024-08-04 22:25   ` Sabrina Dubroca [this message]
2024-08-05  2:33     ` Christian Hopps
2024-08-05  4:19       ` Christian Hopps
2024-08-06  8:47       ` Sabrina Dubroca
2024-08-06  8:54         ` Christian Hopps
2024-08-06 10:03           ` Florian Westphal
2024-08-06 10:05             ` Christian Hopps
2024-08-06 11:05           ` Sabrina Dubroca
2024-08-06 11:07             ` Christian Hopps
2024-08-08 11:30             ` Christian Hopps
2024-08-08 13:28               ` Sabrina Dubroca
2024-08-08 13:35                 ` Christian Hopps
2024-08-08 14:01                   ` Sabrina Dubroca
2024-08-08 21:42                     ` Christian Hopps
2024-08-06 11:07           ` Steffen Klassert
2024-08-07 16:23             ` Christian Hopps
2024-08-06 11:32     ` Steffen Klassert
2024-08-07 19:40       ` Christian Hopps
2024-08-08  9:26         ` Sabrina Dubroca
2024-08-08 11:23           ` Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 11/16] xfrm: iptfs: add basic receive packet (tunnel egress) handling Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 12/16] xfrm: iptfs: handle received fragmented inner packets Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 13/16] xfrm: iptfs: add reusing received skb for the tunnel egress packet Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 14/16] xfrm: iptfs: add skb-fragment sharing code Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 15/16] xfrm: iptfs: handle reordering of received packets Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 16/16] xfrm: iptfs: add tracepoint functionality Christian Hopps

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zq__9Z4ckXNdR-Ec@hog \
    --to=sd@queasysnail.net \
    --cc=chopps@chopps.org \
    --cc=chopps@labn.net \
    --cc=devel@linux-ipsec.org \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.