All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Hopps <chopps@labn.net>
To: Simon Horman <horms@kernel.org>
Cc: Christian Hopps <chopps@chopps.org>,
	devel@linux-ipsec.org,
	Steffen Klassert <steffen.klassert@secunet.com>,
	netdev@vger.kernel.org, Christian Hopps <chopps@labn.net>
Subject: Re: [PATCH ipsec-next v1 8/8] iptfs: impl: add new iptfs xfrm mode impl
Date: Thu, 22 Feb 2024 15:23:36 -0500	[thread overview]
Message-ID: <m24je013cj.fsf@ja.int.chopps.org> (raw)
In-Reply-To: <20240219201349.GO40273@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 11263 bytes --]


Simon Horman <horms@kernel.org> writes:

> On Mon, Feb 19, 2024 at 03:57:35AM -0500, Christian Hopps wrote:
>> From: Christian Hopps <chopps@labn.net>
>>
>> Add a new xfrm mode implementing AggFrag/IP-TFS from RFC9347.
>>
>> This utilizes the new xfrm_mode_cbs to implement demand-driven IP-TFS
>> functionality. This functionality can be used to increase bandwidth
>> utilization through small packet aggregation, as well as help solve PMTU
>> issues through it's efficient use of fragmentation.
>>
>> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
>>
>> Signed-off-by: Christian Hopps <chopps@labn.net>
>
> ...
>
>> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
>
> ...
>
>> +/**
>> + * skb_head_to_frag() - initialize a skb_frag_t based on skb head data
>> + * @skb: skb with the head data
>> + * @frag: frag to initialize
>> + */
>> +static void skb_head_to_frag(const struct sk_buff *skb, skb_frag_t *frag)
>> +{
>> +	struct page *page = virt_to_head_page(skb->data);
>> +	unsigned char *addr = (unsigned char *)page_address(page);
>> +
>> +	BUG_ON(!skb->head_frag);
>
> Is it strictly necessary to crash the Kernel here?
> Likewise, many other places in this patch.

In all use cases it represents a programming error (bug) if the condition is met.

What is the correct use of BUG_ON?

>> +	skb_frag_fill_page_desc(frag, page, skb->data - addr, skb_headlen(skb));
>> +}
>
> ...
>
>> +/**
>> + * skb_add_frags() - add a range of fragment references into an skb
>> + * @skb: skb to add references into
>> + * @walk: the walk to add referenced fragments from.
>> + * @offset: offset from beginning of original skb to start from.
>> + * @len: amount of data to add frag references to in @skb.
>> + *
>> + * skb_can_add_frags() should be called before this function to verify that the
>> + * destination @skb is compatible with the walk and has space in the array for
>> + * the to be added frag refrences.
>
> nit: references

Fixed.

>> + *
>> + * Return: The number of bytes not added to @skb b/c we reached the end of the
>> + * walk before adding all of @len.
>> + */
>
> ...
>
>> +/**
>> + * iptfs_reassem_done() - In-progress packet is aborted free the state.
>
> nit: This does not match the name of the function it documents.
>
>      Flagged by W=1 build with gcc-13.

Fixed.

>
>> + * @xtfs: xtfs state
>> + */
>> +static void iptfs_reassem_abort(struct xfrm_iptfs_data *xtfs)
>> +{
>> +	__iptfs_reassem_done(xtfs, true);
>> +}
>
> ...
>
>> +/**
>> + * iptfs_input_ordered() - handle next in order IPTFS payload.
>> + * @x: xfrm state
>> + * @skb: current packet
>> + *
>> + * Process the IPTFS payload in `skb` and consume it afterwards.
>> + */
>> +static int iptfs_input_ordered(struct xfrm_state *x, struct sk_buff *skb)
>> +{
>> +	u8 hbytes[sizeof(struct ipv6hdr)];
>> +	struct ip_iptfs_cc_hdr iptcch;
>> +	struct skb_seq_state skbseq;
>> +	struct skb_frag_walk _fragwalk;
>> +	struct skb_frag_walk *fragwalk = NULL;
>> +	struct list_head sublist; /* rename this it's just a list */
>> +	struct sk_buff *first_skb, *defer, *next;
>> +	const unsigned char *old_mac;
>> +	struct xfrm_iptfs_data *xtfs;
>> +	struct ip_iptfs_hdr *ipth;
>> +	struct iphdr *iph;
>> +	struct net *net;
>> +	u32 remaining, first_iplen, iplen, iphlen, data, tail;
>> +	u32 blkoff, capturelen;
>> +	u64 seq;
>> +
>> +	xtfs = x->mode_data;
>> +	net = dev_net(skb->dev);
>> +	first_skb = NULL;
>> +	defer = NULL;
>> +
>> +	seq = __esp_seq(skb);
>> +
>> +	/* Large enough to hold both types of header */
>> +	ipth = (struct ip_iptfs_hdr *)&iptcch;
>> +
>> +	/* Save the old mac header if set */
>> +	old_mac = skb_mac_header_was_set(skb) ? skb_mac_header(skb) : NULL;
>> +
>> +	skb_prepare_seq_read(skb, 0, skb->len, &skbseq);
>> +
>> +	/* Get the IPTFS header and validate it */
>> +
>> +	if (skb_copy_bits_seq(&skbseq, 0, ipth, sizeof(*ipth))) {
>> +		XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
>> +		goto done;
>> +	}
>> +	data = sizeof(*ipth);
>> +
>> +	trace_iptfs_egress_recv(skb, xtfs, htons(ipth->block_offset));
>
> Maybe this is backwards, because the argument to htons should be
> in host byte order, but the type of ipth->block_offset is __be16.
>
> Also, personally, i would suggest using be16_to_cpu as it better
> describes the types involved.
>
> This is flagged by Sparse along with some other problems.
> Please take care not to introduce new Sparse warnings.

Cleaned these up. Switched to be16 macros..

> ...
>
>> +static u32 __reorder_future_shifts(struct xfrm_iptfs_data *xtfs,
>> +				   struct sk_buff *inskb,
>> +				   struct list_head *list,
>> +				   struct list_head *freelist, u32 *fcount)
>> +{
>> +	const u32 nslots = xtfs->cfg.reorder_win_size + 1;
>> +	const u64 inseq = __esp_seq(inskb);
>> +	u32 savedlen = xtfs->w_savedlen;
>> +	u64 wantseq = xtfs->w_wantseq;
>> +	struct sk_buff *slot0 = NULL;
>> +	u64 last_drop_seq = xtfs->w_wantseq;
>> +	u64 distance, extra_drops, missed, s0seq;
>
> Missed is set but otherwise unused in this function.
>
> Flagged by W=1 build with clang-17.

I've removed `missed`; however, it will be needed for congestion control if that gets implemented.

>
>> +	u32 count = 0;
>> +	struct skb_wseq *wnext;
>> +	u32 beyond, shifting, slot;
>> +
>> +	BUG_ON(inseq <= wantseq);
>> +	distance = inseq - wantseq;
>> +	BUG_ON(distance <= nslots - 1);
>> +	beyond = distance - (nslots - 1);
>> +	missed = 0;
>> +
>> +	/* Handle future sequence number received.
>> +	 *
>> +	 * IMPORTANT: we are at least advancing w_wantseq (i.e., wantseq) by 1
>> +	 * b/c we are beyond the window boundary.
>> +	 *
>> +	 * We know we don't have the wantseq so that counts as a drop.
>> +	 */
>> +
>> +	/* ex: slot count is 4, array size is 3 savedlen is 2, slot 0 is the
>> +	 * missing sequence number.
>> +	 *
>> +	 * the final slot at savedlen (index savedlen - 1) is always occupied.
>> +	 *
>> +	 * beyond is "beyond array size" not savedlen.
>> +	 *
>> +	 *          +--------- array length (savedlen == 2)
>> +	 *          |   +----- array size (nslots - 1 == 3)
>> +	 *          |   |   +- window boundary (nslots == 4)
>> +	 *          V   V | V
>> +	 *                |
>> +	 *  0   1   2   3 |   slot number
>> +	 * ---  0   1   2 |   array index
>> +	 *     [b] [c] : :|   array
>> +	 *                |
>> +	 * "2" "3" "4" "5"|*6*  seq numbers
>> +	 *
>> +	 * We receive seq number 6
>> +	 * distance == 4 [inseq(6) - w_wantseq(2)]
>> +	 * newslot == distance
>> +	 * index == 3 [distance(4) - 1]
>> +	 * beyond == 1 [newslot(4) - lastslot((nslots(4) - 1))]
>> +	 * shifting == 1 [min(savedlen(2), beyond(1)]
>> +	 * slot0_skb == [b], and should match w_wantseq
>> +	 *
>> +	 *                +--- window boundary (nslots == 4)
>> +	 *  0   1   2   3 | 4   slot number
>> +	 * ---  0   1   2 | 3   array index
>> +	 *     [b] : : : :|     array
>> +	 * "2" "3" "4" "5" *6*  seq numbers
>> +	 *
>> +	 * We receive seq number 6
>> +	 * distance == 4 [inseq(6) - w_wantseq(2)]
>> +	 * newslot == distance
>> +	 * index == 3 [distance(4) - 1]
>> +	 * beyond == 1 [newslot(4) - lastslot((nslots(4) - 1))]
>> +	 * shifting == 1 [min(savedlen(1), beyond(1)]
>> +	 * slot0_skb == [b] and should match w_wantseq
>> +	 *
>> +	 *                +-- window boundary (nslots == 4)
>> +	 *  0   1   2   3 | 4   5   6   slot number
>> +	 * ---  0   1   2 | 3   4   5   array index
>> +	 *     [-] [c] : :|             array
>> +	 * "2" "3" "4" "5" "6" "7" *8*  seq numbers
>> +	 *
>> +	 * savedlen = 2, beyond = 3
>> +	 * iter 1: slot0 == NULL, missed++, lastdrop = 2 (2+1-1), slot0 = [-]
>> +	 * iter 2: slot0 == NULL, missed++, lastdrop = 3 (2+2-1), slot0 = [c]
>> +	 * 2 < 3, extra = 1 (3-2), missed += extra, lastdrop = 4 (2+2+1-1)
>> +	 *
>> +	 * We receive seq number 8
>> +	 * distance == 6 [inseq(8) - w_wantseq(2)]
>> +	 * newslot == distance
>> +	 * index == 5 [distance(6) - 1]
>> +	 * beyond == 3 [newslot(6) - lastslot((nslots(4) - 1))]
>> +	 * shifting == 2 [min(savedlen(2), beyond(3)]
>> +	 *
>> +	 * slot0_skb == NULL changed from [b] when "savedlen < beyond" is true.
>> +	 */
>> +
>> +	/* Now send any packets that are being shifted out of saved, and account
>> +	 * for missing packets that are exiting the window as we shift it.
>> +	 */
>> +
>> +	/* If savedlen > beyond we are shifting some, else all. */
>> +	shifting = min(savedlen, beyond);
>> +
>> +	/* slot0 is the buf that just shifted out and into slot0 */
>> +	slot0 = NULL;
>> +	s0seq = wantseq;
>> +	last_drop_seq = s0seq;
>> +	wnext = xtfs->w_saved;
>> +	for (slot = 1; slot <= shifting; slot++, wnext++) {
>> +		/* handle what was in slot0 before we occupy it */
>> +		if (!slot0) {
>> +			last_drop_seq = s0seq;
>> +			missed++;
>> +		} else {
>> +			list_add_tail(&slot0->list, list);
>> +			count++;
>> +		}
>> +		s0seq++;
>> +		slot0 = wnext->skb;
>> +		wnext->skb = NULL;
>> +	}
>> +
>> +	/* slot0 is now either NULL (in which case it's what we now are waiting
>> +	 * for, or a buf in which case we need to handle it like we received it;
>> +	 * however, we may be advancing past that buffer as well..
>> +	 */
>> +
>> +	/* Handle case where we need to shift more than we had saved, slot0 will
>> +	 * be NULL iff savedlen is 0, otherwise slot0 will always be
>> +	 * non-NULL b/c we shifted the final element, which is always set if
>> +	 * there is any saved, into slot0.
>> +	 */
>> +	if (savedlen < beyond) {
>> +		extra_drops = beyond - savedlen;
>> +		if (savedlen == 0) {
>> +			BUG_ON(slot0);
>> +			s0seq += extra_drops;
>> +			last_drop_seq = s0seq - 1;
>> +		} else {
>> +			extra_drops--; /* we aren't dropping what's in slot0 */
>> +			BUG_ON(!slot0);
>> +			list_add_tail(&slot0->list, list);
>> +			/* if extra_drops then we are going past this slot0
>> +			 * so we can safely advance last_drop_seq
>> +			 */
>> +			if (extra_drops)
>> +				last_drop_seq = s0seq + extra_drops;
>> +			s0seq += extra_drops + 1;
>> +			count++;
>> +		}
>> +		missed += extra_drops;
>> +		slot0 = NULL;
>> +		/* slot0 has had an empty slot pushed into it */
>> +	}
>> +	(void)last_drop_seq;	/* we want this for CC code */
>> +
>> +	/* Remove the entries */
>> +	__vec_shift(xtfs, beyond);
>> +
>> +	/* Advance want seq */
>> +	xtfs->w_wantseq += beyond;
>> +
>> +	/* Process drops here when implementing congestion control */
>> +
>> +	/* We've shifted. plug the packet in at the end. */
>> +	xtfs->w_savedlen = nslots - 1;
>> +	xtfs->w_saved[xtfs->w_savedlen - 1].skb = inskb;
>> +	iptfs_set_window_drop_times(xtfs, xtfs->w_savedlen - 1);
>> +
>> +	/* if we don't have a slot0 then we must wait for it */
>> +	if (!slot0)
>> +		return count;
>> +
>> +	/* If slot0, seq must match new want seq */
>> +	BUG_ON(xtfs->w_wantseq != __esp_seq(slot0));
>> +
>> +	/* slot0 is valid, treat like we received expected. */
>> +	count += __reorder_this(xtfs, slot0, list);
>> +	return count;
>> +}
>
> ...
>
>> +/**
>> + * iptfs_get_mtu() - return the inner MTU for an IPTFS xfrm.
>
> nit: This does not match the name of the function it documents.

Fixed.

Thanks for your review!
Chris.

>> + * @x: xfrm state.
>> + * @outer_mtu: Outer MTU for the encapsulated packet.
>> + *
>> + * Return: Correct MTU taking in to account the encap overhead.
>> + */
>> +static u32 iptfs_get_inner_mtu(struct xfrm_state *x, int outer_mtu)
>
> ...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

  reply	other threads:[~2024-02-22 23:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19  8:57 [PATCH ipsec-next v1 0/8] Add IP-TFS mode to xfrm Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 1/8] iptfs: config: add CONFIG_XFRM_IPTFS Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 2/8] iptfs: uapi: ip: add ip_tfs_*_hdr packet formats Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 3/8] iptfs: uapi: IPPROTO_AGGFRAG AGGFRAG in ESP Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 4/8] iptfs: sysctl: allow configuration of global default values Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 5/8] iptfs: netlink: add config (netlink) options Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 6/8] iptfs: xfrm: Add mode_cbs module functionality Christian Hopps
2024-02-25 12:17   ` [devel-ipsec] " Antony Antony
2024-03-08 22:21     ` Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 7/8] iptfs: xfrm: add generic iptfs defines and functionality Christian Hopps
2024-02-19  8:57 ` [PATCH ipsec-next v1 8/8] iptfs: impl: add new iptfs xfrm mode impl Christian Hopps
2024-02-19 20:13   ` Simon Horman
2024-02-22 20:23     ` Christian Hopps [this message]
2024-02-26 20:57       ` Simon Horman
2024-02-29  9:12         ` Christian Hopps
2024-02-20 23:16   ` kernel test robot
2024-02-25 12:16   ` [devel-ipsec] " Antony Antony
2024-03-06 13:57   ` Sabrina Dubroca
2024-03-06 15:30     ` Christian Hopps

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m24je013cj.fsf@ja.int.chopps.org \
    --to=chopps@labn.net \
    --cc=chopps@chopps.org \
    --cc=devel@linux-ipsec.org \
    --cc=horms@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.