From: Simon Horman <horms@kernel.org>
To: Christian Hopps <chopps@chopps.org>
Cc: devel@linux-ipsec.org,
Steffen Klassert <steffen.klassert@secunet.com>,
netdev@vger.kernel.org, Christian Hopps <chopps@labn.net>
Subject: Re: [PATCH ipsec-next v1 8/8] iptfs: impl: add new iptfs xfrm mode impl
Date: Mon, 19 Feb 2024 20:13:49 +0000 [thread overview]
Message-ID: <20240219201349.GO40273@kernel.org> (raw)
In-Reply-To: <20240219085735.1220113-9-chopps@chopps.org>
On Mon, Feb 19, 2024 at 03:57:35AM -0500, Christian Hopps wrote:
> From: Christian Hopps <chopps@labn.net>
>
> Add a new xfrm mode implementing AggFrag/IP-TFS from RFC9347.
>
> This utilizes the new xfrm_mode_cbs to implement demand-driven IP-TFS
> functionality. This functionality can be used to increase bandwidth
> utilization through small packet aggregation, as well as help solve PMTU
> issues through it's efficient use of fragmentation.
>
> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
>
> Signed-off-by: Christian Hopps <chopps@labn.net>
...
> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
...
> +/**
> + * skb_head_to_frag() - initialize a skb_frag_t based on skb head data
> + * @skb: skb with the head data
> + * @frag: frag to initialize
> + */
> +static void skb_head_to_frag(const struct sk_buff *skb, skb_frag_t *frag)
> +{
> + struct page *page = virt_to_head_page(skb->data);
> + unsigned char *addr = (unsigned char *)page_address(page);
> +
> + BUG_ON(!skb->head_frag);
Is it strictly necessary to crash the Kernel here?
Likewise, many other places in this patch.
> + skb_frag_fill_page_desc(frag, page, skb->data - addr, skb_headlen(skb));
> +}
...
> +/**
> + * skb_add_frags() - add a range of fragment references into an skb
> + * @skb: skb to add references into
> + * @walk: the walk to add referenced fragments from.
> + * @offset: offset from beginning of original skb to start from.
> + * @len: amount of data to add frag references to in @skb.
> + *
> + * skb_can_add_frags() should be called before this function to verify that the
> + * destination @skb is compatible with the walk and has space in the array for
> + * the to be added frag refrences.
nit: references
> + *
> + * Return: The number of bytes not added to @skb b/c we reached the end of the
> + * walk before adding all of @len.
> + */
...
> +/**
> + * iptfs_reassem_done() - In-progress packet is aborted free the state.
nit: This does not match the name of the function it documents.
Flagged by W=1 build with gcc-13.
> + * @xtfs: xtfs state
> + */
> +static void iptfs_reassem_abort(struct xfrm_iptfs_data *xtfs)
> +{
> + __iptfs_reassem_done(xtfs, true);
> +}
...
> +/**
> + * iptfs_input_ordered() - handle next in order IPTFS payload.
> + * @x: xfrm state
> + * @skb: current packet
> + *
> + * Process the IPTFS payload in `skb` and consume it afterwards.
> + */
> +static int iptfs_input_ordered(struct xfrm_state *x, struct sk_buff *skb)
> +{
> + u8 hbytes[sizeof(struct ipv6hdr)];
> + struct ip_iptfs_cc_hdr iptcch;
> + struct skb_seq_state skbseq;
> + struct skb_frag_walk _fragwalk;
> + struct skb_frag_walk *fragwalk = NULL;
> + struct list_head sublist; /* rename this it's just a list */
> + struct sk_buff *first_skb, *defer, *next;
> + const unsigned char *old_mac;
> + struct xfrm_iptfs_data *xtfs;
> + struct ip_iptfs_hdr *ipth;
> + struct iphdr *iph;
> + struct net *net;
> + u32 remaining, first_iplen, iplen, iphlen, data, tail;
> + u32 blkoff, capturelen;
> + u64 seq;
> +
> + xtfs = x->mode_data;
> + net = dev_net(skb->dev);
> + first_skb = NULL;
> + defer = NULL;
> +
> + seq = __esp_seq(skb);
> +
> + /* Large enough to hold both types of header */
> + ipth = (struct ip_iptfs_hdr *)&iptcch;
> +
> + /* Save the old mac header if set */
> + old_mac = skb_mac_header_was_set(skb) ? skb_mac_header(skb) : NULL;
> +
> + skb_prepare_seq_read(skb, 0, skb->len, &skbseq);
> +
> + /* Get the IPTFS header and validate it */
> +
> + if (skb_copy_bits_seq(&skbseq, 0, ipth, sizeof(*ipth))) {
> + XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
> + goto done;
> + }
> + data = sizeof(*ipth);
> +
> + trace_iptfs_egress_recv(skb, xtfs, htons(ipth->block_offset));
Maybe this is backwards, because the argument to htons should be
in host byte order, but the type of ipth->block_offset is __be16.
Also, personally, i would suggest using be16_to_cpu as it better
describes the types involved.
This is flagged by Sparse along with some other problems.
Please take care not to introduce new Sparse warnings.
...
> +static u32 __reorder_future_shifts(struct xfrm_iptfs_data *xtfs,
> + struct sk_buff *inskb,
> + struct list_head *list,
> + struct list_head *freelist, u32 *fcount)
> +{
> + const u32 nslots = xtfs->cfg.reorder_win_size + 1;
> + const u64 inseq = __esp_seq(inskb);
> + u32 savedlen = xtfs->w_savedlen;
> + u64 wantseq = xtfs->w_wantseq;
> + struct sk_buff *slot0 = NULL;
> + u64 last_drop_seq = xtfs->w_wantseq;
> + u64 distance, extra_drops, missed, s0seq;
Missed is set but otherwise unused in this function.
Flagged by W=1 build with clang-17.
> + u32 count = 0;
> + struct skb_wseq *wnext;
> + u32 beyond, shifting, slot;
> +
> + BUG_ON(inseq <= wantseq);
> + distance = inseq - wantseq;
> + BUG_ON(distance <= nslots - 1);
> + beyond = distance - (nslots - 1);
> + missed = 0;
> +
> + /* Handle future sequence number received.
> + *
> + * IMPORTANT: we are at least advancing w_wantseq (i.e., wantseq) by 1
> + * b/c we are beyond the window boundary.
> + *
> + * We know we don't have the wantseq so that counts as a drop.
> + */
> +
> + /* ex: slot count is 4, array size is 3 savedlen is 2, slot 0 is the
> + * missing sequence number.
> + *
> + * the final slot at savedlen (index savedlen - 1) is always occupied.
> + *
> + * beyond is "beyond array size" not savedlen.
> + *
> + * +--------- array length (savedlen == 2)
> + * | +----- array size (nslots - 1 == 3)
> + * | | +- window boundary (nslots == 4)
> + * V V | V
> + * |
> + * 0 1 2 3 | slot number
> + * --- 0 1 2 | array index
> + * [b] [c] : :| array
> + * |
> + * "2" "3" "4" "5"|*6* seq numbers
> + *
> + * We receive seq number 6
> + * distance == 4 [inseq(6) - w_wantseq(2)]
> + * newslot == distance
> + * index == 3 [distance(4) - 1]
> + * beyond == 1 [newslot(4) - lastslot((nslots(4) - 1))]
> + * shifting == 1 [min(savedlen(2), beyond(1)]
> + * slot0_skb == [b], and should match w_wantseq
> + *
> + * +--- window boundary (nslots == 4)
> + * 0 1 2 3 | 4 slot number
> + * --- 0 1 2 | 3 array index
> + * [b] : : : :| array
> + * "2" "3" "4" "5" *6* seq numbers
> + *
> + * We receive seq number 6
> + * distance == 4 [inseq(6) - w_wantseq(2)]
> + * newslot == distance
> + * index == 3 [distance(4) - 1]
> + * beyond == 1 [newslot(4) - lastslot((nslots(4) - 1))]
> + * shifting == 1 [min(savedlen(1), beyond(1)]
> + * slot0_skb == [b] and should match w_wantseq
> + *
> + * +-- window boundary (nslots == 4)
> + * 0 1 2 3 | 4 5 6 slot number
> + * --- 0 1 2 | 3 4 5 array index
> + * [-] [c] : :| array
> + * "2" "3" "4" "5" "6" "7" *8* seq numbers
> + *
> + * savedlen = 2, beyond = 3
> + * iter 1: slot0 == NULL, missed++, lastdrop = 2 (2+1-1), slot0 = [-]
> + * iter 2: slot0 == NULL, missed++, lastdrop = 3 (2+2-1), slot0 = [c]
> + * 2 < 3, extra = 1 (3-2), missed += extra, lastdrop = 4 (2+2+1-1)
> + *
> + * We receive seq number 8
> + * distance == 6 [inseq(8) - w_wantseq(2)]
> + * newslot == distance
> + * index == 5 [distance(6) - 1]
> + * beyond == 3 [newslot(6) - lastslot((nslots(4) - 1))]
> + * shifting == 2 [min(savedlen(2), beyond(3)]
> + *
> + * slot0_skb == NULL changed from [b] when "savedlen < beyond" is true.
> + */
> +
> + /* Now send any packets that are being shifted out of saved, and account
> + * for missing packets that are exiting the window as we shift it.
> + */
> +
> + /* If savedlen > beyond we are shifting some, else all. */
> + shifting = min(savedlen, beyond);
> +
> + /* slot0 is the buf that just shifted out and into slot0 */
> + slot0 = NULL;
> + s0seq = wantseq;
> + last_drop_seq = s0seq;
> + wnext = xtfs->w_saved;
> + for (slot = 1; slot <= shifting; slot++, wnext++) {
> + /* handle what was in slot0 before we occupy it */
> + if (!slot0) {
> + last_drop_seq = s0seq;
> + missed++;
> + } else {
> + list_add_tail(&slot0->list, list);
> + count++;
> + }
> + s0seq++;
> + slot0 = wnext->skb;
> + wnext->skb = NULL;
> + }
> +
> + /* slot0 is now either NULL (in which case it's what we now are waiting
> + * for, or a buf in which case we need to handle it like we received it;
> + * however, we may be advancing past that buffer as well..
> + */
> +
> + /* Handle case where we need to shift more than we had saved, slot0 will
> + * be NULL iff savedlen is 0, otherwise slot0 will always be
> + * non-NULL b/c we shifted the final element, which is always set if
> + * there is any saved, into slot0.
> + */
> + if (savedlen < beyond) {
> + extra_drops = beyond - savedlen;
> + if (savedlen == 0) {
> + BUG_ON(slot0);
> + s0seq += extra_drops;
> + last_drop_seq = s0seq - 1;
> + } else {
> + extra_drops--; /* we aren't dropping what's in slot0 */
> + BUG_ON(!slot0);
> + list_add_tail(&slot0->list, list);
> + /* if extra_drops then we are going past this slot0
> + * so we can safely advance last_drop_seq
> + */
> + if (extra_drops)
> + last_drop_seq = s0seq + extra_drops;
> + s0seq += extra_drops + 1;
> + count++;
> + }
> + missed += extra_drops;
> + slot0 = NULL;
> + /* slot0 has had an empty slot pushed into it */
> + }
> + (void)last_drop_seq; /* we want this for CC code */
> +
> + /* Remove the entries */
> + __vec_shift(xtfs, beyond);
> +
> + /* Advance want seq */
> + xtfs->w_wantseq += beyond;
> +
> + /* Process drops here when implementing congestion control */
> +
> + /* We've shifted. plug the packet in at the end. */
> + xtfs->w_savedlen = nslots - 1;
> + xtfs->w_saved[xtfs->w_savedlen - 1].skb = inskb;
> + iptfs_set_window_drop_times(xtfs, xtfs->w_savedlen - 1);
> +
> + /* if we don't have a slot0 then we must wait for it */
> + if (!slot0)
> + return count;
> +
> + /* If slot0, seq must match new want seq */
> + BUG_ON(xtfs->w_wantseq != __esp_seq(slot0));
> +
> + /* slot0 is valid, treat like we received expected. */
> + count += __reorder_this(xtfs, slot0, list);
> + return count;
> +}
...
> +/**
> + * iptfs_get_mtu() - return the inner MTU for an IPTFS xfrm.
nit: This does not match the name of the function it documents.
> + * @x: xfrm state.
> + * @outer_mtu: Outer MTU for the encapsulated packet.
> + *
> + * Return: Correct MTU taking in to account the encap overhead.
> + */
> +static u32 iptfs_get_inner_mtu(struct xfrm_state *x, int outer_mtu)
...
next prev parent reply other threads:[~2024-02-19 20:13 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-19 8:57 [PATCH ipsec-next v1 0/8] Add IP-TFS mode to xfrm Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 1/8] iptfs: config: add CONFIG_XFRM_IPTFS Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 2/8] iptfs: uapi: ip: add ip_tfs_*_hdr packet formats Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 3/8] iptfs: uapi: IPPROTO_AGGFRAG AGGFRAG in ESP Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 4/8] iptfs: sysctl: allow configuration of global default values Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 5/8] iptfs: netlink: add config (netlink) options Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 6/8] iptfs: xfrm: Add mode_cbs module functionality Christian Hopps
2024-02-25 12:17 ` [devel-ipsec] " Antony Antony
2024-03-08 22:21 ` Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 7/8] iptfs: xfrm: add generic iptfs defines and functionality Christian Hopps
2024-02-19 8:57 ` [PATCH ipsec-next v1 8/8] iptfs: impl: add new iptfs xfrm mode impl Christian Hopps
2024-02-19 20:13 ` Simon Horman [this message]
2024-02-22 20:23 ` Christian Hopps
2024-02-26 20:57 ` Simon Horman
2024-02-29 9:12 ` Christian Hopps
2024-02-20 23:16 ` kernel test robot
2024-02-25 12:16 ` [devel-ipsec] " Antony Antony
2024-03-06 13:57 ` Sabrina Dubroca
2024-03-06 15:30 ` Christian Hopps
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240219201349.GO40273@kernel.org \
--to=horms@kernel.org \
--cc=chopps@chopps.org \
--cc=chopps@labn.net \
--cc=devel@linux-ipsec.org \
--cc=netdev@vger.kernel.org \
--cc=steffen.klassert@secunet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.