Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sabrina Dubroca <sd@queasysnail.net>
To: Christian Hopps <chopps@chopps.org>
Cc: devel@linux-ipsec.org,
	Steffen Klassert <steffen.klassert@secunet.com>,
	netdev@vger.kernel.org, Christian Hopps <chopps@labn.net>
Subject: Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets
Date: Tue, 6 Aug 2024 13:05:32 +0200	[thread overview]
Message-ID: <ZrIDfN2uFpktGJYD@hog> (raw)
In-Reply-To: <m2le1aouzf.fsf@ja-home.int.chopps.org>

2024-08-06, 04:54:53 -0400, Christian Hopps wrote:
> 
> Sabrina Dubroca <sd@queasysnail.net> writes:
> 
> > 2024-08-04, 22:33:05 -0400, Christian Hopps wrote:
> > > > > +/* 1) skb->head should be cache aligned.
> > > > > + * 2) when resv is for L2 headers (i.e., ethernet) we want the cacheline to
> > > > > + * start -16 from data.
> > > > > + * 3) when resv is for L3+L2 headers IOW skb->data points at the IPTFS payload
> > > > > + * we want data to be cache line aligned so all the pushed headers will be in
> > > > > + * another cacheline.
> > > > > + */
> > > > > +#define XFRM_IPTFS_MIN_L3HEADROOM 128
> > > > > +#define XFRM_IPTFS_MIN_L2HEADROOM (64 + 16)
> > > >
> > > > How did you pick those values?
> > > 
> > > That's what the comment is talking to. When reserving space for L2 headers we
> > > pick 64 + 16 (a 2^(<=6) cacheline + 16 bytes so the the cacheline should start
> > > -16 from where skb->data will point at.
> > 
> > Hard-coding the x86 cacheline size is not a good idea. And what's the
> > 16B for? You don't know that it's enough for the actual L2 headers.
> 
> I am not hard coding the x86 cacheline. I am picking 64 as the largest cacheline that this is optimized for, it also works for smaller cachelines.

At least use SMP_CACHE_BYTES then?

> 16B is to allow for the incredibly common 14B L2 header to fit.

Why not use skb->dev->needed_headroom, like a bunch of tunnels are
already doing? No guessing required. ethernet is the most common, but
there's no reason to penalize other protocols when the information is
available.

> > > For L3 we reserve double the power of 2 space we reserved for L2 only.
> > 
> > But that's the core of my question. Why is that correct/enough?
> 
> I have to pick a value. There is no magically perfect number that I can pick. I've given you technical reasons and justifications for the numbers I have chosen -- not sure what else I can say. Do you have better suggestions for the sizes which would be more optimal on more architectures? If not then let's use the numbers that I have given technical reasons for choosing.

Yes, now you've spelled it out, and we can evaluate your choices.

> Put this another way. I could just pick 128 b/c it's 2 cachelines
> and fits lots of different headers and would be "good
> enough". That's plenty justification too. I think you looking for
> too much here -- this isn't a precision thing, it's a "Good Enough"
> thing.

I'm asking questions. That's kind of the reviewer's job, understanding
how the thing they're reviewing works. ¯\_(ツ)_/¯

> > > We have to reserve some amount of space for pushed headers, so the above made sense to me for good performance/cache locality.
> > > 
> > > > > +static struct sk_buff *iptfs_alloc_skb(struct sk_buff *tpl, u32 len,
> > > > > +				       bool l3resv)
> > > > > +{
> > > > > +	struct sk_buff *skb;
> > > > > +	u32 resv;
> > > > > +
> > > > > +	if (!l3resv) {
> > > > > +		resv = XFRM_IPTFS_MIN_L2HEADROOM;
> > > > > +	} else {
> > > > > +		resv = skb_headroom(tpl);
> > > > > +		if (resv < XFRM_IPTFS_MIN_L3HEADROOM)
> > > > > +			resv = XFRM_IPTFS_MIN_L3HEADROOM;
> > > > > +	}
> > > > > +
> > > > > +	skb = alloc_skb(len + resv, GFP_ATOMIC);
> > > > > +	if (!skb) {
> > > > > +		XFRM_INC_STATS(dev_net(tpl->dev), LINUX_MIB_XFRMNOSKBERROR);
> > > >
> > > > Hmpf, so we've gone from incrementing the wrong counter to
> > > > incrementing a new counter that doesn't have a precise meaning.
> > > 
> > > The new "No SKB" counter is supposed to mean "couldn't get an SKB",
> > > given plenty of other errors are logged under "OutErr" or "InErr"
> > > i'm not sure what level of precision you're looking for here. :)
> > 
> > OutErr and InErr would be better than that new counter IMO.
> 
> Why?
> 
> My counter tracks the SKB depletion failure that is actually happening. Would you have me now pass in the direction argument just so I can tick the correct overly general MIB counter that provides less value to the user in identifying the actual problem? How is that good design?
> 
> I'm inclined to just delete the thing altogether rather than block on this thing that will almost never happen.

Fine.

> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	skb_reserve(skb, resv);
> > > > > +
> > > > > +	/* We do not want any of the tpl->headers copied over, so we do
> > > > > +	 * not use `skb_copy_header()`.
> > > > > +	 */
> > > >
> > > > This is a bit of a bad sign for the implementation. It also worries
> > > > me, as this may not be updated when changes are made to
> > > > __copy_skb_header().
> > > > (c/p'd from v1 review since this was still not answered)
> > > 
> > > I don't agree that this is a bad design at all, I'm curious what you think a good design to be.
> > 
> > Strange skb manipulations hiding in a protocol module is not good
> > design.
> 
> It's a fragmentation and aggregation protocol, it's needs work with skbs by design. It's literally the function of the protocol to manipulate packet content.

packet content != cherry-picked parts of sk_buff

> I would appreciate it if you could provide technical reasons to justify referring to things as "bad" or "strange" -- it's not helpful otherwise.

I did say it's a bad sign, not a blocking issue on its own. But that
bad sign, combined with the unusual use of skb_seq and a lot of
copying data around, indicates that this is not the right way to
implement this part of the protocol.

> > c/p bits of core code into a module (where they will never get fixed
> > up when the core code gets updated) is always a bad idea.
> 
> I need some values from the SKB, so I copy them -- it's that simple.
> 
> > > I did specifically state why we are not re-using
> > > skb_copy_header(). The functionality is different. We are not trying
> > > to make a copy of an skb we are using an skb as a template for new
> > > skbs.
> > 
> > I saw that. That doesn't mean it's a good thing to do.
> 
> Please suggest an alternative.

A common helper in a location where people are going to know that they
need to fix it up when they modify things about sk_buff would be a
good start.

> > > > > +/**
> > > > > + * skb_copy_bits_seq - copy bits from a skb_seq_state to kernel buffer
> > > > > + * @st: source skb_seq_state
> > > > > + * @offset: offset in source
> > > > > + * @to: destination buffer
> > > > > + * @len: number of bytes to copy
> > > > > + *
> > > > > + * Copy @len bytes from @offset bytes into the source @st to the destination
> > > > > + * buffer @to. `offset` should increase (or be unchanged) with each subsequent
> > > > > + * call to this function. If offset needs to decrease from the previous use `st`
> > > > > + * should be reset first.
> > > > > + *
> > > > > + * Return: 0 on success or a negative error code on failure
> > > > > + */
> > > > > +static int skb_copy_bits_seq(struct skb_seq_state *st, int offset, void *to,
> > > > > +			     int len)
> > > >
> > > > Probably belongs in net/core/skbuff.c, although I'm really not
> > > > convinced copying data around is the right way to implement the type
> > > > of packet splitting IPTFS does (which sounds a bit like a kind of
> > > > GSO). And there are helpers in net/core/skbuff.c (such as
> > > > pskb_carve/pskb_extract) that seem to do similar things to what you
> > > > need here, without as much data copying.
> > > 
> > > I don't have an issue with moving more general skb functionality
> > > into skbuff.c; however, I do not want to gate IP-TFS on this change
> > > to the general net infra, it is appropriate for a patchset of it's
> > > own.
> > 
> > If you need helpers that don't exist, it's part of your job to make
> > the core changes that are required to implement the functionality.
> 
> This is part of a new code protocol and feature addition and it's a single use.

Of course the helper would be single use when it's introduced. You
don't know if it will remain single use. And pskb_extract is single
use, it's fine.

> Another patchset can present this code to the general network
> community to see if they think it *also* has value outside of
> IPTFS. There is *no* reason to delay IPTFS on general network
> infrastructure improvements. Please don't do this.

Sorry, I don't think that's how it works.

> > > Re copying: Let's be clear here, we are not always copying data,
> > > there are sharing code paths as well; however, there are times when
> > > it is the best (and even fastest) way to accomplish things (e.g.,
> > > b/c the packet is small or the data is arranged in skbs in a way to
> > > make sharing ridiculously complex and thus slow).
> > 
> > I'm not finding the sharing code. You mean iptfs_first_should_copy
> > returning false?
> 
> 
>        /* Try share then copy. */
>        if (fragwalk && skb_can_add_frags(newskb, fragwalk, data, copylen)) {
>        ...
>                leftover = skb_add_frags(newskb, fragwalk, data, copylen);
>        } else {
>                /* copy fragment data into newskb */
>                if (skb_copy_bits_seq(st, data, skb_put(newskb, copylen),
>                ...
>        }

You're talking about reassembly now. This patch is fragmentation/TX.

-- 
Sabrina

next prev parent reply	other threads:[~2024-08-06 11:05 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-04 20:33 [PATCH ipsec-next v8 00/16] Add IP-TFS mode to xfrm Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 01/16] xfrm: config: add CONFIG_XFRM_IPTFS Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 02/16] include: uapi: add ip_tfs_*_hdr packet formats Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 03/16] include: uapi: add IPPROTO_AGGFRAG for AGGFRAG in ESP Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 04/16] xfrm: netlink: add config (netlink) options Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 05/16] xfrm: add mode_cbs module functionality Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 06/16] xfrm: add generic iptfs defines and functionality Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 07/16] xfrm: iptfs: add new iptfs xfrm mode impl Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 08/16] xfrm: iptfs: add user packet (tunnel ingress) handling Christian Hopps
2024-08-05 17:10   ` Simon Horman
2024-08-06 10:19     ` [devel-ipsec] " Christian Hopps
2024-08-06 15:24       ` Simon Horman
2024-08-04 20:33 ` [PATCH ipsec-next v8 09/16] xfrm: iptfs: share page fragments of inner packets Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets Christian Hopps
2024-08-04 22:25   ` Sabrina Dubroca
2024-08-05  2:33     ` Christian Hopps
2024-08-05  4:19       ` Christian Hopps
2024-08-06  8:47       ` Sabrina Dubroca
2024-08-06  8:54         ` Christian Hopps
2024-08-06 10:03           ` Florian Westphal
2024-08-06 10:05             ` Christian Hopps
2024-08-06 11:05           ` Sabrina Dubroca [this message]
2024-08-06 11:07             ` Christian Hopps
2024-08-08 11:30             ` Christian Hopps
2024-08-08 13:28               ` Sabrina Dubroca
2024-08-08 13:35                 ` Christian Hopps
2024-08-08 14:01                   ` Sabrina Dubroca
2024-08-08 21:42                     ` Christian Hopps
2024-08-06 11:07           ` Steffen Klassert
2024-08-07 16:23             ` Christian Hopps
2024-08-06 11:32     ` Steffen Klassert
2024-08-07 19:40       ` Christian Hopps
2024-08-08  9:26         ` Sabrina Dubroca
2024-08-08 11:23           ` Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 11/16] xfrm: iptfs: add basic receive packet (tunnel egress) handling Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 12/16] xfrm: iptfs: handle received fragmented inner packets Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 13/16] xfrm: iptfs: add reusing received skb for the tunnel egress packet Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 14/16] xfrm: iptfs: add skb-fragment sharing code Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 15/16] xfrm: iptfs: handle reordering of received packets Christian Hopps
2024-08-04 20:33 ` [PATCH ipsec-next v8 16/16] xfrm: iptfs: add tracepoint functionality Christian Hopps

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrIDfN2uFpktGJYD@hog \
    --to=sd@queasysnail.net \
    --cc=chopps@chopps.org \
    --cc=chopps@labn.net \
    --cc=devel@linux-ipsec.org \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.