From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-44.mimecast.com (us-smtp-delivery-44.mimecast.com [207.211.30.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CF9F16BE17 for ; Tue, 6 Aug 2024 08:47:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.211.30.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722934045; cv=none; b=tM+xUqmOSzVDuWxlymWVmS4dUBK92ZNrnvX7SYNKTZOloR1Ik4u9qEhKZlNBr2lqJIA+ZnM+p9YE4SF0IofaYqb2s1hlVDugZ275hI8A6bgKR6sSLNmimnoITe69IbFQF6bN1G6rag+YIRMQwghdTY1y4lncOjw7Rwov89UWS4o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722934045; c=relaxed/simple; bh=LvXH+RYQzwHVWVapWXzQbE2zx8NoIlFfxaXKZ8trYdA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: In-Reply-To:Content-Type:Content-Disposition; b=nQFVA7DQ4sIno0Ufq7JgmSf++U3u5Hn570A2grR/dmpIM6cr0TzIRGLSbX7iXKVrCSILEKdusEujxqdRONNR9O6l/6uMyg0mTdPhdzFa+EHK54T5RoQblL9ftsp4gkJuS1UUVeNaXoZ1PjV4qPrBwNW/LUd9+vl300PJCSg16Y4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=queasysnail.net; spf=none smtp.mailfrom=queasysnail.net; arc=none smtp.client-ip=207.211.30.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=queasysnail.net Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=queasysnail.net Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-493-Uw3DicH2M96HPcT4tFP1ag-1; Tue, 06 Aug 2024 04:47:14 -0400 X-MC-Unique: Uw3DicH2M96HPcT4tFP1ag-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7C8151956064; Tue, 6 Aug 2024 08:47:12 +0000 (UTC) Received: from hog (unknown [10.39.192.3]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2D9841955D42; Tue, 6 Aug 2024 08:47:05 +0000 (UTC) Date: Tue, 6 Aug 2024 10:47:03 +0200 From: Sabrina Dubroca To: Christian Hopps Cc: devel@linux-ipsec.org, Steffen Klassert , netdev@vger.kernel.org, Christian Hopps Subject: Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets Message-ID: References: <20240804203346.3654426-1-chopps@chopps.org> <20240804203346.3654426-11-chopps@chopps.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: queasysnail.net Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable 2024-08-04, 22:33:05 -0400, Christian Hopps wrote: > > > +/* 1) skb->head should be cache aligned. > > > + * 2) when resv is for L2 headers (i.e., ethernet) we want the cache= line to > > > + * start -16 from data. > > > + * 3) when resv is for L3+L2 headers IOW skb->data points at the IPT= FS payload > > > + * we want data to be cache line aligned so all the pushed headers w= ill be in > > > + * another cacheline. > > > + */ > > > +#define XFRM_IPTFS_MIN_L3HEADROOM 128 > > > +#define XFRM_IPTFS_MIN_L2HEADROOM (64 + 16) > >=20 > > How did you pick those values? >=20 > That's what the comment is talking to. When reserving space for L2 header= s we pick 64 + 16 (a 2^(<=3D6) cacheline + 16 bytes so the the cacheline sh= ould start -16 from where skb->data will point at. Hard-coding the x86 cacheline size is not a good idea. And what's the 16B for? You don't know that it's enough for the actual L2 headers. > For L3 we reserve double the power of 2 space we reserved for L2 only. But that's the core of my question. Why is that correct/enough? >=20 > We have to reserve some amount of space for pushed headers, so the above = made sense to me for good performance/cache locality. >=20 > > > +static struct sk_buff *iptfs_alloc_skb(struct sk_buff *tpl, u32 len, > > > +=09=09=09=09 bool l3resv) > > > +{ > > > +=09struct sk_buff *skb; > > > +=09u32 resv; > > > + > > > +=09if (!l3resv) { > > > +=09=09resv =3D XFRM_IPTFS_MIN_L2HEADROOM; > > > +=09} else { > > > +=09=09resv =3D skb_headroom(tpl); > > > +=09=09if (resv < XFRM_IPTFS_MIN_L3HEADROOM) > > > +=09=09=09resv =3D XFRM_IPTFS_MIN_L3HEADROOM; > > > +=09} > > > + > > > +=09skb =3D alloc_skb(len + resv, GFP_ATOMIC); > > > +=09if (!skb) { > > > +=09=09XFRM_INC_STATS(dev_net(tpl->dev), LINUX_MIB_XFRMNOSKBERROR); > >=20 > > Hmpf, so we've gone from incrementing the wrong counter to > > incrementing a new counter that doesn't have a precise meaning. >=20 > The new "No SKB" counter is supposed to mean "couldn't get an SKB", > given plenty of other errors are logged under "OutErr" or "InErr" > i'm not sure what level of precision you're looking for here. :) OutErr and InErr would be better than that new counter IMO. > > > +=09=09return NULL; > > > +=09} > > > + > > > +=09skb_reserve(skb, resv); > > > + > > > +=09/* We do not want any of the tpl->headers copied over, so we do > > > +=09 * not use `skb_copy_header()`. > > > +=09 */ > >=20 > > This is a bit of a bad sign for the implementation. It also worries > > me, as this may not be updated when changes are made to > > __copy_skb_header(). > > (c/p'd from v1 review since this was still not answered) >=20 > I don't agree that this is a bad design at all, I'm curious what you thin= k a good design to be. Strange skb manipulations hiding in a protocol module is not good design. c/p bits of core code into a module (where they will never get fixed up when the core code gets updated) is always a bad idea. > I did specifically state why we are not re-using > skb_copy_header(). The functionality is different. We are not trying > to make a copy of an skb we are using an skb as a template for new > skbs. I saw that. That doesn't mean it's a good thing to do. > > > +/** > > > + * skb_copy_bits_seq - copy bits from a skb_seq_state to kernel buff= er > > > + * @st: source skb_seq_state > > > + * @offset: offset in source > > > + * @to: destination buffer > > > + * @len: number of bytes to copy > > > + * > > > + * Copy @len bytes from @offset bytes into the source @st to the des= tination > > > + * buffer @to. `offset` should increase (or be unchanged) with each = subsequent > > > + * call to this function. If offset needs to decrease from the previ= ous use `st` > > > + * should be reset first. > > > + * > > > + * Return: 0 on success or a negative error code on failure > > > + */ > > > +static int skb_copy_bits_seq(struct skb_seq_state *st, int offset, v= oid *to, > > > +=09=09=09 int len) > >=20 > > Probably belongs in net/core/skbuff.c, although I'm really not > > convinced copying data around is the right way to implement the type > > of packet splitting IPTFS does (which sounds a bit like a kind of > > GSO). And there are helpers in net/core/skbuff.c (such as > > pskb_carve/pskb_extract) that seem to do similar things to what you > > need here, without as much data copying. >=20 > I don't have an issue with moving more general skb functionality > into skbuff.c; however, I do not want to gate IP-TFS on this change > to the general net infra, it is appropriate for a patchset of it's > own. If you need helpers that don't exist, it's part of your job to make the core changes that are required to implement the functionality. > Re copying: Let's be clear here, we are not always copying data, > there are sharing code paths as well; however, there are times when > it is the best (and even fastest) way to accomplish things (e.g., > b/c the packet is small or the data is arranged in skbs in a way to > make sharing ridiculously complex and thus slow). I'm not finding the sharing code. You mean iptfs_first_should_copy returning false? --=20 Sabrina