All of lore.kernel.org
 help / color / mirror / Atom feed
From: cpaasch at apple.com
To: mptcp at lists.01.org
Subject: Re: [MPTCP] [PATCH 1/2] skbuff: Add shared control buffer
Date: Thu, 09 Nov 2017 13:48:03 +0900	[thread overview]
Message-ID: <20171109044803.GA87737@Chimay.local> (raw)
In-Reply-To: HE1PR0501MB223522222BDDAE746291152BB0570@HE1PR0501MB2235.eurprd05.prod.outlook.com

[-- Attachment #1: Type: text/plain, Size: 12371 bytes --]

On 09/11/17 - 04:32:54, Boris Pismenny wrote:
> +Ilya and Liran
> 
> Hi,
> 
> > -----Original Message-----
> > From: cpaasch(a)apple.com [mailto:cpaasch(a)apple.com]
> > Sent: Thursday, November 09, 2017 13:13
> > To: Mat Martineau <mathew.j.martineau(a)linux.intel.com>; Boris Pismenny
> > <borisp(a)mellanox.com>
> > Cc: mptcp(a)lists.01.org
> > Subject: Re: [MPTCP] [PATCH 1/2] skbuff: Add shared control buffer
> > 
> > +Boris
> > 
> > On 20/10/17 - 16:02:31, Mat Martineau wrote:
> > > The sk_buff control buffer is of limited size, and cannot be enlarged
> > > without significant impact on systemwide memory use. However, additional
> > > per-packet state is needed for some protocols, like Multipath TCP.
> > >
> > > An optional shared control buffer placed after the normal struct
> > > skb_shared_info can accomodate the necessary state without imposing
> > > extra memory usage or code changes on normal struct sk_buff
> > > users. __alloc_skb will now place a skb_shared_info_ext structure at
> > > skb->end when given the SKB_ALLOC_SHINFO_EXT flag. Most users of struct
> > > sk_buff continue to use the skb_shinfo() macro to access shared
> > > info. skb_shinfo(skb)->is_ext is set if the extended structure is
> > > available, and cleared if it is not.
> > >
> > > pskb_expand_head will preserve the shared control buffer if it is present.
> > >
> > > Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
> > > ---
> > >  include/linux/skbuff.h | 24 +++++++++++++++++++++-
> > >  net/core/skbuff.c      | 56 ++++++++++++++++++++++++++++++++++++++-------
> > -----
> > >  2 files changed, 66 insertions(+), 14 deletions(-)
> > 
> > Boris, below is the change I mentioned to you.
> > 
> > It allows to allocate 48 additional bytes on-demand after skb_shared_info.
> > As it is on-demand, it won't increase the size of the skb for other users.
> > 
> > For example, TLS could start using it when it creates the skb that it
> > pushes down to the TCP-stack. That way you don't need to handle the
> > tls_record lists.
> > 
> 
> One small problem is that TLS doesn't create SKBs. As a ULP it calls the transport send
> Functions (do_tcp_sendpages for TLS). This function receives a page and not a SKB.

yes, that's a good point. Mat has another patch as part of this series,
that adds an skb-arg to sendpages
(https://lists.01.org/pipermail/mptcp/2017-October/000130.html)

That should do the job for you.

> We decided not to create the SKB outside of the TCP layer to reduce the number of
> changes we made to TCP.
> 
> It would be nice if we could use something like that. Did you talk to DaveM about
> upstreaming this?

No, we didn't talk yet to anyone outside of this list here about it.

We were looking for a user of it outside of MPTCP but couldn't find one.
Now, it seems like we found one that would be interested :)


I think this infrastructure here would simplify your code quite a bit, no?

> We will definitely find it useful for the receive side, there we allocate the SKB in the driver.

Interesting! So even there you could use it. We were under the impression
that it would be of less interest for the receive-side.


Christoph

> 
> > See below for the rest of the patch.
> > 
> > 
> > Christoph
> > 
> > 
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 03634ec2f918..873910c66df9 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -489,7 +489,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct
> > sk_buff *skb,
> > >   * the end of the header data, ie. at skb->end.
> > >   */
> > >  struct skb_shared_info {
> > > -	__u8		__unused;
> > > +	__u8		is_ext:1,
> > > +			__unused:7;
> > >  	__u8		meta_len;
> > >  	__u8		nr_frags;
> > >  	__u8		tx_flags;
> > > @@ -530,6 +531,24 @@ struct skb_shared_info {
> > >  #define SKB_DATAREF_MASK ((1 << SKB_DATAREF_SHIFT) - 1)
> > >
> > >
> > > +/* This is an extended version of skb_shared_info, also invariant across
> > > + * clones and living at the end of the header data.
> > > + */
> > > +struct skb_shared_info_ext {
> > > +	/* skb_shared_info must be the first member */
> > > +	struct skb_shared_info	shinfo;
> > > +
> > > +	/* This is the shared control buffer. It is similar to sk_buff's
> > > +	 * control buffer, but is shared across clones. It must not be
> > > +	 * modified when multiple sk_buffs are referencing this structure.
> > > +	 */
> > > +	char			shcb[48];
> > > +};
> > > +
> > > +#define SKB_SHINFO_EXT_OVERHEAD	\
> > > +	(SKB_DATA_ALIGN(sizeof(struct skb_shared_info_ext)) - \
> > > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> > > +
> > >  enum {
> > >  	SKB_FCLONE_UNAVAILABLE,	/* skb has no fclone (from head_cache)
> > */
> > >  	SKB_FCLONE_ORIG,	/* orig skb (from fclone_cache) */
> > > @@ -856,6 +875,7 @@ struct sk_buff {
> > >  #define SKB_ALLOC_FCLONE	0x01
> > >  #define SKB_ALLOC_RX		0x02
> > >  #define SKB_ALLOC_NAPI		0x04
> > > +#define SKB_ALLOC_SHINFO_EXT	0x08
> > >
> > >  /* Returns true if the skb was allocated from PFMEMALLOC reserves */
> > >  static inline bool skb_pfmemalloc(const struct sk_buff *skb)
> > > @@ -1271,6 +1291,8 @@ static inline unsigned int skb_end_offset(const
> > struct sk_buff *skb)
> > >
> > >  /* Internal */
> > >  #define skb_shinfo(SKB)	((struct skb_shared_info
> > *)(skb_end_pointer(SKB)))
> > > +#define skb_shinfo_ext(SKB)	\
> > > +	((struct skb_shared_info_ext *)(skb_end_pointer(SKB)))
> > >
> > >  static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff
> > *skb)
> > >  {
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 40717501cbdd..397edd5c0613 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -166,6 +166,8 @@ static void *__kmalloc_reserve(size_t size, gfp_t
> > flags, int node,
> > >   *		instead of head cache and allocate a cloned (child) skb.
> > >   *		If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
> > >   *		allocations in case the data is required for writeback
> > > + *		If SKB_ALLOC_SHINFO_EXT is set, the skb will be allocated
> > > + *		with an extended shared info struct.
> > >   *	@node: numa node to allocate memory on
> > >   *
> > >   *	Allocate a new &sk_buff. The returned buffer has no headroom and a
> > > @@ -179,9 +181,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t
> > gfp_mask,
> > >  			    int flags, int node)
> > >  {
> > >  	struct kmem_cache *cache;
> > > -	struct skb_shared_info *shinfo;
> > >  	struct sk_buff *skb;
> > >  	u8 *data;
> > > +	unsigned int shinfo_size;
> > >  	bool pfmemalloc;
> > >
> > >  	cache = (flags & SKB_ALLOC_FCLONE)
> > > @@ -199,18 +201,22 @@ struct sk_buff *__alloc_skb(unsigned int size,
> > gfp_t gfp_mask,
> > >  	/* We do our best to align skb_shared_info on a separate cache
> > >  	 * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
> > >  	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> > > -	 * Both skb->head and skb_shared_info are cache line aligned.
> > > +	 * Both skb->head and skb_shared_info (or skb_shared_info_ext) are
> > > +	 * cache line aligned.
> > >  	 */
> > >  	size = SKB_DATA_ALIGN(size);
> > > -	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > -	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> > > +	if (flags & SKB_ALLOC_SHINFO_EXT)
> > > +		shinfo_size = SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info_ext));
> > > +	else
> > > +		shinfo_size = SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info));
> > > +	data = kmalloc_reserve(size + shinfo_size, gfp_mask, node,
> > &pfmemalloc);
> > >  	if (!data)
> > >  		goto nodata;
> > >  	/* kmalloc(size) might give us more room than requested.
> > >  	 * Put skb_shared_info exactly at the end of allocated zone,
> > >  	 * to allow max possible filling before reallocation.
> > >  	 */
> > > -	size = SKB_WITH_OVERHEAD(ksize(data));
> > > +	size = ksize(data) - shinfo_size;
> > >  	prefetchw(data + size);
> > >
> > >  	/*
> > > @@ -220,7 +226,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t
> > gfp_mask,
> > >  	 */
> > >  	memset(skb, 0, offsetof(struct sk_buff, tail));
> > >  	/* Account for allocated memory : skb + skb->head */
> > > -	skb->truesize = SKB_TRUESIZE(size);
> > > +	if (flags & SKB_ALLOC_SHINFO_EXT)
> > > +		skb->truesize = SKB_TRUESIZE(size) +
> > SKB_SHINFO_EXT_OVERHEAD;
> > > +	else
> > > +		skb->truesize = SKB_TRUESIZE(size);
> > >  	skb->pfmemalloc = pfmemalloc;
> > >  	refcount_set(&skb->users, 1);
> > >  	skb->head = data;
> > > @@ -231,10 +240,21 @@ struct sk_buff *__alloc_skb(unsigned int size,
> > gfp_t gfp_mask,
> > >  	skb->transport_header = (typeof(skb->transport_header))~0U;
> > >
> > >  	/* make sure we initialize shinfo sequentially */
> > > -	shinfo = skb_shinfo(skb);
> > > -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > > -	atomic_set(&shinfo->dataref, 1);
> > > -	kmemcheck_annotate_variable(shinfo->destructor_arg);
> > > +	if (flags & SKB_ALLOC_SHINFO_EXT) {
> > > +		struct skb_shared_info_ext *shinfo_ext = skb_shinfo_ext(skb);
> > > +		shinfo_ext->shinfo.is_ext = 1;
> > > +		memset(&shinfo_ext->shinfo.meta_len, 0,
> > > +		       offsetof(struct skb_shared_info, dataref) -
> > > +		       offsetof(struct skb_shared_info, meta_len));
> > > +		atomic_set(&shinfo_ext->shinfo.dataref, 1);
> > > +		kmemcheck_annotate_variable(shinfo_ext-
> > >shinfo.destructor_arg);
> > > +		memset(&shinfo_ext->shcb, 0, sizeof(shinfo_ext->shcb));
> > > +	} else {
> > > +		struct skb_shared_info *shinfo = skb_shinfo(skb);
> > > +		memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > > +		atomic_set(&shinfo->dataref, 1);
> > > +		kmemcheck_annotate_variable(shinfo->destructor_arg);
> > > +	}
> > >
> > >  	if (flags & SKB_ALLOC_FCLONE) {
> > >  		struct sk_buff_fclones *fclones;
> > > @@ -1443,6 +1463,7 @@ int pskb_expand_head(struct sk_buff *skb, int
> > nhead, int ntail,
> > >  {
> > >  	int i, osize = skb_end_offset(skb);
> > >  	int size = osize + nhead + ntail;
> > > +	int shinfo_size;
> > >  	long off;
> > >  	u8 *data;
> > >
> > > @@ -1454,11 +1475,14 @@ int pskb_expand_head(struct sk_buff *skb, int
> > nhead, int ntail,
> > >
> > >  	if (skb_pfmemalloc(skb))
> > >  		gfp_mask |= __GFP_MEMALLOC;
> > > -	data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info)),
> > > -			       gfp_mask, NUMA_NO_NODE, NULL);
> > > +	if (skb_shinfo(skb)->is_ext)
> > > +		shinfo_size = SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info_ext));
> > > +	else
> > > +		shinfo_size = SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info));
> > > +	data = kmalloc_reserve(size + shinfo_size, gfp_mask,
> > NUMA_NO_NODE, NULL);
> > >  	if (!data)
> > >  		goto nodata;
> > > -	size = SKB_WITH_OVERHEAD(ksize(data));
> > > +	size = ksize(data) - shinfo_size;
> > >
> > >  	/* Copy only real data... and, alas, header. This should be
> > >  	 * optimized for the cases when header is void.
> > > @@ -1468,6 +1492,12 @@ int pskb_expand_head(struct sk_buff *skb, int
> > nhead, int ntail,
> > >  	memcpy((struct skb_shared_info *)(data + size),
> > >  	       skb_shinfo(skb),
> > >  	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
> > > +	if (skb_shinfo(skb)->is_ext) {
> > > +		int offset = offsetof(struct skb_shared_info_ext, shcb);
> > > +		memcpy((struct skb_shared_info_ext *)(data + size + offset),
> > > +		       &skb_shinfo_ext(skb)->shcb,
> > > +		       sizeof(skb_shinfo_ext(skb)->shcb));
> > > +	}
> > >
> > >  	/*
> > >  	 * if shinfo is shared we must drop the old head gracefully, but if it
> > > --
> > > 2.14.2
> > >
> > > _______________________________________________
> > > mptcp mailing list
> > > mptcp(a)lists.01.org
> > >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.0
> > 1.org%2Fmailman%2Flistinfo%2Fmptcp&data=02%7C01%7Cborisp%40mellano
> > x.com%7C57765cda687e4c3ab8ed08d527283bd5%7Ca652971c7d2e4d9ba6a4
> > d149256f461b%7C0%7C0%7C636457976113353851&sdata=RtZwfWQx%2FSw
> > HoF9miFLpa2r4kA4pf0ta7la6k7F3OZI%3D&reserved=0

             reply	other threads:[~2017-11-09  4:48 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09  4:48 cpaasch [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-11-13  6:47 [MPTCP] [PATCH 1/2] skbuff: Add shared control buffer cpaasch
2017-11-10  0:31 Mat Martineau
2017-11-09 16:26 Mat Martineau
2017-11-09  7:56 cpaasch
2017-11-09  7:51 cpaasch
2017-11-09  4:13 Christoph Paasch
2017-11-08 21:02 Christoph Paasch
2017-11-08 20:41 Rao Shoaib
2017-11-08  0:25 Christoph Paasch
2017-11-07 23:35 Rao Shoaib
2017-11-07 23:23 Rao Shoaib
2017-11-07 21:15 Christoph Paasch
2017-11-07 17:13 Rao Shoaib
2017-11-07  4:09 Christoph Paasch
2017-11-07  3:16 Rao Shoaib
2017-11-07  2:46 Rao Shoaib
2017-11-06 22:24 Christoph Paasch
2017-11-06  2:45 Rao Shoaib
2017-11-03  5:10 Christoph Paasch
2017-11-02 21:41 Mat Martineau
2017-10-31 21:58 Mat Martineau
2017-10-31  4:17 Christoph Paasch
2017-10-30 22:44 Mat Martineau
2017-10-30  4:16 Christoph Paasch
2017-10-27 19:57 Christoph Paasch
2017-10-27 18:19 Mat Martineau
2017-10-26 23:20 Rao Shoaib
2017-10-26 22:26 Rao Shoaib
2017-10-23 23:10 Mat Martineau
2017-10-23 22:51 Mat Martineau
2017-10-23 20:13 Rao Shoaib
2017-10-23 20:10 Christoph Paasch
2017-10-23 19:49 Rao Shoaib
2017-10-23 16:37 Christoph Paasch
2017-10-20 23:02 Mat Martineau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171109044803.GA87737@Chimay.local \
    --to=unknown@example.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.