netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gregory Haskins <ghaskins@novell.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	herbert.xu@redhat.com
Subject: Re: [RFC PATCH] net: add dataref destructor to sk_buff
Date: Tue, 10 Nov 2009 13:53:35 +0200	[thread overview]
Message-ID: <20091110115335.GC6989@redhat.com> (raw)
In-Reply-To: <20091002141407.30224.54207.stgit@dev.haskins.net>

On Fri, Oct 02, 2009 at 10:20:00AM -0400, Gregory Haskins wrote:
> (Applies to davem/net-2.6.git:4fdb78d30)
> 
> Hi David, netdevs,
> 
> The following is an RFC for an attempt at addressing a zero-copy solution.
> 
> To be perfectly honest, I have no idea if this is the best solution, or if
> there is truly a problem with skb->destructor that requires an alternate
> mechanism.  What I do know is that this patch seems to work, and I would
> like to see some kind of solution available upstream.  So I thought I would
> send my hack out as at least a point of discussion.  FWIW: This has been
> tested heavily in my rig and is technically suitable for inclusion after
> review as is, if that is decided to be the optimal path forward here.
> 
> Thanks for your review and consideration,
> 
> Kind regards,
> -Greg
> 
> ----------------------------------------
> From: Gregory Haskins <ghaskins@novell.com>
> Subject: [RFC PATCH] net: add dataref destructor to sk_buff
> 
> What: The skb->destructor field is reportedly unreliable for ensuring
> that all shinfo users have dropped their references.  Therefore, we add
> a distinct ->release() method for the shinfo structure which is closely
> tied to the underlying page resources we want to protect.
> 
> Why: We want to add zero-copy transmit support for AlacrityVM guests.
> In order to support this, the host kernel must map guest pages directly
> into a paged-skb and send it as normal.  put_page() alone is not
> sufficient lifetime management since the pages are ultimately allocated
> from within the guest.  Therefore, we need higher-level notification
> when the skb is finally freed on the host so we can then inject a proper
> "tx-complete" event into the guest context.
> 
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
> 
>  include/linux/skbuff.h |    2 ++
>  net/core/skbuff.c      |    9 +++++++++
>  2 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index df7b23a..02cdab6 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -207,6 +207,8 @@ struct skb_shared_info {
>  	/* Intermediate layers must ensure that destructor_arg
>  	 * remains valid until skb destructor */
>  	void *		destructor_arg;
> +	void *          priv;
> +	void           (*release)(struct sk_buff *skb);
>  };
>  
>  /* We divide dataref into two halves.  The higher 16 bits hold references
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 80a9616..a7e40a9 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -219,6 +219,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  	shinfo->tx_flags.flags = 0;
>  	skb_frag_list_init(skb);
>  	memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps));
> +	shinfo->release = NULL;
> +	shinfo->priv = NULL;
>  
>  	if (fclone) {
>  		struct sk_buff *child = skb + 1;
> @@ -350,6 +352,9 @@ static void skb_release_data(struct sk_buff *skb)
>  		if (skb_has_frags(skb))
>  			skb_drop_fraglist(skb);
>  
> +		if (skb_shinfo(skb)->release)
> +			skb_shinfo(skb)->release(skb);
> +
>  		kfree(skb->head);
>  	}
>  }
> @@ -514,6 +519,8 @@ int skb_recycle_check(struct sk_buff *skb, int skb_size)
>  	shinfo->tx_flags.flags = 0;
>  	skb_frag_list_init(skb);
>  	memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps));
> +	shinfo->release = NULL;
> +	shinfo->priv = NULL;
>  
>  	memset(skb, 0, offsetof(struct sk_buff, tail));
>  	skb->data = skb->head + NET_SKB_PAD;
> @@ -856,6 +863,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	skb->hdr_len  = 0;
>  	skb->nohdr    = 0;
>  	atomic_set(&skb_shinfo(skb)->dataref, 1);
> +	skb_shinfo(skb)->release = NULL;
> +	skb_shinfo(skb)->priv = NULL;
>  	return 0;
>  
>  nodata:

This is basically subset of the skb data destructors patch, isn't it?
Last time this was tried, this is the objection that was voiced:

	The problem with this patch is that it's tracking skb's, while
	you want use it to track pages for zero-copy.  That just doesn't
	work.  Through mechanisms like splice, individual pages in the
	skb can be detached and metastasize to other locations, e.g.,
	the VFS.

and I think this applies here. In other words, this only *seems*
to work for you because you are not trying to do things like
guest to host communication, with host doing smart things.

Cc Herbert which was involved in the original discussion.

In the specific case, it seems that things like pskb_copy,
skb_split and others will also be broken, won't they?

-- 
MST

  parent reply	other threads:[~2009-11-10 11:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-02 14:20 [RFC PATCH] net: add dataref destructor to sk_buff Gregory Haskins
2009-11-06  5:08 ` David Miller
2009-11-06 16:08   ` Gregory Haskins
2009-11-10 11:53 ` Michael S. Tsirkin [this message]
2009-11-10 12:40   ` Gregory Haskins
2009-11-10 13:17     ` Michael S. Tsirkin
2009-11-10 14:11       ` Gregory Haskins
2009-11-10 14:36         ` Michael S. Tsirkin
2009-11-10 15:45           ` Gregory Haskins
2009-11-10 17:36             ` Michael S. Tsirkin
2009-11-10 18:36               ` Gregory Haskins
2009-11-10 21:40                 ` Evgeniy Polyakov
2009-11-14  1:12             ` Herbert Xu
2009-11-14  1:33               ` Gregory Haskins
2009-11-14  2:21                 ` Herbert Xu
2009-11-14  2:27                   ` Gregory Haskins
2009-11-14  2:43                     ` Herbert Xu
2009-11-14  2:45                     ` Stephen Hemminger
2009-11-14  2:51                       ` Herbert Xu
2009-11-14  5:27                       ` Gregory Haskins
2009-11-16 19:59                         ` Stephen Hemminger
2009-11-16 20:18                           ` Gregory Haskins
2009-11-14  3:09                     ` David Miller
2009-11-14  3:04                 ` David Miller
2009-11-16 14:22                   ` Gregory Haskins
2009-11-17  1:02                     ` Herbert Xu
2009-11-17 12:33                       ` Gregory Haskins
2009-11-16 17:08                   ` Avi Kivity
2009-11-10 14:45         ` Michael S. Tsirkin
2009-11-10 15:47           ` Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091110115335.GC6989@redhat.com \
    --to=mst@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=herbert.xu@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).