linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Evgeniy Polyakov <zbr@ioremap.net>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Vladislav Bolkhovitin <vst@vlnb.net>,
	linux-scsi@vger.kernel.org,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	Mike Christie <michaelc@cs.wisc.edu>,
	Jeff Garzik <jeff@garzik.org>,
	Boaz Harrosh <bharrosh@panasas.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, scst-devel@lists.sourceforge.net,
	Bart Van Assche <bart.vanassche@gmail.com>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	netdev@vger.kernel.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data
Date: Mon, 22 Dec 2008 11:13:22 +1030	[thread overview]
Message-ID: <200812221113.24044.rusty@rustcorp.com.au> (raw)
In-Reply-To: <494D49E6.9000703@goop.org>

On Sunday 21 December 2008 06:09:18 Jeremy Fitzhardinge wrote:
> Evgeniy Polyakov wrote:
> > Things should work fine, since pskb_expand_head() copies whole shared
> > info structure (and thus will copy destructor), get all pages and then
> > copy all pointers into the new skb, and then release old skb's data.
> >
> > So destructor for the pages should not rely on which skb it is called on
> > and check if pages are about to be really freed (i.e. check theirs
> > reference counter).
> >   
> 
> OK.
> 
> > __pskb_pull_tail() is tricky, it just puts some pages it does not want
> > to be present in the skb, but it could be possible to add there
> > destructor callback from the original skb with partial flag (or just
> > having destructor with two parameters: skb and page, and if page is not
> > NULL, then actually only given page is freed, otherwise the whole skb).
> >   
> 
> Yes, that doesn't sound too bad.

That would be one approach.  Actually, my patch solved this by keeping a
parent ref in various cases if the parent had a destructor: we only destroy
the parent when all the clones are gone.

Here's the patch for reference:

net: add destructor for skb data.

If we want to notify something when an skb is truly finished (such as
for tun vringfd support), we need a destructor on the data.

This turns out to be slightly non-trivial as fragments from one skb
get copied to another skb: if the first skb has a destructor (or its
parent does) we need to keep a reference to it and destroy it only
when (all the) children are destroyed.  We add an 'orig' pointer to
the skb_shared_info to do this.

But there's currently no way to get from the shinfo to the head (to
kfree it), so we add a 'len' field.  A better alternative to this
might be to move the skb_shared_info to before the head of the skb data.

Note that the destructor is responsible for calling kfree: for the tun
device, this is critical since the destructor can be called from any
context and it has to do a copy_to_user, so it queues the skb.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/linux/skbuff.h |    9 ++++++
 net/core/skbuff.c      |   66 ++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 64 insertions(+), 11 deletions(-)

diff -r 3948a0050f81 include/linux/skbuff.h
--- a/include/linux/skbuff.h	Tue May 06 21:18:01 2008 +1000
+++ b/include/linux/skbuff.h	Thu May 08 13:12:47 2008 +1000
@@ -146,8 +146,12 @@ struct skb_shared_info {
 	unsigned short	gso_segs;
 	unsigned short  gso_type;
 	__be32          ip6_frag_id;
+	unsigned int	len; /* Subtract from this shinfo to find skb->head */
 	struct sk_buff	*frag_list;
 	skb_frag_t	frags[MAX_SKB_FRAGS];
+	struct skb_shared_info *orig;
+	/* This is responsible for kfree() of header. */
+	void		(*destructor)(struct skb_shared_info *);
 };
 
 /* We divide dataref into two halves.  The higher 16 bits hold references
@@ -827,6 +831,11 @@ static inline void skb_fill_page_desc(st
 #define SKB_PAGE_ASSERT(skb) 	BUG_ON(skb_shinfo(skb)->nr_frags)
 #define SKB_FRAG_ASSERT(skb) 	BUG_ON(skb_shinfo(skb)->frag_list)
 #define SKB_LINEAR_ASSERT(skb)  BUG_ON(skb_is_nonlinear(skb))
+
+static inline unsigned char *skb_shinfo_to_head(struct skb_shared_info *shinfo)
+{
+	return (unsigned char *)shinfo - shinfo->len;
+}
 
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
 static inline unsigned char *skb_tail_pointer(const struct sk_buff *skb)
diff -r 3948a0050f81 net/core/skbuff.c
--- a/net/core/skbuff.c	Tue May 06 21:18:01 2008 +1000
+++ b/net/core/skbuff.c	Thu May 08 13:12:47 2008 +1000
@@ -218,6 +218,9 @@ struct sk_buff *__alloc_skb(unsigned int
 	shinfo->gso_type = 0;
 	shinfo->ip6_frag_id = 0;
 	shinfo->frag_list = NULL;
+	shinfo->destructor = NULL;
+	shinfo->orig = NULL;
+	shinfo->len = skb_end_pointer(skb) - skb->head;
 
 	if (fclone) {
 		struct sk_buff *child = skb + 1;
@@ -311,21 +314,53 @@ static void skb_clone_fraglist(struct sk
 		skb_get(list);
 }
 
+static void shinfo_put(struct skb_shared_info *shinfo, bool nohdr, bool clone)
+{
+	struct skb_shared_info *orig;
+
+	do {
+		if (clone &&
+		    atomic_sub_return(nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
+				      &shinfo->dataref)) {
+			return;
+		}
+
+		if (shinfo->nr_frags) {
+			int i;
+			for (i = 0; i < shinfo->nr_frags; i++)
+				put_page(shinfo->frags[i].page);
+		}
+
+		if (shinfo->frag_list)
+			skb_drop_list(&shinfo->frag_list);
+
+		orig = shinfo->orig;
+		if (shinfo->destructor)
+			shinfo->destructor(shinfo);
+		else
+			kfree(skb_shinfo_to_head(shinfo));
+
+		/* We hold a payload reference to our parent. */
+		nohdr = true;
+		clone = true;
+	} while ((shinfo = orig) != NULL);
+}
+
 static void skb_release_data(struct sk_buff *skb)
 {
-	if (!skb->cloned ||
-	    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
-			       &skb_shinfo(skb)->dataref)) {
-		if (skb_shinfo(skb)->nr_frags) {
-			int i;
-			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-				put_page(skb_shinfo(skb)->frags[i].page);
-		}
+	shinfo_put(skb_shinfo(skb), skb->nohdr, skb->cloned);
+}
 
-		if (skb_shinfo(skb)->frag_list)
-			skb_drop_fraglist(skb);
+/* Now hold reference to older data, if has a destructor (recursively). */
+static void skb_ref_data_parent(struct sk_buff *parent,
+				struct skb_shared_info *shinfo)
+{
+	struct skb_shared_info *pshinfo = skb_shinfo(parent);
 
-		kfree(skb->head);
+	if (pshinfo->destructor || pshinfo->orig) {
+		shinfo->orig = pshinfo;
+		atomic_add((1 << SKB_DATAREF_SHIFT) + 1, &pshinfo->dataref);
+		parent->cloned = 1;
 	}
 }
 
@@ -656,6 +691,7 @@ struct sk_buff *pskb_copy(struct sk_buff
 			get_page(skb_shinfo(n)->frags[i].page);
 		}
 		skb_shinfo(n)->nr_frags = i;
+		skb_ref_data_parent(skb, skb_shinfo(n));
 	}
 
 	if (skb_shinfo(skb)->frag_list) {
@@ -721,6 +757,8 @@ int pskb_expand_head(struct sk_buff *skb
 	if (skb_shinfo(skb)->frag_list)
 		skb_clone_fraglist(skb);
 
+	skb_ref_data_parent(skb, (void *)(data + size));
+
 	skb_release_data(skb);
 
 	off = (data + nhead) - skb->head;
@@ -743,6 +781,8 @@ int pskb_expand_head(struct sk_buff *skb
 	skb->hdr_len  = 0;
 	skb->nohdr    = 0;
 	atomic_set(&skb_shinfo(skb)->dataref, 1);
+	skb_shinfo(skb)->len = skb_end_pointer(skb) - skb->head;
+	skb_shinfo(skb)->destructor = NULL;
 	return 0;
 
 nodata:
@@ -1962,6 +2002,8 @@ void skb_split(struct sk_buff *skb, stru
 		skb_split_inside_header(skb, skb1, len, pos);
 	else		/* Second chunk has no header, nothing to copy. */
 		skb_split_no_header(skb, skb1, len, pos);
+
+	skb_ref_data_parent(skb, skb_shinfo(skb1));
 }
 
 /**
@@ -2334,6 +2376,8 @@ struct sk_buff *skb_segment(struct sk_bu
 		nskb->data_len = len - hsize;
 		nskb->len += nskb->data_len;
 		nskb->truesize += nskb->data_len;
+
+		skb_ref_data_parent(skb, skb_shinfo(nskb));
 	} while ((offset += len) < skb->len);
 
 	return segs;

  reply	other threads:[~2008-12-22  0:43 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 18:28 ` [PATCH][RFC 1/23]: SCST public headers Vladislav Bolkhovitin
2008-12-10 18:30 ` [PATCH][RFC 2/23]: SCST core Vladislav Bolkhovitin
2008-12-10 19:12   ` Sam Ravnborg
2008-12-11 17:28     ` Vladislav Bolkhovitin
2008-12-11 21:09       ` Sam Ravnborg
2008-12-12 19:24         ` Vladislav Bolkhovitin
2008-12-12 21:50           ` Steven Rostedt
     [not found]             ` <20081212230523.GB4775@ghostprotocols.net>
2008-12-13  1:25               ` Frédéric Weisbecker
2008-12-13  1:27                 ` Frédéric Weisbecker
2008-12-13 14:46             ` Vladislav Bolkhovitin
2008-12-14  0:35               ` Frédéric Weisbecker
2008-12-16 21:49                 ` Ingo Molnar
2008-12-16 22:13                   ` Frédéric Weisbecker
2008-12-16 22:22                     ` Ingo Molnar
2008-12-16 23:46                       ` Frédéric Weisbecker
2008-12-18 11:45                 ` Vladislav Bolkhovitin
2008-12-20 13:06                   ` Frédéric Weisbecker
2008-12-23 19:11                     ` Vladislav Bolkhovitin
2008-12-27 11:20                       ` Ingo Molnar
2008-12-30 17:13                         ` Vladislav Bolkhovitin
2008-12-30 21:03                           ` Frederic Weisbecker
2008-12-30 21:35                             ` Steven Rostedt
2008-12-10 18:34 ` [PATCH][RFC 3/23]: SCST core docs Vladislav Bolkhovitin
2008-12-10 18:36 ` [PATCH][RFC 4/23]: SCST debug support Vladislav Bolkhovitin
2008-12-10 18:37 ` [PATCH][RFC 5/23]: SCST /proc interface Vladislav Bolkhovitin
2008-12-11 20:23   ` Nicholas A. Bellinger
2008-12-12 19:23     ` Vladislav Bolkhovitin
2008-12-10 18:39 ` [PATCH][RFC 6/23]: SCST SGV cache Vladislav Bolkhovitin
2008-12-10 18:40 ` [PATCH][RFC 7/23]: SCST integration into the kernel Vladislav Bolkhovitin
2008-12-10 18:42 ` [PATCH][RFC 8/23]: SCST pass-through backend handlers Vladislav Bolkhovitin
2008-12-10 18:43 ` [PATCH][RFC 9/23]: SCST virtual disk backend handler Vladislav Bolkhovitin
2008-12-10 18:44 ` [PATCH][RFC 10/23]: SCST user space " Vladislav Bolkhovitin
2008-12-10 18:46 ` [PATCH][RFC 11/23]: Makefile for SCST backend handlers Vladislav Bolkhovitin
2008-12-10 18:47 ` [PATCH][RFC 12/23]: Patch to add necessary support for SCST pass-through Vladislav Bolkhovitin
2008-12-10 18:49 ` [PATCH][RFC 13/23]: Export of alloc_io_context() function Vladislav Bolkhovitin
2008-12-11 13:34   ` Jens Axboe
2008-12-11 18:17     ` Vladislav Bolkhovitin
2008-12-11 18:41       ` Jens Axboe
2008-12-11 19:00         ` Vladislav Bolkhovitin
2008-12-11 19:06           ` Jens Axboe
2008-12-12 19:16             ` Vladislav Bolkhovitin
2008-12-10 18:50 ` [PATCH][RFC 14/23]: Necessary functionality in qla2xxx driver to support target mode Vladislav Bolkhovitin
2008-12-10 18:51 ` [PATCH][RFC 15/23]: QLogic target driver Vladislav Bolkhovitin
2008-12-10 18:54 ` [PATCH][RFC 16/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:55 ` [PATCH][RFC 17/23]: InfiniBand SRP " Vladislav Bolkhovitin
2008-12-10 18:57 ` [PATCH][RFC 18/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:58 ` [PATCH][RFC 19/23]: scst_local " Vladislav Bolkhovitin
2008-12-10 19:00 ` [PATCH][RFC 20/23]: Documentation for scst_local driver Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55   ` Nicholas A. Bellinger
2008-12-11 22:59     ` Nicholas A. Bellinger
2008-12-12 19:26     ` Vladislav Bolkhovitin
2008-12-13 10:03       ` Nicholas A. Bellinger
2008-12-13 10:11         ` Bart Van Assche
2008-12-13 10:16           ` Nicholas A. Bellinger
2008-12-13 10:27             ` Bart Van Assche
2008-12-13 15:01             ` Vladislav Bolkhovitin
2008-12-13 14:57         ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45   ` Evgeniy Polyakov
2008-12-11 18:16     ` Vladislav Bolkhovitin
2008-12-11 19:12       ` James Bottomley
2008-12-12 19:25         ` Vladislav Bolkhovitin
2008-12-12 19:37           ` James Bottomley
2008-12-15 17:58             ` Vladislav Bolkhovitin
2008-12-15 23:18               ` Christoph Hellwig
2008-12-16 18:57                 ` Vladislav Bolkhovitin
2008-12-18 18:35                   ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:43                     ` David M. Lloyd
2008-12-19 17:37                       ` Vladislav Bolkhovitin
2008-12-19 19:07                         ` Jens Axboe
2008-12-19 19:17                           ` Vladislav Bolkhovitin
2008-12-19 19:27                             ` Jens Axboe
2008-12-19 21:58                               ` Evgeniy Polyakov
2008-12-23 19:11                               ` Vladislav Bolkhovitin
2008-12-19 11:27                     ` Andi Kleen
2008-12-19 17:38                       ` Vladislav Bolkhovitin
2008-12-19 18:00                         ` Andi Kleen
2008-12-19 17:57                           ` Vladislav Bolkhovitin
2008-12-16 16:00     ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41       ` Evgeniy Polyakov
2008-12-19 20:21   ` Jeremy Fitzhardinge
2008-12-19 22:04     ` Evgeniy Polyakov
2008-12-19 22:21       ` Jeremy Fitzhardinge
2008-12-19 22:33         ` Evgeniy Polyakov
2008-12-20  1:56           ` Jeremy Fitzhardinge
2008-12-20  2:02             ` Herbert Xu
2008-12-20  6:14               ` Jeremy Fitzhardinge
2008-12-20  6:51                 ` Herbert Xu
2008-12-20  7:43                   ` Jeremy Fitzhardinge
2008-12-20  8:10                     ` Herbert Xu
2008-12-20 10:32                       ` Evgeniy Polyakov
2008-12-20 19:39                         ` Jeremy Fitzhardinge
2008-12-22  0:43                           ` Rusty Russell [this message]
2008-12-23 19:14                             ` Vladislav Bolkhovitin
2008-12-23 19:16                         ` Vladislav Bolkhovitin
2008-12-23 21:38                           ` Evgeniy Polyakov
2008-12-24 14:37                             ` Vladislav Bolkhovitin
2008-12-24 14:44                               ` Evgeniy Polyakov
2008-12-24 17:46                                 ` Vladislav Bolkhovitin
2008-12-24 18:08                                   ` Evgeniy Polyakov
2008-12-30 17:37                                     ` Vladislav Bolkhovitin
2008-12-30 21:35                                       ` Evgeniy Polyakov
2008-12-23 19:13     ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812221113.24044.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bart.vanassche@gmail.com \
    --cc=bharrosh@panasas.com \
    --cc=davem@davemloft.net \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=herbert@gondor.apana.org.au \
    --cc=jeff@garzik.org \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=nab@linux-iscsi.org \
    --cc=netdev@vger.kernel.org \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=torvalds@linux-foundation.org \
    --cc=vst@vlnb.net \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).