From: Ian Campbell <ian.campbell@citrix.com>
To: netdev@vger.kernel.org
Cc: "David Miller" <davem@davemloft.net>,
"Eric Dumazet" <eric.dumazet@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Wei Liu" <wei.liu2@citrix.com>,
xen-devel@lists.xen.org, "Ian Campbell" <ian.campbell@citrix.com>,
"Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Subject: [PATCH 06/10] net: add support for per-paged-fragment destructors
Date: Tue, 10 Apr 2012 15:26:20 +0100 [thread overview]
Message-ID: <1334067984-7706-6-git-send-email-ian.campbell@citrix.com> (raw)
In-Reply-To: <1334067965.5394.22.camel@zakaz.uk.xensource.com>
Entities which care about the complete lifecycle of pages which they inject
into the network stack via an skb paged fragment can choose to set this
destructor in order to receive a callback when the stack is really finished
with a page (including all clones, retransmits, pull-ups etc etc).
This destructor will always be propagated alongside the struct page when
copying skb_frag_t->page. This is the reason I chose to embed the destructor in
a "struct { } page" within the skb_frag_t, rather than as a separate field,
since it allows existing code which propagates ->frags[N].page to Just
Work(tm).
When the destructor is present the page reference counting is done slightly
differently. No references are held by the network stack on the struct page (it
is up to the caller to manage this as necessary) instead the network stack will
track references via the count embedded in the destructor structure. When this
reference count reaches zero then the destructor will be called and the caller
can take the necesary steps to release the page (i.e. release the struct page
reference itself).
The intention is that callers can use this callback to delay completion to
_their_ callers until the network stack has completely released the page, in
order to prevent use-after-free or modification of data pages which are still
in use by the stack.
It is allowable (indeed expected) for a caller to share a single destructor
instance between multiple pages injected into the stack e.g. a group of pages
included in a single higher level operation might share a destructor which is
used to complete that higher level operation.
With this change and the previous two changes to shinfo alignment and field
orderring it is now the case tyhat on a 64 bit system with 64 byte cache lines,
everything from nr_frags until the end of frags[0] is on the same cacheline.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
---
include/linux/skbuff.h | 43 +++++++++++++++++++++++++++++++++++++++++++
net/core/skbuff.c | 17 +++++++++++++++++
2 files changed, 60 insertions(+), 0 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f0ae39c..6ac283e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -166,9 +166,15 @@ struct sk_buff;
typedef struct skb_frag_struct skb_frag_t;
+struct skb_frag_destructor {
+ atomic_t ref;
+ int (*destroy)(struct skb_frag_destructor *destructor);
+};
+
struct skb_frag_struct {
struct {
struct page *p;
+ struct skb_frag_destructor *destructor;
} page;
#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
__u32 page_offset;
@@ -1221,6 +1227,31 @@ static inline int skb_pagelen(const struct sk_buff *skb)
}
/**
+ * skb_frag_set_destructor - set destructor for a paged fragment
+ * @skb: buffer containing fragment to be initialised
+ * @i: paged fragment index to initialise
+ * @destroy: the destructor to use for this fragment
+ *
+ * Sets @destroy as the destructor to be called when all references to
+ * the frag @i in @skb (tracked over skb_clone, retransmit, pull-ups,
+ * etc) are released.
+ *
+ * When a destructor is set then reference counting is performed on
+ * @destroy->ref. When the ref reaches zero then @destroy->destroy
+ * will be called. The caller is responsible for holding and managing
+ * any other references (such a the struct page reference count).
+ *
+ * This function must be called before any use of skb_frag_ref() or
+ * skb_frag_unref().
+ */
+static inline void skb_frag_set_destructor(struct sk_buff *skb, int i,
+ struct skb_frag_destructor *destroy)
+{
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+ frag->page.destructor = destroy;
+}
+
+/**
* __skb_fill_page_desc - initialise a paged fragment in an skb
* @skb: buffer containing fragment to be initialised
* @i: paged fragment index to initialise
@@ -1239,6 +1270,7 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
frag->page.p = page;
+ frag->page.destructor = NULL;
frag->page_offset = off;
skb_frag_size_set(frag, size);
}
@@ -1743,6 +1775,9 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
return frag->page.p;
}
+extern void skb_frag_destructor_ref(struct skb_frag_destructor *destroy);
+extern void skb_frag_destructor_unref(struct skb_frag_destructor *destroy);
+
/**
* __skb_frag_ref - take an addition reference on a paged fragment.
* @frag: the paged fragment
@@ -1751,6 +1786,10 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
*/
static inline void __skb_frag_ref(skb_frag_t *frag)
{
+ if (unlikely(frag->page.destructor)) {
+ skb_frag_destructor_ref(frag->page.destructor);
+ return;
+ }
get_page(skb_frag_page(frag));
}
@@ -1774,6 +1813,10 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
*/
static inline void __skb_frag_unref(skb_frag_t *frag)
{
+ if (unlikely(frag->page.destructor)) {
+ skb_frag_destructor_unref(frag->page.destructor);
+ return;
+ }
put_page(skb_frag_page(frag));
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b8a41d6..9ec88ce 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -349,6 +349,23 @@ struct sk_buff *dev_alloc_skb(unsigned int length)
}
EXPORT_SYMBOL(dev_alloc_skb);
+void skb_frag_destructor_ref(struct skb_frag_destructor *destroy)
+{
+ BUG_ON(destroy == NULL);
+ atomic_inc(&destroy->ref);
+}
+EXPORT_SYMBOL(skb_frag_destructor_ref);
+
+void skb_frag_destructor_unref(struct skb_frag_destructor *destroy)
+{
+ if (destroy == NULL)
+ return;
+
+ if (atomic_dec_and_test(&destroy->ref))
+ destroy->destroy(destroy);
+}
+EXPORT_SYMBOL(skb_frag_destructor_unref);
+
static void skb_drop_list(struct sk_buff **listp)
{
struct sk_buff *list = *listp;
--
1.7.2.5
next prev parent reply other threads:[~2012-04-10 14:32 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-10 14:26 [PATCH v4 0/10] skb paged fragment destructors Ian Campbell
2012-04-10 14:26 ` [PATCH 01/10] net: add and use SKB_ALLOCSIZE Ian Campbell
2012-04-10 14:57 ` Eric Dumazet
2012-04-10 14:26 ` [PATCH 02/10] net: Use SKB_WITH_OVERHEAD in build_skb Ian Campbell
2012-04-10 14:58 ` Eric Dumazet
2012-04-10 14:26 ` [PATCH 03/10] chelsio: use SKB_WITH_OVERHEAD Ian Campbell
2012-04-10 14:59 ` Eric Dumazet
2012-04-10 14:26 ` [PATCH 04/10] net: pad skb data and shinfo as a whole rather than individually Ian Campbell
2012-04-10 15:01 ` Eric Dumazet
2012-04-10 14:26 ` [PATCH 05/10] net: move destructor_arg to the front of sk_buff Ian Campbell
2012-04-10 15:05 ` Eric Dumazet
2012-04-10 15:19 ` Ian Campbell
2012-04-10 18:33 ` Alexander Duyck
2012-04-10 18:41 ` Eric Dumazet
2012-04-10 19:15 ` Alexander Duyck
2012-04-11 8:00 ` Ian Campbell
2012-04-11 16:31 ` Alexander Duyck
2012-04-11 17:00 ` Ian Campbell
2012-04-11 8:20 ` Eric Dumazet
2012-04-11 16:05 ` Alexander Duyck
2012-04-11 7:56 ` Ian Campbell
2012-04-10 14:26 ` Ian Campbell [this message]
2012-04-26 20:44 ` [Xen-devel] [PATCH 06/10] net: add support for per-paged-fragment destructors Konrad Rzeszutek Wilk
2012-04-10 14:26 ` [PATCH 07/10] net: only allow paged fragments with the same destructor to be coalesced Ian Campbell
2012-04-10 20:11 ` Ben Hutchings
2012-04-11 7:45 ` Ian Campbell
2012-04-10 14:26 ` [PATCH 08/10] net: add skb_orphan_frags to copy aside frags with destructors Ian Campbell
[not found] ` <1334067965.5394.22.camel-o4Be2W7LfRlXesXXhkcM7miJhflN2719@public.gmane.org>
2012-04-10 14:26 ` [PATCH 09/10] net: add paged frag destructor support to kernel_sendpage Ian Campbell
2012-04-10 14:26 ` [PATCH 10/10] sunrpc: use SKB fragment destructors to delay completion until page is released by network stack Ian Campbell
2012-04-10 14:58 ` [PATCH v4 0/10] skb paged fragment destructors Michael S. Tsirkin
2012-04-10 15:00 ` Michael S. Tsirkin
2012-04-10 15:46 ` Bart Van Assche
2012-04-10 15:50 ` Ian Campbell
2012-04-11 10:02 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1334067984-7706-6-git-send-email-ian.campbell@citrix.com \
--to=ian.campbell@citrix.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=mirq-linux@rere.qmqm.pl \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).