netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: "David S. Miller" <davem@davemloft.net>
Cc: kelly@au1.ibm.com, netdev@vger.kernel.org
Subject: Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
Date: Fri, 05 May 2006 11:31:08 +1000	[thread overview]
Message-ID: <1146792668.7681.59.camel@localhost.localdomain> (raw)
In-Reply-To: <20060504.162223.44735502.davem@davemloft.net>

On Thu, 2006-05-04 at 16:22 -0700, David S. Miller wrote:
> From: Kelly Daly <kelly@au1.ibm.com>
> Date: Thu, 4 May 2006 12:59:23 +1000
> 
> > We DID write an infrastructure to resolve this issue, although it is more 
> > complex than the dynamic descriptor scheme for userspace.  And we want to 
> > keep this simple - right?
> 
> Yes.
> 
> I wonder if it is possible to manage the buffer pool just like a SLAB
> cache to deal with the variable lifetimes.  The system has a natural
> "working set" size of networking buffers at a given point in time and
> even the default net channel can grow to accomodate that with some
> kind of limit.
> 
> This is kind of what I was alluding to in the past, in that we now
> have globals limits on system TCP socket memory when really what we
> want to do is have a set of global generic system packet memory
> limits.
> 
> These two things can tie in together.

Hi Dave,

	We kept a simple "used" bitmap, but to avoid the consumer touching it,
also put a "I am masquerading as an SKB" bit in the trailer, like so:

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16405-linux-2.6.17-rc3-git7/include/linux/skbuff.h .16405-linux-2.6.17-rc3-git7.updated/include/linux/skbuff.h
--- .16405-linux-2.6.17-rc3-git7/include/linux/skbuff.h	2006-05-03 22:07:14.000000000 +1000
+++ .16405-linux-2.6.17-rc3-git7.updated/include/linux/skbuff.h	2006-05-03 22:07:15.000000000 +1000
@@ -133,7 +133,8 @@ struct skb_frag_struct {
  */
 struct skb_shared_info {
 	atomic_t	dataref;
-	unsigned short	nr_frags;
+	unsigned short	nr_frags : 15;
+	unsigned int	chan_as_skb : 1;
 	unsigned short	tso_size;
 	unsigned short	tso_segs;
 	unsigned short  ufo_size;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16405-linux-2.6.17-rc3-git7/net/core/skbuff.c .16405-linux-2.6.17-rc3-git7.updated/net/core/skbuff.c
--- .16405-linux-2.6.17-rc3-git7/net/core/skbuff.c	2006-05-03 22:07:14.000000000 +1000
+++ .16405-linux-2.6.17-rc3-git7.updated/net/core/skbuff.c	2006-05-03 22:07:15.000000000 +1000
@@ -289,6 +289,7 @@ struct sk_buff *skb_netchan_graft(struct
 	skb_shinfo(skb)->ufo_size = 0;
 	skb_shinfo(skb)->ip6_frag_id = 0;
 	skb_shinfo(skb)->frag_list = NULL;
+	skb_shinfo(skb)->chan_as_skb = 1;
 
 	return skb;
 }
@@ -328,7 +329,10 @@ void skb_release_data(struct sk_buff *sk
 		if (skb_shinfo(skb)->frag_list)
 			skb_drop_fraglist(skb);
 
-		kfree(skb->head);
+		if (skb_shinfo(skb)->chan_as_skb)
+			skb_shinfo(skb)->chan_as_skb = 0;
+		else
+			kfree(skb->head);
 	}
 }
 
Buffer allocation would be: find_first_bit, check that it's not actually
inside an skb, or otherwise find_next_bit.  Assuming most buffers do not
go down default channel, this is efficient.

Problems:
1) it's still not cache-friendly with producers on multiple CPUs.  We
could divide up the bitmap into per-cpu regions to try first to improve
cache behaviour.

2) In addition, we had every buffer one page large.  This isn't
sufficient for jumbo frames, and wasteful for ethernet.  So if we
statically assign descriptors -> buffers, we need to have multiple
sizes.

3) OTOH, if descriptor table is dynamic, we have cache issues again as
multiple people are writing to it, and it's not clear what we really
gain over direct pointers.

4) Grow/shrink can be done, but needs stop_machine, or maybe tricky RCU.

5) The killer for me: we can't use our scheme straight-to-userspace
anyway, since we can't trust the (user-writable) ringbuffer in deciding
what buffers to release.  Since we need to store this somewhere, we need
a test in netchannel_enqueue.  At which point, we might as well
translate to "descriptors" at that point, anyway (since descriptors are
only really needed for userspace).  Something like:

	tail = np->netchan_tail;
	if (tail == np->netchan_head)
		return -ENOMEM;

+	/* Write to userspace?  They can't deref ptr anyway. */
+	if (np->shadow_ring && !netchan_local_buf(bp)) {
+		np->shadow_ring[tail] = bp;
+		bp = (void *)-1;
+	}
	np->netchan_queue[tail++] = bp;
	if (tail == NET_CHANNEL_ENTRIES)

(We don't have local buffers yet, but I'm assuming we'll use v. low
pointers for them).  Userspace goes "desc number is in range, we can
access directly" or "desc number isn't, call into kernel to copy them
for us".

> So, are you still sure you want to do away with the descriptors for
> the default channel?  Is the scheme I have outlined above doable or
> is there some critical barrier or some complexity issue which makes
> it undesirable?

I think it's simpler to build global alloc limiters on what we have.
The slab already has the nice lifetime and cache-friendly properties we
want, so we just have to write the limiting code.  There's enough work
there to keep us busy 8)

Cheers,
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol


  reply	other threads:[~2006-05-05  1:31 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-26 11:47 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Kelly Daly
2006-04-26  7:33 ` David S. Miller
2006-04-27  3:31   ` Kelly Daly
2006-04-27  6:25     ` David S. Miller
2006-04-27 11:51       ` Evgeniy Polyakov
2006-04-27 20:09         ` David S. Miller
2006-04-28  6:05           ` Evgeniy Polyakov
2006-05-04  2:59       ` Kelly Daly
2006-05-04 23:22         ` David S. Miller
2006-05-05  1:31           ` Rusty Russell [this message]
2006-04-26  7:59 ` David S. Miller
2006-05-04  7:28   ` Kelly Daly
2006-05-04 23:11     ` David S. Miller
2006-05-05  2:48       ` Kelly Daly
2006-05-16  1:02         ` Kelly Daly
2006-05-16  1:05           ` David S. Miller
2006-05-16  1:15             ` Kelly Daly
2006-05-16  5:16           ` David S. Miller
2006-06-22  2:05             ` Kelly Daly
2006-06-22  3:58               ` James Morris
2006-06-22  4:31                 ` Arnaldo Carvalho de Melo
2006-06-22  4:36                 ` YOSHIFUJI Hideaki / 吉藤英明
2006-07-08  0:05               ` David Miller
2006-05-16  6:19           ` [1/1] netchannel subsystem Evgeniy Polyakov
2006-05-16  6:57             ` David S. Miller
2006-05-16  6:59               ` Evgeniy Polyakov
2006-05-16  7:06                 ` David S. Miller
2006-05-16  7:15                   ` Evgeniy Polyakov
2006-05-16  7:07                 ` Evgeniy Polyakov
2006-05-16 17:34               ` [1/1] Netchannel subsyste Evgeniy Polyakov
2006-05-18 10:34                 ` Netchannel subsystem update Evgeniy Polyakov
2006-05-20 15:52                   ` Evgeniy Polyakov
2006-05-22  6:06                     ` David S. Miller
2006-05-22 16:34                       ` [Netchannel] Full TCP receiving support Evgeniy Polyakov
2006-05-24  9:38                         ` Evgeniy Polyakov
  -- strict thread matches above, loose matches on Subject: below --
2006-04-26 16:57 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Caitlin Bestler
2006-04-26 19:23 ` David S. Miller
2006-04-26 19:30 Caitlin Bestler
2006-04-26 19:46 ` Jeff Garzik
2006-04-26 22:40   ` David S. Miller
2006-04-27  3:40 ` Rusty Russell
2006-04-27  4:58   ` James Morris
2006-04-27  6:16     ` David S. Miller
2006-04-27  6:17   ` David S. Miller
2006-04-26 20:20 Caitlin Bestler
2006-04-26 22:35 ` David S. Miller
2006-04-26 22:53 Caitlin Bestler
2006-04-26 22:59 ` David S. Miller
2006-04-27  1:02 Caitlin Bestler
2006-04-27  6:08 ` David S. Miller
2006-04-27  6:17   ` Andi Kleen
2006-04-27  6:27     ` David S. Miller
2006-04-27  6:41       ` Andi Kleen
2006-04-27  7:52         ` David S. Miller
2006-04-27 21:12 Caitlin Bestler
2006-04-28  6:10 ` Evgeniy Polyakov
2006-04-28  7:20   ` David S. Miller
2006-04-28  7:32     ` Evgeniy Polyakov
2006-04-28 18:20       ` David S. Miller
2006-04-28  8:24 ` Rusty Russell
2006-04-28 19:21   ` David S. Miller
2006-04-28 22:04     ` Rusty Russell
2006-04-28 22:38       ` David S. Miller
2006-04-29  0:10         ` Rusty Russell
2006-04-28 15:59 Caitlin Bestler
2006-04-28 16:12 ` Evgeniy Polyakov
2006-04-28 19:09   ` David S. Miller
2006-04-28 17:02 Caitlin Bestler
2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:29   ` Evgeniy Polyakov
2006-04-28 17:41     ` Stephen Hemminger
2006-04-28 17:55       ` Evgeniy Polyakov
2006-04-28 19:16         ` David S. Miller
2006-04-28 19:49           ` Stephen Hemminger
2006-04-28 19:59             ` Evgeniy Polyakov
2006-04-28 22:00               ` David S. Miller
2006-04-29 13:54                 ` Evgeniy Polyakov
     [not found]                 ` <20060429124451.GA19810@2ka.mipt.ru>
2006-05-01 21:32                   ` David S. Miller
2006-05-02  7:08                     ` Evgeniy Polyakov
2006-04-28 19:52           ` Evgeniy Polyakov
2006-04-28 19:10   ` David S. Miller
2006-04-28 20:46     ` Brent Cook
2006-04-28 17:25 ` Evgeniy Polyakov
2006-04-28 19:14   ` David S. Miller
2006-04-28 17:55 Caitlin Bestler
2006-04-28 22:17 ` Rusty Russell
2006-04-28 22:40   ` David S. Miller
2006-04-29  0:22     ` Rusty Russell
2006-04-29  6:46       ` David S. Miller
2006-04-28 23:45 Caitlin Bestler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1146792668.7681.59.camel@localhost.localdomain \
    --to=rusty@rustcorp.com.au \
    --cc=davem@davemloft.net \
    --cc=kelly@au1.ibm.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).