From: Rusty Russell <rusty@rustcorp.com.au>
To: "David S. Miller" <davem@davemloft.net>
Cc: kelly@au1.ibm.com, netdev@vger.kernel.org
Subject: Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
Date: Fri, 05 May 2006 11:31:08 +1000 [thread overview]
Message-ID: <1146792668.7681.59.camel@localhost.localdomain> (raw)
In-Reply-To: <20060504.162223.44735502.davem@davemloft.net>
On Thu, 2006-05-04 at 16:22 -0700, David S. Miller wrote:
> From: Kelly Daly <kelly@au1.ibm.com>
> Date: Thu, 4 May 2006 12:59:23 +1000
>
> > We DID write an infrastructure to resolve this issue, although it is more
> > complex than the dynamic descriptor scheme for userspace. And we want to
> > keep this simple - right?
>
> Yes.
>
> I wonder if it is possible to manage the buffer pool just like a SLAB
> cache to deal with the variable lifetimes. The system has a natural
> "working set" size of networking buffers at a given point in time and
> even the default net channel can grow to accomodate that with some
> kind of limit.
>
> This is kind of what I was alluding to in the past, in that we now
> have globals limits on system TCP socket memory when really what we
> want to do is have a set of global generic system packet memory
> limits.
>
> These two things can tie in together.
Hi Dave,
We kept a simple "used" bitmap, but to avoid the consumer touching it,
also put a "I am masquerading as an SKB" bit in the trailer, like so:
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16405-linux-2.6.17-rc3-git7/include/linux/skbuff.h .16405-linux-2.6.17-rc3-git7.updated/include/linux/skbuff.h
--- .16405-linux-2.6.17-rc3-git7/include/linux/skbuff.h 2006-05-03 22:07:14.000000000 +1000
+++ .16405-linux-2.6.17-rc3-git7.updated/include/linux/skbuff.h 2006-05-03 22:07:15.000000000 +1000
@@ -133,7 +133,8 @@ struct skb_frag_struct {
*/
struct skb_shared_info {
atomic_t dataref;
- unsigned short nr_frags;
+ unsigned short nr_frags : 15;
+ unsigned int chan_as_skb : 1;
unsigned short tso_size;
unsigned short tso_segs;
unsigned short ufo_size;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16405-linux-2.6.17-rc3-git7/net/core/skbuff.c .16405-linux-2.6.17-rc3-git7.updated/net/core/skbuff.c
--- .16405-linux-2.6.17-rc3-git7/net/core/skbuff.c 2006-05-03 22:07:14.000000000 +1000
+++ .16405-linux-2.6.17-rc3-git7.updated/net/core/skbuff.c 2006-05-03 22:07:15.000000000 +1000
@@ -289,6 +289,7 @@ struct sk_buff *skb_netchan_graft(struct
skb_shinfo(skb)->ufo_size = 0;
skb_shinfo(skb)->ip6_frag_id = 0;
skb_shinfo(skb)->frag_list = NULL;
+ skb_shinfo(skb)->chan_as_skb = 1;
return skb;
}
@@ -328,7 +329,10 @@ void skb_release_data(struct sk_buff *sk
if (skb_shinfo(skb)->frag_list)
skb_drop_fraglist(skb);
- kfree(skb->head);
+ if (skb_shinfo(skb)->chan_as_skb)
+ skb_shinfo(skb)->chan_as_skb = 0;
+ else
+ kfree(skb->head);
}
}
Buffer allocation would be: find_first_bit, check that it's not actually
inside an skb, or otherwise find_next_bit. Assuming most buffers do not
go down default channel, this is efficient.
Problems:
1) it's still not cache-friendly with producers on multiple CPUs. We
could divide up the bitmap into per-cpu regions to try first to improve
cache behaviour.
2) In addition, we had every buffer one page large. This isn't
sufficient for jumbo frames, and wasteful for ethernet. So if we
statically assign descriptors -> buffers, we need to have multiple
sizes.
3) OTOH, if descriptor table is dynamic, we have cache issues again as
multiple people are writing to it, and it's not clear what we really
gain over direct pointers.
4) Grow/shrink can be done, but needs stop_machine, or maybe tricky RCU.
5) The killer for me: we can't use our scheme straight-to-userspace
anyway, since we can't trust the (user-writable) ringbuffer in deciding
what buffers to release. Since we need to store this somewhere, we need
a test in netchannel_enqueue. At which point, we might as well
translate to "descriptors" at that point, anyway (since descriptors are
only really needed for userspace). Something like:
tail = np->netchan_tail;
if (tail == np->netchan_head)
return -ENOMEM;
+ /* Write to userspace? They can't deref ptr anyway. */
+ if (np->shadow_ring && !netchan_local_buf(bp)) {
+ np->shadow_ring[tail] = bp;
+ bp = (void *)-1;
+ }
np->netchan_queue[tail++] = bp;
if (tail == NET_CHANNEL_ENTRIES)
(We don't have local buffers yet, but I'm assuming we'll use v. low
pointers for them). Userspace goes "desc number is in range, we can
access directly" or "desc number isn't, call into kernel to copy them
for us".
> So, are you still sure you want to do away with the descriptors for
> the default channel? Is the scheme I have outlined above doable or
> is there some critical barrier or some complexity issue which makes
> it undesirable?
I think it's simpler to build global alloc limiters on what we have.
The slab already has the nice lifetime and cache-friendly properties we
want, so we just have to write the limiting code. There's enough work
there to keep us busy 8)
Cheers,
Rusty.
--
ccontrol: http://ozlabs.org/~rusty/ccontrol
next prev parent reply other threads:[~2006-05-05 1:31 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-26 11:47 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Kelly Daly
2006-04-26 7:33 ` David S. Miller
2006-04-27 3:31 ` Kelly Daly
2006-04-27 6:25 ` David S. Miller
2006-04-27 11:51 ` Evgeniy Polyakov
2006-04-27 20:09 ` David S. Miller
2006-04-28 6:05 ` Evgeniy Polyakov
2006-05-04 2:59 ` Kelly Daly
2006-05-04 23:22 ` David S. Miller
2006-05-05 1:31 ` Rusty Russell [this message]
2006-04-26 7:59 ` David S. Miller
2006-05-04 7:28 ` Kelly Daly
2006-05-04 23:11 ` David S. Miller
2006-05-05 2:48 ` Kelly Daly
2006-05-16 1:02 ` Kelly Daly
2006-05-16 1:05 ` David S. Miller
2006-05-16 1:15 ` Kelly Daly
2006-05-16 5:16 ` David S. Miller
2006-06-22 2:05 ` Kelly Daly
2006-06-22 3:58 ` James Morris
2006-06-22 4:31 ` Arnaldo Carvalho de Melo
2006-06-22 4:36 ` YOSHIFUJI Hideaki / 吉藤英明
2006-07-08 0:05 ` David Miller
2006-05-16 6:19 ` [1/1] netchannel subsystem Evgeniy Polyakov
2006-05-16 6:57 ` David S. Miller
2006-05-16 6:59 ` Evgeniy Polyakov
2006-05-16 7:06 ` David S. Miller
2006-05-16 7:15 ` Evgeniy Polyakov
2006-05-16 7:07 ` Evgeniy Polyakov
2006-05-16 17:34 ` [1/1] Netchannel subsyste Evgeniy Polyakov
2006-05-18 10:34 ` Netchannel subsystem update Evgeniy Polyakov
2006-05-20 15:52 ` Evgeniy Polyakov
2006-05-22 6:06 ` David S. Miller
2006-05-22 16:34 ` [Netchannel] Full TCP receiving support Evgeniy Polyakov
2006-05-24 9:38 ` Evgeniy Polyakov
-- strict thread matches above, loose matches on Subject: below --
2006-04-26 16:57 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Caitlin Bestler
2006-04-26 19:23 ` David S. Miller
2006-04-26 19:30 Caitlin Bestler
2006-04-26 19:46 ` Jeff Garzik
2006-04-26 22:40 ` David S. Miller
2006-04-27 3:40 ` Rusty Russell
2006-04-27 4:58 ` James Morris
2006-04-27 6:16 ` David S. Miller
2006-04-27 6:17 ` David S. Miller
2006-04-26 20:20 Caitlin Bestler
2006-04-26 22:35 ` David S. Miller
2006-04-26 22:53 Caitlin Bestler
2006-04-26 22:59 ` David S. Miller
2006-04-27 1:02 Caitlin Bestler
2006-04-27 6:08 ` David S. Miller
2006-04-27 6:17 ` Andi Kleen
2006-04-27 6:27 ` David S. Miller
2006-04-27 6:41 ` Andi Kleen
2006-04-27 7:52 ` David S. Miller
2006-04-27 21:12 Caitlin Bestler
2006-04-28 6:10 ` Evgeniy Polyakov
2006-04-28 7:20 ` David S. Miller
2006-04-28 7:32 ` Evgeniy Polyakov
2006-04-28 18:20 ` David S. Miller
2006-04-28 8:24 ` Rusty Russell
2006-04-28 19:21 ` David S. Miller
2006-04-28 22:04 ` Rusty Russell
2006-04-28 22:38 ` David S. Miller
2006-04-29 0:10 ` Rusty Russell
2006-04-28 15:59 Caitlin Bestler
2006-04-28 16:12 ` Evgeniy Polyakov
2006-04-28 19:09 ` David S. Miller
2006-04-28 17:02 Caitlin Bestler
2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:29 ` Evgeniy Polyakov
2006-04-28 17:41 ` Stephen Hemminger
2006-04-28 17:55 ` Evgeniy Polyakov
2006-04-28 19:16 ` David S. Miller
2006-04-28 19:49 ` Stephen Hemminger
2006-04-28 19:59 ` Evgeniy Polyakov
2006-04-28 22:00 ` David S. Miller
2006-04-29 13:54 ` Evgeniy Polyakov
[not found] ` <20060429124451.GA19810@2ka.mipt.ru>
2006-05-01 21:32 ` David S. Miller
2006-05-02 7:08 ` Evgeniy Polyakov
2006-04-28 19:52 ` Evgeniy Polyakov
2006-04-28 19:10 ` David S. Miller
2006-04-28 20:46 ` Brent Cook
2006-04-28 17:25 ` Evgeniy Polyakov
2006-04-28 19:14 ` David S. Miller
2006-04-28 17:55 Caitlin Bestler
2006-04-28 22:17 ` Rusty Russell
2006-04-28 22:40 ` David S. Miller
2006-04-29 0:22 ` Rusty Russell
2006-04-29 6:46 ` David S. Miller
2006-04-28 23:45 Caitlin Bestler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1146792668.7681.59.camel@localhost.localdomain \
--to=rusty@rustcorp.com.au \
--cc=davem@davemloft.net \
--cc=kelly@au1.ibm.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).