From: Florian Weimer <fw@deneb.enyo.de>
To: "David S. Miller" <davem@redhat.com>
Cc: netdev@oss.sgi.com, linux-net@vger.kernel.org
Subject: Re: Route cache performance under stress
Date: Sun, 08 Jun 2003 13:39:49 +0200 [thread overview]
Message-ID: <87he70re62.fsf@deneb.enyo.de> (raw)
In-Reply-To: <20030526.233211.54217447.davem@redhat.com> (David S. Miller's message of "Mon, 26 May 2003 23:32:11 -0700 (PDT)")
"David S. Miller" <davem@redhat.com> writes:
> Of course, this will result in vastly decreased functionality (no
> arbitary netmasks, no policy-based routing, code will be fine-tuned
> for typical Internet routing tables), so this proposal definitely
> comes at a price.
>
> As a general purpose operating system, where people DO in fact use
> these features quite regularly,
Even non-CIDR netmasks? AFAIK, it's hard to find dedicated networking
devices (and routing protocols!) which support them. 8-/
Anyway, I've played a bit with something inspired by CEF (more
precisely speaking, one diagram in the IOS internals book and some IOS
diagnostic output).
Basically, it's a 256-way trie, with "adjacency information" at the
leaves (consisting of L2 addressing information and the prefix
length). The leaves contain a full list of child nodes which
reference to the leaf itself. This allows for branch-free routing
(see below). (A further optimization would not allocate the
self-referencing pointers for leaves which are at the fourth layer of
the trie, but this is unlikely to have a hughe performance impact.)
The trie has 7862 internal nodes for my copy of the Internet routing
table, which amounts to 8113584 bytes (excluding memory management
overhead, twice the value for 64 bit architectures). The numer of
internal nodes does not depend on the number of interfaces/peerings,
and prefix filtering based on their lengths (/27 or even /24) doesn't
make a huge difference either.
For each adjacency, space for the L2 addressing information is
required plus 256 pointers for the self-references (of course, for
each relevant prefix length, so you have a few kilobytes for a typical
peering).
The routing function looks like this:
struct cef_entry *
cef_route (struct cef_table *table, ipv4_t address)
{
unsigned char octet1 = address >> 24;
unsigned char octet2 = (address >> 16) & 0xFF;
unsigned char octet3 = (address >> 8) & 0xFF;
unsigned char octet4 = address & 0xFF;
struct cef_entry * entry1 = table->children[octet1];
struct cef_entry * entry2 = entry1->table[octet2];
struct cef_entry * entry3 = entry2->table[octet3];
struct cef_entry * entry4 = entry3->table[octet4];
return entry4;
}
For the full routing table with "maximum" adjacency information
(different L2 addressing information for each origin AS) and
"real-world" addresses (captured at the border of a medium-size
network, local addresses filtered), the function needs about 82 cycles
per routing decision on my Athlon XP (including function call
overhead). For random addresses, we have 155 cycles. In a simulation
of a moderate peering (only 94 adjacencies, simulated interfaces to
half a dozen AS concentrated in Germany), I measured 45 cycles per
routing decision for real-world traffic, and 70 cycles for random
addresses. (More peerings result in more adjacencies which lead to
fewer cache hits.)
You can save 1K (or 2K on 64-bit architectures) per adjacency if you
introduce data-dependent branches:
struct cef_entry *
cef_route (struct cef_table *table, ipv4_t address)
{
unsigned char octet1 = address >> 24;
struct cef_entry * entry1 = table->children[octet1];
if (entry1->prefix_length < 0) {
unsigned char octet2 = (address >> 16) & 0xFF;
struct cef_entry * entry2 = entry1->table[octet2];
if (entry2->prefix_length < 0) {
unsigned char octet3 = (address >> 8) & 0xFF;
struct cef_entry * entry3 = entry2->table[octet3];
if (entry3->prefix_length < 0) {
unsigned char octet4 = address & 0xFF;
struct cef_entry * entry4 = entry3->table[octet4];
return entry4;
} else {
return entry3;
}
} else {
return entry2;
}
} else {
return entry1;
}
}
However, this decreases performance (even on my Athlon XP with just
256 KB cache).
At the moment, I've got a userspace prototype for simulations which
can build the trie and make routing decisions. Removing entries is a
bit tricky and requires more data because formerly overridden prefixes
might have to be resurrected. I'm unsure which data structures should
be used to solve this problem. Memory management is a related
question, too. And locking. *sigh*
next prev parent reply other threads:[~2003-06-08 11:39 UTC|newest]
Thread overview: 217+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87d6iit4g7.fsf@deneb.enyo.de>
[not found] ` <20030517.150933.74723581.davem@redhat.com>
[not found] ` <87iss87gqd.fsf@deneb.enyo.de>
2003-05-18 9:31 ` Route cache performance under stress David S. Miller
2003-05-19 17:36 ` Jamal Hadi
2003-05-19 19:18 ` Ralph Doncaster
2003-05-19 22:37 ` Jamal Hadi
2003-05-20 1:10 ` Simon Kirby
2003-05-20 1:14 ` David S. Miller
2003-05-20 1:23 ` Jamal Hadi
2003-05-20 1:24 ` David S. Miller
2003-05-20 2:13 ` Jamal Hadi
2003-05-20 5:01 ` Pekka Savola
2003-05-20 11:47 ` Jamal Hadi
2003-05-20 11:55 ` Pekka Savola
2003-05-20 6:46 ` David S. Miller
2003-05-20 12:04 ` Jamal Hadi
2003-05-21 0:36 ` David S. Miller
2003-05-21 13:03 ` Jamal Hadi
2003-05-23 5:42 ` David S. Miller
2003-05-22 8:40 ` Simon Kirby
2003-05-22 8:58 ` David S. Miller
2003-05-22 10:40 ` David S. Miller
2003-05-22 11:15 ` Martin Josefsson
2003-05-23 1:00 ` David S. Miller
2003-05-23 1:01 ` David S. Miller
2003-05-23 8:21 ` Andi Kleen
2003-05-23 8:22 ` David S. Miller
2003-05-23 9:03 ` Andi Kleen
2003-05-23 9:59 ` David S. Miller
2003-05-24 0:41 ` Andrew Morton
2003-05-26 2:29 ` David S. Miller
2003-05-22 11:44 ` Simon Kirby
2003-05-22 13:03 ` Martin Josefsson
2003-05-23 0:55 ` David S. Miller
2003-05-22 22:33 ` David S. Miller
2003-05-29 20:51 ` Simon Kirby
2003-06-02 10:58 ` Robert Olsson
2003-06-02 15:18 ` Simon Kirby
2003-06-02 16:36 ` Robert Olsson
2003-06-02 18:05 ` Simon Kirby
2003-06-09 17:21 ` David S. Miller
2003-06-09 17:19 ` David S. Miller
2003-05-23 0:59 ` David S. Miller
2003-05-26 7:18 ` Florian Weimer
2003-05-26 7:29 ` David S. Miller
2003-05-26 9:34 ` Florian Weimer
2003-05-27 6:32 ` David S. Miller
2003-06-08 11:39 ` Florian Weimer [this message]
2003-06-08 12:05 ` David S. Miller
2003-06-08 13:10 ` Florian Weimer
2003-06-08 23:49 ` Simon Kirby
2003-06-08 23:55 ` CIT/Paul
2003-06-09 3:15 ` Jamal Hadi
2003-06-09 5:27 ` CIT/Paul
2003-06-09 5:58 ` David S. Miller
2003-06-09 6:28 ` CIT/Paul
2003-06-09 6:28 ` David S. Miller
2003-06-09 16:23 ` Stephen Hemminger
2003-06-09 16:37 ` David S. Miller
2003-06-09 7:13 ` Simon Kirby
2003-06-09 8:10 ` CIT/Paul
2003-06-09 8:27 ` Simon Kirby
2003-06-09 19:38 ` CIT/Paul
2003-06-09 21:30 ` David S. Miller
2003-06-09 22:19 ` Simon Kirby
2003-06-09 22:54 ` Robert Olsson
2003-06-13 6:21 ` David S. Miller
2003-06-13 10:40 ` Robert Olsson
2003-06-15 6:36 ` David S. Miller
2003-06-17 17:03 ` Robert Olsson
2003-06-09 22:56 ` CIT/Paul
2003-06-09 23:05 ` David S. Miller
2003-06-10 13:41 ` Robert Olsson
2003-06-10 0:03 ` Jamal Hadi
2003-06-10 0:32 ` Ralph Doncaster
2003-06-10 1:15 ` Jamal Hadi
2003-06-10 2:45 ` Ralph Doncaster
2003-06-10 3:23 ` Ben Greear
2003-06-10 3:41 ` Ralph Doncaster
2003-06-10 18:10 ` Ralph Doncaster
2003-06-10 18:21 ` Ben Greear
2003-06-10 4:34 ` Simon Kirby
2003-06-10 11:01 ` Jamal Hadi
2003-06-10 11:28 ` Jamal Hadi
2003-06-10 13:18 ` Ralph Doncaster
2003-06-10 16:10 ` David S. Miller
2003-06-10 10:53 ` Jamal Hadi
2003-06-10 11:41 ` chas williams
2003-06-10 16:27 ` David S. Miller
2003-06-10 16:57 ` chas williams
2003-06-10 11:41 ` Pekka Savola
2003-06-10 11:58 ` John S. Denker
2003-06-10 12:12 ` Jamal Hadi
2003-06-10 16:33 ` David S. Miller
2003-06-10 12:07 ` Jamal Hadi
2003-06-10 15:29 ` Ralph Doncaster
2003-06-11 19:48 ` Florian Weimer
2003-06-11 19:40 ` CIT/Paul
2003-06-11 21:09 ` Florian Weimer
2003-06-10 13:10 ` Ralph Doncaster
2003-06-10 13:36 ` Jamal Hadi
2003-06-10 14:03 ` Ralph Doncaster
2003-06-10 16:38 ` David S. Miller
2003-06-10 16:39 ` David S. Miller
2003-06-10 18:41 ` Florian Weimer
2003-06-11 11:47 ` Was (Re: " Jamal Hadi
2003-06-11 18:41 ` Real World Routers 8-) Florian Weimer
2003-06-10 15:53 ` Route cache performance under stress David S. Miller
2003-06-10 16:15 ` 3c59x (was Route cache performance under stress) Bogdan Costescu
2003-06-10 16:20 ` Andi Kleen
2003-06-10 16:23 ` Jeff Garzik
2003-06-10 17:02 ` 3c59x David S. Miller
2003-06-10 17:16 ` 3c59x Jeff Garzik
2003-06-10 17:14 ` 3c59x David S. Miller
2003-06-10 17:25 ` 3c59x Jeff Garzik
2003-06-10 17:30 ` 3c59x David S. Miller
2003-06-10 19:20 ` 3c59x Jeff Garzik
2003-06-10 19:21 ` 3c59x David S. Miller
2003-06-10 17:18 ` 3c59x Andi Kleen
2003-06-10 17:29 ` 3c59x chas williams
2003-06-10 17:31 ` 3c59x David S. Miller
2003-06-10 17:39 ` 3c59x chas williams
2003-06-10 17:43 ` 3c59x David S. Miller
2003-06-11 17:52 ` Route cache performance under stress Robert Olsson
2003-06-10 1:53 ` Simon Kirby
2003-06-10 3:18 ` Ralph Doncaster
2003-06-10 16:06 ` David S. Miller
2003-06-10 15:56 ` David S. Miller
2003-06-10 16:45 ` 3c59x (was Route cache performance under stress) Bogdan Costescu
2003-06-10 16:49 ` Andi Kleen
2003-06-11 9:54 ` Robert Olsson
2003-06-11 10:05 ` Andi Kleen
2003-06-11 10:38 ` Robert Olsson
2003-06-11 12:08 ` Jamal Hadi
2003-06-10 17:12 ` 3c59x David S. Miller
2003-06-10 17:19 ` Route cache performance under stress Ralph Doncaster
2003-06-10 15:49 ` David S. Miller
2003-06-10 17:33 ` Ralph Doncaster
2003-06-10 17:32 ` David S. Miller
2003-06-10 18:34 ` Robert Olsson
2003-06-10 18:57 ` David S. Miller
2003-06-10 19:53 ` Robert Olsson
2003-06-10 21:36 ` CIT/Paul
2003-06-10 21:39 ` Ralph Doncaster
2003-06-10 22:20 ` David S. Miller
2003-06-10 23:58 ` Ralph Doncaster
2003-06-10 23:57 ` David S. Miller
2003-06-11 0:41 ` Ralph Doncaster
2003-06-11 0:58 ` David S. Miller
2003-06-11 0:58 ` David S. Miller
2003-06-11 0:51 ` Ben Greear
2003-06-11 1:01 ` David S. Miller
2003-06-11 1:15 ` Ben Greear
2003-06-11 1:22 ` David S. Miller
2003-06-11 1:51 ` Ben Greear
2003-06-11 3:33 ` David S. Miller
2003-06-11 11:54 ` gettime: Was (Re: " Jamal Hadi
2003-06-11 12:08 ` Andi Kleen
2003-06-12 3:30 ` David S. Miller
2003-06-12 6:32 ` Ben Greear
2003-06-12 8:46 ` David S. Miller
2003-06-11 15:57 ` Ben Greear
2003-06-12 3:29 ` David S. Miller
2003-06-11 1:17 ` Ralph Doncaster
2003-06-11 1:23 ` David S. Miller
2003-06-11 7:28 ` Andi Kleen
2003-06-11 7:25 ` Andi Kleen
2003-06-11 17:40 ` Robert Olsson
2003-06-13 5:38 ` David S. Miller
2003-06-13 10:22 ` Robert Olsson
2003-06-13 17:15 ` Robert Olsson
2003-06-12 6:45 ` David S. Miller
2003-06-12 13:56 ` Robert Olsson
2003-06-12 21:35 ` David S. Miller
2003-06-13 10:50 ` Robert Olsson
2003-06-10 0:56 ` Ralph Doncaster
2003-06-09 11:38 ` Jamal Hadi
2003-06-09 11:55 ` David S. Miller
2003-06-09 12:18 ` Jamal Hadi
2003-06-09 12:32 ` David S. Miller
2003-06-09 13:22 ` Jamal Hadi
2003-06-09 13:22 ` David S. Miller
2003-06-09 8:56 ` David S. Miller
2003-06-09 22:39 ` Robert Olsson
2003-06-09 6:25 ` David S. Miller
2003-06-09 6:59 ` Simon Kirby
2003-06-09 7:03 ` David S. Miller
2003-06-09 13:04 ` Ralph Doncaster
2003-06-09 13:26 ` Jamal Hadi
2003-06-09 5:44 ` David S. Miller
2003-06-09 5:51 ` CIT/Paul
2003-06-09 6:03 ` David S. Miller
2003-06-09 6:52 ` Simon Kirby
2003-06-09 6:56 ` David S. Miller
2003-06-09 7:36 ` Simon Kirby
2003-06-09 8:18 ` Simon Kirby
2003-06-09 8:22 ` David S. Miller
2003-06-09 8:31 ` Simon Kirby
2003-06-09 9:01 ` David S. Miller
2003-06-09 9:47 ` Andi Kleen
2003-06-09 10:03 ` David S. Miller
2003-06-09 10:13 ` Andi Kleen
2003-06-09 10:13 ` David S. Miller
2003-06-09 10:40 ` YOSHIFUJI Hideaki / 吉藤英明
2003-06-09 10:40 ` David S. Miller
2003-06-09 14:14 ` David S. Miller
2003-06-09 6:47 ` Simon Kirby
2003-06-09 6:49 ` David S. Miller
2003-06-09 13:28 ` Ralph Doncaster
2003-06-09 16:30 ` Simon Kirby
2003-06-17 20:58 ` Florian Weimer
2003-06-09 5:38 ` David S. Miller
2003-06-10 3:05 ` Steven Blake
2003-06-12 6:31 ` David S. Miller
2003-06-08 17:58 ` Pekka Savola
2003-05-21 0:09 ` Simon Kirby
2003-05-21 0:13 ` David S. Miller
2003-05-26 9:29 ` Florian Weimer
[not found] <8765pshpd4.fsf@deneb.enyo.de>
2003-04-05 18:17 ` Martin Josefsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87he70re62.fsf@deneb.enyo.de \
--to=fw@deneb.enyo.de \
--cc=davem@redhat.com \
--cc=linux-net@vger.kernel.org \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).