From: David Miller <davem@davemloft.net>
To: alexander.duyck@gmail.com
Cc: eric.dumazet@gmail.com, netdev@vger.kernel.org
Subject: Re: [PATCH 00/16] Remove the ipv4 routing cache
Date: Thu, 26 Jul 2012 23:02:46 -0700 (PDT) [thread overview]
Message-ID: <20120726.230246.219188476590178857.davem@davemloft.net> (raw)
In-Reply-To: <20120726.200846.66786272076299783.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Thu, 26 Jul 2012 20:08:46 -0700 (PDT)
> A lot of the overhead comes from write traffic that results from
> filling in the "fib_result" structure onto the callers stack.
Here's the longer analysis of how things are now.
There are several components to a route lookup result, and struct
fib_result tries to encapsulate all of this.
Another aspect is that our route tables are broken up into different
datas tructures which reference each other, in order to save space.
So the actual objects in the FIB trie are fib_alias structures, and
those point to fib_info. There is a many to one relationship between
FIB trie nodes and fib_info objects.
The idea is that many routes have the same set of nexthops, metrics,
preferred source address, etc.
So one thing we return in the fib_result is a pointer to the fib_info
and an index into the nexthop array (nh_sel). That's why we have all
of these funny accessor's FIB_RES_X(res) which essentially provide
res.fi->fib_nh[res.nh_sel].X
Therefore one area of simplification would be to just return a pointer
to the FIB nexthop, rather than the fib_info pointer and the nexthop
index. We can get to the fib_info, if we need to, via the nh_parent
pointer of the nexthop.
It seems also that the res->scope value can be cribbed from the
fib_info as well.
res->type is embedded in the fib_alias we select hanging off of the
FIB trie node. And the res->prefixlen is taken from the FIB trie
node.
res->tclassid is problematic, because it comes from the FIB rules
tables rather than the FIB trie. We used to store a full FIB rules
pointer in the fib_result, but I reduced it down to just the u32
tclassid.
This whole area, as well as the FIB trie lookup itself, is an area
ripe for a large number of small micro-optimizations that in the end
make it's overhead much more reasonable.
Another thing I haven't mentioned is that another part of FIB trie's
overhead is that it does backtracking. The shorter prefixes sit at
the top of the trie, so when it traverses down it does so until it
can't get a match, then it walks back up to the root until it does
have a match.
next prev parent reply other threads:[~2012-07-27 6:02 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-20 21:25 [PATCH 00/16] Remove the ipv4 routing cache David Miller
2012-07-20 22:05 ` Eric Dumazet
2012-07-20 22:42 ` Eric Dumazet
2012-07-20 22:50 ` David Miller
2012-07-20 22:54 ` David Miller
2012-07-20 23:13 ` David Miller
2012-07-21 5:40 ` Eric Dumazet
2012-07-22 7:47 ` Vijay Subramanian
2012-07-22 19:42 ` David Miller
2012-07-23 0:39 ` David Miller
2012-07-23 7:15 ` Eric Dumazet
2012-07-23 17:54 ` Paweł Staszewski
2012-07-23 20:10 ` David Miller
2012-07-26 17:02 ` Eric Dumazet
2012-07-25 23:02 ` Alexander Duyck
2012-07-25 23:17 ` David Miller
2012-07-25 23:39 ` David Miller
2012-07-26 0:54 ` David Miller
2012-07-26 2:30 ` Alexander Duyck
2012-07-26 5:32 ` David Miller
2012-07-26 8:13 ` Eric Dumazet
2012-07-26 8:18 ` David Miller
2012-07-26 8:27 ` Eric Dumazet
2012-07-26 8:47 ` David Miller
2012-07-26 9:12 ` Eric Dumazet
2012-07-26 17:18 ` Alexander Duyck
2012-07-26 17:30 ` Eric Dumazet
2012-07-26 17:36 ` Eric Dumazet
2012-07-26 17:43 ` Eric Dumazet
2012-07-26 17:48 ` Eric Dumazet
2012-07-26 18:26 ` Alexander Duyck
2012-07-26 21:06 ` David Miller
2012-07-26 22:03 ` Alexander Duyck
2012-07-26 22:13 ` Stephen Hemminger
2012-07-26 22:19 ` Eric Dumazet
2012-07-26 22:48 ` David Miller
2012-07-26 22:53 ` David Miller
2012-07-27 2:14 ` Alexander Duyck
2012-07-27 3:08 ` David Miller
2012-07-27 6:02 ` David Miller [this message]
2012-07-27 10:01 ` Eric Dumazet
2012-07-27 14:53 ` Eric W. Biederman
2012-07-27 15:12 ` Eric Dumazet
2012-07-27 16:23 ` Eric W. Biederman
2012-07-27 16:28 ` Eric Dumazet
2012-07-27 19:06 ` Alexander Duyck
2012-07-28 4:15 ` David Miller
2012-07-28 5:45 ` Alexander Duyck
2012-07-26 18:06 ` Alexander Duyck
2012-07-26 21:00 ` David Miller
2012-07-26 20:59 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120726.230246.219188476590178857.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=alexander.duyck@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).