state of rtcache removal...

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* state of rtcache removal...
@ 2011-02-17  0:08 David Miller
  2011-02-17  2:59 ` Tom Herbert
  2011-02-17  6:25 ` Eric Dumazet
  0 siblings, 2 replies; 5+ messages in thread
From: David Miller @ 2011-02-17  0:08 UTC (permalink / raw)
  To: netdev

So I've been testing out the routing cache removal patch to see
what the impact is on performance.

I'm using a UDP flood to a single IP address over a dummy interface
with hard coded ARP entries, so that pretty much just the main IP
output and routing paths are being exercised.

The UDP flood tool I cooked up based upon a description sent to me by
Eric Dumazet of a similar utility he uses for testing.  I've included
the code to this tool at the end of this email, as well as the dummy
interface setup script.   Basically, you go:

bash# ./udpflood_setup.sh
bash# time ./udpflood -l 10000 10.2.2.11

The IP output path is about twice as slow with the routing cache
removed entirely.  Here are the numbers I have:

net-next-2.6, rt_cache on:

davem@maramba:~$ time udpflood -l 10000000 10.2.2.11
real		 1m47.012s
user		 0m8.670s
sys		 1m38.370s

net-next-2.6, rt_cache turned off via sysctl:

davem@maramba:~$ time udpflood -l 10000000 10.2.2.11
real		 3m12.662s
user		 0m9.490s
sys		 3m3.220s

net-next-2.6 + "BONUS" rt_cache deletion patch:

maramba:/home/davem# time ./bin/udpflood -l 10000000 10.2.2.11
real		     3m9.921s
user		     0m9.520s
sys		     3m0.440s

I then worked on some simplifications of the code in net/ipv4/route.c
that remains after the cache removal.  I'll post those patches after
I've chewed on them some more, but they knock a couple seconds back off
of the benchmark:

The profile output is what you'd expect, with fib_table_lookup() topping
the charts taking ~%10 of the time.

What might not be initially apparent is that each output route lookup
results in two calls to fib_table_lookup() and thus two trie lookups.
Why?  Because we have two routing tables (3 with IP_MULTIPLE_TABLES
enabled) that get searched, first the LOCAL then the MAIN table (then
with mutliple-tables enabled, the DEFAULT).  And most external
outgoing routes sit in the MAIN table.

We do this so we can store all the interface address network,
broadcast, loopback network, et al. routes in the LOCAL table, then all
globally visible routes in the MAIN table.

Anyways, the long and short of this is that route lookups take two
trie lookups instead of just one.  On input there are even more, for
source address validation done by fib_validate_source().  That can be
up to 4 more fib_table_lookup() invocations.

Add in another level of complexity if you have a series of FIB rules
installed.

So, to me, this means that spending time micro-optiming fib_trie is
not going to help much.  Getting rid of that multiplier somehow, on
the other hand, might.

I plan to play with some ideas, such as sticking fib_alias entries into
the flow cache and consulting/populating the flow cache on fib_lookup()
calls.

-------------------- udpflood.c --------------------
/* An adaptation of Eric Dumazet's udpflood tool.  */

#include <stdio.h>
#include <stddef.h>
#include <malloc.h>
#include <string.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define _GNU_SOURCE
#include <getopt.h>

static int usage(void)
{
	printf("usage: udpflood [ -l count ] [ -m message_size ] IP_ADDRESS\n");
	return -1;
}

static int send_packets(in_addr_t addr, int port, int count, int msg_sz)
{
	char *msg = malloc(msg_sz);
	struct sockaddr_in saddr;
	int fd, i, err;

	if (!msg)
		return -ENOMEM;

	memset(msg, 0, msg_sz);

	memset(&saddr, 0, sizeof(saddr));
	saddr.sin_family = AF_INET;
	saddr.sin_port = port;
	saddr.sin_addr.s_addr = addr;

	fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
	if (fd < 0) {
		perror("socket");
		err = fd;
		goto out_nofd;
	}
	err = connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
	if (err < 0) {
		perror("connect");
		close(fd);
		goto out;
	}
	for (i = 0; i < count; i++) {
		err = sendto(fd, msg, msg_sz, 0,
			     (struct sockaddr *) &saddr, sizeof(saddr));
		if (err < 0) {
			perror("sendto");
			goto out;
		}
	}

	err = 0;
out:
	close(fd);
out_nofd:
	free(msg);
	return err;
}

int main(int argc, char **argv, char **envp)
{
	int port, msg_sz, count, ret;
	in_addr_t addr;

	port = 6000;
	msg_sz = 32;
	count = 10000000;

	while ((ret = getopt(argc, argv, "l:s:p:")) >= 0) {
		switch (ret) {
		case 'l':
			sscanf(optarg, "%d", &count);
			break;
		case 's':
			sscanf(optarg, "%d", &msg_sz);
			break;
		case 'p':
			sscanf(optarg, "%d", &port);
			break;
		case '?':
			return usage();
		}
	}

	if (!argv[optind])
		return usage();

	addr = inet_addr(argv[optind]);
	if (addr == INADDR_NONE)
		return usage();

	return send_packets(addr, port, count, msg_sz);
}

-------------------- udpflood_setup.sh --------------------
#!/bin/sh
modprobe dummy
ifconfig dummy0 10.2.2.254 netmask 255.255.255.0 up

for f in $(seq 11 26)
do
 arp -H ether -i dummy0 -s 10.2.2.$f 00:00:0c:07:ac:$f
done

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: state of rtcache removal...
  2011-02-17  0:08 state of rtcache removal David Miller
@ 2011-02-17  2:59 ` Tom Herbert
  2011-02-17  3:27   ` David Miller
  2011-02-17  6:25 ` Eric Dumazet
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Herbert @ 2011-02-17  2:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, Feb 16, 2011 at 4:08 PM, David Miller <davem@davemloft.net> wrote:
>
> So I've been testing out the routing cache removal patch to see
> what the impact is on performance.
>
Interesting results.

I assume that this test is purposely using sento on a connected socket
to force sendmsg to go through the route lookup :-), so this is
showing what the benefits of rtcache are is when cache hit rate is
100%.  For comparison, it might interesting to see what the
performance is when rate is < 100%.  For instance, we often see hit
rates < 20% on front end servers.  This could be done flooding to
random addresses in 10/8 or even 0/0...  I'm hoping that without the
rtcache performance actually improves in that case!

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: state of rtcache removal...
  2011-02-17  2:59 ` Tom Herbert
@ 2011-02-17  3:27   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2011-02-17  3:27 UTC (permalink / raw)
  To: therbert; +Cc: netdev

From: Tom Herbert <therbert@google.com>
Date: Wed, 16 Feb 2011 18:59:35 -0800

> On Wed, Feb 16, 2011 at 4:08 PM, David Miller <davem@davemloft.net> wrote:
>>
>> So I've been testing out the routing cache removal patch to see
>> what the impact is on performance.
>>
> Interesting results.
> 
> I assume that this test is purposely using sento on a connected socket
> to force sendmsg to go through the route lookup :-), so this is
> showing what the benefits of rtcache are is when cache hit rate is
> 100%.  For comparison, it might interesting to see what the
> performance is when rate is < 100%.  For instance, we often see hit
> rates < 20% on front end servers.  This could be done flooding to
> random addresses in 10/8 or even 0/0...  I'm hoping that without the
> rtcache performance actually improves in that case!

We know that the performance will be higher in the "closer to %0"
situation, ie.  for DoS workloads.  Because all of the route cache
management overhead goes away.

Anyways I'm working on some ideas to make the high hit rate case
perform amicably again.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: state of rtcache removal...
  2011-02-17  0:08 state of rtcache removal David Miller
  2011-02-17  2:59 ` Tom Herbert
@ 2011-02-17  6:25 ` Eric Dumazet
  2011-02-17  6:39   ` David Miller
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2011-02-17  6:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le mercredi 16 février 2011 à 16:08 -0800, David Miller a écrit :
> So I've been testing out the routing cache removal patch to see
> what the impact is on performance.
> 
> I'm using a UDP flood to a single IP address over a dummy interface
> with hard coded ARP entries, so that pretty much just the main IP
> output and routing paths are being exercised.
> 
> The UDP flood tool I cooked up based upon a description sent to me by
> Eric Dumazet of a similar utility he uses for testing.  I've included
> the code to this tool at the end of this email, as well as the dummy
> interface setup script.   Basically, you go:
> 
> bash# ./udpflood_setup.sh
> bash# time ./udpflood -l 10000 10.2.2.11
> 
> The IP output path is about twice as slow with the routing cache
> removed entirely.  Here are the numbers I have:
> 
> net-next-2.6, rt_cache on:
> 
> davem@maramba:~$ time udpflood -l 10000000 10.2.2.11
> real		 1m47.012s
> user		 0m8.670s
> sys		 1m38.370s
> 
> net-next-2.6, rt_cache turned off via sysctl:
> 
> davem@maramba:~$ time udpflood -l 10000000 10.2.2.11
> real		 3m12.662s
> user		 0m9.490s
> sys		 3m3.220s
> 
> net-next-2.6 + "BONUS" rt_cache deletion patch:
> 
> maramba:/home/davem# time ./bin/udpflood -l 10000000 10.2.2.11
> real		     3m9.921s
> user		     0m9.520s
> sys		     3m0.440s
> 
> I then worked on some simplifications of the code in net/ipv4/route.c
> that remains after the cache removal.  I'll post those patches after
> I've chewed on them some more, but they knock a couple seconds back off
> of the benchmark:
> 
> The profile output is what you'd expect, with fib_table_lookup() topping
> the charts taking ~%10 of the time.
> 
> What might not be initially apparent is that each output route lookup
> results in two calls to fib_table_lookup() and thus two trie lookups.
> Why?  Because we have two routing tables (3 with IP_MULTIPLE_TABLES
> enabled) that get searched, first the LOCAL then the MAIN table (then
> with mutliple-tables enabled, the DEFAULT).  And most external
> outgoing routes sit in the MAIN table.
> 
> We do this so we can store all the interface address network,
> broadcast, loopback network, et al. routes in the LOCAL table, then all
> globally visible routes in the MAIN table.
> 
> Anyways, the long and short of this is that route lookups take two
> trie lookups instead of just one.  On input there are even more, for
> source address validation done by fib_validate_source().  That can be
> up to 4 more fib_table_lookup() invocations.
> 
> Add in another level of complexity if you have a series of FIB rules
> installed.
> 
> So, to me, this means that spending time micro-optiming fib_trie is
> not going to help much.  Getting rid of that multiplier somehow, on
> the other hand, might.
> 
> I plan to play with some ideas, such as sticking fib_alias entries into
> the flow cache and consulting/populating the flow cache on fib_lookup()
> calls.
> 
> -------------------- udpflood.c --------------------
> /* An adaptation of Eric Dumazet's udpflood tool.  */
> 
> #include <stdio.h>
> #include <stddef.h>
> #include <malloc.h>
> #include <string.h>
> #include <errno.h>
> 
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> 
> #define _GNU_SOURCE
> #include <getopt.h>
> 
> static int usage(void)
> {
> 	printf("usage: udpflood [ -l count ] [ -m message_size ] IP_ADDRESS\n");
> 	return -1;
> }
> 
> static int send_packets(in_addr_t addr, int port, int count, int msg_sz)
> {
> 	char *msg = malloc(msg_sz);
> 	struct sockaddr_in saddr;
> 	int fd, i, err;
> 
> 	if (!msg)
> 		return -ENOMEM;
> 
> 	memset(msg, 0, msg_sz);
> 
> 	memset(&saddr, 0, sizeof(saddr));
> 	saddr.sin_family = AF_INET;
> 	saddr.sin_port = port;
> 	saddr.sin_addr.s_addr = addr;
> 
> 	fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
> 	if (fd < 0) {
> 		perror("socket");
> 		err = fd;
> 		goto out_nofd;
> 	}
> 	err = connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
> 	if (err < 0) {
> 		perror("connect");
> 		close(fd);
> 		goto out;
> 	}
> 	for (i = 0; i < count; i++) {
> 		err = sendto(fd, msg, msg_sz, 0,
> 			     (struct sockaddr *) &saddr, sizeof(saddr));
> 		if (err < 0) {
> 			perror("sendto");
> 			goto out;
> 		}
> 	}
> 
> 	err = 0;
> out:
> 	close(fd);
> out_nofd:
> 	free(msg);
> 	return err;
> }
> 
> int main(int argc, char **argv, char **envp)
> {
> 	int port, msg_sz, count, ret;
> 	in_addr_t addr;
> 
> 	port = 6000;
> 	msg_sz = 32;
> 	count = 10000000;
> 
> 	while ((ret = getopt(argc, argv, "l:s:p:")) >= 0) {
> 		switch (ret) {
> 		case 'l':
> 			sscanf(optarg, "%d", &count);
> 			break;
> 		case 's':
> 			sscanf(optarg, "%d", &msg_sz);
> 			break;
> 		case 'p':
> 			sscanf(optarg, "%d", &port);
> 			break;
> 		case '?':
> 			return usage();
> 		}
> 	}
> 
> 	if (!argv[optind])
> 		return usage();
> 
> 	addr = inet_addr(argv[optind]);
> 	if (addr == INADDR_NONE)
> 		return usage();
> 
> 	return send_packets(addr, port, count, msg_sz);
> }
> 
> -------------------- udpflood_setup.sh --------------------
> #!/bin/sh
> modprobe dummy
> ifconfig dummy0 10.2.2.254 netmask 255.255.255.0 up
> 
> for f in $(seq 11 26)
> do
>  arp -H ether -i dummy0 -s 10.2.2.$f 00:00:0c:07:ac:$f
> done
> --

Thanks David for this work in progress.

If I remember my works in last October/November, I also know fib_hash
was a bit faster than fib_trie (around 20%)...





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: state of rtcache removal...
  2011-02-17  6:25 ` Eric Dumazet
@ 2011-02-17  6:39   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2011-02-17  6:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 17 Feb 2011 07:25:15 +0100

> Thanks David for this work in progress.

It was my pleasure :-)

> If I remember my works in last October/November, I also know fib_hash
> was a bit faster than fib_trie (around 20%)...

Right, if table is small hash can be faster.

I just wrote a hack that puts fib_lookup() results into the the
flow cache, and hooked it only into the one fib_lookup() call
that happens in ip_route_output_slow().

This pointed out another costly thing we do when resolving output
routes.  If the flow key's source is not INADDR_ANY, we validate the
source address is our's by doing a fib table lookup in the local
table, with the address in the flow key's destination field.

So even on output we were doing 3 fib lookups :-/

Therefore, the flow cache hack gets rid of 2 of those 3.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-17  6:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17  0:08 state of rtcache removal David Miller
2011-02-17  2:59 ` Tom Herbert
2011-02-17  3:27   ` David Miller
2011-02-17  6:25 ` Eric Dumazet
2011-02-17  6:39   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).