From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: state of rtcache removal... Date: Thu, 17 Feb 2011 07:25:15 +0100 Message-ID: <1297923915.2645.24.camel@edumazet-laptop> References: <20110216.160838.39164069.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org To: David Miller Return-path: Received: from mail-bw0-f46.google.com ([209.85.214.46]:57320 "EHLO mail-bw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752124Ab1BQGZW (ORCPT ); Thu, 17 Feb 2011 01:25:22 -0500 Received: by bwz15 with SMTP id 15so1381221bwz.19 for ; Wed, 16 Feb 2011 22:25:20 -0800 (PST) In-Reply-To: <20110216.160838.39164069.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: Le mercredi 16 f=C3=A9vrier 2011 =C3=A0 16:08 -0800, David Miller a =C3= =A9crit : > So I've been testing out the routing cache removal patch to see > what the impact is on performance. >=20 > I'm using a UDP flood to a single IP address over a dummy interface > with hard coded ARP entries, so that pretty much just the main IP > output and routing paths are being exercised. >=20 > The UDP flood tool I cooked up based upon a description sent to me by > Eric Dumazet of a similar utility he uses for testing. I've included > the code to this tool at the end of this email, as well as the dummy > interface setup script. Basically, you go: >=20 > bash# ./udpflood_setup.sh > bash# time ./udpflood -l 10000 10.2.2.11 >=20 > The IP output path is about twice as slow with the routing cache > removed entirely. Here are the numbers I have: >=20 > net-next-2.6, rt_cache on: >=20 > davem@maramba:~$ time udpflood -l 10000000 10.2.2.11 > real 1m47.012s > user 0m8.670s > sys 1m38.370s >=20 > net-next-2.6, rt_cache turned off via sysctl: >=20 > davem@maramba:~$ time udpflood -l 10000000 10.2.2.11 > real 3m12.662s > user 0m9.490s > sys 3m3.220s >=20 > net-next-2.6 + "BONUS" rt_cache deletion patch: >=20 > maramba:/home/davem# time ./bin/udpflood -l 10000000 10.2.2.11 > real 3m9.921s > user 0m9.520s > sys 3m0.440s >=20 > I then worked on some simplifications of the code in net/ipv4/route.c > that remains after the cache removal. I'll post those patches after > I've chewed on them some more, but they knock a couple seconds back o= ff > of the benchmark: >=20 > The profile output is what you'd expect, with fib_table_lookup() topp= ing > the charts taking ~%10 of the time. >=20 > What might not be initially apparent is that each output route lookup > results in two calls to fib_table_lookup() and thus two trie lookups. > Why? Because we have two routing tables (3 with IP_MULTIPLE_TABLES > enabled) that get searched, first the LOCAL then the MAIN table (then > with mutliple-tables enabled, the DEFAULT). And most external > outgoing routes sit in the MAIN table. >=20 > We do this so we can store all the interface address network, > broadcast, loopback network, et al. routes in the LOCAL table, then a= ll > globally visible routes in the MAIN table. >=20 > Anyways, the long and short of this is that route lookups take two > trie lookups instead of just one. On input there are even more, for > source address validation done by fib_validate_source(). That can be > up to 4 more fib_table_lookup() invocations. >=20 > Add in another level of complexity if you have a series of FIB rules > installed. >=20 > So, to me, this means that spending time micro-optiming fib_trie is > not going to help much. Getting rid of that multiplier somehow, on > the other hand, might. >=20 > I plan to play with some ideas, such as sticking fib_alias entries in= to > the flow cache and consulting/populating the flow cache on fib_lookup= () > calls. >=20 > -------------------- udpflood.c -------------------- > /* An adaptation of Eric Dumazet's udpflood tool. */ >=20 > #include > #include > #include > #include > #include >=20 > #include > #include > #include > #include >=20 > #define _GNU_SOURCE > #include >=20 > static int usage(void) > { > printf("usage: udpflood [ -l count ] [ -m message_size ] IP_ADDRESS\= n"); > return -1; > } >=20 > static int send_packets(in_addr_t addr, int port, int count, int msg_= sz) > { > char *msg =3D malloc(msg_sz); > struct sockaddr_in saddr; > int fd, i, err; >=20 > if (!msg) > return -ENOMEM; >=20 > memset(msg, 0, msg_sz); >=20 > memset(&saddr, 0, sizeof(saddr)); > saddr.sin_family =3D AF_INET; > saddr.sin_port =3D port; > saddr.sin_addr.s_addr =3D addr; >=20 > fd =3D socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); > if (fd < 0) { > perror("socket"); > err =3D fd; > goto out_nofd; > } > err =3D connect(fd, (struct sockaddr *) &saddr, sizeof(saddr)); > if (err < 0) { > perror("connect"); > close(fd); > goto out; > } > for (i =3D 0; i < count; i++) { > err =3D sendto(fd, msg, msg_sz, 0, > (struct sockaddr *) &saddr, sizeof(saddr)); > if (err < 0) { > perror("sendto"); > goto out; > } > } >=20 > err =3D 0; > out: > close(fd); > out_nofd: > free(msg); > return err; > } >=20 > int main(int argc, char **argv, char **envp) > { > int port, msg_sz, count, ret; > in_addr_t addr; >=20 > port =3D 6000; > msg_sz =3D 32; > count =3D 10000000; >=20 > while ((ret =3D getopt(argc, argv, "l:s:p:")) >=3D 0) { > switch (ret) { > case 'l': > sscanf(optarg, "%d", &count); > break; > case 's': > sscanf(optarg, "%d", &msg_sz); > break; > case 'p': > sscanf(optarg, "%d", &port); > break; > case '?': > return usage(); > } > } >=20 > if (!argv[optind]) > return usage(); >=20 > addr =3D inet_addr(argv[optind]); > if (addr =3D=3D INADDR_NONE) > return usage(); >=20 > return send_packets(addr, port, count, msg_sz); > } >=20 > -------------------- udpflood_setup.sh -------------------- > #!/bin/sh > modprobe dummy > ifconfig dummy0 10.2.2.254 netmask 255.255.255.0 up >=20 > for f in $(seq 11 26) > do > arp -H ether -i dummy0 -s 10.2.2.$f 00:00:0c:07:ac:$f > done > -- Thanks David for this work in progress. If I remember my works in last October/November, I also know fib_hash was a bit faster than fib_trie (around 20%)...