* [PATCH v2] rps: optimize rps_get_cpu()
@ 2010-04-20 14:02 Changli Gao
2010-04-22 5:40 ` David Miller
0 siblings, 1 reply; 3+ messages in thread
From: Changli Gao @ 2010-04-20 14:02 UTC (permalink / raw)
To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao
optimize rps_get_cpu().
don't initialize ports when we can get the ports. one memory access for ports
than two. use ihl in bytes to eliminate later multiplyings.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/core/dev.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b31d5d6..55ecfa9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2225,7 +2225,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
int cpu = -1;
u8 ip_proto;
u16 tcpu;
- u32 addr1, addr2, ports, ihl;
+ u32 addr1, addr2, ihl;
+ union {
+ u32 v32;
+ u16 v16[2];
+ } ports;
if (skb_rx_queue_recorded(skb)) {
u16 index = skb_get_rx_queue(skb);
@@ -2256,7 +2260,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
ip_proto = ip->protocol;
addr1 = (__force u32) ip->saddr;
addr2 = (__force u32) ip->daddr;
- ihl = ip->ihl;
+ ihl = ip->ihl << 2;
break;
case __constant_htons(ETH_P_IPV6):
if (!pskb_may_pull(skb, sizeof(*ip6)))
@@ -2266,12 +2270,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
ip_proto = ip6->nexthdr;
addr1 = (__force u32) ip6->saddr.s6_addr32[3];
addr2 = (__force u32) ip6->daddr.s6_addr32[3];
- ihl = (40 >> 2);
+ ihl = 40;
break;
default:
goto done;
}
- ports = 0;
switch (ip_proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
@@ -2280,26 +2283,21 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
case IPPROTO_AH:
case IPPROTO_SCTP:
case IPPROTO_UDPLITE:
- if (pskb_may_pull(skb, (ihl * 4) + 4)) {
- __be16 *hports = (__be16 *) (skb->data + (ihl * 4));
- u32 sport, dport;
-
- sport = (__force u16) hports[0];
- dport = (__force u16) hports[1];
- if (dport < sport)
- swap(sport, dport);
- ports = (sport << 16) + dport;
+ if (pskb_may_pull(skb, ihl + 4)) {
+ ports.v32 = * (__force u32 *) (skb->data + ihl);
+ if (ports.v16[0] < ports.v16[1])
+ swap(ports.v16[0], ports.v16[1]);
+ break;
}
- break;
-
default:
+ ports.v32 = 0;
break;
}
/* get a consistent hash (same value on both flow directions) */
if (addr2 < addr1)
swap(addr1, addr2);
- skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
+ skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
if (!skb->rxhash)
skb->rxhash = 1;
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] rps: optimize rps_get_cpu()
2010-04-20 14:02 [PATCH v2] rps: optimize rps_get_cpu() Changli Gao
@ 2010-04-22 5:40 ` David Miller
2010-04-22 6:10 ` Changli Gao
0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2010-04-22 5:40 UTC (permalink / raw)
To: xiaosuo; +Cc: therbert, eric.dumazet, netdev
From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 20 Apr 2010 22:02:40 +0800
> use ihl in bytes to eliminate later multiplyings.
I'll buy you a cookie if you can find a multiply generated by the
compiler for "x * 4". It's going to use shifts and those are
basically free.
Please just change one thing at a time. It would have helped you
here. I was willing to apply the port dereference part of your
change, but not necessarily the 'ihl' changes. But because you've
combined them, I have no choice but to reject everything.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] rps: optimize rps_get_cpu()
2010-04-22 5:40 ` David Miller
@ 2010-04-22 6:10 ` Changli Gao
0 siblings, 0 replies; 3+ messages in thread
From: Changli Gao @ 2010-04-22 6:10 UTC (permalink / raw)
To: David Miller; +Cc: therbert, eric.dumazet, netdev
On Thu, Apr 22, 2010 at 1:40 PM, David Miller <davem@davemloft.net> wrote:
>
> I'll buy you a cookie if you can find a multiply generated by the
> compiler for "x * 4". It's going to use shifts and those are
> basically free.
On amd64:
if (pskb_may_pull(skb, (ihl * 4) + 4)) {
2794: 8d 34 9d 04 00 00 00 lea 0x4(,%rbx,4),%esi
279b: 4c 89 ef mov %r13,%rdi
279e: e8 a5 fd ff ff callq 2548 <pskb_may_pull>
27a3: 85 c0 test %eax,%eax
27a5: 74 28 je 27cf <get_rps_cpu+0x169>
__be16 *hports = (__be16 *) (skb->data + (ihl * 4));
27a7: 8d 04 9d 00 00 00 00 lea 0x0(,%rbx,4),%eax
the compiler uses lea instead of multiply, and it should be more
efficient, but i'm not sure. Is there a equivalent of lea on the other
architectures?
>
> Please just change one thing at a time. It would have helped you
> here. I was willing to apply the port dereference part of your
> change, but not necessarily the 'ihl' changes. But because you've
> combined them, I have no choice but to reject everything.
>
Ok. I think the first version is ready to apply, it only has the port
dereference part.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-04-22 6:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-20 14:02 [PATCH v2] rps: optimize rps_get_cpu() Changli Gao
2010-04-22 5:40 ` David Miller
2010-04-22 6:10 ` Changli Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).