netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] rps: optimize rps_get_cpu()
@ 2010-04-20 14:02 Changli Gao
  2010-04-22  5:40 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Changli Gao @ 2010-04-20 14:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

optimize rps_get_cpu().

don't initialize ports when we can get the ports. one memory access for ports
than two. use ihl in bytes to eliminate later multiplyings.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b31d5d6..55ecfa9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2225,7 +2225,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	int cpu = -1;
 	u8 ip_proto;
 	u16 tcpu;
-	u32 addr1, addr2, ports, ihl;
+	u32 addr1, addr2, ihl;
+	union {
+		u32 v32;
+		u16 v16[2];
+	} ports;
 
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
@@ -2256,7 +2260,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		ip_proto = ip->protocol;
 		addr1 = (__force u32) ip->saddr;
 		addr2 = (__force u32) ip->daddr;
-		ihl = ip->ihl;
+		ihl = ip->ihl << 2;
 		break;
 	case __constant_htons(ETH_P_IPV6):
 		if (!pskb_may_pull(skb, sizeof(*ip6)))
@@ -2266,12 +2270,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		ip_proto = ip6->nexthdr;
 		addr1 = (__force u32) ip6->saddr.s6_addr32[3];
 		addr2 = (__force u32) ip6->daddr.s6_addr32[3];
-		ihl = (40 >> 2);
+		ihl = 40;
 		break;
 	default:
 		goto done;
 	}
-	ports = 0;
 	switch (ip_proto) {
 	case IPPROTO_TCP:
 	case IPPROTO_UDP:
@@ -2280,26 +2283,21 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	case IPPROTO_AH:
 	case IPPROTO_SCTP:
 	case IPPROTO_UDPLITE:
-		if (pskb_may_pull(skb, (ihl * 4) + 4)) {
-			__be16 *hports = (__be16 *) (skb->data + (ihl * 4));
-			u32 sport, dport;
-
-			sport = (__force u16) hports[0];
-			dport = (__force u16) hports[1];
-			if (dport < sport)
-				swap(sport, dport);
-			ports = (sport << 16) + dport;
+		if (pskb_may_pull(skb, ihl + 4)) {
+			ports.v32 = * (__force u32 *) (skb->data + ihl);
+			if (ports.v16[0] < ports.v16[1])
+				swap(ports.v16[0], ports.v16[1]);
+			break;
 		}
-		break;
-
 	default:
+		ports.v32 = 0;
 		break;
 	}
 
 	/* get a consistent hash (same value on both flow directions) */
 	if (addr2 < addr1)
 		swap(addr1, addr2);
-	skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
+	skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
 	if (!skb->rxhash)
 		skb->rxhash = 1;
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] rps: optimize rps_get_cpu()
  2010-04-20 14:02 [PATCH v2] rps: optimize rps_get_cpu() Changli Gao
@ 2010-04-22  5:40 ` David Miller
  2010-04-22  6:10   ` Changli Gao
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2010-04-22  5:40 UTC (permalink / raw)
  To: xiaosuo; +Cc: therbert, eric.dumazet, netdev

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 20 Apr 2010 22:02:40 +0800

> use ihl in bytes to eliminate later multiplyings.

I'll buy you a cookie if you can find a multiply generated by the
compiler for "x * 4".  It's going to use shifts and those are
basically free.

Please just change one thing at a time.  It would have helped you
here.  I was willing to apply the port dereference part of your
change, but not necessarily the 'ihl' changes.  But because you've
combined them, I have no choice but to reject everything.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] rps: optimize rps_get_cpu()
  2010-04-22  5:40 ` David Miller
@ 2010-04-22  6:10   ` Changli Gao
  0 siblings, 0 replies; 3+ messages in thread
From: Changli Gao @ 2010-04-22  6:10 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev

On Thu, Apr 22, 2010 at 1:40 PM, David Miller <davem@davemloft.net> wrote:
>
> I'll buy you a cookie if you can find a multiply generated by the
> compiler for "x * 4".  It's going to use shifts and those are
> basically free.

On amd64:

                if (pskb_may_pull(skb, (ihl * 4) + 4)) {
    2794:       8d 34 9d 04 00 00 00    lea    0x4(,%rbx,4),%esi
    279b:       4c 89 ef                mov    %r13,%rdi
    279e:       e8 a5 fd ff ff          callq  2548 <pskb_may_pull>
    27a3:       85 c0                   test   %eax,%eax
    27a5:       74 28                   je     27cf <get_rps_cpu+0x169>
                        __be16 *hports = (__be16 *) (skb->data + (ihl * 4));
    27a7:       8d 04 9d 00 00 00 00    lea    0x0(,%rbx,4),%eax

the compiler uses lea instead of multiply, and it should be more
efficient, but i'm not sure. Is there a equivalent of lea on the other
architectures?

>
> Please just change one thing at a time.  It would have helped you
> here.  I was willing to apply the port dereference part of your
> change, but not necessarily the 'ihl' changes.  But because you've
> combined them, I have no choice but to reject everything.
>

Ok. I think the first version is ready to apply, it only has the port
dereference part.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-04-22  6:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-20 14:02 [PATCH v2] rps: optimize rps_get_cpu() Changli Gao
2010-04-22  5:40 ` David Miller
2010-04-22  6:10   ` Changli Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).