* [PATCH v3] rps: optimize rps_get_cpu()
@ 2010-04-24 15:17 Changli Gao
2010-04-24 16:04 ` jamal
2010-04-25 5:51 ` David Miller
0 siblings, 2 replies; 9+ messages in thread
From: Changli Gao @ 2010-04-24 15:17 UTC (permalink / raw)
To: David Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao
optimize rps_get_cpu().
don't initialize ports when we can get the ports. one memory access for ports
than two.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/core/dev.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index a4a7c36..4d43f1a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2229,7 +2229,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
int cpu = -1;
u8 ip_proto;
u16 tcpu;
- u32 addr1, addr2, ports, ihl;
+ u32 addr1, addr2, ihl;
+ union {
+ u32 v32;
+ u16 v16[2];
+ } ports;
if (skb_rx_queue_recorded(skb)) {
u16 index = skb_get_rx_queue(skb);
@@ -2275,7 +2279,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
default:
goto done;
}
- ports = 0;
switch (ip_proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
@@ -2285,25 +2288,20 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
case IPPROTO_SCTP:
case IPPROTO_UDPLITE:
if (pskb_may_pull(skb, (ihl * 4) + 4)) {
- __be16 *hports = (__be16 *) (skb->data + (ihl * 4));
- u32 sport, dport;
-
- sport = (__force u16) hports[0];
- dport = (__force u16) hports[1];
- if (dport < sport)
- swap(sport, dport);
- ports = (sport << 16) + dport;
+ ports.v32 = * (__force u32 *) (skb->data + (ihl * 4));
+ if (ports.v16[1] < ports.v16[0])
+ swap(ports.v16[0], ports.v16[1]);
+ break;
}
- break;
-
default:
+ ports.v32 = 0;
break;
}
/* get a consistent hash (same value on both flow directions) */
if (addr2 < addr1)
swap(addr1, addr2);
- skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
+ skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
if (!skb->rxhash)
skb->rxhash = 1;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-24 15:17 [PATCH v3] rps: optimize rps_get_cpu() Changli Gao
@ 2010-04-24 16:04 ` jamal
2010-04-24 16:19 ` Changli Gao
2010-04-25 5:51 ` David Miller
1 sibling, 1 reply; 9+ messages in thread
From: jamal @ 2010-04-24 16:04 UTC (permalink / raw)
To: Changli Gao; +Cc: David Miller, Tom Herbert, Eric Dumazet, netdev
By the time you hit this code (at least on machines that make sense for
RPS), you already have the ethernet header, IP header and transport
ports in cache, no?
I think the sport << 16 shifting is avoided - but i dont think theres
any effect on mem access.
cheers,
jamal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-24 16:04 ` jamal
@ 2010-04-24 16:19 ` Changli Gao
2010-04-24 16:22 ` jamal
0 siblings, 1 reply; 9+ messages in thread
From: Changli Gao @ 2010-04-24 16:19 UTC (permalink / raw)
To: hadi; +Cc: David Miller, Tom Herbert, Eric Dumazet, netdev
On Sun, Apr 25, 2010 at 12:04 AM, jamal <hadi@cyberus.ca> wrote:
>
> By the time you hit this code (at least on machines that make sense for
> RPS), you already have the ethernet header, IP header and transport
> ports in cache, no?
> I think the sport << 16 shifting is avoided - but i dont think theres
> any effect on mem access.
Maybe I have used the wrong word. Sorry. If the ports are already in
cache, the new code has only one cache access for ports, and the later
operations are in registers.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-24 16:19 ` Changli Gao
@ 2010-04-24 16:22 ` jamal
0 siblings, 0 replies; 9+ messages in thread
From: jamal @ 2010-04-24 16:22 UTC (permalink / raw)
To: Changli Gao; +Cc: David Miller, Tom Herbert, Eric Dumazet, netdev
On Sun, 2010-04-25 at 00:19 +0800, Changli Gao wrote:
> Maybe I have used the wrong word. Sorry. If the ports are already in
> cache, the new code has only one cache access for ports, and the later
> operations are in registers.
Ok, that makes more sense - so your commit log is confusing.
You are saving a shift operation per packet - probably not a big deal
but better than zero ;->
cheers,
jamal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-24 15:17 [PATCH v3] rps: optimize rps_get_cpu() Changli Gao
2010-04-24 16:04 ` jamal
@ 2010-04-25 5:51 ` David Miller
2010-04-25 6:48 ` Changli Gao
1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2010-04-25 5:51 UTC (permalink / raw)
To: xiaosuo; +Cc: therbert, eric.dumazet, netdev
From: Changli Gao <xiaosuo@gmail.com>
Date: Sat, 24 Apr 2010 23:17:07 +0800
> optimize rps_get_cpu().
>
> don't initialize ports when we can get the ports. one memory access for ports
> than two.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Applied, thanks.
We can load both addresses in one go on 64-bit btw.
It seems we're just duplicating, one by one, the optimizations
we already do in INET_COMBINED_PORTS() and INET_ADDR_COOKIE().
:-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-25 5:51 ` David Miller
@ 2010-04-25 6:48 ` Changli Gao
2010-04-25 7:38 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Changli Gao @ 2010-04-25 6:48 UTC (permalink / raw)
To: David Miller; +Cc: therbert, eric.dumazet, netdev
On Sun, Apr 25, 2010 at 1:51 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Sat, 24 Apr 2010 23:17:07 +0800
>
>> optimize rps_get_cpu().
>>
>> don't initialize ports when we can get the ports. one memory access for ports
>> than two.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>
> Applied, thanks.
>
> We can load both addresses in one go on 64-bit btw.
>
Are they always aligned to 64-bit boundary? I don't think so.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-25 6:48 ` Changli Gao
@ 2010-04-25 7:38 ` David Miller
2010-04-25 7:48 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-04-25 7:38 UTC (permalink / raw)
To: xiaosuo; +Cc: therbert, eric.dumazet, netdev
From: Changli Gao <xiaosuo@gmail.com>
Date: Sun, 25 Apr 2010 14:48:49 +0800
> Are they always aligned to 64-bit boundary? I don't think so.
If not than TCP stack should be crashing for past 15 years.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-25 7:38 ` David Miller
@ 2010-04-25 7:48 ` David Miller
2010-04-25 8:03 ` Changli Gao
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-04-25 7:48 UTC (permalink / raw)
To: xiaosuo; +Cc: therbert, eric.dumazet, netdev
From: David Miller <davem@davemloft.net>
Date: Sun, 25 Apr 2010 00:38:34 -0700 (PDT)
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Sun, 25 Apr 2010 14:48:49 +0800
>
>> Are they always aligned to 64-bit boundary? I don't think so.
>
> If not than TCP stack should be crashing for past 15 years.
Nevermind, currently we only depend upon the addresses in struct sock
being 64-bit aligned not the protocol headers.
It shouldn't be hard to make the protocol header addresses 64-bit
aligned too. Simply setting the default NET_IP_ALIGN to '6' instead
of '2' ought to be sufficient.
skb->data upon alloc_skb() is 64-bit aligned.
So if we skb_reserve(NET_IP_ALIGN '6'), then we have the ethernet
header (14 bytes). And since 'saddr' is 12 bytes into struct iphdr it
will be (6 + 14 + 12) == 32 bytes in from the original 64-bit aligned
skb->data.
Therefore, since skb->data is 64-bit aligned, skb->data plus a
multiple of 8 (which 32 is) will also be 64-bit aligned, and that
means iph->saddr will be 64-bit aligned.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] rps: optimize rps_get_cpu()
2010-04-25 7:48 ` David Miller
@ 2010-04-25 8:03 ` Changli Gao
0 siblings, 0 replies; 9+ messages in thread
From: Changli Gao @ 2010-04-25 8:03 UTC (permalink / raw)
To: David Miller; +Cc: therbert, eric.dumazet, netdev
On Sun, Apr 25, 2010 at 3:48 PM, David Miller <davem@davemloft.net> wrote:
>
> Nevermind, currently we only depend upon the addresses in struct sock
> being 64-bit aligned not the protocol headers.
>
> It shouldn't be hard to make the protocol header addresses 64-bit
> aligned too. Simply setting the default NET_IP_ALIGN to '6' instead
> of '2' ought to be sufficient.
>
> skb->data upon alloc_skb() is 64-bit aligned.
>
> So if we skb_reserve(NET_IP_ALIGN '6'), then we have the ethernet
> header (14 bytes). And since 'saddr' is 12 bytes into struct iphdr it
> will be (6 + 14 + 12) == 32 bytes in from the original 64-bit aligned
> skb->data.
>
> Therefore, since skb->data is 64-bit aligned, skb->data plus a
> multiple of 8 (which 32 is) will also be 64-bit aligned, and that
> means iph->saddr will be 64-bit aligned.
>
But if there is a vlan header, extra 4-bytes are appended to the
ethernet header, so the addresses aren't aligned to 64-bit boundary
when we set NET_IP_ALIGN to 6.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-04-25 8:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-24 15:17 [PATCH v3] rps: optimize rps_get_cpu() Changli Gao
2010-04-24 16:04 ` jamal
2010-04-24 16:19 ` Changli Gao
2010-04-24 16:22 ` jamal
2010-04-25 5:51 ` David Miller
2010-04-25 6:48 ` Changli Gao
2010-04-25 7:38 ` David Miller
2010-04-25 7:48 ` David Miller
2010-04-25 8:03 ` Changli Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).