netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v5] rps: Receive Packet Steering
       [not found] <alpine.DEB.1.00.1001141353140.19018@pokey.mtv.corp.google.com>
@ 2010-01-15  2:22 ` Changli Gao
  2010-01-15  6:19   ` Eric Dumazet
  2010-01-15  8:50   ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Changli Gao @ 2010-01-15  2:22 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, Netfilter Developer Mailing List

On Fri, Jan 15, 2010 at 5:56 AM, Tom Herbert <therbert@google.com> wrote:
> +
> +       if (skb->rxhash)
> +               goto got_hash; /* Skip hash computation on packet header */
> +
> +       switch (skb->protocol) {
> +       case __constant_htons(ETH_P_IP):
> +               if (!pskb_may_pull(skb, sizeof(*ip)))
> +                       goto done;
> +
> +               ip = (struct iphdr *) skb->data;
> +               ip_proto = ip->protocol;
> +               addr1 = ip->saddr;
> +               addr2 = ip->daddr;
> +               ihl = ip->ihl;
> +               break;
> +       case __constant_htons(ETH_P_IPV6):
> +               if (!pskb_may_pull(skb, sizeof(*ip6)))
> +                       goto done;
> +
> +               ip6 = (struct ipv6hdr *) skb->data;
> +               ip_proto = ip6->nexthdr;
This code can't work, when there are extra headers. ipv6_skip_exthdr()
can be used to get the l4 header.

> +               addr1 = ip6->saddr.s6_addr32[3];
> +               addr2 = ip6->daddr.s6_addr32[3];
> +               ihl = (40 >> 2);
> +               break;
> +       default:
> +               goto done;
> +       }
> +       ports = 0;
> +       switch (ip_proto) {
> +       case IPPROTO_TCP:
> +       case IPPROTO_UDP:
> +       case IPPROTO_DCCP:
> +       case IPPROTO_ESP:
> +       case IPPROTO_AH:
> +       case IPPROTO_SCTP:
> +       case IPPROTO_UDPLITE:
> +               if (pskb_may_pull(skb, (ihl * 4) + 4))
> +                       ports = *((u32 *) (skb->data + (ihl * 4)));
> +               break;
> +
> +       default:
> +               break;
> +       }
> +
> +       skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
For connection based packet processing, such as netfilter,
distributing the packets in two directions into one CPU will reduce
cache miss, when NAT isn't used. I think the code bellow will help:
if (addr1 > addr2)
  swap(addr1, addr2);

> +       if (!skb->rxhash)
> +               skb->rxhash = 1;

Why not put the above code into a new function, and add more protocols
support, such as 802.1Q.  Though rxhash is based on 4-tuple, I think
netfilter will benefit from it.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  2:22 ` [PATCH v5] rps: Receive Packet Steering Changli Gao
@ 2010-01-15  6:19   ` Eric Dumazet
  2010-01-15  6:39     ` Changli Gao
  2010-01-15  8:50   ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2010-01-15  6:19 UTC (permalink / raw)
  To: Changli Gao; +Cc: Tom Herbert, davem, netdev, Netfilter Developer Mailing List

Le 15/01/2010 03:22, Changli Gao a écrit :
> On Fri, Jan 15, 2010 at 5:56 AM, Tom Herbert <therbert@google.com> wrote:
>> +
>> +       if (skb->rxhash)
>> +               goto got_hash; /* Skip hash computation on packet header */
>> +
>> +       switch (skb->protocol) {
>> +       case __constant_htons(ETH_P_IP):
>> +               if (!pskb_may_pull(skb, sizeof(*ip)))
>> +                       goto done;
>> +
>> +               ip = (struct iphdr *) skb->data;
>> +               ip_proto = ip->protocol;
>> +               addr1 = ip->saddr;
>> +               addr2 = ip->daddr;
>> +               ihl = ip->ihl;
>> +               break;
>> +       case __constant_htons(ETH_P_IPV6):
>> +               if (!pskb_may_pull(skb, sizeof(*ip6)))
>> +                       goto done;
>> +
>> +               ip6 = (struct ipv6hdr *) skb->data;
>> +               ip_proto = ip6->nexthdr;
> This code can't work, when there are extra headers. ipv6_skip_exthdr()
> can be used to get the l4 header.

Could you give exact code please ?

> 
>> +               addr1 = ip6->saddr.s6_addr32[3];
>> +               addr2 = ip6->daddr.s6_addr32[3];
>> +               ihl = (40 >> 2);
>> +               break;
>> +       default:
>> +               goto done;
>> +       }
>> +       ports = 0;
>> +       switch (ip_proto) {
>> +       case IPPROTO_TCP:
>> +       case IPPROTO_UDP:
>> +       case IPPROTO_DCCP:
>> +       case IPPROTO_ESP:
>> +       case IPPROTO_AH:
>> +       case IPPROTO_SCTP:
>> +       case IPPROTO_UDPLITE:
>> +               if (pskb_may_pull(skb, (ihl * 4) + 4))
>> +                       ports = *((u32 *) (skb->data + (ihl * 4)));
>> +               break;
>> +
>> +       default:
>> +               break;
>> +       }
>> +
>> +       skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
> For connection based packet processing, such as netfilter,
> distributing the packets in two directions into one CPU will reduce
> cache miss, when NAT isn't used. I think the code bellow will help:
> if (addr1 > addr2)
>   swap(addr1, addr2);

Yes, I already gave this hint in a previous review, but this adds a test
and I suspect Google is not using NAT :)

> 
>> +       if (!skb->rxhash)
>> +               skb->rxhash = 1;
> 
> Why not put the above code into a new function, and add more protocols
> support, such as 802.1Q.  Though rxhash is based on 4-tuple, I think
> netfilter will benefit from it.
> 

Sure, this can be done in a followup patch.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  6:19   ` Eric Dumazet
@ 2010-01-15  6:39     ` Changli Gao
  2010-01-15  6:57       ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Changli Gao @ 2010-01-15  6:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev, Netfilter Developer Mailing List

On Fri, Jan 15, 2010 at 2:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le 15/01/2010 03:22, Changli Gao a écrit :
>> On Fri, Jan 15, 2010 at 5:56 AM, Tom Herbert <therbert@google.com> wrote:
>>> +
>>> +       if (skb->rxhash)
>>> +               goto got_hash; /* Skip hash computation on packet header */
>>> +
>>> +       switch (skb->protocol) {
>>> +       case __constant_htons(ETH_P_IP):
>>> +               if (!pskb_may_pull(skb, sizeof(*ip)))
>>> +                       goto done;
>>> +
>>> +               ip = (struct iphdr *) skb->data;
>>> +               ip_proto = ip->protocol;
>>> +               addr1 = ip->saddr;
>>> +               addr2 = ip->daddr;
>>> +               ihl = ip->ihl;
>>> +               break;
>>> +       case __constant_htons(ETH_P_IPV6):
>>> +               if (!pskb_may_pull(skb, sizeof(*ip6)))
>>> +                       goto done;
>>> +
>>> +               ip6 = (struct ipv6hdr *) skb->data;
>>> +               ip_proto = ip6->nexthdr;
>> This code can't work, when there are extra headers. ipv6_skip_exthdr()
>> can be used to get the l4 header.
>
> Could you give exact code please ?
>

The code bellow is from my ifb-mq.patch
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
        case __constant_htons(ETH_P_IPV6):
process_ipv6:
                if (unlikely(!pskb_may_pull(skb, sizeof(struct ipv6hdr))))
                        goto process_other;
                addr1 = ipv6_hdr(skb)->saddr.s6_addr32[3];
                addr2 = ipv6_hdr(skb)->daddr.s6_addr32[3];
                ihl = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &ip_proto);
                if (unlikely(ihl < 0))
                        goto process_other_trans;
                break;
#endif


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  6:39     ` Changli Gao
@ 2010-01-15  6:57       ` Eric Dumazet
  2010-01-15  8:49         ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2010-01-15  6:57 UTC (permalink / raw)
  To: Changli Gao; +Cc: Tom Herbert, davem, netdev, Netfilter Developer Mailing List

Le 15/01/2010 07:39, Changli Gao a écrit :
> The code bellow is from my ifb-mq.patch
> #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>         case __constant_htons(ETH_P_IPV6):
> process_ipv6:
>                 if (unlikely(!pskb_may_pull(skb, sizeof(struct ipv6hdr))))
>                         goto process_other;
>                 addr1 = ipv6_hdr(skb)->saddr.s6_addr32[3];
>                 addr2 = ipv6_hdr(skb)->daddr.s6_addr32[3];
>                 ihl = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &ip_proto);
>                 if (unlikely(ihl < 0))
>                         goto process_other_trans;
>                 break;
> #endif
> 
> 

Thanks Changli !


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  6:57       ` Eric Dumazet
@ 2010-01-15  8:49         ` David Miller
  2010-01-15  9:20           ` Changli Gao
  2010-01-15  9:45           ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: David Miller @ 2010-01-15  8:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, therbert, netdev, netfilter-devel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 15 Jan 2010 07:57:56 +0100

> Le 15/01/2010 07:39, Changli Gao a écrit :
>> The code bellow is from my ifb-mq.patch
>> #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>>         case __constant_htons(ETH_P_IPV6):
>> process_ipv6:
>>                 if (unlikely(!pskb_may_pull(skb, sizeof(struct ipv6hdr))))
>>                         goto process_other;
>>                 addr1 = ipv6_hdr(skb)->saddr.s6_addr32[3];
>>                 addr2 = ipv6_hdr(skb)->daddr.s6_addr32[3];
>>                 ihl = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &ip_proto);
>>                 if (unlikely(ihl < 0))
>>                         goto process_other_trans;
>>                 break;
>> #endif
>> 
>> 
> 
> Thanks Changli !

Actually, no thanks.  Have you actually taken a look at
ipv6_skip_exthdr()?

Do that, then tell me that you want the extra function call, plus all
of the processing and data touching that that function does, just to
handle the case that there "might" be ipv6 extension headers there.

It is the exception rather than the rule, and I think it's just
assume we have a real protocol header next.

And that's what skb_tx_hash() used to do too before we started using
the recorded RX queue and socket hash values.

Nobody cared and nobody complained.  Guess why?  Because in practice
it doesn't matter.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  2:22 ` [PATCH v5] rps: Receive Packet Steering Changli Gao
  2010-01-15  6:19   ` Eric Dumazet
@ 2010-01-15  8:50   ` David Miller
  2010-01-15  9:05     ` Changli Gao
  1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2010-01-15  8:50 UTC (permalink / raw)
  To: xiaosuo; +Cc: therbert, netdev, netfilter-devel

From: Changli Gao <xiaosuo@gmail.com>
Date: Fri, 15 Jan 2010 10:22:03 +0800

> For connection based packet processing, such as netfilter,
> distributing the packets in two directions into one CPU will reduce
> cache miss, when NAT isn't used. I think the code bellow will help:
> if (addr1 > addr2)
>   swap(addr1, addr2);

You can't just do the addresses, the ports will swap too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  8:50   ` David Miller
@ 2010-01-15  9:05     ` Changli Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Changli Gao @ 2010-01-15  9:05 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev, netfilter-devel

On Fri, Jan 15, 2010 at 4:50 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Fri, 15 Jan 2010 10:22:03 +0800
>
>> For connection based packet processing, such as netfilter,
>> distributing the packets in two directions into one CPU will reduce
>> cache miss, when NAT isn't used. I think the code bellow will help:
>> if (addr1 > addr2)
>>   swap(addr1, addr2);
>
> You can't just do the addresses, the ports will swap too.
>

Yea, and it is just an example.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  8:49         ` David Miller
@ 2010-01-15  9:20           ` Changli Gao
  2010-01-15  9:26             ` David Miller
  2010-01-15  9:45           ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Changli Gao @ 2010-01-15  9:20 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, netdev, netfilter-devel

On Fri, Jan 15, 2010 at 4:49 PM, David Miller <davem@davemloft.net> wrote:
>
> Actually, no thanks.  Have you actually taken a look at
> ipv6_skip_exthdr()?
>
> Do that, then tell me that you want the extra function call, plus all
> of the processing and data touching that that function does, just to
> handle the case that there "might" be ipv6 extension headers there.
>

I don't think ipv6_skip_exthdr() is too weight. If there isn't any
extra header, only some compare and jump instruments are added, and no
more data references. If there are some headers, I think distributing
packets among CPUs is more important than the extra cost introduced by
calling ipv6_skip_exthdr().

> It is the exception rather than the rule, and I think it's just
> assume we have a real protocol header next.
>
> And that's what skb_tx_hash() used to do too before we started using
> the recorded RX queue and socket hash values.
>
> Nobody cared and nobody complained.  Guess why?  Because in practice
> it doesn't matter.
>

Maybe they don't know it.If it was a performance regression, I think
more people might pay attention on it.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  9:20           ` Changli Gao
@ 2010-01-15  9:26             ` David Miller
  2010-01-21  7:04               ` Changli Gao
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2010-01-15  9:26 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, netdev, netfilter-devel

From: Changli Gao <xiaosuo@gmail.com>
Date: Fri, 15 Jan 2010 17:20:43 +0800

> On Fri, Jan 15, 2010 at 4:49 PM, David Miller <davem@davemloft.net> wrote:
>>
>> Actually, no thanks.  Have you actually taken a look at
>> ipv6_skip_exthdr()?
>>
>> Do that, then tell me that you want the extra function call, plus all
>> of the processing and data touching that that function does, just to
>> handle the case that there "might" be ipv6 extension headers there.
>>
> 
> I don't think ipv6_skip_exthdr() is too weight. If there isn't any
> extra header, only some compare and jump instruments are added, and no
> more data references. If there are some headers, I think distributing
> packets among CPUs is more important than the extra cost introduced by
> calling ipv6_skip_exthdr().

Calling a function is expensive.

What was now a leaf function deep in the call chain, will no longer
be, so GCC will need to push all live registers onto the stack,
then reload them back into registers when ipv6_skip_exthdr() returns.

And that function is expensive, it's a lot of code that %99 of the
time serves no purpose at all.

This will be executed for every single packet we process, and Linux
can process millions of packets per second, so every cycle and every
memory reference matters.

> Maybe they don't know it.If it was a performance regression, I think
> more people might pay attention on it.

And we can address such a problem at that time.

Can you show a real life setup that sees ipv6 packets with extension
headers and would be effected by this?

Really, I do not want to bloat up this path with useless code
execution when for all practical purposes it really doesn't matter.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  8:49         ` David Miller
  2010-01-15  9:20           ` Changli Gao
@ 2010-01-15  9:45           ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2010-01-15  9:45 UTC (permalink / raw)
  To: therbert, netdev, netfilter-devel


Tom your patch still doesn't apply.

I took it out of patchwork:

	http://patchwork.ozlabs.org/patch/42931/

and the patch is all corrupted in the leading line characters.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5] rps: Receive Packet Steering
  2010-01-15  9:26             ` David Miller
@ 2010-01-21  7:04               ` Changli Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Changli Gao @ 2010-01-21  7:04 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, netdev, netfilter-devel

On Fri, Jan 15, 2010 at 5:26 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Fri, 15 Jan 2010 17:20:43 +0800
>
> Calling a function is expensive.
>
> What was now a leaf function deep in the call chain, will no longer
> be, so GCC will need to push all live registers onto the stack,
> then reload them back into registers when ipv6_skip_exthdr() returns.
>
> And that function is expensive, it's a lot of code that %99 of the
> time serves no purpose at all.
>
> This will be executed for every single packet we process, and Linux
> can process millions of packets per second, so every cycle and every
> memory reference matters.
>

We can write a new inline function like this:

static inline int ipv6_get_ports(const struct sk_buff *skb, u16 *port1,
                u16 *port2)
{
        u8 nexthdr;
        int hdrlen;

        nexthdr = ipv6_hdr(skb)->nexthdr;
        hdrlen = sizeof(struct ipv6hdr);
        while (1) {
                switch (nexthdr) {
                case IPPROTO_TCP:
                case IPPROTO_UDP:
                case IPPROTO_DCCP:
                case IPPROTO_ESP:
                case IPPROTO_AH:
                case IPPROTO_SCTP:
                case IPPROTO_UDPLITE:
                        skb_copy_bits(skb, hdrlen, port1, 2);
                        skb_copy_bits(skb, hdrlen + 2, port2, 2);
                        return 0;
                case NEXTHDR_HOP:
                case NEXTHDR_ROUTING:
                case NEXTHDR_FRAGMENT:
                case NEXTHDR_AUTH:
                case NEXTHDR_DEST:
                        // some code like ipv6_skip_exthdr()
                        ....
                        break;
                case NEXTHDR_NONE:
                        return -1;
                default:
                        return -1;
                }
        }
}

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-21  7:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.DEB.1.00.1001141353140.19018@pokey.mtv.corp.google.com>
2010-01-15  2:22 ` [PATCH v5] rps: Receive Packet Steering Changli Gao
2010-01-15  6:19   ` Eric Dumazet
2010-01-15  6:39     ` Changli Gao
2010-01-15  6:57       ` Eric Dumazet
2010-01-15  8:49         ` David Miller
2010-01-15  9:20           ` Changli Gao
2010-01-15  9:26             ` David Miller
2010-01-21  7:04               ` Changli Gao
2010-01-15  9:45           ` David Miller
2010-01-15  8:50   ` David Miller
2010-01-15  9:05     ` Changli Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).