netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] RPS: support 802.1q and pppoe session
@ 2010-03-25  4:30 Changli Gao
  2010-03-25  4:50 ` David Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Changli Gao @ 2010-03-25  4:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, xiaosuo, netdev

support 802.1q and pppoe session

Support 802.1q and pppoe session, and these two protocols can get the
benefit from RPS.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/core/dev.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a03aab4..647ecc4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,10 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 
+#ifdef CONFIG_SMP
+#include <linux/if_pppox.h>
+#endif
+
 #include "net-sysfs.h"
 
 /* Instead of increasing this, you should create a hash table. */
@@ -2190,7 +2194,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	struct rps_map *map;
 	int cpu = -1;
 	u8 ip_proto;
-	u32 addr1, addr2, ports, ihl;
+	__be16 protocol;
+	u32 addr1, addr2, ports, offset;
 
 	rcu_read_lock();
 
@@ -2214,26 +2219,43 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	if (skb->rxhash)
 		goto got_hash; /* Skip hash computation on packet header */
 
-	switch (skb->protocol) {
+	offset = 0;
+	protocol = skb->protocol;
+nest:
+	switch (protocol) {
+	case __constant_htons(ETH_P_8021Q):
+		if (!pskb_may_pull(skb, offset + VLAN_HLEN))
+			goto done;
+		protocol = ((struct vlan_hdr*)(skb->data +
+				offset))->h_vlan_encapsulated_proto;
+		offset += VLAN_HLEN;
+		goto nest;
+	case __constant_htons(ETH_P_PPP_SES):
+		if (!pskb_may_pull(skb, offset + PPPOE_SES_HLEN))
+			goto done;
+		protocol = *((__be16 *)(skb->data + offset +
+				sizeof(struct pppoe_hdr)));
+		offset += PPPOE_SES_HLEN;
+		goto nest;
 	case __constant_htons(ETH_P_IP):
-		if (!pskb_may_pull(skb, sizeof(*ip)))
+		if (!pskb_may_pull(skb, offset + sizeof(*ip)))
 			goto done;
 
 		ip = (struct iphdr *) skb->data;
 		ip_proto = ip->protocol;
 		addr1 = ip->saddr;
 		addr2 = ip->daddr;
-		ihl = ip->ihl;
+		offset += ip->ihl << 2;
 		break;
 	case __constant_htons(ETH_P_IPV6):
-		if (!pskb_may_pull(skb, sizeof(*ip6)))
+		if (!pskb_may_pull(skb, offset + sizeof(*ip6)))
 			goto done;
 
 		ip6 = (struct ipv6hdr *) skb->data;
 		ip_proto = ip6->nexthdr;
 		addr1 = ip6->saddr.s6_addr32[3];
 		addr2 = ip6->daddr.s6_addr32[3];
-		ihl = (40 >> 2);
+		offset += 40;
 		break;
 	default:
 		goto done;
@@ -2247,8 +2269,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	case IPPROTO_AH:
 	case IPPROTO_SCTP:
 	case IPPROTO_UDPLITE:
-		if (pskb_may_pull(skb, (ihl * 4) + 4))
-			ports = *((u32 *) (skb->data + (ihl * 4)));
+		if (pskb_may_pull(skb, offset + 4))
+			ports = *((u32 *) (skb->data + offset));
 		break;
 
 	default:



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  4:30 [PATCH] RPS: support 802.1q and pppoe session Changli Gao
@ 2010-03-25  4:50 ` David Miller
  2010-03-25  5:03 ` Eric Dumazet
  2010-03-25  5:13 ` Eric Dumazet
  2 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-03-25  4:50 UTC (permalink / raw)
  To: xiaosuo; +Cc: therbert, netdev

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 25 Mar 2010 12:30:33 +0800

> support 802.1q and pppoe session
> 
> Support 802.1q and pppoe session, and these two protocols can get the
> benefit from RPS.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

This is getting rediculious.

The TX hasher doesn't support this, neither should RPS.

Most of the patches you've been posting to RPS are very specialized
hacks and frankly not very welcome.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  4:30 [PATCH] RPS: support 802.1q and pppoe session Changli Gao
  2010-03-25  4:50 ` David Miller
@ 2010-03-25  5:03 ` Eric Dumazet
  2010-03-25  5:12   ` Changli Gao
  2010-03-25  5:13 ` Eric Dumazet
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-03-25  5:03 UTC (permalink / raw)
  To: xiaosuo; +Cc: David S. Miller, Tom Herbert, netdev

Le jeudi 25 mars 2010 à 12:30 +0800, Changli Gao a écrit :
> support 802.1q and pppoe session
> 
> Support 802.1q and pppoe session, and these two protocols can get the
> benefit from RPS.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
> net/core/dev.c | 38 ++++++++++++++++++++++++++++++--------
> 1 file changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a03aab4..647ecc4 100644

While this might sounds a good idea, you really should split this in two
parts.

By the way, why not handling IPIP too ?

Because I believe 802.1q part has no added value for instance, since
packet handled by CPUX will be decoded and passed to VLAN device, having
a chance to be fully taken by RPS, since we go back to netif_rx().

Probably same thing for IPIP / PPPOE can be discussed.

I agree we might need a flag or something to reset rxhash to 0 somewhere
(probably in non accelerated vlan rx handling) to force second
get_rps_cpu() invocation to recompute it. This small correction has no
cost if put outside of get_rps_cpus().

If get_rps_cpus() is too complex, it might become too slow for typical
use. We should find smart ways to solve your performance problem if they
ever exist.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  5:03 ` Eric Dumazet
@ 2010-03-25  5:12   ` Changli Gao
  2010-03-25  5:24     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Changli Gao @ 2010-03-25  5:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev

On Thu, Mar 25, 2010 at 1:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> While this might sounds a good idea, you really should split this in two
> parts.
>
> By the way, why not handling IPIP too ?

I'm not sure if it is a good idea to support VLAN and PPPOE, and
actually David don't like it. :(

>
> Because I believe 802.1q part has no added value for instance, since
> packet handled by CPUX will be decoded and passed to VLAN device, having
> a chance to be fully taken by RPS, since we go back to netif_rx().
>
> Probably same thing for IPIP / PPPOE can be discussed.

It is useful when Linux is run as a bridge.

>
> I agree we might need a flag or something to reset rxhash to 0 somewhere
> (probably in non accelerated vlan rx handling) to force second
> get_rps_cpu() invocation to recompute it. This small correction has no
> cost if put outside of get_rps_cpus().
>
> If get_rps_cpus() is too complex, it might become too slow for typical
> use. We should find smart ways to solve your performance problem if they
> ever exist.
>

It means that more than one IPI will be sent for just a single
packets, I don't think the cost is acceptable.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  4:30 [PATCH] RPS: support 802.1q and pppoe session Changli Gao
  2010-03-25  4:50 ` David Miller
  2010-03-25  5:03 ` Eric Dumazet
@ 2010-03-25  5:13 ` Eric Dumazet
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-03-25  5:13 UTC (permalink / raw)
  To: xiaosuo, David Miller; +Cc: Tom Herbert, netdev

Le jeudi 25 mars 2010 à 12:30 +0800, Changli Gao a écrit :
> +#ifdef CONFIG_SMP
> +#include <linux/if_pppox.h>
> +#endif

BTW, when I see this kind of illogical thing, I do think we should have
a CONFIG_RPS setting...

David, may I submit again my former patch adding CONFIG_RPS, but not a
user selectable boolean ?

[PATCH net-next-2.6] rps: add CONFIG_RPS

RPS currently depends on SMP and SYSFS

Adding a CONFIG_RPS makes sense in case this requirement changes in the
future. This patch saves about 1500 bytes of kernel text in case SMP is
on but SYSFS is off.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h |    4 ++++
 net/Kconfig               |    5 +++++
 net/core/dev.c            |   29 +++++++++++++++++++----------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c96c41e..53c272f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -531,6 +531,7 @@ struct netdev_queue {
 	unsigned long		tx_dropped;
 } ____cacheline_aligned_in_smp;
 
+#ifdef CONFIG_RPS
 /*
  * This structure holds an RPS map which can be of variable length.  The
  * map is an array of CPUs.
@@ -549,6 +550,7 @@ struct netdev_rx_queue {
 	struct netdev_rx_queue *first;
 	atomic_t count;
 } ____cacheline_aligned_in_smp;
+#endif
 
 /*
  * This structure defines the management hooks for network devices.
@@ -897,12 +899,14 @@ struct net_device {
 
 	unsigned char		broadcast[MAX_ADDR_LEN];	/* hw bcast add	*/
 
+#ifdef CONFIG_RPS
 	struct kset		*queues_kset;
 
 	struct netdev_rx_queue	*_rx;
 
 	/* Number of RX queues allocated at alloc_netdev_mq() time  */
 	unsigned int		num_rx_queues;
+#endif
 
 	struct netdev_queue	rx_queue;
 
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..6851464 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -203,6 +203,11 @@ source "net/ieee802154/Kconfig"
 source "net/sched/Kconfig"
 source "net/dcb/Kconfig"
 
+config RPS
+	boolean
+	depends on SMP && SYSFS
+	default y
+
 menu "Network testing"
 
 config NET_PKTGEN
diff --git a/net/core/dev.c b/net/core/dev.c
index 5e3dc28..bcb3ed2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2177,7 +2177,7 @@ int weight_p __read_mostly = 64;            /* old backlog weight */
 
 DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
@@ -2325,7 +2325,7 @@ enqueue:
 
 		/* Schedule NAPI for backlog device */
 		if (napi_schedule_prep(&queue->backlog)) {
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 			if (cpu != smp_processor_id()) {
 				struct rps_remote_softirq_cpus *rcpus =
 				    &__get_cpu_var(rps_remote_softirq_cpus);
@@ -2376,7 +2376,7 @@ int netif_rx(struct sk_buff *skb)
 	if (!skb->tstamp.tv64)
 		net_timestamp(skb);
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 	cpu = get_rps_cpu(skb->dev, skb);
 	if (cpu < 0)
 		cpu = smp_processor_id();
@@ -2750,7 +2750,7 @@ out:
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 	int cpu;
 
 	cpu = get_rps_cpu(skb->dev, skb);
@@ -3189,7 +3189,7 @@ void netif_napi_del(struct napi_struct *napi)
 }
 EXPORT_SYMBOL(netif_napi_del);
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 /*
  * net_rps_action sends any pending IPI's for rps.  This is only called from
  * softirq and interrupts must be enabled.
@@ -3214,7 +3214,7 @@ static void net_rx_action(struct softirq_action *h)
 	unsigned long time_limit = jiffies + 2;
 	int budget = netdev_budget;
 	void *have;
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 	int select;
 	struct rps_remote_softirq_cpus *rcpus;
 #endif
@@ -3280,7 +3280,7 @@ static void net_rx_action(struct softirq_action *h)
 		netpoll_poll_unlock(have);
 	}
 out:
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 	rcpus = &__get_cpu_var(rps_remote_softirq_cpus);
 	select = rcpus->select;
 	rcpus->select ^= 1;
@@ -5277,6 +5277,7 @@ int register_netdevice(struct net_device *dev)
 
 	dev->iflink = -1;
 
+#ifdef CONFIG_RPS
 	if (!dev->num_rx_queues) {
 		/*
 		 * Allocate a single RX queue if driver never called
@@ -5293,7 +5294,7 @@ int register_netdevice(struct net_device *dev)
 		atomic_set(&dev->_rx->count, 1);
 		dev->num_rx_queues = 1;
 	}
-
+#endif
 	/* Init, if this function is available */
 	if (dev->netdev_ops->ndo_init) {
 		ret = dev->netdev_ops->ndo_init(dev);
@@ -5653,11 +5654,13 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		void (*setup)(struct net_device *), unsigned int queue_count)
 {
 	struct netdev_queue *tx;
-	struct netdev_rx_queue *rx;
 	struct net_device *dev;
 	size_t alloc_size;
 	struct net_device *p;
+#ifdef CONFIG_RPS
+	struct netdev_rx_queue *rx;
 	int i;
+#endif
 
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
@@ -5683,6 +5686,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		goto free_p;
 	}
 
+#ifdef CONFIG_RPS
 	rx = kcalloc(queue_count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
 	if (!rx) {
 		printk(KERN_ERR "alloc_netdev: Unable to allocate "
@@ -5698,6 +5702,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	 */
 	for (i = 0; i < queue_count; i++)
 		rx[i].first = rx;
+#endif
 
 	dev = PTR_ALIGN(p, NETDEV_ALIGN);
 	dev->padded = (char *)dev - (char *)p;
@@ -5713,8 +5718,10 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	dev->num_tx_queues = queue_count;
 	dev->real_num_tx_queues = queue_count;
 
+#ifdef CONFIG_RPS
 	dev->_rx = rx;
 	dev->num_rx_queues = queue_count;
+#endif
 
 	dev->gso_max_size = GSO_MAX_SIZE;
 
@@ -5731,8 +5738,10 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	return dev;
 
 free_rx:
+#ifdef CONFIG_RPS
 	kfree(rx);
 free_tx:
+#endif
 	kfree(tx);
 free_p:
 	kfree(p);
@@ -6236,7 +6245,7 @@ static int __init net_dev_init(void)
 		queue->completion_queue = NULL;
 		INIT_LIST_HEAD(&queue->poll_list);
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RPS
 		queue->csd.func = trigger_softirq;
 		queue->csd.info = queue;
 		queue->csd.flags = 0;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  5:12   ` Changli Gao
@ 2010-03-25  5:24     ` Eric Dumazet
  2010-03-25  5:47       ` Changli Gao
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-03-25  5:24 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev

Le jeudi 25 mars 2010 à 13:12 +0800, Changli Gao a écrit :
> On Thu, Mar 25, 2010 at 1:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > While this might sounds a good idea, you really should split this in two
> > parts.
> >
> > By the way, why not handling IPIP too ?
> 
> I'm not sure if it is a good idea to support VLAN and PPPOE, and
> actually David don't like it. :(
> 
> >
> > Because I believe 802.1q part has no added value for instance, since
> > packet handled by CPUX will be decoded and passed to VLAN device, having
> > a chance to be fully taken by RPS, since we go back to netif_rx().
> >
> > Probably same thing for IPIP / PPPOE can be discussed.
> 
> It is useful when Linux is run as a bridge.
> 

I am not saying its not useful. BTW, for routers/bridges, RPS is not a
good idea, unless you must add netfilter complex rules.

> >
> > I agree we might need a flag or something to reset rxhash to 0 somewhere
> > (probably in non accelerated vlan rx handling) to force second
> > get_rps_cpu() invocation to recompute it. This small correction has no
> > cost if put outside of get_rps_cpus().
> >
> > If get_rps_cpus() is too complex, it might become too slow for typical
> > use. We should find smart ways to solve your performance problem if they
> > ever exist.
> >
> 
> It means that more than one IPI will be sent for just a single
> packets, I don't think the cost is acceptable.
> 

I believe you dont _fully_ understand how RPS currently works.

I am very surprised you send RPS patches if you dont master it.

Please read again get_rps_cpus(), line 2238

        default:
                goto done;   <<<< HERE skb->rxhash unchanged >>>>
        }
        ports = 0;
        switch (ip_proto) {


This means that unknown protocols are directly handled by THIS cpu, and
not given to another cpu. No IPI involved.

In case of tunnels or vlan, we then reenter lowlevel stack and at this
point, we can fully use RPS, because we are able to find IPV4/IPV6
headers.

Your patch is not necessary, since next time we call get_rps_cpus(),
rxhash being still null, we compute the correct non null hash and at
this point can choose an appropriate target cpu.

(All you need is to set /sys/class/net/vlan.825/queues/rx-0/rps_cpus to
needed mask)





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  5:24     ` Eric Dumazet
@ 2010-03-25  5:47       ` Changli Gao
  2010-03-25  5:58         ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Changli Gao @ 2010-03-25  5:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev

On Thu, Mar 25, 2010 at 1:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 25 mars 2010 à 13:12 +0800, Changli Gao a écrit :
>>
>> It is useful when Linux is run as a bridge.
>>
>
> I am not saying its not useful. BTW, for routers/bridges, RPS is not a
> good idea, unless you must add netfilter complex rules.
>

Yea, we do DPI in netfilter. And for a stateful fireware, connection
tracking isn't cheap. As bandwidth increases, we find one CPU can't
handle all the traffic from a single NIC. We currently use dynamic
weighted packets distributing algorithm with patched Linux-2.6.18, and
it works very well.

>> >
>> > I agree we might need a flag or something to reset rxhash to 0 somewhere
>> > (probably in non accelerated vlan rx handling) to force second
>> > get_rps_cpu() invocation to recompute it. This small correction has no
>> > cost if put outside of get_rps_cpus().
>> >
>> > If get_rps_cpus() is too complex, it might become too slow for typical
>> > use. We should find smart ways to solve your performance problem if they
>> > ever exist.
>> >
>>
>> It means that more than one IPI will be sent for just a single
>> packets, I don't think the cost is acceptable.
>>
>
> I believe you dont _fully_ understand how RPS currently works.
>
> I am very surprised you send RPS patches if you dont master it.
>
> Please read again get_rps_cpus(), line 2238
>
>        default:
>                goto done;   <<<< HERE skb->rxhash unchanged >>>>
>        }
>        ports = 0;
>        switch (ip_proto) {
>
>
> This means that unknown protocols are directly handled by THIS cpu, and
> not given to another cpu. No IPI involved.
>
> In case of tunnels or vlan, we then reenter lowlevel stack and at this
> point, we can fully use RPS, because we are able to find IPV4/IPV6
> headers.
>
> Your patch is not necessary, since next time we call get_rps_cpus(),
> rxhash being still null, we compute the correct non null hash and at
> this point can choose an appropriate target cpu.
>
> (All you need is to set /sys/class/net/vlan.825/queues/rx-0/rps_cpus to
> needed mask)
>

I knew the current code is OK, and no additional IPI is needed. I said
that because you said:

>> > I agree we might need a flag or something to reset rxhash to 0 somewhere
>> > (probably in non accelerated vlan rx handling) to force second
>> > get_rps_cpu() invocation to recompute it. This small correction has no
>> > cost if put outside of get_rps_cpus().

Oh, maybe I misunderstood you words. I thought the rxhash you want to
clear is computed by get_rps_cpu()? I remembered some NIC itself can
get 5-tuple from vlan and pppoe packets to compute hash. In that case,
we should not clear rxhash.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  5:47       ` Changli Gao
@ 2010-03-25  5:58         ` Eric Dumazet
  2010-03-25  6:09           ` Changli Gao
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-03-25  5:58 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev

Le jeudi 25 mars 2010 à 13:47 +0800, Changli Gao a écrit :

> Yea, we do DPI in netfilter. And for a stateful fireware, connection
> tracking isn't cheap. As bandwidth increases, we find one CPU can't
> handle all the traffic from a single NIC. We currently use dynamic
> weighted packets distributing algorithm with patched Linux-2.6.18, and
> it works very well.
> 

Hmm... we added RCU to conntrack last year only, so with 2.6.18
conntrack hits a global lock.

tcp conntrack also uses another global lock, this is not yet converted,
even in 2.6.33.
How can this scale ?


> Oh, maybe I misunderstood you words. I thought the rxhash you want to
> clear is computed by get_rps_cpu()? I remembered some NIC itself can
> get 5-tuple from vlan and pppoe packets to compute hash. In that case,
> we should not clear rxhash.
> 

rxhash is provided/computed only if its possible, and stay 0 if its not
possible to compute it :)

AFAIK, at this point, no network driver currently provides a rxhash. If
you know some NIC can provide it, please submit a patch :)




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] RPS: support 802.1q and pppoe session
  2010-03-25  5:58         ` Eric Dumazet
@ 2010-03-25  6:09           ` Changli Gao
  0 siblings, 0 replies; 9+ messages in thread
From: Changli Gao @ 2010-03-25  6:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev

On Thu, Mar 25, 2010 at 1:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 25 mars 2010 à 13:47 +0800, Changli Gao a écrit :
>
>
> Hmm... we added RCU to conntrack last year only, so with 2.6.18
> conntrack hits a global lock.
>
> tcp conntrack also uses another global lock, this is not yet converted,
> even in 2.6.33.
> How can this scale ?
>

In our case, conntrack isn't a big issue, but DPI is. So we use per
connrack lock other than the global conntrack lock, and distributing
the two sides traffic belongs to the same conntrack to the same CPU,
so in normal case, there isn't any lock contention.

>
> rxhash is provided/computed only if its possible, and stay 0 if its not
> possible to compute it :)
>
> AFAIK, at this point, no network driver currently provides a rxhash. If
> you know some NIC can provide it, please submit a patch :)
>

It is a part of a private NP, and its code isn't in mainline. :(

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-25  6:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-25  4:30 [PATCH] RPS: support 802.1q and pppoe session Changli Gao
2010-03-25  4:50 ` David Miller
2010-03-25  5:03 ` Eric Dumazet
2010-03-25  5:12   ` Changli Gao
2010-03-25  5:24     ` Eric Dumazet
2010-03-25  5:47       ` Changli Gao
2010-03-25  5:58         ` Eric Dumazet
2010-03-25  6:09           ` Changli Gao
2010-03-25  5:13 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).