Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] deal with if frags[0].size is pulled to 0 in dev_gro_receive()
From: David Miller @ 2010-08-03  5:03 UTC (permalink / raw)
  To: herbert; +Cc: xiaohui.xin, netdev
In-Reply-To: <20100803045637.GA14173@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 3 Aug 2010 12:56:38 +0800

> On Tue, Aug 03, 2010 at 11:17:19AM +0800, xiaohui.xin@intel.com wrote:
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>> 
>> Now in dev_gro_receive(), if frags[0].size is pulled to 0, memmove is called and
>> the null page is released. But it's not enough, we should reset size of each frags
>> left as well. Compared to this, we can have another way to do this, it's not do do
>> anything at all.
>> 
>> Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
> 
> This patch can only work if you audit everything that uses skb
> frags to ensure that they can tolerate a zero-sided frag.
> 
> I think it's much easier to just fix the memmove.

Agreed.

^ permalink raw reply

* Re: [PATCH] Fixes a typo from "dev" to "ndev"
From: David Miller @ 2010-08-03  5:03 UTC (permalink / raw)
  To: henrique.camargo
  Cc: chaithrika, srk, khilman, jpirko, netdev, linux-kernel, segooon
In-Reply-To: <1280805042.2089.6.camel@lemming>

From: Henrique Camargo <henrique.camargo@ensitec.com.br>
Date: Tue, 03 Aug 2010 00:10:42 -0300

> The typo was causing compilation errors since "dev" was not defined.
> 
> Signed-off-by: Henrique Camargo <henrique.camargo@ensitec.com.br>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] deal with if frags[0].size is pulled to 0 in dev_gro_receive()
From: Herbert Xu @ 2010-08-03  4:56 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, davem
In-Reply-To: <1280805439-18988-1-git-send-email-xiaohui.xin@intel.com>

On Tue, Aug 03, 2010 at 11:17:19AM +0800, xiaohui.xin@intel.com wrote:
> From: Xin Xiaohui <xiaohui.xin@intel.com>
> 
> Now in dev_gro_receive(), if frags[0].size is pulled to 0, memmove is called and
> the null page is released. But it's not enough, we should reset size of each frags
> left as well. Compared to this, we can have another way to do this, it's not do do
> anything at all.
> 
> Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>

This patch can only work if you audit everything that uses skb
frags to ensure that they can tolerate a zero-sided frag.

I think it's much easier to just fix the memmove.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
From: David Miller @ 2010-08-03  4:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, alexander.h.duyck
In-Reply-To: <20100803005849.4678.10583.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 02 Aug 2010 17:59:04 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change corrects an issue that resulted in a null pointer dereference
> for the addition of VLAN 0 without any VLANs being registered.  Also this
> code removes some unnecessary checks for defines and the unnecessary setting
> of VLAN flags since that is now handled within the kernel via the
> vlan_features.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] igb: Use irq_synchronize per vector when using MSI-X
From: David Miller @ 2010-08-03  4:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, jdelvare, emil.s.tantilov
In-Reply-To: <20100803004044.4441.34335.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 02 Aug 2010 17:40:52 -0700

> From: Emil Tantilov <emil.s.tantilov@intel.com>
> 
> Synchronize all IRQs when using MSI-X. Similar to ixgbe.
> Issue was reported on e1000e, but the patch is also valid for igb.
> 
> CC: Jean Delvare <jdelvare@suse.de>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] e1000e: update to workaround for jumbo frames on 82577
From: David Miller @ 2010-08-03  4:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, bruce.w.allan
In-Reply-To: <AANLkTi=TvRmXxJu1+y9B8x7G55Ur9wNhYzDZ8Q=+HJRJ@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 2 Aug 2010 17:36:56 -0700

> On Mon, Aug 2, 2010 at 17:27, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
>> From: Bruce Allan <bruce.w.allan@intel.com>
>>
>> For OEM systems with this part that also has Spread Spectrum Clocking (SSC)
>> enabled in the BIOS, there is an Rx performance issue with 4K jumbo frames.
>> Leaving the defaults in PHY page 770 register 26 resolves the issue, and
>> does not negatively impact jumbo frames on systems with SSC disabled.
>>
>> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> ---
>>
>>  drivers/net/e1000e/netdev.c |    5 -----
>>  1 files changed, 0 insertions(+), 5 deletions(-)
>>
> 
> Please disregard this patch, it was sent out accidentally (my bad).
> During testing issues were found and changes need to be made to this
> patch.

Ok.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] e1000e: Fix irq_synchronize in MSI-X case
From: David Miller @ 2010-08-03  4:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, gospo, bphilips, jdelvare, jesse.brandeburg,
	bruce.w.allan
In-Reply-To: <20100803002721.4179.75916.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 02 Aug 2010 17:27:23 -0700

> Based on original patch/work from Jean Delvare <jdelvare@suse.de>
> Synchronize all IRQs when in MSI-X IRQ mode.
> 
> Jean's original patch hard coded the sync with the 3 possible vectors,
> this patch incorporates more flexibility for the future and aligns
> with how igb stores the number of vectors into the adapter structure.
> 
> CC: Jean Delvare <jdelvare@suse.de>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Acked-by: Bruce Allan <bruce.w.allan@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] e1000e: register pm_qos request on hardware activation
From: David Miller @ 2010-08-03  4:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, florian
In-Reply-To: <20100803002622.4179.31850.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 02 Aug 2010 17:27:00 -0700

> From: Florian Mickler <florian@mickler.org>
> 
> The pm_qos_add_request call has to register the pm_qos request with the pm_qos
> susbsystem before first use of the pm_qos request via
> pm_qos_update_request.
> 
> As pm_qos changed to use plists there is no benefit in registering and
> unregistering the pm_qos request on ifup/ifdown and thus we move the
> registering into e1000_open and the unregistering in e1000_close.
> 
> This fixes the following warning:
 ...
> Signed-off-by: Florian Mickler <florian@mickler.org>
> Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH] e1000e: 82577/82578 PHY register access issues
From: David Miller @ 2010-08-03  4:10 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, stable, bruce.w.allan
In-Reply-To: <AANLkTi=P_+3wpmAA5+eqaWoGE39OkZPFhRXEuXGypAco@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 2 Aug 2010 18:04:43 -0700

> Have you sync'd up your net-next-2.6 tree with net-2.6 tree?  Because
> I do not see this change in the net-next-2.6 tree.

No, I plan to do that tonight or tomorrow.

^ permalink raw reply

* Re: [PATCH v3 1/2] core: Factor out flow calculation from get_rps_cpu
From: Changli Gao @ 2010-08-03  4:05 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, arnd, bhutchings, netdev, therbert, mst
In-Reply-To: <20100803030256.8486.82622.sendpatchset@krkumar2.in.ibm.com>

On Tue, Aug 3, 2010 at 11:02 AM, Krishna Kumar <krkumar2@in.ibm.com> wrote:
> From: Krishna Kumar <krkumar2@in.ibm.com>
>
> Factor out flow calculation code from get_rps_cpu, since macvtap
> driver can use the same code.
>
> Revisions:
>
> v2 - Ben: Separate flow calcuation out and use in select queue
> v3 - Arnd: Don't re-implement MIN
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>  include/linux/netdevice.h |    1
>  net/core/dev.c            |   94 ++++++++++++++++++++++--------------
>  2 files changed, 59 insertions(+), 36 deletions(-)
>
> diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
> --- org/include/linux/netdevice.h       2010-08-03 08:19:57.000000000 +0530
> +++ new/include/linux/netdevice.h       2010-08-03 08:19:57.000000000 +0530
> @@ -2253,6 +2253,7 @@ static inline const char *netdev_name(co
>        return dev->name;
>  }
>
> +extern int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb);
>  extern int netdev_printk(const char *level, const struct net_device *dev,
>                         const char *format, ...)
>        __attribute__ ((format (printf, 3, 4)));
> diff -ruNp org/net/core/dev.c new/net/core/dev.c
> --- org/net/core/dev.c  2010-08-03 08:19:57.000000000 +0530
> +++ new/net/core/dev.c  2010-08-03 08:19:57.000000000 +0530
> @@ -2263,51 +2263,24 @@ static inline void ____napi_schedule(str
>        __raise_softirq_irqoff(NET_RX_SOFTIRQ);
>  }
>
> -#ifdef CONFIG_RPS
> -
> -/* One global table that all flow-based protocols share. */
> -struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
> -EXPORT_SYMBOL(rps_sock_flow_table);
> -
>  /*
> - * get_rps_cpu is called from netif_receive_skb and returns the target
> - * CPU from the RPS map of the receiving queue for a given skb.
> - * rcu_read_lock must be held on entry.
> + * skb_calculate_flow: calculate a flow hash based on src/dst addresses
> + * and src/dst port numbers. On success, returns a hash number (> 0),
> + * otherwise -1.
>  */
> -static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
> -                      struct rps_dev_flow **rflowp)
> +int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb)
>  {
> +       int hash = skb->rxhash;
>        struct ipv6hdr *ip6;
>        struct iphdr *ip;
> -       struct netdev_rx_queue *rxqueue;
> -       struct rps_map *map;
> -       struct rps_dev_flow_table *flow_table;
> -       struct rps_sock_flow_table *sock_flow_table;
> -       int cpu = -1;
>        u8 ip_proto;
> -       u16 tcpu;
>        u32 addr1, addr2, ihl;
>        union {
>                u32 v32;
>                u16 v16[2];
>        } ports;
>
> -       if (skb_rx_queue_recorded(skb)) {
> -               u16 index = skb_get_rx_queue(skb);
> -               if (unlikely(index >= dev->num_rx_queues)) {
> -                       WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
> -                               "on queue %u, but number of RX queues is %u\n",
> -                               dev->name, index, dev->num_rx_queues);
> -                       goto done;
> -               }
> -               rxqueue = dev->_rx + index;
> -       } else
> -               rxqueue = dev->_rx;
> -
> -       if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
> -               goto done;
> -
> -       if (skb->rxhash)
> +       if (hash)
>                goto got_hash; /* Skip hash computation on packet header */
>
>        switch (skb->protocol) {
> @@ -2334,6 +2307,7 @@ static int get_rps_cpu(struct net_device
>        default:
>                goto done;
>        }
> +
>        switch (ip_proto) {
>        case IPPROTO_TCP:
>        case IPPROTO_UDP:
> @@ -2356,11 +2330,59 @@ static int get_rps_cpu(struct net_device
>        /* get a consistent hash (same value on both flow directions) */
>        if (addr2 < addr1)
>                swap(addr1, addr2);
> -       skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
> -       if (!skb->rxhash)
> -               skb->rxhash = 1;
> +
> +       hash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
> +       if (!hash)
> +               hash = 1;
>
>  got_hash:
> +       return hash;
> +
> +done:
> +       return -1;
> +}
> +EXPORT_SYMBOL(skb_calculate_flow);

I have noticed that you use skb_calculate_flow() in
macvtap_get_queue() where skb->data doesn't point to the network
header but the ethernet header. However, skb_calculate_flow() assume
skb->data points to the network header. There are two choices:
 * update skb_calculate_flow to support called in ethernet layer.
 * pull skb before skb_calculate_flow, and push skb after
skb_calculate_flow() in macvtap_get_queue().

I prefer the former way.

BTW: the function name skb_calculate_flow isn't good. How about
skb_get_rxhash(). Maybe we can implement two versions: fast path and
slow path. And implement the fast path version as a inline function in
skbuff.h.

static inline u32 skb_get_rxhash(struct sk_buff *skb)
{
        u32 rxhash;

        rxhash = skb->rxhash;
        if (!rxhash)
                return __skb_get_rxhash(skb);
        return rxhash;
}


> +
> +#ifdef CONFIG_RPS
> +
> +/* One global table that all flow-based protocols share. */
> +struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
> +EXPORT_SYMBOL(rps_sock_flow_table);
> +
> +/*
> + * get_rps_cpu is called from netif_receive_skb and returns the target
> + * CPU from the RPS map of the receiving queue for a given skb.
> + * rcu_read_lock must be held on entry.
> + */
> +static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
> +                      struct rps_dev_flow **rflowp)
> +{
> +       struct netdev_rx_queue *rxqueue;
> +       struct rps_map *map;
> +       struct rps_dev_flow_table *flow_table;
> +       struct rps_sock_flow_table *sock_flow_table;
> +       int cpu = -1;
> +       u16 tcpu;
> +
> +       if (skb_rx_queue_recorded(skb)) {
> +               u16 index = skb_get_rx_queue(skb);
> +               if (unlikely(index >= dev->num_rx_queues)) {
> +                       WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
> +                               "on queue %u, but number of RX queues is %u\n",
> +                               dev->name, index, dev->num_rx_queues);
> +                       goto done;
> +               }
> +               rxqueue = dev->_rx + index;
> +       } else
> +               rxqueue = dev->_rx;
> +
> +       if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
> +               goto done;
> +
> +       skb->rxhash = skb_calculate_flow(dev, skb);
> +       if (skb->rxhash < 0)
> +               goto done;
> +
>        flow_table = rcu_dereference(rxqueue->rps_flow_table);
>        sock_flow_table = rcu_dereference(rps_sock_flow_table);
>        if (flow_table && sock_flow_table) {
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH] Fixes a typo from "dev" to "ndev"
From: Henrique Camargo @ 2010-08-03  3:10 UTC (permalink / raw)
  To: David Miller
  Cc: chaithrika, srk, khilman, jpirko, netdev, linux-kernel, segooon
In-Reply-To: <20100730.222034.183058824.davem@davemloft.net>

The typo was causing compilation errors since "dev" was not defined.

Signed-off-by: Henrique Camargo <henrique.camargo@ensitec.com.br>
---
 drivers/net/davinci_emac.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/davinci_emac.c b/drivers/net/davinci_emac.c
index 25e14d2..b89b7bf 100644
--- a/drivers/net/davinci_emac.c
+++ b/drivers/net/davinci_emac.c
@@ -1182,8 +1182,8 @@ static int emac_net_tx_complete(struct emac_priv *priv,
 	struct net_device *ndev = priv->ndev;
 	u32 cnt;
 
-	if (unlikely(num_tokens && netif_queue_stopped(dev)))
-		netif_start_queue(dev);
+	if (unlikely(num_tokens && netif_queue_stopped(ndev)))
+		netif_start_queue(ndev);
 	for (cnt = 0; cnt < num_tokens; cnt++) {
 		struct sk_buff *skb = (struct sk_buff *)net_data_tokens[cnt];
 		if (skb == NULL)
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH v3 2/2] macvtap: Implement multiqueue macvtap driver
From: Krishna Kumar @ 2010-08-03  3:03 UTC (permalink / raw)
  To: davem, arnd; +Cc: bhutchings, netdev, mst, Krishna Kumar, therbert
In-Reply-To: <20100803030256.8486.82622.sendpatchset@krkumar2.in.ibm.com>

From: Krishna Kumar <krkumar2@in.ibm.com>

Implement multiqueue facility for macvtap driver. The idea is that
a macvtap device can be opened multiple times and the fd's can be
used to register eg, as backend for vhost.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/macvtap.c      |   89 ++++++++++++++++++++++++++++-------
 include/linux/if_macvlan.h |    9 +++
 2 files changed, 80 insertions(+), 18 deletions(-)

diff -ruNp org/include/linux/if_macvlan.h new/include/linux/if_macvlan.h
--- org/include/linux/if_macvlan.h	2010-08-03 08:19:57.000000000 +0530
+++ new/include/linux/if_macvlan.h	2010-08-03 08:20:39.000000000 +0530
@@ -40,6 +40,12 @@ struct macvlan_rx_stats {
 	unsigned long		rx_errors;
 };
 
+/*
+ * Maximum times a macvtap device can be opened. This can be used to
+ * configure the number of receive queue, e.g. for multiqueue virtio.
+ */
+#define MAX_MACVTAP_QUEUES	(NR_CPUS < 16 ? NR_CPUS : 16)
+
 struct macvlan_dev {
 	struct net_device	*dev;
 	struct list_head	list;
@@ -50,7 +56,8 @@ struct macvlan_dev {
 	enum macvlan_mode	mode;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
-	struct macvtap_queue	*tap;
+	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
+	int			numvtaps;
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
diff -ruNp org/drivers/net/macvtap.c new/drivers/net/macvtap.c
--- org/drivers/net/macvtap.c	2010-08-03 08:19:57.000000000 +0530
+++ new/drivers/net/macvtap.c	2010-08-03 08:19:57.000000000 +0530
@@ -84,26 +84,45 @@ static const struct proto_ops macvtap_so
 static DEFINE_SPINLOCK(macvtap_lock);
 
 /*
- * Choose the next free queue, for now there is only one
+ * get_slot: return a [unused/occupied] slot in vlan->taps[]:
+ *	- if 'q' is NULL, return the first empty slot;
+ *	- otherwise, return the slot this pointer occupies.
  */
+static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
+{
+	int i;
+
+	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
+		if (rcu_dereference(vlan->taps[i]) == q)
+			return i;
+	}
+
+	/* Should never happen */
+	BUG_ON(1);
+}
+
 static int macvtap_set_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	int index;
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
-	if (rcu_dereference(vlan->tap))
+	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
 		goto out;
 
 	err = 0;
+	index = get_slot(vlan, NULL);
 	rcu_assign_pointer(q->vlan, vlan);
-	rcu_assign_pointer(vlan->tap, q);
+	rcu_assign_pointer(vlan->taps[index], q);
 	sock_hold(&q->sk);
 
 	q->file = file;
 	file->private_data = q;
 
+	vlan->numvtaps++;
+
 out:
 	spin_unlock(&macvtap_lock);
 	return err;
@@ -124,9 +143,12 @@ static void macvtap_put_queue(struct mac
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference(q->vlan);
 	if (vlan) {
-		rcu_assign_pointer(vlan->tap, NULL);
+		int index = get_slot(vlan, q);
+
+		rcu_assign_pointer(vlan->taps[index], NULL);
 		rcu_assign_pointer(q->vlan, NULL);
 		sock_put(&q->sk);
+		--vlan->numvtaps;
 	}
 
 	spin_unlock(&macvtap_lock);
@@ -136,39 +158,72 @@ static void macvtap_put_queue(struct mac
 }
 
 /*
- * Since we only support one queue, just dereference the pointer.
+ * Select a queue based on the rxq of the device on which this packet
+ * arrived. If the incoming device is not mq, calculate a flow hash to
+ * select a queue. vlan->numvtaps is cached in case it reduces during
+ * the execution of this function.
  */
 static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 					       struct sk_buff *skb)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct macvtap_queue *tap = NULL;
+	int numvtaps = vlan->numvtaps;
+	u16 rxq;
+
+	if (!numvtaps)
+		goto out;
+
+	if (likely(skb_rx_queue_recorded(skb))) {
+		rxq = skb_get_rx_queue(skb);
+
+		while (unlikely(rxq >= numvtaps))
+			rxq -= numvtaps;
 
-	return rcu_dereference(vlan->tap);
+		tap = rcu_dereference(vlan->taps[rxq]);
+		if (tap)
+			goto out;
+	}
+
+	rxq = skb_calculate_flow(dev, skb);
+	if (rxq < 0)
+		rxq = smp_processor_id();
+
+	tap = rcu_dereference(vlan->taps[rxq & (numvtaps - 1)]);
+
+out:
+	return tap;
 }
 
 /*
  * The net_device is going away, give up the reference
- * that it holds on the queue (all the queues one day)
- * and safely set the pointer from the queues to NULL.
+ * that it holds on all queues and safely set the pointer
+ * from the queues to NULL.
  */
 static void macvtap_del_queues(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	struct macvtap_queue *q;
+	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
+	int i, j = 0;
 
+	/* macvtap_put_queue can free some slots, so go through all slots */
 	spin_lock(&macvtap_lock);
-	q = rcu_dereference(vlan->tap);
-	if (!q) {
-		spin_unlock(&macvtap_lock);
-		return;
+	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+		q = rcu_dereference(vlan->taps[i]);
+		if (q) {
+			qlist[j++] = q;
+			rcu_assign_pointer(vlan->taps[i], NULL);
+			rcu_assign_pointer(q->vlan, NULL);
+			vlan->numvtaps--;
+		}
 	}
-
-	rcu_assign_pointer(vlan->tap, NULL);
-	rcu_assign_pointer(q->vlan, NULL);
+	BUG_ON(vlan->numvtaps != 0);
 	spin_unlock(&macvtap_lock);
 
 	synchronize_rcu();
-	sock_put(&q->sk);
+
+	for (--j; j >= 0; j--)
+		sock_put(&qlist[j]->sk);
 }
 
 /*

^ permalink raw reply

* [PATCH v3 1/2] core: Factor out flow calculation from get_rps_cpu
From: Krishna Kumar @ 2010-08-03  3:02 UTC (permalink / raw)
  To: davem, arnd; +Cc: bhutchings, netdev, therbert, Krishna Kumar, mst

From: Krishna Kumar <krkumar2@in.ibm.com>

Factor out flow calculation code from get_rps_cpu, since macvtap
driver can use the same code.

Revisions:

v2 - Ben: Separate flow calcuation out and use in select queue
v3 - Arnd: Don't re-implement MIN

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/linux/netdevice.h |    1 
 net/core/dev.c            |   94 ++++++++++++++++++++++--------------
 2 files changed, 59 insertions(+), 36 deletions(-)

diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2010-08-03 08:19:57.000000000 +0530
+++ new/include/linux/netdevice.h	2010-08-03 08:19:57.000000000 +0530
@@ -2253,6 +2253,7 @@ static inline const char *netdev_name(co
 	return dev->name;
 }
 
+extern int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb);
 extern int netdev_printk(const char *level, const struct net_device *dev,
 			 const char *format, ...)
 	__attribute__ ((format (printf, 3, 4)));
diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c	2010-08-03 08:19:57.000000000 +0530
+++ new/net/core/dev.c	2010-08-03 08:19:57.000000000 +0530
@@ -2263,51 +2263,24 @@ static inline void ____napi_schedule(str
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
 
-#ifdef CONFIG_RPS
-
-/* One global table that all flow-based protocols share. */
-struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
-EXPORT_SYMBOL(rps_sock_flow_table);
-
 /*
- * get_rps_cpu is called from netif_receive_skb and returns the target
- * CPU from the RPS map of the receiving queue for a given skb.
- * rcu_read_lock must be held on entry.
+ * skb_calculate_flow: calculate a flow hash based on src/dst addresses
+ * and src/dst port numbers. On success, returns a hash number (> 0),
+ * otherwise -1.
  */
-static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
-		       struct rps_dev_flow **rflowp)
+int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb)
 {
+	int hash = skb->rxhash;
 	struct ipv6hdr *ip6;
 	struct iphdr *ip;
-	struct netdev_rx_queue *rxqueue;
-	struct rps_map *map;
-	struct rps_dev_flow_table *flow_table;
-	struct rps_sock_flow_table *sock_flow_table;
-	int cpu = -1;
 	u8 ip_proto;
-	u16 tcpu;
 	u32 addr1, addr2, ihl;
 	union {
 		u32 v32;
 		u16 v16[2];
 	} ports;
 
-	if (skb_rx_queue_recorded(skb)) {
-		u16 index = skb_get_rx_queue(skb);
-		if (unlikely(index >= dev->num_rx_queues)) {
-			WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
-				"on queue %u, but number of RX queues is %u\n",
-				dev->name, index, dev->num_rx_queues);
-			goto done;
-		}
-		rxqueue = dev->_rx + index;
-	} else
-		rxqueue = dev->_rx;
-
-	if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
-		goto done;
-
-	if (skb->rxhash)
+	if (hash)
 		goto got_hash; /* Skip hash computation on packet header */
 
 	switch (skb->protocol) {
@@ -2334,6 +2307,7 @@ static int get_rps_cpu(struct net_device
 	default:
 		goto done;
 	}
+
 	switch (ip_proto) {
 	case IPPROTO_TCP:
 	case IPPROTO_UDP:
@@ -2356,11 +2330,59 @@ static int get_rps_cpu(struct net_device
 	/* get a consistent hash (same value on both flow directions) */
 	if (addr2 < addr1)
 		swap(addr1, addr2);
-	skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
-	if (!skb->rxhash)
-		skb->rxhash = 1;
+
+	hash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
+	if (!hash)
+		hash = 1;
 
 got_hash:
+	return hash;
+
+done:
+	return -1;
+}
+EXPORT_SYMBOL(skb_calculate_flow);
+
+#ifdef CONFIG_RPS
+
+/* One global table that all flow-based protocols share. */
+struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
+EXPORT_SYMBOL(rps_sock_flow_table);
+
+/*
+ * get_rps_cpu is called from netif_receive_skb and returns the target
+ * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
+ */
+static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
+		       struct rps_dev_flow **rflowp)
+{
+	struct netdev_rx_queue *rxqueue;
+	struct rps_map *map;
+	struct rps_dev_flow_table *flow_table;
+	struct rps_sock_flow_table *sock_flow_table;
+	int cpu = -1;
+	u16 tcpu;
+
+	if (skb_rx_queue_recorded(skb)) {
+		u16 index = skb_get_rx_queue(skb);
+		if (unlikely(index >= dev->num_rx_queues)) {
+			WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
+				"on queue %u, but number of RX queues is %u\n",
+				dev->name, index, dev->num_rx_queues);
+			goto done;
+		}
+		rxqueue = dev->_rx + index;
+	} else
+		rxqueue = dev->_rx;
+
+	if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
+		goto done;
+
+	skb->rxhash = skb_calculate_flow(dev, skb);
+	if (skb->rxhash < 0)
+		goto done;
+
 	flow_table = rcu_dereference(rxqueue->rps_flow_table);
 	sock_flow_table = rcu_dereference(rps_sock_flow_table);
 	if (flow_table && sock_flow_table) {

^ permalink raw reply

* [PATCH] deal with if frags[0].size is pulled to 0 in dev_gro_receive()
From: xiaohui.xin @ 2010-08-03  3:17 UTC (permalink / raw)
  To: netdev, herbert, davem; +Cc: Xin Xiaohui

From: Xin Xiaohui <xiaohui.xin@intel.com>

Now in dev_gro_receive(), if frags[0].size is pulled to 0, memmove is called and
the null page is released. But it's not enough, we should reset size of each frags
left as well. Compared to this, we can have another way to do this, it's not do do
anything at all.

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>

---
 net/core/dev.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 264137f..28cdbbf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2730,13 +2730,6 @@ pull:
 
 		skb_shinfo(skb)->frags[0].page_offset += grow;
 		skb_shinfo(skb)->frags[0].size -= grow;
-
-		if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
-			put_page(skb_shinfo(skb)->frags[0].page);
-			memmove(skb_shinfo(skb)->frags,
-				skb_shinfo(skb)->frags + 1,
-				--skb_shinfo(skb)->nr_frags);
-		}
 	}
 
 ok:
-- 
1.5.4.4


^ permalink raw reply related

* RE: Is it a possible bug in dev_gro_receive()?
From: Xin, Xiaohui @ 2010-08-03  2:33 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: netdev@vger.kernel.org, herbert@gondor.apana.org.au,
	davem@davemloft.net
In-Reply-To: <20100802102906.GA8439@ff.dom.local>

>-----Original Message-----
>From: Jarek Poplawski [mailto:jarkao2@gmail.com]
>Sent: Monday, August 02, 2010 6:29 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; herbert@gondor.apana.org.au; davem@davemloft.net
>Subject: Re: Is it a possible bug in dev_gro_receive()?
>
>Xin Xiaohui wrote:
>> I looked into the code dev_gro_receive(), found the code here:
>> if the frags[0] is pulled to 0, then the page will be released,
>> and memmove() frags left.
>> Is that right? I'm not sure if memmove do right or not, but
>> frags[0].size is never set after memove at least. what I think
>> a simple way is not to do anything if we found frags[0].size == 0.
>> The patch is as followed.
>>
>> Or am I missing something here?
>
>I think, you're right, but fixing memmove looks nicer to me:
>
> -	--skb_shinfo(skb)->nr_frags);
> +	--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));
>
>Jarek P.

Is there a little hurt of performance to do memmove() if skb_shinfo(skb)->nr_frags is large?
We're now working on the zero-copy patches based on napi_gro_frags() interface, and in 
this case, we have found a lot of skbs which frags[0] is pulled to 0. And after the memmove is
fixed, each frags[x].size is needed to modify too.
So I think don't do anything is better. Or is there any side effect with a null page in the stack?

Thanks
Xiaohui
>
>>
>> ---
>>  net/core/dev.c |    7 -------
>>  1 files changed, 0 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 264137f..28cdbbf 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2730,13 +2730,6 @@ pull:
>>
>>  		skb_shinfo(skb)->frags[0].page_offset += grow;
>>  		skb_shinfo(skb)->frags[0].size -= grow;
>> -
>> -		if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
>> -			put_page(skb_shinfo(skb)->frags[0].page);
>> -			memmove(skb_shinfo(skb)->frags,
>> -				skb_shinfo(skb)->frags + 1,
>> -				--skb_shinfo(skb)->nr_frags);
>> -		}
>>  	}
>>
>>  ok:
>
>


^ permalink raw reply

* Re: [PATCH -mmotm 05/30] mm: sl[au]b: add knowledge of reserve pages
From: Neil Brown @ 2010-08-03  1:44 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Xiaotian Feng, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	riel-H+wXaHxf7aLQT0dZR+AlfA, cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, lwang-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <AANLkTilj5GrhbRJZfSsfXP1v9cQSRlARFmxpys1vUelr-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, 13 Jul 2010 23:33:14 +0300
Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org> wrote:

> Hi Xiaotian!
> 
> I would actually prefer that the SLAB, SLOB, and SLUB changes were in
> separate patches to make reviewing easier.
> 
> Looking at SLUB:
> 
> On Tue, Jul 13, 2010 at 1:17 PM, Xiaotian Feng <dfeng-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 7bb7940..7a5d6dc 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -27,6 +27,8 @@
> >  #include <linux/memory.h>
> >  #include <linux/math64.h>
> >  #include <linux/fault-inject.h>
> > +#include "internal.h"
> > +
> >
> >  /*
> >  * Lock order:
> > @@ -1139,7 +1141,8 @@ static void setup_object(struct kmem_cache *s, struct page *page,
> >                s->ctor(object);
> >  }
> >
> > -static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
> > +static
> > +struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node, int *reserve)
> >  {
> >        struct page *page;
> >        void *start;
> > @@ -1153,6 +1156,8 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
> >        if (!page)
> >                goto out;
> >
> > +       *reserve = page->reserve;
> > +
> >        inc_slabs_node(s, page_to_nid(page), page->objects);
> >        page->slab = s;
> >        page->flags |= 1 << PG_slab;
> > @@ -1606,10 +1611,20 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> >  {
> >        void **object;
> >        struct page *new;
> > +       int reserve;
> >
> >        /* We handle __GFP_ZERO in the caller */
> >        gfpflags &= ~__GFP_ZERO;
> >
> > +       if (unlikely(c->reserve)) {
> > +               /*
> > +                * If the current slab is a reserve slab and the current
> > +                * allocation context does not allow access to the reserves we
> > +                * must force an allocation to test the current levels.
> > +                */
> > +               if (!(gfp_to_alloc_flags(gfpflags) & ALLOC_NO_WATERMARKS))
> > +                       goto grow_slab;
> 
> OK, so assume that:
> 
>   (1) c->reserve is set to one
> 
>   (2) GFP flags don't allow dipping into the reserves
> 
>   (3) we've managed to free enough pages so normal
>        allocations are fine
> 
>   (4) the page from reserves is not yet empty
> 
> we will call flush_slab() and put the "emergency page" on partial list
> and clear c->reserve. This effectively means that now some other
> allocation can fetch the partial page and start to use it. Is this OK?
> Who makes sure the emergency reserves are large enough for the next
> out-of-memory condition where we swap over NFS?

Yes, this is OK.  The emergency reserves are maintained at a lower level -
within alloc_page.
The fact that (3) normal allocations are fine means that there are enough
free pages to satisfy any swap-out allocation - so any pages that were
previously allocated as 'emergency' pages can have their emergency status
forgotten (the emergency has passed).

This is a subtle but important aspect of the emergency reservation scheme in
swap-over-NFS.  It is the act-of-allocating that is emergency-or-not.  The
memory itself, once allocated, is not special.

c->reserve means "the last page allocated required an emergency allocation".
This means that parts of that page, or any other page, can only be given as
emergency allocations.  Once the slab succeeds at a non-emergency allocation,
the flag should obviously be cleared.

Similarly the page->reserve flag does not mean "this is a reserve page", but
simply "when this page was allocated, it was an emergency allocation".  The
flag is often soon lost as it is in a union with e.g. freelist.  But that
doesn't matter as it is only really meaningful at the moment of allocation.

I hope that clarifies the situation,

NeilBrown

> 
> > +       }
> >        if (!c->page)
> >                goto new_slab;
> >
> > @@ -1623,8 +1638,8 @@ load_freelist:
> >        object = c->page->freelist;
> >        if (unlikely(!object))
> >                goto another_slab;
> > -       if (unlikely(SLABDEBUG && PageSlubDebug(c->page)))
> > -               goto debug;
> > +       if (unlikely(SLABDEBUG && PageSlubDebug(c->page) || c->reserve))
> > +               goto slow_path;
> >
> >        c->freelist = get_freepointer(s, object);
> >        c->page->inuse = c->page->objects;
> > @@ -1646,16 +1661,18 @@ new_slab:
> >                goto load_freelist;
> >        }
> >
> > +grow_slab:
> >        if (gfpflags & __GFP_WAIT)
> >                local_irq_enable();
> >
> > -       new = new_slab(s, gfpflags, node);
> > +       new = new_slab(s, gfpflags, node, &reserve);
> >
> >        if (gfpflags & __GFP_WAIT)
> >                local_irq_disable();
> >
> >        if (new) {
> >                c = __this_cpu_ptr(s->cpu_slab);
> > +               c->reserve = reserve;
> >                stat(s, ALLOC_SLAB);
> >                if (c->page)
> >                        flush_slab(s, c);
> > @@ -1667,10 +1684,20 @@ new_slab:
> >        if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
> >                slab_out_of_memory(s, gfpflags, node);
> >        return NULL;
> > -debug:
> > -       if (!alloc_debug_processing(s, c->page, object, addr))
> > +
> > +slow_path:
> > +       if (!c->reserve && !alloc_debug_processing(s, c->page, object, addr))
> >                goto another_slab;
> >
> > +       /*
> > +        * Avoid the slub fast path in slab_alloc() by not setting
> > +        * c->freelist and the fast path in slab_free() by making
> > +        * node_match() fail by setting c->node to -1.
> > +        *
> > +        * We use this for for debug and reserve checks which need
> > +        * to be done for each allocation.
> > +        */
> > +
> >        c->page->inuse++;
> >        c->page->freelist = get_freepointer(s, object);
> >        c->node = -1;
> > @@ -2095,10 +2122,11 @@ static void early_kmem_cache_node_alloc(gfp_t gfpflags, int node)
> >        struct page *page;
> >        struct kmem_cache_node *n;
> >        unsigned long flags;
> > +       int reserve;
> >
> >        BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
> >
> > -       page = new_slab(kmalloc_caches, gfpflags, node);
> > +       page = new_slab(kmalloc_caches, gfpflags, node, &reserve);
> >
> >        BUG_ON(!page);
> >        if (page_to_nid(page) != node) {
> > --
> > 1.7.1.1
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-2.6 PATCH] e1000e: 82577/82578 PHY register access issues
From: Jeff Kirsher @ 2010-08-03  1:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, gospo, bphilips, stable, bruce.w.allan
In-Reply-To: <20100727.210642.137855605.davem@davemloft.net>

On Tue, Jul 27, 2010 at 21:06, David Miller <davem@davemloft.net> wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 27 Jul 2010 15:28:46 -0700
>
>> From: Bruce Allan <bruce.w.allan@intel.com>
>>
>> The MAC-PHY interconnect on 82577/82578 uses a power management feature
>> (called K1) which must be disabled when in 1Gbps due to a hardware issue on
>> these parts.  The #define bit setting used to enable/disable K1 is
>> incorrect and can cause PHY register accesses to stop working altogether
>> until the next device reset.  This patch sets the register correctly.
>>
>> This issue is present in kernels since 2.6.32.
>>
>> CC: stable@kernel.org
>> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
>> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>
> Applied, thanks.
> --

Dave,
Have you sync'd up your net-next-2.6 tree with net-2.6 tree?  Because
I do not see this change in the net-next-2.6 tree.

-- 
Cheers,
Jeff

^ permalink raw reply

* [net-next-2.6 PATCH] ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
From: Jeff Kirsher @ 2010-08-03  0:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Alexander Duyck, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change corrects an issue that resulted in a null pointer dereference
for the addition of VLAN 0 without any VLANs being registered.  Also this
code removes some unnecessary checks for defines and the unnecessary setting
of VLAN flags since that is now handled within the kernel via the
vlan_features.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbevf/ixgbevf_main.c |   12 +-----------
 1 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
index 4867440..3e291cc 100644
--- a/drivers/net/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ixgbevf/ixgbevf_main.c
@@ -1463,18 +1463,10 @@ static void ixgbevf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
-	struct net_device *v_netdev;
 
 	/* add VID to filter table */
 	if (hw->mac.ops.set_vfta)
 		hw->mac.ops.set_vfta(hw, vid, 0, true);
-	/*
-	 * Copy feature flags from netdev to the vlan netdev for this vid.
-	 * This allows things like TSO to bubble down to our vlan device.
-	 */
-	v_netdev = vlan_group_get_device(adapter->vlgrp, vid);
-	v_netdev->features |= adapter->netdev->features;
-	vlan_group_set_device(adapter->vlgrp, vid, v_netdev);
 }
 
 static void ixgbevf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
@@ -3402,7 +3394,6 @@ static int __devinit ixgbevf_probe(struct pci_dev *pdev,
 	/* setup the private structure */
 	err = ixgbevf_sw_init(adapter);
 
-#ifdef MAX_SKB_FRAGS
 	netdev->features = NETIF_F_SG |
 			   NETIF_F_IP_CSUM |
 			   NETIF_F_HW_VLAN_TX |
@@ -3416,13 +3407,12 @@ static int __devinit ixgbevf_probe(struct pci_dev *pdev,
 	netdev->vlan_features |= NETIF_F_TSO;
 	netdev->vlan_features |= NETIF_F_TSO6;
 	netdev->vlan_features |= NETIF_F_IP_CSUM;
+	netdev->vlan_features |= NETIF_F_IPV6_CSUM;
 	netdev->vlan_features |= NETIF_F_SG;
 
 	if (pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
-#endif /* MAX_SKB_FRAGS */
-
 	/* The HW MAC address was set and/or determined in sw_init */
 	memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);
 	memcpy(netdev->perm_addr, adapter->hw.mac.addr, netdev->addr_len);


^ permalink raw reply related

* [net-next-2.6 PATCH] igb: Use irq_synchronize per vector when using MSI-X
From: Jeff Kirsher @ 2010-08-03  0:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Jean Delvare, Emil Tantilov,
	Jeff Kirsher

From: Emil Tantilov <emil.s.tantilov@intel.com>

Synchronize all IRQs when using MSI-X. Similar to ixgbe.
Issue was reported on e1000e, but the patch is also valid for igb.

CC: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 667b527..df5dcd2 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1290,7 +1290,13 @@ static void igb_irq_disable(struct igb_adapter *adapter)
 	wr32(E1000_IAM, 0);
 	wr32(E1000_IMC, ~0);
 	wrfl();
-	synchronize_irq(adapter->pdev->irq);
+	if (adapter->msix_entries) {
+		int i;
+		for (i = 0; i < adapter->num_q_vectors; i++)
+			synchronize_irq(adapter->msix_entries[i].vector);
+	} else {
+		synchronize_irq(adapter->pdev->irq);
+	}
 }
 
 /**


^ permalink raw reply related

* Re: [net-next-2.6 PATCH 3/3] e1000e: update to workaround for jumbo frames on 82577
From: Jeff Kirsher @ 2010-08-03  0:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Bruce Allan, Jeff Kirsher
In-Reply-To: <20100803002748.4179.85660.stgit@localhost.localdomain>

On Mon, Aug 2, 2010 at 17:27, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> From: Bruce Allan <bruce.w.allan@intel.com>
>
> For OEM systems with this part that also has Spread Spectrum Clocking (SSC)
> enabled in the BIOS, there is an Rx performance issue with 4K jumbo frames.
> Leaving the defaults in PHY page 770 register 26 resolves the issue, and
> does not negatively impact jumbo frames on systems with SSC disabled.
>
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>
>  drivers/net/e1000e/netdev.c |    5 -----
>  1 files changed, 0 insertions(+), 5 deletions(-)
>

Please disregard this patch, it was sent out accidentally (my bad).
During testing issues were found and changes need to be made to this
patch.

-- 
Cheers,
Jeff

^ permalink raw reply

* [net-next-2.6 PATCH 3/3] e1000e: update to workaround for jumbo frames on 82577
From: Jeff Kirsher @ 2010-08-03  0:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Bruce Allan, Jeff Kirsher
In-Reply-To: <20100803002622.4179.31850.stgit@localhost.localdomain>

From: Bruce Allan <bruce.w.allan@intel.com>

For OEM systems with this part that also has Spread Spectrum Clocking (SSC)
enabled in the BIOS, there is an Rx performance issue with 4K jumbo frames.
Leaving the defaults in PHY page 770 register 26 resolves the issue, and
does not negatively impact jumbo frames on systems with SSC disabled.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/netdev.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 9e9164a..cc97b58 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2727,11 +2727,6 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 	if ((hw->phy.type == e1000_phy_82577) && (rctl & E1000_RCTL_LPE)) {
 		u16 phy_data;
 
-		e1e_rphy(hw, PHY_REG(770, 26), &phy_data);
-		phy_data &= 0xfff8;
-		phy_data |= (1 << 2);
-		e1e_wphy(hw, PHY_REG(770, 26), phy_data);
-
 		e1e_rphy(hw, 22, &phy_data);
 		phy_data &= 0x0fff;
 		phy_data |= (1 << 14);


^ permalink raw reply related

* [net-next-2.6 PATCH 2/3] e1000e: Fix irq_synchronize in MSI-X case
From: Jeff Kirsher @ 2010-08-03  0:27 UTC (permalink / raw)
  To: davem
  Cc: netdev, gospo, bphilips, Jean Delvare, Jesse Brandeburg,
	Jeff Kirsher, Bruce Allan
In-Reply-To: <20100803002622.4179.31850.stgit@localhost.localdomain>

Based on original patch/work from Jean Delvare <jdelvare@suse.de>
Synchronize all IRQs when in MSI-X IRQ mode.

Jean's original patch hard coded the sync with the 3 possible vectors,
this patch incorporates more flexibility for the future and aligns
with how igb stores the number of vectors into the adapter structure.

CC: Jean Delvare <jdelvare@suse.de>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
---

 drivers/net/e1000e/e1000.h  |    1 +
 drivers/net/e1000e/netdev.c |   26 ++++++++++++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 9ee133f..f9a31c8 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -348,6 +348,7 @@ struct e1000_adapter {
 	u32 test_icr;
 
 	u32 msg_enable;
+	unsigned int num_vectors;
 	struct msix_entry *msix_entries;
 	int int_mode;
 	u32 eiac_mask;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 464c9a2..9e9164a 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1785,25 +1785,25 @@ void e1000e_reset_interrupt_capability(struct e1000_adapter *adapter)
 void e1000e_set_interrupt_capability(struct e1000_adapter *adapter)
 {
 	int err;
-	int numvecs, i;
-
+	int i;
 
 	switch (adapter->int_mode) {
 	case E1000E_INT_MODE_MSIX:
 		if (adapter->flags & FLAG_HAS_MSIX) {
-			numvecs = 3; /* RxQ0, TxQ0 and other */
-			adapter->msix_entries = kcalloc(numvecs,
+			adapter->num_vectors = 3; /* RxQ0, TxQ0 and other */
+			adapter->msix_entries = kcalloc(adapter->num_vectors,
 						      sizeof(struct msix_entry),
 						      GFP_KERNEL);
 			if (adapter->msix_entries) {
-				for (i = 0; i < numvecs; i++)
+				for (i = 0; i < adapter->num_vectors; i++)
 					adapter->msix_entries[i].entry = i;
 
 				err = pci_enable_msix(adapter->pdev,
 						      adapter->msix_entries,
-						      numvecs);
-				if (err == 0)
+						      adapter->num_vectors);
+				if (err == 0) {
 					return;
+				}
 			}
 			/* MSI-X failed, so fall through and try MSI */
 			e_err("Failed to initialize MSI-X interrupts.  "
@@ -1825,6 +1825,9 @@ void e1000e_set_interrupt_capability(struct e1000_adapter *adapter)
 		/* Don't do anything; this is the system default */
 		break;
 	}
+
+	/* store the number of vectors being used */
+	adapter->num_vectors = 1;
 }
 
 /**
@@ -1946,7 +1949,14 @@ static void e1000_irq_disable(struct e1000_adapter *adapter)
 	if (adapter->msix_entries)
 		ew32(EIAC_82574, 0);
 	e1e_flush();
-	synchronize_irq(adapter->pdev->irq);
+
+	if (adapter->msix_entries) {
+		int i;
+		for (i = 0; i < adapter->num_vectors; i++)
+			synchronize_irq(adapter->msix_entries[i].vector);
+	} else {
+		synchronize_irq(adapter->pdev->irq);
+	}
 }
 
 /**


^ permalink raw reply related

* [net-next-2.6 PATCH 1/3] e1000e: register pm_qos request on hardware activation
From: Jeff Kirsher @ 2010-08-03  0:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Florian Mickler, Jeff Kirsher

From: Florian Mickler <florian@mickler.org>

The pm_qos_add_request call has to register the pm_qos request with the pm_qos
susbsystem before first use of the pm_qos request via
pm_qos_update_request.

As pm_qos changed to use plists there is no benefit in registering and
unregistering the pm_qos request on ifup/ifdown and thus we move the
registering into e1000_open and the unregistering in e1000_close.

This fixes the following warning:

[    1.786060] WARNING: at kernel/pm_qos_params.c:264
pm_qos_update_request+0x28/0x54()
[    1.786088] Hardware name: Latitude E6500
[    1.787045] pm_qos_update_request() called for unknown object
[    1.787966] Modules linked in:
[    1.788940] Pid: 1, comm: swapper Not tainted 2.6.35-rc5-mmotm0719 #1
[    1.790035] Call Trace:
[    1.791121]  [<ffffffff81037335>] warn_slowpath_common+0x80/0x98
[    1.792205]  [<ffffffff810373e1>] warn_slowpath_fmt+0x41/0x43
[    1.793279]  [<ffffffff81057c14>] pm_qos_update_request+0x28/0x54
[    1.794347]  [<ffffffff8134889e>] e1000_configure+0x421/0x459
[    1.795393]  [<ffffffff8134afbd>] e1000_open+0xbd/0x37c
[    1.796436]  [<ffffffff8105743a>] ? raw_notifier_call_chain+0xf/0x11
[    1.797491]  [<ffffffff8145f948>] __dev_open+0xae/0xe2
[    1.798547]  [<ffffffff8145f997>] dev_open+0x1b/0x49
[    1.799612]  [<ffffffff8146e36e>] netpoll_setup+0x84/0x259
[    1.800685]  [<ffffffff81b5037c>] init_netconsole+0xbc/0x21f
[    1.801744]  [<ffffffff81b5026c>] ? sir_wq_init+0x0/0x35
[    1.802793]  [<ffffffff81b502c0>] ? init_netconsole+0x0/0x21f
[    1.803845]  [<ffffffff810002ff>] do_one_initcall+0x7a/0x12f
[    1.804885]  [<ffffffff81b2ccae>] kernel_init+0x138/0x1c2
[    1.805915]  [<ffffffff81003554>] kernel_thread_helper+0x4/0x10
[    1.806937]  [<ffffffff81590e00>] ? restore_args+0x0/0x30
[    1.807955]  [<ffffffff81b2cb76>] ? kernel_init+0x0/0x1c2
[    1.808958]  [<ffffffff81003550>] ? kernel_thread_helper+0x0/0x10
[    1.809958] ---[ end trace 84b562a00a60539e ]---

Signed-off-by: Florian Mickler <florian@mickler.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/netdev.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index afd0129..464c9a2 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -3218,12 +3218,6 @@ int e1000e_up(struct e1000_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 
-	/* DMA latency requirement to workaround early-receive/jumbo issue */
-	if (adapter->flags & FLAG_HAS_ERT)
-		adapter->netdev->pm_qos_req =
-			pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
-				       PM_QOS_DEFAULT_VALUE);
-
 	/* hardware has been reset, we need to reload some things */
 	e1000_configure(adapter);
 
@@ -3287,12 +3281,6 @@ void e1000e_down(struct e1000_adapter *adapter)
 	e1000_clean_tx_ring(adapter);
 	e1000_clean_rx_ring(adapter);
 
-	if (adapter->flags & FLAG_HAS_ERT) {
-		pm_qos_remove_request(
-			      adapter->netdev->pm_qos_req);
-		adapter->netdev->pm_qos_req = NULL;
-	}
-
 	/*
 	 * TODO: for power management, we could drop the link and
 	 * pci_disable_device here.
@@ -3527,6 +3515,12 @@ static int e1000_open(struct net_device *netdev)
 	     E1000_MNG_DHCP_COOKIE_STATUS_VLAN))
 		e1000_update_mng_vlan(adapter);
 
+	/* DMA latency requirement to workaround early-receive/jumbo issue */
+	if (adapter->flags & FLAG_HAS_ERT)
+		adapter->netdev->pm_qos_req =
+		                    pm_qos_add_request(PM_QOS_CPU_DMA_LATENCY,
+		                                       PM_QOS_DEFAULT_VALUE);
+
 	/*
 	 * before we allocate an interrupt, we must be ready to handle it.
 	 * Setting DEBUG_SHIRQ in the kernel makes it fire an interrupt
@@ -3631,6 +3625,11 @@ static int e1000_close(struct net_device *netdev)
 	if (adapter->flags & FLAG_HAS_AMT)
 		e1000_release_hw_control(adapter);
 
+	if (adapter->flags & FLAG_HAS_ERT) {
+		pm_qos_remove_request(adapter->netdev->pm_qos_req);
+		adapter->netdev->pm_qos_req = NULL;
+	}
+
 	pm_runtime_put_sync(&pdev->dev);
 
 	return 0;


^ permalink raw reply related

* Re: [PATCH] net: Add getsockopt support for TCP thin-streams
From: David Miller @ 2010-08-03  0:25 UTC (permalink / raw)
  To: apetlund; +Cc: johunt, kuznet, jmorris, kaber, netdev, linux-kernel, juhlenko
In-Reply-To: <4C56B018.8030309@simula.no>

From: Andreas Petlund <apetlund@simula.no>
Date: Mon, 02 Aug 2010 13:46:32 +0200

> On 07/31/2010 01:49 AM, Josh Hunt wrote:
>> Initial TCP thin-stream commit did not add getsockopt support for the new
>> socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support
>> for them.
>> 
>> Signed-off-by: Josh Hunt <johunt@akamai.com>
 ...
> Tested-by: Andreas Petlund <apetlund@simula.no>
> Acked-by: Andreas Petlund <apetlund@simula.no>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
From: David Miller @ 2010-08-03  0:25 UTC (permalink / raw)
  To: bdschuym; +Cc: kaber, xiaosuo, netdev
In-Reply-To: <4C571A70.20103@pandora.be>

From: Bart De Schuymer <bdschuym@pandora.be>
Date: Mon, 02 Aug 2010 21:20:16 +0200

> Patrick McHardy schreef:
>> On 01.08.2010 01:25, Changli Gao wrote:
>>   
>>> 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 subtracts PPPOE_SES_HLEN from
>>> mtu at
>>> the front of ip_fragment(). So the later subtraction should be
>>> removed. The
>>> MTU of 802.1q is also 1500, so MTU should not be changed.
 ...
> Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo>
>>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Applied, thanks everyone.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox