Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
From: Eric Dumazet @ 2010-04-27 20:21 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, therbert, netdev, rick.jones2
In-Reply-To: <1272389872.2295.405.camel@edumazet-laptop>

Le mardi 27 avril 2010 à 19:37 +0200, Eric Dumazet a écrit :

> We might use the ticket spinlock paradigm to let writers go in parallel
> and let the user the socket lock
> 
> Instead of having the bh_lock_sock() to protect receive_queue *and*
> backlog, writers get a unique slot in a table, that 'user' can handle
> later.
> 
> Or serialize writers (before they try to bh_lock_sock()) with a
> dedicated lock, so that user has 50% chances to get the sock lock,
> contending with at most one writer.

Following patch fixes the issue for me, with little performance hit on
fast path.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on my dev machine.

Thanks !

[PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account

Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |   13 +++++++++++--
 net/core/sock.c    |    5 ++++-
 net/ipv4/udp.c     |    4 ++++
 net/ipv6/udp.c     |    8 ++++++++
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 86a8ca1..4b0097d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -255,7 +255,6 @@ struct sock {
 		struct sk_buff *head;
 		struct sk_buff *tail;
 		int len;
-		int limit;
 	} sk_backlog;
 	wait_queue_head_t	*sk_sleep;
 	struct dst_entry	*sk_dst_cache;
@@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+/*
+ * Take into account size of receive queue and backlog queue
+ */
+static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
+{
+	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
+
+	return qsize + skb->truesize > sk->sk_rcvbuf;
+}
+
 /* The per-socket spinlock must be held here. */
 static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
-	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
+	if (sk_rcvqueues_full(sk, skb))
 		return -ENOBUFS;
 
 	__sk_add_backlog(sk, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 58ebd14..5104175 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
 
 	skb->dev = NULL;
 
+	if (sk_rcvqueues_full(sk, skb)) {
+		atomic_inc(&sk->sk_drops);
+		goto discard_and_relse;
+	}
 	if (nested)
 		bh_lock_sock_nested(sk);
 	else
@@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_allocation	=	GFP_KERNEL;
 	sk->sk_rcvbuf		=	sysctl_rmem_default;
 	sk->sk_sndbuf		=	sysctl_wmem_default;
-	sk->sk_backlog.limit	=	sk->sk_rcvbuf << 1;
 	sk->sk_state		=	TCP_CLOSE;
 	sk_set_socket(sk, sock);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1e18f9c..776c844 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
+
+	if (sk_rcvqueues_full(sk, skb))
+		goto drop;
+
 	rc = 0;
 
 	bh_lock_sock(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2850e35..3ead20a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, unsigned int count,
 
 		sk = stack[i];
 		if (skb1) {
+			if (sk_rcvqueues_full(sk, skb)) {
+				kfree_skb(skb1);
+				goto drop;
+			}
 			bh_lock_sock(sk);
 			if (!sock_owned_by_user(sk))
 				udpv6_queue_rcv_skb(sk, skb1);
@@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	/* deliver */
 
+	if (sk_rcvqueues_full(sk, skb)) {
+		sock_put(sk);
+		goto discard;
+	}
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		udpv6_queue_rcv_skb(sk, skb);



^ permalink raw reply related

* Re: [net-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: David Miller @ 2010-04-27 20:32 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <u2n9929d2391004271305y9732d81r8b611fbb9f9e8b9a@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 13:05:25 -0700

> As far as sending the the three patches un-acceptable patches this
> late in the -rc series, that was poor judgement on my part, sorry.

No worries, I just pushed out the net-next-2.6 patches you sent to me
so please respin these patches originally targetted to net-2.6 so
I can apply them to net-next-2.6

Thanks.

^ permalink raw reply

* Re: [PATCH v2] net: reimplement softnet_data.output_queue as a FIFO queue
From: Eric Dumazet @ 2010-04-27 20:36 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1272359184-2929-1-git-send-email-xiaosuo@gmail.com>

Le mardi 27 avril 2010 à 17:06 +0800, Changli Gao a écrit :
> reimplement softnet_data.output_queue as a FIFO queue.
> 
> reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
> the qdiscs rescheduled.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

> ----
>  include/linux/netdevice.h |    1 +
>  net/core/dev.c            |   22 ++++++++++++----------
>  2 files changed, 13 insertions(+), 10 deletions(-)
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3c5ed5f..c04ca24 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1385,6 +1385,7 @@ static inline int unregister_gifconf(unsigned int family)
>   */
>  struct softnet_data {
>  	struct Qdisc		*output_queue;
> +	struct Qdisc		**output_queue_tailp;
>  	struct list_head	poll_list;
>  	struct sk_buff		*completion_queue;
>  
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4d43f1a..3d31491 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1557,8 +1557,9 @@ static inline void __netif_reschedule(struct Qdisc *q)
>  
>  	local_irq_save(flags);
>  	sd = &__get_cpu_var(softnet_data);
> -	q->next_sched = sd->output_queue;
> -	sd->output_queue = q;
> +	q->next_sched = NULL;
> +	*sd->output_queue_tailp = q;
> +	sd->output_queue_tailp = &q->next_sched;
>  	raise_softirq_irqoff(NET_TX_SOFTIRQ);
>  	local_irq_restore(flags);
>  }
> @@ -2529,6 +2530,7 @@ static void net_tx_action(struct softirq_action *h)
>  		local_irq_disable();
>  		head = sd->output_queue;
>  		sd->output_queue = NULL;
> +		sd->output_queue_tailp = &sd->output_queue;
>  		local_irq_enable();
>  
>  		while (head) {
> @@ -5594,7 +5596,6 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>  			    void *ocpu)
>  {
>  	struct sk_buff **list_skb;
> -	struct Qdisc **list_net;
>  	struct sk_buff *skb;
>  	unsigned int cpu, oldcpu = (unsigned long)ocpu;
>  	struct softnet_data *sd, *oldsd;
> @@ -5615,13 +5616,13 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>  	*list_skb = oldsd->completion_queue;
>  	oldsd->completion_queue = NULL;
>  
> -	/* Find end of our output_queue. */
> -	list_net = &sd->output_queue;
> -	while (*list_net)
> -		list_net = &(*list_net)->next_sched;
>  	/* Append output queue from offline CPU. */
> -	*list_net = oldsd->output_queue;
> -	oldsd->output_queue = NULL;
> +	if (oldsd->output_queue) {
> +		*sd->output_queue_tailp = oldsd->output_queue;
> +		sd->output_queue_tailp = oldsd->output_queue_tailp;
> +		oldsd->output_queue = NULL;
> +		oldsd->output_queue_tailp = &oldsd->output_queue;
> +	}
>  
>  	raise_softirq_irqoff(NET_TX_SOFTIRQ);
>  	local_irq_enable();
> @@ -5851,7 +5852,8 @@ static int __init net_dev_init(void)
>  		skb_queue_head_init(&sd->input_pkt_queue);
>  		sd->completion_queue = NULL;
>  		INIT_LIST_HEAD(&sd->poll_list);
> -
> +		sd->output_queue = NULL;
> +		sd->output_queue_tailp = &sd->output_queue;
>  #ifdef CONFIG_RPS
>  		sd->csd.func = rps_trigger_softirq;
>  		sd->csd.info = sd;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add ndo_set_vf_port_profile
From: Scott Feldman @ 2010-04-27 20:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Rose, Gregory V, David Miller, netdev@vger.kernel.org,
	chrisw@redhat.com, Williams, Mitch A
In-Reply-To: <201004271435.25480.arnd@arndb.de>

On 4/27/10 5:35 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Tuesday 27 April 2010, Scott Feldman wrote:
>>> Yes, I believe that's there today:
>>> 
>>>     NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
>>> 
>>> The number of VFs is returned in RTM_GETLINK.  But, it's only returned if:
>>> 
>>>     if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent)
>>> 
>>> For my proposal, I'll need to return IFLA_NUM_VF unconditionally so callers
>>> can get num VFs.
>> 
>> Hmmm...seems IFLA_NUM_VF assumes a PCI device supporting SR-IOV when it uses
>> dev_num_vf().  I think a better option would have been to query the device
>> for the number of VFs, without assuming SR-IOV or even PCI.
>> 
>> I see a ndo_get_num_vf() coming...
> 
> Shouldn't the number of registered port profiles be totally independent of
> the number of virtual functions?
> 
> Any of the VFs could multiplex multiple guests using macvlan, which means you
> need to register each guest separately, not each VF.
> 
> Anything that ties port profiles to VFs seems fundamentally flawed AFAICT,
> at least when we want to extend this to adapters that don't do it in firmware.

Ya, I tend I agree.  Let's just make port-profile a setting of any netdev,
an eth, macvtap, eth.x, bond, etc.  That's probably what I should have done
in the first place.  Something like:

       ip link set DEVICE [ { up | down } ]
                          [ arp { on | off } ]
                            <...clip...>
                          [ alias NAME ]
                          [ vf NUM [ mac LLADDR ]
                                   [ vlan VLANID [ qos VLAN-QOS ] ]
                                   [ rate TXRATE ] ]
                          [ port_profile [ PORT-PROFILE
                                   [ mac LLADDR ]
                                   [ host_uuid HOST_UUID ]
                                   [ client_uuid CLIENT_UUID ]
                                   [ client_name CLIENT_NAME ] ] ] ]
       ip link show [ DEVICE ]

I think I was trying to be too accommodating for models with VFs, but it
doesn't matter like you point out.

This way, I can get the RTM_GETLINK to return the port-profile in use.

New patches coming soon...

-scott


^ permalink raw reply

* Re: [patch] ipheth: potential null dereferences on error path
From: L. Alberto Giménez @ 2010-04-27 21:00 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Diego Giagio, David S. Miller, netdev, kernel-janitors
In-Reply-To: <20100427092012.GA29093@bicker>

On Tue, Apr 27, 2010 at 11:20:12AM +0200, Dan Carpenter wrote:
> The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
> parameter list but those could be NULL.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Seems good to me (should I ack it or any other kind of singoff?).

-- 
L. Alberto Giménez
JabberID agimenez@jabber.sysvalve.es
GnuPG key ID 0x3BAABDE1

^ permalink raw reply

* powerpc gianfar driver does not work well when PREEMPT/PREEMPT_RT is enabled
From: Xianghua Xiao @ 2010-04-27 21:01 UTC (permalink / raw)
  To: netdev, linux-rt-users

I posted this to linuxppc list originally and hope someone here with
NAPI/COALESCE/RT experience can comment on...
-----------------------------
I'm trying to get 834x/TSEC gianfar.c working with 2.6.33/RT.

when PREEMPT is disabled gianfar driver worked well.

if PREEMPT is enabled, especially when PREEMPT_RT is enabled,
network(gianfar) will be disconnected in about 2-3 minutes under
iperf, if NFS  is used then the whole system will hang after a while
when NFS is accessed.

In an older version (2.6.18-rt) where NAPI is disabled, gianfar
performed well under PREEMPT_RT, in the new version of gianfar, NAPI
is enforced(the code is there by default and it's hard to disable NAPI
in the code now), also TX COALESCE is enabled while RX COALESCE is
disabled. It seems to me NAPI is now by default for Rx and COALESCE is
by default for Tx.

Both NAPI/COALESCE may have negative effects for real time systems,
where latency is more important than throughput. Unfortunately it's
hard to disable either of them now after some experiments.

Is there anyone here using gianfar with PREEMPT_RT? Do I have to port
an older version gianfar to get rid of NAPI at least?

Thanks,
Xianghua

^ permalink raw reply

* [PATCH 2/3] bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
From: Michael Chan @ 2010-04-27 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-1-git-send-email-mchan@broadcom.com>

The bonding driver calls ndo_vlan_rx_register() while holding bond->lock.
The bnx2 driver calls bnx2_netif_stop() to stop the rx handling while
changing the vlgrp.  The call also stops the cnic driver which sleeps
while the bond->lock is held and cause the warning.

This code path only needs to stop the NAPI rx handling while we are
changing the vlgrp.  Since no reset is going to occur, there is no need
to stop cnic in this case.  By adding a parameter to bnx2_netif_stop()
to skip stopping cnic, we can avoid the warning.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   38 ++++++++++++++++++++------------------
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4c1e51e..35eec2d 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -651,9 +651,10 @@ bnx2_napi_enable(struct bnx2 *bp)
 }
 
 static void
-bnx2_netif_stop(struct bnx2 *bp)
+bnx2_netif_stop(struct bnx2 *bp, bool stop_cnic)
 {
-	bnx2_cnic_stop(bp);
+	if (stop_cnic)
+		bnx2_cnic_stop(bp);
 	if (netif_running(bp->dev)) {
 		int i;
 
@@ -671,14 +672,15 @@ bnx2_netif_stop(struct bnx2 *bp)
 }
 
 static void
-bnx2_netif_start(struct bnx2 *bp)
+bnx2_netif_start(struct bnx2 *bp, bool start_cnic)
 {
 	if (atomic_dec_and_test(&bp->intr_sem)) {
 		if (netif_running(bp->dev)) {
 			netif_tx_wake_all_queues(bp->dev);
 			bnx2_napi_enable(bp);
 			bnx2_enable_int(bp);
-			bnx2_cnic_start(bp);
+			if (start_cnic)
+				bnx2_cnic_start(bp);
 		}
 	}
 }
@@ -6277,12 +6279,12 @@ bnx2_reset_task(struct work_struct *work)
 		return;
 	}
 
-	bnx2_netif_stop(bp);
+	bnx2_netif_stop(bp, true);
 
 	bnx2_init_nic(bp, 1);
 
 	atomic_set(&bp->intr_sem, 1);
-	bnx2_netif_start(bp);
+	bnx2_netif_start(bp, true);
 	rtnl_unlock();
 }
 
@@ -6324,7 +6326,7 @@ bnx2_vlan_rx_register(struct net_device *dev, struct vlan_group *vlgrp)
 	struct bnx2 *bp = netdev_priv(dev);
 
 	if (netif_running(dev))
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, false);
 
 	bp->vlgrp = vlgrp;
 
@@ -6335,7 +6337,7 @@ bnx2_vlan_rx_register(struct net_device *dev, struct vlan_group *vlgrp)
 	if (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN)
 		bnx2_fw_sync(bp, BNX2_DRV_MSG_CODE_KEEP_VLAN_UPDATE, 0, 1);
 
-	bnx2_netif_start(bp);
+	bnx2_netif_start(bp, false);
 }
 #endif
 
@@ -7055,9 +7057,9 @@ bnx2_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal)
 	bp->stats_ticks &= BNX2_HC_STATS_TICKS_HC_STAT_TICKS;
 
 	if (netif_running(bp->dev)) {
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		bnx2_init_nic(bp, 0);
-		bnx2_netif_start(bp);
+		bnx2_netif_start(bp, true);
 	}
 
 	return 0;
@@ -7087,7 +7089,7 @@ bnx2_change_ring_size(struct bnx2 *bp, u32 rx, u32 tx)
 		/* Reset will erase chipset stats; save them */
 		bnx2_save_stats(bp);
 
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET);
 		bnx2_free_skbs(bp);
 		bnx2_free_mem(bp);
@@ -7115,7 +7117,7 @@ bnx2_change_ring_size(struct bnx2 *bp, u32 rx, u32 tx)
 			bnx2_setup_cnic_irq_info(bp);
 		mutex_unlock(&bp->cnic_lock);
 #endif
-		bnx2_netif_start(bp);
+		bnx2_netif_start(bp, true);
 	}
 	return 0;
 }
@@ -7368,7 +7370,7 @@ bnx2_self_test(struct net_device *dev, struct ethtool_test *etest, u64 *buf)
 	if (etest->flags & ETH_TEST_FL_OFFLINE) {
 		int i;
 
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_DIAG);
 		bnx2_free_skbs(bp);
 
@@ -7387,7 +7389,7 @@ bnx2_self_test(struct net_device *dev, struct ethtool_test *etest, u64 *buf)
 			bnx2_shutdown_chip(bp);
 		else {
 			bnx2_init_nic(bp, 1);
-			bnx2_netif_start(bp);
+			bnx2_netif_start(bp, true);
 		}
 
 		/* wait for link up */
@@ -8381,7 +8383,7 @@ bnx2_suspend(struct pci_dev *pdev, pm_message_t state)
 		return 0;
 
 	flush_scheduled_work();
-	bnx2_netif_stop(bp);
+	bnx2_netif_stop(bp, true);
 	netif_device_detach(dev);
 	del_timer_sync(&bp->timer);
 	bnx2_shutdown_chip(bp);
@@ -8403,7 +8405,7 @@ bnx2_resume(struct pci_dev *pdev)
 	bnx2_set_power_state(bp, PCI_D0);
 	netif_device_attach(dev);
 	bnx2_init_nic(bp, 1);
-	bnx2_netif_start(bp);
+	bnx2_netif_start(bp, true);
 	return 0;
 }
 
@@ -8430,7 +8432,7 @@ static pci_ers_result_t bnx2_io_error_detected(struct pci_dev *pdev,
 	}
 
 	if (netif_running(dev)) {
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		del_timer_sync(&bp->timer);
 		bnx2_reset_nic(bp, BNX2_DRV_MSG_CODE_RESET);
 	}
@@ -8487,7 +8489,7 @@ static void bnx2_io_resume(struct pci_dev *pdev)
 
 	rtnl_lock();
 	if (netif_running(dev))
-		bnx2_netif_start(bp);
+		bnx2_netif_start(bp, true);
 
 	netif_device_attach(dev);
 	rtnl_unlock();
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH 3/3] bnx2: Update version to 2.0.9.
From: Michael Chan @ 2010-04-27 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-2-git-send-email-mchan@broadcom.com>

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 35eec2d..ac90a38 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
 #include "bnx2_fw.h"
 
 #define DRV_MODULE_NAME		"bnx2"
-#define DRV_MODULE_VERSION	"2.0.8"
-#define DRV_MODULE_RELDATE	"Feb 15, 2010"
+#define DRV_MODULE_VERSION	"2.0.9"
+#define DRV_MODULE_RELDATE	"April 27, 2010"
 #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-5.0.0.j6.fw"
 #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
 #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-5.0.0.j9.fw"
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH 1/3] bnx2: Fix lost MSI-X problem on 5709 NICs.
From: Michael Chan @ 2010-04-27 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, jfeeney

It has been reported that under certain heavy traffic conditions in MSI-X
mode, the driver can lose an MSI-X vector causing all packets in the
associated rx/tx ring pair to be dropped.  The problem is caused by
the chip dropping the write to unmask the MSI-X vector by the kernel
(when migrating the IRQ for example).

This can be prevented by increasing the GRC timeout value for these
register read and write operations.

Thanks to Dell for helping us debug this problem.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a257bab..4c1e51e 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -4759,8 +4759,12 @@ bnx2_reset_chip(struct bnx2 *bp, u32 reset_code)
 		rc = bnx2_alloc_bad_rbuf(bp);
 	}
 
-	if (bp->flags & BNX2_FLAG_USING_MSIX)
+	if (bp->flags & BNX2_FLAG_USING_MSIX) {
 		bnx2_setup_msix_tbl(bp);
+		/* Prevent MSIX table reads and write from timing out */
+		REG_WR(bp, BNX2_MISC_ECO_HW_CTL,
+			BNX2_MISC_ECO_HW_CTL_LARGE_GRC_TMOUT_EN);
+	}
 
 	return rc;
 }
-- 
1.6.4.GIT



^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: Jeff Kirsher @ 2010-04-27 21:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Don Skidmore, Jeff Kirsher

From: Don Skidmore <donald.c.skidmore@intel.com>

The way we were setting autoneg via ethtool was inconstant with that
of our other drivers.  It will change the following:

If autoneg is off:
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  off
RX:             off
TX:             off

Before:
>ethtool -A eth0 autoneg on
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  off
RX:             off
TX:             off

Now:
>ethtool -A eth0 autoneg on
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  on
RX:             on
TX:             on

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 5f8c6ab..dfbfe35 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -365,7 +365,7 @@ static int ixgbe_set_pauseparam(struct net_device *netdev,
 	else
 		fc.disable_fc_autoneg = false;
 
-	if (pause->rx_pause && pause->tx_pause)
+	if ((pause->rx_pause && pause->tx_pause) || pause->autoneg)
 		fc.requested_mode = ixgbe_fc_full;
 	else if (pause->rx_pause && !pause->tx_pause)
 		fc.requested_mode = ixgbe_fc_rx_pause;


^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbe: Properly display 1 gig downshift warning for backplane
From: Jeff Kirsher @ 2010-04-27 21:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Anjali Singhai, Jeff Kirsher

From: Anjali Singhai <anjali.singhai@intel.com>

Description: When using Intel smartspeed, the patch displays a
warning when the link down shifts to 1 Gig.

Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_82599.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index d189ba7..38c3840 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -642,6 +642,7 @@ static s32 ixgbe_setup_mac_link_smartspeed(struct ixgbe_hw *hw,
 	s32 i, j;
 	bool link_up = false;
 	u32 autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC);
+	struct ixgbe_adapter *adapter = hw->back;
 
 	hw_dbg(hw, "ixgbe_setup_mac_link_smartspeed.\n");
 
@@ -726,6 +727,10 @@ static s32 ixgbe_setup_mac_link_smartspeed(struct ixgbe_hw *hw,
 					    autoneg_wait_to_complete);
 
 out:
+	if (link_up && (link_speed == IXGBE_LINK_SPEED_1GB_FULL))
+		netif_info(adapter, hw, adapter->netdev, "Smartspeed has"
+			" downgraded the link speed from the maximum"
+			" advertised\n");
 	return status;
 }
 


^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbevf: Fix link speed display
From: Jeff Kirsher @ 2010-04-27 21:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Greg Rose, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

The ixgbevf driver would always report 10Gig speeds even when the link
speed is downshifted to 1Gig.  This patch fixes that problem.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbevf/defines.h |   12 +++++++-----
 drivers/net/ixgbevf/vf.c      |    3 ++-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ixgbevf/defines.h b/drivers/net/ixgbevf/defines.h
index c44fdb0..ca2c81f 100644
--- a/drivers/net/ixgbevf/defines.h
+++ b/drivers/net/ixgbevf/defines.h
@@ -41,11 +41,13 @@ typedef u32 ixgbe_link_speed;
 #define IXGBE_LINK_SPEED_1GB_FULL       0x0020
 #define IXGBE_LINK_SPEED_10GB_FULL      0x0080
 
-#define IXGBE_CTRL_RST          0x04000000 /* Reset (SW) */
-#define IXGBE_RXDCTL_ENABLE     0x02000000 /* Enable specific Rx Queue */
-#define IXGBE_TXDCTL_ENABLE     0x02000000 /* Enable specific Tx Queue */
-#define IXGBE_LINKS_UP          0x40000000
-#define IXGBE_LINKS_SPEED       0x20000000
+#define IXGBE_CTRL_RST              0x04000000 /* Reset (SW) */
+#define IXGBE_RXDCTL_ENABLE         0x02000000 /* Enable specific Rx Queue */
+#define IXGBE_TXDCTL_ENABLE         0x02000000 /* Enable specific Tx Queue */
+#define IXGBE_LINKS_UP              0x40000000
+#define IXGBE_LINKS_SPEED_82599     0x30000000
+#define IXGBE_LINKS_SPEED_10G_82599 0x30000000
+#define IXGBE_LINKS_SPEED_1G_82599  0x20000000
 
 /* Number of Transmit and Receive Descriptors must be a multiple of 8 */
 #define IXGBE_REQ_TX_DESCRIPTOR_MULTIPLE  8
diff --git a/drivers/net/ixgbevf/vf.c b/drivers/net/ixgbevf/vf.c
index 852e9c4..f6f9299 100644
--- a/drivers/net/ixgbevf/vf.c
+++ b/drivers/net/ixgbevf/vf.c
@@ -359,7 +359,8 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
 	else
 		*link_up = false;
 
-	if (links_reg & IXGBE_LINKS_SPEED)
+	if ((links_reg & IXGBE_LINKS_SPEED_82599) ==
+	    IXGBE_LINKS_SPEED_10G_82599)
 		*speed = IXGBE_LINK_SPEED_10GB_FULL;
 	else
 		*speed = IXGBE_LINK_SPEED_1GB_FULL;


^ permalink raw reply related

* Re: [PATCH v2] net: reimplement softnet_data.output_queue as a FIFO queue
From: David Miller @ 2010-04-27 21:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev
In-Reply-To: <1272400618.2343.13.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 27 Apr 2010 22:36:58 +0200

> Le mardi 27 avril 2010 à 17:06 +0800, Changli Gao a écrit :
>> reimplement softnet_data.output_queue as a FIFO queue.
>> 
>> reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
>> the qdiscs rescheduled.
>> 
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [patch] ipheth: potential null dereferences on error path
From: David Miller @ 2010-04-27 21:33 UTC (permalink / raw)
  To: agimenez; +Cc: error27, diego, netdev, kernel-janitors
In-Reply-To: <20100427210003.GA13873@bart.evergreen.loc>

From: L. Alberto Giménez <agimenez@sysvalve.es>
Date: Tue, 27 Apr 2010 23:00:03 +0200

> On Tue, Apr 27, 2010 at 11:20:12AM +0200, Dan Carpenter wrote:
>> The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
>> parameter list but those could be NULL.
>> 
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> Seems good to me (should I ack it or any other kind of singoff?).

If you give it an "Acked-by: ..." that would be nice.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: David Miller @ 2010-04-27 21:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <20100427213002.25913.93796.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 14:31:06 -0700

> From: Don Skidmore <donald.c.skidmore@intel.com>
> 
> The way we were setting autoneg via ethtool was inconstant with that
> of our other drivers.  It will change the following:
...
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: Properly display 1 gig downshift warning for backplane
From: David Miller @ 2010-04-27 21:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, anjali.singhai
In-Reply-To: <20100427213124.25913.82475.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 14:31:25 -0700

> From: Anjali Singhai <anjali.singhai@intel.com>
> 
> Description: When using Intel smartspeed, the patch displays a
> warning when the link down shifts to 1 Gig.
> 
> Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbevf: Fix link speed display
From: David Miller @ 2010-04-27 21:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, gregory.v.rose
In-Reply-To: <20100427213143.25913.83381.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 14:31:45 -0700

> From: Greg Rose <gregory.v.rose@intel.com>
> 
> The ixgbevf driver would always report 10Gig speeds even when the link
> speed is downshifted to 1Gig.  This patch fixes that problem.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/3] bnx2: Fix lost MSI-X problem on 5709 NICs.
From: David Miller @ 2010-04-27 21:38 UTC (permalink / raw)
  To: mchan; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 27 Apr 2010 14:28:09 -0700

> It has been reported that under certain heavy traffic conditions in MSI-X
> mode, the driver can lose an MSI-X vector causing all packets in the
> associated rx/tx ring pair to be dropped.  The problem is caused by
> the chip dropping the write to unmask the MSI-X vector by the kernel
> (when migrating the IRQ for example).
> 
> This can be prevented by increasing the GRC timeout value for these
> register read and write operations.
> 
> Thanks to Dell for helping us debug this problem.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied to net-2.6

^ permalink raw reply

* Re: [PATCH 2/3] bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
From: David Miller @ 2010-04-27 21:38 UTC (permalink / raw)
  To: mchan; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-2-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 27 Apr 2010 14:28:10 -0700

> The bonding driver calls ndo_vlan_rx_register() while holding bond->lock.
> The bnx2 driver calls bnx2_netif_stop() to stop the rx handling while
> changing the vlgrp.  The call also stops the cnic driver which sleeps
> while the bond->lock is held and cause the warning.
> 
> This code path only needs to stop the NAPI rx handling while we are
> changing the vlgrp.  Since no reset is going to occur, there is no need
> to stop cnic in this case.  By adding a parameter to bnx2_netif_stop()
> to skip stopping cnic, we can avoid the warning.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied to net-2.6

^ permalink raw reply

* Re: [PATCH 3/3] bnx2: Update version to 2.0.9.
From: David Miller @ 2010-04-27 21:38 UTC (permalink / raw)
  To: mchan; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-3-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 27 Apr 2010 14:28:11 -0700

> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied to net-2.6

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
From: David Miller @ 2010-04-27 21:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bmb, therbert, netdev, rick.jones2
In-Reply-To: <1272399662.2343.12.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 27 Apr 2010 22:21:02 +0200

> [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
> 
> Current socket backlog limit is not enough to really stop DDOS attacks,
> because user thread spend many time to process a full backlog each
> round, and user might crazy spin on socket lock.
> 
> We should add backlog size and receive_queue size (aka rmem_alloc) to
> pace writers, and let user run without being slow down too much.
> 
> Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
> stress situations.
> 
> Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
> receiver can now process ~200.000 pps (instead of ~100 pps before the
> patch) on a 8 core machine.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

This looks great, applied, thanks Eric!

^ permalink raw reply

* Re: [patch] ipheth: potential null dereferences on error path
From: L. Alberto Giménez @ 2010-04-27 21:43 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Diego Giagio, David S. Miller, netdev, kernel-janitors
In-Reply-To: <20100427092012.GA29093@bicker>

On Tue, Apr 27, 2010 at 11:20:12AM +0200, Dan Carpenter wrote:
> The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
> parameter list but those could be NULL.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Acked-by: L. Alberto Giménez <agimenez@sysvalve.es>

-- 
L. Alberto Giménez
JabberID agimenez@jabber.sysvalve.es
GnuPG key ID 0x3BAABDE1

^ permalink raw reply

* Re: [PATCH kernel 2.6.34-rc5] smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()
From: David Miller @ 2010-04-27 21:47 UTC (permalink / raw)
  To: ken_kawasaki; +Cc: netdev
In-Reply-To: <20100425053709.ec182f63.ken_kawasaki@spring.nifty.jp>

From: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Date: Sun, 25 Apr 2010 05:37:09 +0900

> 
> smc91c92_cs:
>   * spin_unlock_irqrestore before calling smc_interrupt() in media_check()
>      to avoid lockup.
>   * use spin_lock_irqsave for ethtool function.
> 
> Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

Applied, thank you.

^ permalink raw reply

* Re: [patch] ipheth: potential null dereferences on error path
From: David Miller @ 2010-04-27 21:49 UTC (permalink / raw)
  To: agimenez; +Cc: error27, diego, netdev, kernel-janitors
In-Reply-To: <20100427214347.GA2376@bart.evergreen.loc>

From: L. Alberto Giménez <agimenez@sysvalve.es>
Date: Tue, 27 Apr 2010 23:43:47 +0200

> On Tue, Apr 27, 2010 at 11:20:12AM +0200, Dan Carpenter wrote:
>> The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
>> parameter list but those could be NULL.
>> 
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> Acked-by: L. Alberto Giménez <agimenez@sysvalve.es>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: David Miller @ 2010-04-27 21:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <1272271271.2346.16.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 26 Apr 2010 10:41:11 +0200

> Le lundi 19 avril 2010 à 14:19 -0700, David Miller a écrit :
> 
>> 
>> I was thinking also about how we could compute rxhash in the
>> loopback driver :-)
> 
> This would be easy if rxhash was not a "struct inet_sock" field but a
> "struct sock" one
> 
> sock_alloc_send_pskb() (or skb_set_owner_w())
> 
> skb->rxhash = sk->rxhash;

Agreed.  I'll commit the following to net-next-2.6 after some build
testing.

net: Make RFS socket operations not be inet specific.

Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index c1d4295..1653de5 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -102,7 +102,6 @@ struct rtable;
  * @uc_ttl - Unicast TTL
  * @inet_sport - Source port
  * @inet_id - ID counter for DF pkts
- * @rxhash - flow hash received from netif layer
  * @tos - TOS
  * @mc_ttl - Multicasting TTL
  * @is_icsk - is this an inet_connection_sock?
@@ -126,9 +125,6 @@ struct inet_sock {
 	__u16			cmsg_flags;
 	__be16			inet_sport;
 	__u16			inet_id;
-#ifdef CONFIG_RPS
-	__u32			rxhash;
-#endif
 
 	struct ip_options	*opt;
 	__u8			tos;
@@ -224,37 +220,4 @@ static inline __u8 inet_sk_flowi_flags(const struct sock *sk)
 	return inet_sk(sk)->transparent ? FLOWI_FLAG_ANYSRC : 0;
 }
 
-static inline void inet_rps_record_flow(const struct sock *sk)
-{
-#ifdef CONFIG_RPS
-	struct rps_sock_flow_table *sock_flow_table;
-
-	rcu_read_lock();
-	sock_flow_table = rcu_dereference(rps_sock_flow_table);
-	rps_record_sock_flow(sock_flow_table, inet_sk(sk)->rxhash);
-	rcu_read_unlock();
-#endif
-}
-
-static inline void inet_rps_reset_flow(const struct sock *sk)
-{
-#ifdef CONFIG_RPS
-	struct rps_sock_flow_table *sock_flow_table;
-
-	rcu_read_lock();
-	sock_flow_table = rcu_dereference(rps_sock_flow_table);
-	rps_reset_sock_flow(sock_flow_table, inet_sk(sk)->rxhash);
-	rcu_read_unlock();
-#endif
-}
-
-static inline void inet_rps_save_rxhash(struct sock *sk, u32 rxhash)
-{
-#ifdef CONFIG_RPS
-	if (unlikely(inet_sk(sk)->rxhash != rxhash)) {
-		inet_rps_reset_flow(sk);
-		inet_sk(sk)->rxhash = rxhash;
-	}
-#endif
-}
 #endif	/* _INET_SOCK_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index ef2f875..cf12b1e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -198,6 +198,7 @@ struct sock_common {
   *	@sk_rcvlowat: %SO_RCVLOWAT setting
   *	@sk_rcvtimeo: %SO_RCVTIMEO setting
   *	@sk_sndtimeo: %SO_SNDTIMEO setting
+  *	@sk_rxhash: flow hash received from netif layer
   *	@sk_filter: socket filtering instructions
   *	@sk_protinfo: private area, net family specific, when not using slab
   *	@sk_timer: sock cleanup timer
@@ -278,6 +279,9 @@ struct sock {
 	int			sk_gso_type;
 	unsigned int		sk_gso_max_size;
 	int			sk_rcvlowat;
+#ifdef CONFIG_RPS
+	__u32			sk_rxhash;
+#endif
 	unsigned long 		sk_flags;
 	unsigned long	        sk_lingertime;
 	struct sk_buff_head	sk_error_queue;
@@ -629,6 +633,40 @@ static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 	return sk->sk_backlog_rcv(sk, skb);
 }
 
+static inline void sock_rps_record_flow(const struct sock *sk)
+{
+#ifdef CONFIG_RPS
+	struct rps_sock_flow_table *sock_flow_table;
+
+	rcu_read_lock();
+	sock_flow_table = rcu_dereference(rps_sock_flow_table);
+	rps_record_sock_flow(sock_flow_table, sk->sk_rxhash);
+	rcu_read_unlock();
+#endif
+}
+
+static inline void sock_rps_reset_flow(const struct sock *sk)
+{
+#ifdef CONFIG_RPS
+	struct rps_sock_flow_table *sock_flow_table;
+
+	rcu_read_lock();
+	sock_flow_table = rcu_dereference(rps_sock_flow_table);
+	rps_reset_sock_flow(sock_flow_table, sk->sk_rxhash);
+	rcu_read_unlock();
+#endif
+}
+
+static inline void sock_rps_save_rxhash(struct sock *sk, u32 rxhash)
+{
+#ifdef CONFIG_RPS
+	if (unlikely(sk->sk_rxhash != rxhash)) {
+		sock_rps_reset_flow(sk);
+		sk->sk_rxhash = rxhash;
+	}
+#endif
+}
+
 #define sk_wait_event(__sk, __timeo, __condition)			\
 	({	int __rc;						\
 		release_sock(__sk);					\
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9f52880..c6c43bc 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -419,7 +419,7 @@ int inet_release(struct socket *sock)
 	if (sk) {
 		long timeout;
 
-		inet_rps_reset_flow(sk);
+		sock_rps_reset_flow(sk);
 
 		/* Applications forget to leave groups before exiting */
 		ip_mc_drop_socket(sk);
@@ -722,7 +722,7 @@ int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 {
 	struct sock *sk = sock->sk;
 
-	inet_rps_record_flow(sk);
+	sock_rps_record_flow(sk);
 
 	/* We may need to bind the socket. */
 	if (!inet_sk(sk)->inet_num && inet_autobind(sk))
@@ -737,7 +737,7 @@ static ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 {
 	struct sock *sk = sock->sk;
 
-	inet_rps_record_flow(sk);
+	sock_rps_record_flow(sk);
 
 	/* We may need to bind the socket. */
 	if (!inet_sk(sk)->inet_num && inet_autobind(sk))
@@ -755,7 +755,7 @@ int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	int addr_len = 0;
 	int err;
 
-	inet_rps_record_flow(sk);
+	sock_rps_record_flow(sk);
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
 				   flags & ~MSG_DONTWAIT, &addr_len);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4d6717d..771f814 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1672,7 +1672,7 @@ process:
 
 	skb->dev = NULL;
 
-	inet_rps_save_rxhash(sk, skb->rxhash);
+	sock_rps_save_rxhash(sk, skb->rxhash);
 
 	bh_lock_sock_nested(sk);
 	ret = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 776c844..63eb56b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1217,7 +1217,7 @@ int udp_disconnect(struct sock *sk, int flags)
 	sk->sk_state = TCP_CLOSE;
 	inet->inet_daddr = 0;
 	inet->inet_dport = 0;
-	inet_rps_save_rxhash(sk, 0);
+	sock_rps_save_rxhash(sk, 0);
 	sk->sk_bound_dev_if = 0;
 	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
 		inet_reset_saddr(sk);
@@ -1262,7 +1262,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	int rc;
 
 	if (inet_sk(sk)->inet_daddr)
-		inet_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb->rxhash);
 
 	rc = sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox