Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH] net: cgroup: null ptr dereference in netprio cgroup during init
From: Gao feng @ 2012-07-18  1:59 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, nhorman, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <20120718003316.2979.49278.stgit@jf-dev1-dcblab>

于 2012年07月18日 08:33, John Fastabend 写道:
> When the netprio cgroup is built in the kernel cgroup_init will call
> cgrp_create which eventually calls update_netdev_tables. This is
> being called before do_initcalls() so a null ptr dereference occurs
> on init_net.
> 
> This patch adds a check on init_net.count to verify the structure
> has been initialized. The failure was introduced here,
> 
> commit ef209f15980360f6945873df3cd710c5f62f2a3e
> Author: Gao feng <gaofeng@cn.fujitsu.com>
> Date:   Wed Jul 11 21:50:15 2012 +0000
> 
>     net: cgroup: fix access the unallocated memory in netprio cgroup
> 
> Tested with ping with netprio_cgroup as a module and built in.
> 
> Marked RFC for now I think DaveM might have a reason why this needs
> some improvement.
> 
> Reported-by: Mark Rustad <mark.d.rustad@intel.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Gao feng <gaofeng@cn.fujitsu.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 
>  net/core/netprio_cgroup.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index b2e9caa..e9fd7fd 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -116,6 +116,9 @@ static int update_netdev_tables(void)
>  	u32 max_len;
>  	struct netprio_map *map;


Thanks John.
It's my mistake.

Can we make sure init_net.count is zero here?
I can't find some places to initialize it to zero.

>  
> +	if (!atomic_read(&init_net.count))
> +		return ret;
> +
>  	rtnl_lock();
>  	max_len = atomic_read(&max_prioidx) + 1;
>  	for_each_netdev(&init_net, dev) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH v2] b44: add 64 bit stats
From: Kevin Groeneveld @ 2012-07-18  2:02 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20120716.230806.242760837075045729.davem@davemloft.net>

On Tue, Jul 17, 2012 at 2:08 AM, David Miller <davem@davemloft.net> wrote:
> This patch was corrupted by your email client and is therefore
> unusable.

If I resend the patch should I bump the version number in the subject?

Should I include "Acked-by" lines that people have posted?

I keep sending myself test messages with the patch but the white space
is always mangled.  I am not sure if Thunderbird is mangling it in the
sent message or the received message... :(

^ permalink raw reply

* [net-next 1/6] ixgbe: Ping the VFs on link status change to trigger link change
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342578045-17778-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

When the link status changes on the PF we need to notify the VFs. In order
to do this we should ping all of the VFs in order to trigger a link status
change on them as well.

This fixes issues in which the PF would reset, but the VF didn't because the
NAK flag was not set in the VF mailbox.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ee230f5..17f46f0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5390,6 +5390,9 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter)
 
 	netif_carrier_on(netdev);
 	ixgbe_check_vf_rate_limit(adapter);
+
+	/* ping all the active vfs to let them know link has changed */
+	ixgbe_ping_all_vfs(adapter);
 }
 
 /**
@@ -5419,6 +5422,9 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 
 	e_info(drv, "NIC Link is Down\n");
 	netif_carrier_off(netdev);
+
+	/* ping all the active vfs to let them know link has changed */
+	ixgbe_ping_all_vfs(adapter);
 }
 
 /**
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 0/6][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to ixgbe & ixgbevf.

The following are changes since commit 5abf7f7e0f6bdbfcac737f636497d7016d9507eb:
  ipv4: fix rcu splat
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (6):
  ixgbe: Ping the VFs on link status change to trigger link change
  ixgbe: Handle failures in the ixgbe_setup_rx/tx_resources calls
  ixgbe: Move configuration of set_real_num_rx/tx_queues into open
  ixgbe: Update the logic for ixgbe_cache_ring_dcb and DCB RSS
    configuration
  ixgbe: Cleanup logic for MRQC and MTQC configuration
  ixgbevf: Update descriptor macros to accept pointers and drop _ADV
    suffix

 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c      |  138 ++++++---------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  190 +++++++++++++--------
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |   12 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   18 +-
 4 files changed, 183 insertions(+), 175 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [net-next 2/6] ixgbe: Handle failures in the ixgbe_setup_rx/tx_resources calls
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342578045-17778-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Previously we were exiting without cleaning up the memory internally on the
ixgbe_setup_rx_resources and ixgbe_setup_tx_resources calls.  Instead of
forcing the caller to clean things up for us we should instead just unwind
the rings and free the memory as we go.  This way we can more gracefully
clean up the rings in the event of an allocation failure.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 17f46f0..373342f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4549,10 +4549,16 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
 		err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
 		if (!err)
 			continue;
+
 		e_err(probe, "Allocation for Tx Queue %u failed\n", i);
-		break;
+		goto err_setup_tx;
 	}
 
+	return 0;
+err_setup_tx:
+	/* rewind the index freeing the rings as we go */
+	while (i--)
+		ixgbe_free_tx_resources(adapter->tx_ring[i]);
 	return err;
 }
 
@@ -4627,10 +4633,16 @@ static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter)
 		err = ixgbe_setup_rx_resources(adapter->rx_ring[i]);
 		if (!err)
 			continue;
+
 		e_err(probe, "Allocation for Rx Queue %u failed\n", i);
-		break;
+		goto err_setup_rx;
 	}
 
+	return 0;
+err_setup_rx:
+	/* rewind the index freeing the rings as we go */
+	while (i--)
+		ixgbe_free_rx_resources(adapter->rx_ring[i]);
 	return err;
 }
 
@@ -4791,10 +4803,10 @@ static int ixgbe_open(struct net_device *netdev)
 	return 0;
 
 err_req_irq:
-err_setup_rx:
 	ixgbe_free_all_rx_resources(adapter);
-err_setup_tx:
+err_setup_rx:
 	ixgbe_free_all_tx_resources(adapter);
+err_setup_tx:
 	ixgbe_reset(adapter);
 
 	return err;
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 3/6] ixgbe: Move configuration of set_real_num_rx/tx_queues into open
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342578045-17778-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

It makes much more sense for us to configure the real number of Tx and Rx
queues in the ixgbe_open call than it does in ixgbe_set_num_queues.  By
setting the number in ixgbe_open we can avoid a number of unecessary
updates and only have to make the calls once.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |   58 ++++++-------------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   36 ++++++++++-----
 2 files changed, 38 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index d308e71..c03d771 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -349,7 +349,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
  * fallthrough conditions.
  *
  **/
-static int ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
+static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 {
 	/* Start with base case */
 	adapter->num_rx_queues = 1;
@@ -358,29 +358,14 @@ static int ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 	adapter->num_rx_queues_per_pool = 1;
 
 	if (ixgbe_set_sriov_queues(adapter))
-		goto done;
+		return;
 
 #ifdef CONFIG_IXGBE_DCB
 	if (ixgbe_set_dcb_queues(adapter))
-		goto done;
+		return;
 
 #endif
-	if (ixgbe_set_rss_queues(adapter))
-		goto done;
-
-	/* fallback to base case */
-	adapter->num_rx_queues = 1;
-	adapter->num_tx_queues = 1;
-
-done:
-	if ((adapter->netdev->reg_state == NETREG_UNREGISTERED) ||
-	    (adapter->netdev->reg_state == NETREG_UNREGISTERING))
-		return 0;
-
-	/* Notify the stack of the (possibly) reduced queue counts. */
-	netif_set_real_num_tx_queues(adapter->netdev, adapter->num_tx_queues);
-	return netif_set_real_num_rx_queues(adapter->netdev,
-					    adapter->num_rx_queues);
+	ixgbe_set_rss_queues(adapter);
 }
 
 static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter,
@@ -710,11 +695,10 @@ static void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter)
  * Attempt to configure the interrupts using the best available
  * capabilities of the hardware and the kernel.
  **/
-static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
+static void ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	int err = 0;
-	int vector, v_budget;
+	int vector, v_budget, err;
 
 	/*
 	 * It's easy to be greedy for MSI-X vectors, but it really
@@ -747,7 +731,7 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 		ixgbe_acquire_msix_vectors(adapter, v_budget);
 
 		if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
-			goto out;
+			return;
 	}
 
 	adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
@@ -762,25 +746,17 @@ static int ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
 		ixgbe_disable_sriov(adapter);
 
-	err = ixgbe_set_num_queues(adapter);
-	if (err)
-		return err;
-
+	ixgbe_set_num_queues(adapter);
 	adapter->num_q_vectors = 1;
 
 	err = pci_enable_msi(adapter->pdev);
-	if (!err) {
-		adapter->flags |= IXGBE_FLAG_MSI_ENABLED;
-	} else {
+	if (err) {
 		netif_printk(adapter, hw, KERN_DEBUG, adapter->netdev,
 			     "Unable to allocate MSI interrupt, "
 			     "falling back to legacy.  Error: %d\n", err);
-		/* reset err */
-		err = 0;
+		return;
 	}
-
-out:
-	return err;
+	adapter->flags |= IXGBE_FLAG_MSI_ENABLED;
 }
 
 /**
@@ -798,15 +774,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 	int err;
 
 	/* Number of supported queues */
-	err = ixgbe_set_num_queues(adapter);
-	if (err)
-		return err;
+	ixgbe_set_num_queues(adapter);
 
-	err = ixgbe_set_interrupt_capability(adapter);
-	if (err) {
-		e_dev_err("Unable to setup interrupt capabilities\n");
-		goto err_set_interrupt;
-	}
+	/* Set interrupt mode */
+	ixgbe_set_interrupt_capability(adapter);
 
 	err = ixgbe_alloc_q_vectors(adapter);
 	if (err) {
@@ -826,7 +797,6 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 
 err_alloc_q_vectors:
 	ixgbe_reset_interrupt_capability(adapter);
-err_set_interrupt:
 	return err;
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 373342f..7f2aa22 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4798,10 +4798,26 @@ static int ixgbe_open(struct net_device *netdev)
 	if (err)
 		goto err_req_irq;
 
+	/* Notify the stack of the actual queue counts. */
+	err = netif_set_real_num_tx_queues(netdev,
+					   adapter->num_rx_pools > 1 ? 1 :
+					   adapter->num_tx_queues);
+	if (err)
+		goto err_set_queues;
+
+
+	err = netif_set_real_num_rx_queues(netdev,
+					   adapter->num_rx_pools > 1 ? 1 :
+					   adapter->num_rx_queues);
+	if (err)
+		goto err_set_queues;
+
 	ixgbe_up_complete(adapter);
 
 	return 0;
 
+err_set_queues:
+	ixgbe_free_irq(adapter);
 err_req_irq:
 	ixgbe_free_all_rx_resources(adapter);
 err_setup_rx:
@@ -4864,23 +4880,19 @@ static int ixgbe_resume(struct pci_dev *pdev)
 
 	pci_wake_from_d3(pdev, false);
 
-	rtnl_lock();
-	err = ixgbe_init_interrupt_scheme(adapter);
-	rtnl_unlock();
-	if (err) {
-		e_dev_err("Cannot initialize interrupts for device\n");
-		return err;
-	}
-
 	ixgbe_reset(adapter);
 
 	IXGBE_WRITE_REG(&adapter->hw, IXGBE_WUS, ~0);
 
-	if (netif_running(netdev)) {
+	rtnl_lock();
+	err = ixgbe_init_interrupt_scheme(adapter);
+	if (!err && netif_running(netdev))
 		err = ixgbe_open(netdev);
-		if (err)
-			return err;
-	}
+
+	rtnl_unlock();
+
+	if (err)
+		return err;
 
 	netif_device_attach(netdev);
 
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 4/6] ixgbe: Update the logic for ixgbe_cache_ring_dcb and DCB RSS configuration
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, John Fastabend,
	Jeff Kirsher
In-Reply-To: <1342578045-17778-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change cleans up some of the logic in an attempt to try and simplify
things for how we are configuring DCB w/ RSS.

In this patch I basically did 3 things.  I updated the logic for getting
the first register index.  I applied the fact that all TCs get the same
number of queues to simplify the looping logic in caching the DCB ring
register.  Finally I updated how we configure the RQTC register to match
the fact that all TCs are assigned the same number of queues.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |   80 ++++++++++++-------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   12 ++--
 2 files changed, 42 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index c03d771..4c3822f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -42,42 +42,37 @@ static void ixgbe_get_first_reg_idx(struct ixgbe_adapter *adapter, u8 tc,
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
-		*tx = tc << 2;
-		*rx = tc << 3;
+		/* TxQs/TC: 4	RxQs/TC: 8 */
+		*tx = tc << 2; /* 0, 4,  8, 12, 16, 20, 24, 28 */
+		*rx = tc << 3; /* 0, 8, 16, 24, 32, 40, 48, 56 */
 		break;
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		if (num_tcs > 4) {
-			if (tc < 3) {
-				*tx = tc << 5;
-				*rx = tc << 4;
-			} else if (tc <  5) {
-				*tx = ((tc + 2) << 4);
-				*rx = tc << 4;
-			} else if (tc < num_tcs) {
-				*tx = ((tc + 8) << 3);
-				*rx = tc << 4;
-			}
+			/*
+			 * TCs    : TC0/1 TC2/3 TC4-7
+			 * TxQs/TC:    32    16     8
+			 * RxQs/TC:    16    16    16
+			 */
+			*rx = tc << 4;
+			if (tc < 3)
+				*tx = tc << 5;		/*   0,  32,  64 */
+			else if (tc < 5)
+				*tx = (tc + 2) << 4;	/*  80,  96 */
+			else
+				*tx = (tc + 8) << 3;	/* 104, 112, 120 */
 		} else {
-			*rx =  tc << 5;
-			switch (tc) {
-			case 0:
-				*tx =  0;
-				break;
-			case 1:
-				*tx = 64;
-				break;
-			case 2:
-				*tx = 96;
-				break;
-			case 3:
-				*tx = 112;
-				break;
-			default:
-				break;
-			}
+			/*
+			 * TCs    : TC0 TC1 TC2/3
+			 * TxQs/TC:  64  32    16
+			 * RxQs/TC:  32  32    32
+			 */
+			*rx = tc << 5;
+			if (tc < 2)
+				*tx = tc << 6;		/*  0,  64 */
+			else
+				*tx = (tc + 4) << 4;	/* 96, 112 */
 		}
-		break;
 	default:
 		break;
 	}
@@ -90,25 +85,26 @@ static void ixgbe_get_first_reg_idx(struct ixgbe_adapter *adapter, u8 tc,
  * Cache the descriptor ring offsets for DCB to the assigned rings.
  *
  **/
-static inline bool ixgbe_cache_ring_dcb(struct ixgbe_adapter *adapter)
+static bool ixgbe_cache_ring_dcb(struct ixgbe_adapter *adapter)
 {
 	struct net_device *dev = adapter->netdev;
-	int i, j, k;
+	unsigned int tx_idx, rx_idx;
+	int tc, offset, rss_i, i;
 	u8 num_tcs = netdev_get_num_tc(dev);
 
-	if (!num_tcs)
+	/* verify we have DCB queueing enabled before proceeding */
+	if (num_tcs <= 1)
 		return false;
 
-	for (i = 0, k = 0; i < num_tcs; i++) {
-		unsigned int tx_s, rx_s;
-		u16 count = dev->tc_to_txq[i].count;
+	rss_i = adapter->ring_feature[RING_F_RSS].indices;
 
-		ixgbe_get_first_reg_idx(adapter, i, &tx_s, &rx_s);
-		for (j = 0; j < count; j++, k++) {
-			adapter->tx_ring[k]->reg_idx = tx_s + j;
-			adapter->rx_ring[k]->reg_idx = rx_s + j;
-			adapter->tx_ring[k]->dcb_tc = i;
-			adapter->rx_ring[k]->dcb_tc = i;
+	for (tc = 0, offset = 0; tc < num_tcs; tc++, offset += rss_i) {
+		ixgbe_get_first_reg_idx(adapter, tc, &tx_idx, &rx_idx);
+		for (i = 0; i < rss_i; i++, tx_idx++, rx_idx++) {
+			adapter->tx_ring[offset + i]->reg_idx = tx_idx;
+			adapter->rx_ring[offset + i]->reg_idx = rx_idx;
+			adapter->tx_ring[offset + i]->dcb_tc = tc;
+			adapter->rx_ring[offset + i]->dcb_tc = tc;
 		}
 	}
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7f2aa22..32c8cd6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3608,20 +3608,16 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 
 	/* Enable RSS Hash per TC */
 	if (hw->mac.type != ixgbe_mac_82598EB) {
-		int i;
-		u32 reg = 0;
-		u8 msb = 0;
-		u8 rss_i = adapter->netdev->tc_to_txq[0].count - 1;
+		u32 msb = 0;
+		u16 rss_i = adapter->ring_feature[RING_F_RSS].indices - 1;
 
 		while (rss_i) {
 			msb++;
 			rss_i >>= 1;
 		}
 
-		for (i = 0; i < MAX_TRAFFIC_CLASS; i++)
-			reg |= msb << IXGBE_RQTC_SHIFT_TC(i);
-
-		IXGBE_WRITE_REG(hw, IXGBE_RQTC, reg);
+		/* write msb to all 8 TCs in one write */
+		IXGBE_WRITE_REG(hw, IXGBE_RQTC, msb * 0x11111111);
 	}
 }
 #endif
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 5/6] ixgbe: Cleanup logic for MRQC and MTQC configuration
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, John Fastabend,
	Jeff Kirsher
In-Reply-To: <1342578045-17778-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to make the code much more readable for MTQC and MRQC
configuration.

The big change is that I simplified much of the logic so that we are
essentially handling just 4 cases and their variants. In the cases where
RSS is disabled we are actually just programming the RETA table with all
1s resulting in a single queue RSS. In the case of SR-IOV I am treating
that as a subset of VMDq. This all results int he following configuration
for the hardware:
         DCB
         En       Dis
VMDq En  VMDQ/DCB VMDq/RSS
     Dis DCB/RSS  RSS

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  116 ++++++++++++++-----------
 1 file changed, 66 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 32c8cd6..2b4b791 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2719,8 +2719,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 static void ixgbe_setup_mtqc(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 rttdcs;
-	u32 reg;
+	u32 rttdcs, mtqc;
 	u8 tcs = netdev_get_num_tc(adapter->netdev);
 
 	if (hw->mac.type == ixgbe_mac_82598EB)
@@ -2732,28 +2731,32 @@ static void ixgbe_setup_mtqc(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_REG(hw, IXGBE_RTTDCS, rttdcs);
 
 	/* set transmit pool layout */
-	switch (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
-	case (IXGBE_FLAG_SRIOV_ENABLED):
-		IXGBE_WRITE_REG(hw, IXGBE_MTQC,
-				(IXGBE_MTQC_VT_ENA | IXGBE_MTQC_64VF));
-		break;
-	default:
-		if (!tcs)
-			reg = IXGBE_MTQC_64Q_1PB;
-		else if (tcs <= 4)
-			reg = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_4TC_4TQ;
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+		mtqc = IXGBE_MTQC_VT_ENA;
+		if (tcs > 4)
+			mtqc |= IXGBE_MTQC_RT_ENA | IXGBE_MTQC_8TC_8TQ;
+		else if (tcs > 1)
+			mtqc |= IXGBE_MTQC_RT_ENA | IXGBE_MTQC_4TC_4TQ;
+		else if (adapter->ring_feature[RING_F_RSS].indices == 4)
+			mtqc |= IXGBE_MTQC_32VF;
 		else
-			reg = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_8TC_8TQ;
+			mtqc |= IXGBE_MTQC_64VF;
+	} else {
+		if (tcs > 4)
+			mtqc = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_8TC_8TQ;
+		else if (tcs > 1)
+			mtqc = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_4TC_4TQ;
+		else
+			mtqc = IXGBE_MTQC_64Q_1PB;
+	}
 
-		IXGBE_WRITE_REG(hw, IXGBE_MTQC, reg);
+	IXGBE_WRITE_REG(hw, IXGBE_MTQC, mtqc);
 
-		/* Enable Security TX Buffer IFG for multiple pb */
-		if (tcs) {
-			reg = IXGBE_READ_REG(hw, IXGBE_SECTXMINIFG);
-			reg |= IXGBE_SECTX_DCB;
-			IXGBE_WRITE_REG(hw, IXGBE_SECTXMINIFG, reg);
-		}
-		break;
+	/* Enable Security TX Buffer IFG for multiple pb */
+	if (tcs) {
+		u32 sectx = IXGBE_READ_REG(hw, IXGBE_SECTXMINIFG);
+		sectx |= IXGBE_SECTX_DCB;
+		IXGBE_WRITE_REG(hw, IXGBE_SECTXMINIFG, sectx);
 	}
 
 	/* re-enable the arbiter */
@@ -2886,11 +2889,18 @@ static void ixgbe_setup_mrqc(struct ixgbe_adapter *adapter)
 	u32 mrqc = 0, reta = 0;
 	u32 rxcsum;
 	int i, j;
-	u8 tcs = netdev_get_num_tc(adapter->netdev);
-	int maxq = adapter->ring_feature[RING_F_RSS].indices;
+	u16 rss_i = adapter->ring_feature[RING_F_RSS].indices;
+
+	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
+		rss_i = 1;
 
-	if (tcs)
-		maxq = min(maxq, adapter->num_tx_queues / tcs);
+	/*
+	 * Program table for at least 2 queues w/ SR-IOV so that VFs can
+	 * make full use of any rings they may have.  We will use the
+	 * PSRTYPE register to control how many rings we use within the PF.
+	 */
+	if ((adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) && (rss_i < 2))
+		rss_i = 2;
 
 	/* Fill out hash function seeds */
 	for (i = 0; i < 10; i++)
@@ -2898,7 +2908,7 @@ static void ixgbe_setup_mrqc(struct ixgbe_adapter *adapter)
 
 	/* Fill out redirection table */
 	for (i = 0, j = 0; i < 128; i++, j++) {
-		if (j == maxq)
+		if (j == rss_i)
 			j = 0;
 		/* reta = 4-byte sliding window of
 		 * 0x00..(indices-1)(indices-1)00..etc. */
@@ -2912,35 +2922,36 @@ static void ixgbe_setup_mrqc(struct ixgbe_adapter *adapter)
 	rxcsum |= IXGBE_RXCSUM_PCSD;
 	IXGBE_WRITE_REG(hw, IXGBE_RXCSUM, rxcsum);
 
-	if (adapter->hw.mac.type == ixgbe_mac_82598EB &&
-	    (adapter->flags & IXGBE_FLAG_RSS_ENABLED)) {
-		mrqc = IXGBE_MRQC_RSSEN;
+	if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
+		if (adapter->flags & IXGBE_FLAG_RSS_ENABLED)
+			mrqc = IXGBE_MRQC_RSSEN;
 	} else {
-		int mask = adapter->flags & (IXGBE_FLAG_RSS_ENABLED
-					     | IXGBE_FLAG_SRIOV_ENABLED);
-
-		switch (mask) {
-		case (IXGBE_FLAG_RSS_ENABLED):
-			if (!tcs)
-				mrqc = IXGBE_MRQC_RSSEN;
-			else if (tcs <= 4)
-				mrqc = IXGBE_MRQC_RTRSS4TCEN;
+		u8 tcs = netdev_get_num_tc(adapter->netdev);
+
+		if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+			if (tcs > 4)
+				mrqc = IXGBE_MRQC_VMDQRT8TCEN;	/* 8 TCs */
+			else if (tcs > 1)
+				mrqc = IXGBE_MRQC_VMDQRT4TCEN;	/* 4 TCs */
+			else if (adapter->ring_feature[RING_F_RSS].indices == 4)
+				mrqc = IXGBE_MRQC_VMDQRSS32EN;
 			else
+				mrqc = IXGBE_MRQC_VMDQRSS64EN;
+		} else {
+			if (tcs > 4)
 				mrqc = IXGBE_MRQC_RTRSS8TCEN;
-			break;
-		case (IXGBE_FLAG_SRIOV_ENABLED):
-			mrqc = IXGBE_MRQC_VMDQEN;
-			break;
-		default:
-			break;
+			else if (tcs > 1)
+				mrqc = IXGBE_MRQC_RTRSS4TCEN;
+			else
+				mrqc = IXGBE_MRQC_RSSEN;
 		}
 	}
 
 	/* Perform hash on these packet types */
-	mrqc |= IXGBE_MRQC_RSS_FIELD_IPV4
-	      | IXGBE_MRQC_RSS_FIELD_IPV4_TCP
-	      | IXGBE_MRQC_RSS_FIELD_IPV6
-	      | IXGBE_MRQC_RSS_FIELD_IPV6_TCP;
+	mrqc |= IXGBE_MRQC_RSS_FIELD_IPV4 |
+		IXGBE_MRQC_RSS_FIELD_IPV4_TCP |
+		IXGBE_MRQC_RSS_FIELD_IPV6 |
+		IXGBE_MRQC_RSS_FIELD_IPV6_TCP;
 
 	if (adapter->flags2 & IXGBE_FLAG2_RSS_FIELD_IPV4_UDP)
 		mrqc |= IXGBE_MRQC_RSS_FIELD_IPV4_UDP;
@@ -3103,8 +3114,13 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	if (hw->mac.type == ixgbe_mac_82598EB)
 		return;
 
-	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED)
-		psrtype |= (adapter->num_rx_queues_per_pool << 29);
+	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
+		int rss_i = adapter->ring_feature[RING_F_RSS].indices;
+		if (rss_i > 3)
+			psrtype |= 2 << 29;
+		else if (rss_i > 1)
+			psrtype |= 1 << 29;
+	}
 
 	for (p = 0; p < adapter->num_rx_pools; p++)
 		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(adapter->num_vfs + p),
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 6/6] ixgbevf: Update descriptor macros to accept pointers and drop _ADV suffix
From: Jeff Kirsher @ 2012-07-18  2:20 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342578045-17778-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change updates the descriptor macros to accept pointers, updates the
name to drop the _ADV suffix, and include the IXGBEVF name in the macro.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |   12 ++++++------
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   18 +++++++++---------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f92daca..1f13765 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -164,12 +164,12 @@ struct ixgbevf_q_vector {
 	((((R)->next_to_clean > (R)->next_to_use) ? 0 : (R)->count) + \
 	(R)->next_to_clean - (R)->next_to_use - 1)
 
-#define IXGBE_RX_DESC_ADV(R, i)	    \
-	(&(((union ixgbe_adv_rx_desc *)((R).desc))[i]))
-#define IXGBE_TX_DESC_ADV(R, i)	    \
-	(&(((union ixgbe_adv_tx_desc *)((R).desc))[i]))
-#define IXGBE_TX_CTXTDESC_ADV(R, i)	    \
-	(&(((struct ixgbe_adv_tx_context_desc *)((R).desc))[i]))
+#define IXGBEVF_RX_DESC(R, i)	    \
+	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
+#define IXGBEVF_TX_DESC(R, i)	    \
+	(&(((union ixgbe_adv_tx_desc *)((R)->desc))[i]))
+#define IXGBEVF_TX_CTXTDESC(R, i)	    \
+	(&(((struct ixgbe_adv_tx_context_desc *)((R)->desc))[i]))
 
 #define IXGBE_MAX_JUMBO_FRAME_SIZE        16128
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 8e022c6..c98cdf7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -195,7 +195,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 
 	i = tx_ring->next_to_clean;
 	eop = tx_ring->tx_buffer_info[i].next_to_watch;
-	eop_desc = IXGBE_TX_DESC_ADV(*tx_ring, eop);
+	eop_desc = IXGBEVF_TX_DESC(tx_ring, eop);
 
 	while ((eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)) &&
 	       (count < tx_ring->count)) {
@@ -206,7 +206,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 			goto cont_loop;
 		for ( ; !cleaned; count++) {
 			struct sk_buff *skb;
-			tx_desc = IXGBE_TX_DESC_ADV(*tx_ring, i);
+			tx_desc = IXGBEVF_TX_DESC(tx_ring, i);
 			tx_buffer_info = &tx_ring->tx_buffer_info[i];
 			cleaned = (i == eop);
 			skb = tx_buffer_info->skb;
@@ -235,7 +235,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 
 cont_loop:
 		eop = tx_ring->tx_buffer_info[i].next_to_watch;
-		eop_desc = IXGBE_TX_DESC_ADV(*tx_ring, eop);
+		eop_desc = IXGBEVF_TX_DESC(tx_ring, eop);
 	}
 
 	tx_ring->next_to_clean = i;
@@ -339,7 +339,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 	bi = &rx_ring->rx_buffer_info[i];
 
 	while (cleaned_count--) {
-		rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i);
+		rx_desc = IXGBEVF_RX_DESC(rx_ring, i);
 		skb = bi->skb;
 		if (!skb) {
 			skb = netdev_alloc_skb(adapter->netdev,
@@ -405,7 +405,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 
 	i = rx_ring->next_to_clean;
-	rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i);
+	rx_desc = IXGBEVF_RX_DESC(rx_ring, i);
 	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	rx_buffer_info = &rx_ring->rx_buffer_info[i];
 
@@ -432,7 +432,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		if (i == rx_ring->count)
 			i = 0;
 
-		next_rxd = IXGBE_RX_DESC_ADV(*rx_ring, i);
+		next_rxd = IXGBEVF_RX_DESC(rx_ring, i);
 		prefetch(next_rxd);
 		cleaned_count++;
 
@@ -2437,7 +2437,7 @@ static int ixgbevf_tso(struct ixgbevf_adapter *adapter,
 		i = tx_ring->next_to_use;
 
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IXGBE_TX_CTXTDESC_ADV(*tx_ring, i);
+		context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
 
 		/* VLAN MACLEN IPLEN */
 		if (tx_flags & IXGBE_TX_FLAGS_VLAN)
@@ -2497,7 +2497,7 @@ static bool ixgbevf_tx_csum(struct ixgbevf_adapter *adapter,
 	    (tx_flags & IXGBE_TX_FLAGS_VLAN)) {
 		i = tx_ring->next_to_use;
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IXGBE_TX_CTXTDESC_ADV(*tx_ring, i);
+		context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
 
 		if (tx_flags & IXGBE_TX_FLAGS_VLAN)
 			vlan_macip_lens |= (tx_flags &
@@ -2700,7 +2700,7 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 	i = tx_ring->next_to_use;
 	while (count--) {
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		tx_desc = IXGBE_TX_DESC_ADV(*tx_ring, i);
+		tx_desc = IXGBEVF_TX_DESC(tx_ring, i);
 		tx_desc->read.buffer_addr = cpu_to_le64(tx_buffer_info->dma);
 		tx_desc->read.cmd_type_len =
 			cpu_to_le32(cmd_type_len | tx_buffer_info->length);
-- 
1.7.10.4

^ permalink raw reply related

* [net] ixgbevf: fix VF untagging when 802.1 prio is set
From: Jeff Kirsher @ 2012-07-18  2:23 UTC (permalink / raw)
  To: davem; +Cc: Pascal Bouchareine, netdev, gospo, sassmann, Jeff Kirsher

From: Pascal Bouchareine <pascal@gandi.net>

We have had an issue when using ixgbe+ixgbevf and 802.1 VLAN tagging.

When attaching a VLAN to a VF, frames with a 802.1q priority appeared
untagged on the VF hence not reaching the VLAN, where frames with
priority 0 where tagged as expected and seen by the VLAN device.

This seems due to the way ixgbevf is looking up the full tag
(prio+cfi+vlan) against the adapter active_vlans, as a condition to mark
the skb tagged.

Signed-off-by: Pascal Bouchareine <pascal@gandi.net>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 41e3225..c16d32f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -304,7 +304,7 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
 	bool is_vlan = (status & IXGBE_RXD_STAT_VP);
 	u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
 
-	if (is_vlan && test_bit(tag, adapter->active_vlans))
+	if (is_vlan && test_bit(tag & VLAN_VID_MASK, adapter->active_vlans))
 		__vlan_hwaccel_put_tag(skb, tag);
 
 	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
-- 
1.7.10.4

^ permalink raw reply related

* Re: [net] ixgbevf: fix VF untagging when 802.1 prio is set
From: Jeff Kirsher @ 2012-07-18  2:31 UTC (permalink / raw)
  To: davem; +Cc: Pascal Bouchareine, netdev, gospo, sassmann
In-Reply-To: <1342578236-20036-1-git-send-email-jeffrey.t.kirsher@intel.com>

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

On Tue, 2012-07-17 at 19:23 -0700, Jeff Kirsher wrote:
> From: Pascal Bouchareine <pascal@gandi.net>
> 
> We have had an issue when using ixgbe+ixgbevf and 802.1 VLAN tagging.
> 
> When attaching a VLAN to a VF, frames with a 802.1q priority appeared
> untagged on the VF hence not reaching the VLAN, where frames with
> priority 0 where tagged as expected and seen by the VLAN device.
> 
> This seems due to the way ixgbevf is looking up the full tag
> (prio+cfi+vlan) against the adapter active_vlans, as a condition to
> mark
> the skb tagged.
> 
> Signed-off-by: Pascal Bouchareine <pascal@gandi.net>
> Tested-by: Sibai Li <sibai.li@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-) 

Dave-

Disregard, I just read your message about "that's it for net".  I will
push this into my net-next tree.

Cheers,
Jeff

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH v2] b44: add 64 bit stats
From: Eric Dumazet @ 2012-07-18  3:18 UTC (permalink / raw)
  To: Kevin Groeneveld; +Cc: netdev
In-Reply-To: <CABF+-6WmaLjgTTX_ot=QCzsM-tYMJhVk5oRYFAVj66cKG-z31A@mail.gmail.com>

On Tue, 2012-07-17 at 22:02 -0400, Kevin Groeneveld wrote:
> On Tue, Jul 17, 2012 at 2:08 AM, David Miller <davem@davemloft.net> wrote:
> > This patch was corrupted by your email client and is therefore
> > unusable.
> 
> If I resend the patch should I bump the version number in the subject?
> 
It doesnt matter in this case, its a formatting issue

> Should I include "Acked-by" lines that people have posted?
> 

You can if no semantic change is done

> I keep sending myself test messages with the patch but the white space
> is always mangled.  I am not sure if Thunderbird is mangling it in the
> sent message or the received message... :(

Documentation/email-clients.txt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thunderbird (GUI)

Thunderbird is an Outlook clone that likes to mangle text, but there are ways
to coerce it into behaving.

- Allows use of an external editor:
  The easiest thing to do with Thunderbird and patches is to use an
  "external editor" extension and then just use your favorite $EDITOR
  for reading/merging patches into the body text.  To do this, download
  and install the extension, then add a button for it using
  View->Toolbars->Customize... and finally just click on it when in the
  Compose dialog.

To beat some sense out of the internal editor, do this:

- Edit your Thunderbird config settings so that it won't use format=flowed.
  Go to "edit->preferences->advanced->config editor" to bring up the
  thunderbird's registry editor, and set "mailnews.send_plaintext_flowed" to
  "false".

- Disable HTML Format: Set "mail.identity.id1.compose_html" to "false".

- Enable "preformat" mode: Set "editor.quotesPreformatted" to "true".

- Enable UTF8: Set "prefs.converted-to-utf8" to "true".

- Install the "toggle wordwrap" extension.  Download the file from:
    https://addons.mozilla.org/thunderbird/addon/2351/
  Then go to "tools->add ons", select "install" at the bottom of the screen,
  and browse to where you saved the .xul file.  This adds an "Enable
  Wordwrap" entry under the Options menu of the message composer.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply

* Re: [PATCH v2] b44: add 64 bit stats
From: Kevin Groeneveld @ 2012-07-18  3:46 UTC (permalink / raw)
  To: netdev; +Cc: Kevin Groeneveld
In-Reply-To: <20120716.230806.242760837075045729.davem@davemloft.net>

From: Kevin Groeneveld <kgroeneveld@gmail.com>

Add support for 64 bit stats to Broadcom b44 ethernet driver.

Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
---
v2: use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of
    u64_stats_fetch_begin/u64_stats_fetch_retry as stats update happens in a
    timer interrupt

 drivers/net/ethernet/broadcom/b44.c |   96 ++++++++++++++++++++---------------
 drivers/net/ethernet/broadcom/b44.h |    3 +-
 2 files changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index d09c6b5..9786c0e 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -483,9 +483,11 @@ out:
 static void b44_stats_update(struct b44 *bp)
 {
 	unsigned long reg;
-	u32 *val;
+	u64 *val;
 
 	val = &bp->hw_stats.tx_good_octets;
+	u64_stats_update_begin(&bp->hw_stats.syncp);
+
 	for (reg = B44_TX_GOOD_O; reg <= B44_TX_PAUSE; reg += 4UL) {
 		*val++ += br32(bp, reg);
 	}
@@ -496,6 +498,8 @@ static void b44_stats_update(struct b44 *bp)
 	for (reg = B44_RX_GOOD_O; reg <= B44_RX_NPAUSE; reg += 4UL) {
 		*val++ += br32(bp, reg);
 	}
+
+	u64_stats_update_end(&bp->hw_stats.syncp);
 }
 
 static void b44_link_report(struct b44 *bp)
@@ -1635,44 +1639,49 @@ static int b44_close(struct net_device *dev)
 	return 0;
 }
 
-static struct net_device_stats *b44_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *b44_get_stats64(struct net_device *dev,
+					struct rtnl_link_stats64 *nstat)
 {
 	struct b44 *bp = netdev_priv(dev);
-	struct net_device_stats *nstat = &dev->stats;
 	struct b44_hw_stats *hwstat = &bp->hw_stats;
-
-	/* Convert HW stats into netdevice stats. */
-	nstat->rx_packets = hwstat->rx_pkts;
-	nstat->tx_packets = hwstat->tx_pkts;
-	nstat->rx_bytes   = hwstat->rx_octets;
-	nstat->tx_bytes   = hwstat->tx_octets;
-	nstat->tx_errors  = (hwstat->tx_jabber_pkts +
-			     hwstat->tx_oversize_pkts +
-			     hwstat->tx_underruns +
-			     hwstat->tx_excessive_cols +
-			     hwstat->tx_late_cols);
-	nstat->multicast  = hwstat->tx_multicast_pkts;
-	nstat->collisions = hwstat->tx_total_cols;
-
-	nstat->rx_length_errors = (hwstat->rx_oversize_pkts +
-				   hwstat->rx_undersize);
-	nstat->rx_over_errors   = hwstat->rx_missed_pkts;
-	nstat->rx_frame_errors  = hwstat->rx_align_errs;
-	nstat->rx_crc_errors    = hwstat->rx_crc_errs;
-	nstat->rx_errors        = (hwstat->rx_jabber_pkts +
-				   hwstat->rx_oversize_pkts +
-				   hwstat->rx_missed_pkts +
-				   hwstat->rx_crc_align_errs +
-				   hwstat->rx_undersize +
-				   hwstat->rx_crc_errs +
-				   hwstat->rx_align_errs +
-				   hwstat->rx_symbol_errs);
-
-	nstat->tx_aborted_errors = hwstat->tx_underruns;
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin_bh(&hwstat->syncp);
+
+		/* Convert HW stats into rtnl_link_stats64 stats. */
+		nstat->rx_packets = hwstat->rx_pkts;
+		nstat->tx_packets = hwstat->tx_pkts;
+		nstat->rx_bytes   = hwstat->rx_octets;
+		nstat->tx_bytes   = hwstat->tx_octets;
+		nstat->tx_errors  = (hwstat->tx_jabber_pkts +
+				     hwstat->tx_oversize_pkts +
+				     hwstat->tx_underruns +
+				     hwstat->tx_excessive_cols +
+				     hwstat->tx_late_cols);
+		nstat->multicast  = hwstat->tx_multicast_pkts;
+		nstat->collisions = hwstat->tx_total_cols;
+
+		nstat->rx_length_errors = (hwstat->rx_oversize_pkts +
+					   hwstat->rx_undersize);
+		nstat->rx_over_errors   = hwstat->rx_missed_pkts;
+		nstat->rx_frame_errors  = hwstat->rx_align_errs;
+		nstat->rx_crc_errors    = hwstat->rx_crc_errs;
+		nstat->rx_errors        = (hwstat->rx_jabber_pkts +
+					   hwstat->rx_oversize_pkts +
+					   hwstat->rx_missed_pkts +
+					   hwstat->rx_crc_align_errs +
+					   hwstat->rx_undersize +
+					   hwstat->rx_crc_errs +
+					   hwstat->rx_align_errs +
+					   hwstat->rx_symbol_errs);
+
+		nstat->tx_aborted_errors = hwstat->tx_underruns;
 #if 0
-	/* Carrier lost counter seems to be broken for some devices */
-	nstat->tx_carrier_errors = hwstat->tx_carrier_lost;
+		/* Carrier lost counter seems to be broken for some devices */
+		nstat->tx_carrier_errors = hwstat->tx_carrier_lost;
 #endif
+	} while (u64_stats_fetch_retry_bh(&hwstat->syncp, start));
 
 	return nstat;
 }
@@ -1993,17 +2002,24 @@ static void b44_get_ethtool_stats(struct net_device *dev,
 				  struct ethtool_stats *stats, u64 *data)
 {
 	struct b44 *bp = netdev_priv(dev);
-	u32 *val = &bp->hw_stats.tx_good_octets;
+	struct b44_hw_stats *hwstat = &bp->hw_stats;
+	u64 *data_src, *data_dst;
+	unsigned int start;
 	u32 i;
 
 	spin_lock_irq(&bp->lock);
-
 	b44_stats_update(bp);
+	spin_unlock_irq(&bp->lock);
 
-	for (i = 0; i < ARRAY_SIZE(b44_gstrings); i++)
-		*data++ = *val++;
+	do {
+		data_src = &hwstat->tx_good_octets;
+		data_dst = data;
+		start = u64_stats_fetch_begin_bh(&hwstat->syncp);
 
-	spin_unlock_irq(&bp->lock);
+		for (i = 0; i < ARRAY_SIZE(b44_gstrings); i++)
+			*data_dst++ = *data_src++;
+
+	} while (u64_stats_fetch_retry_bh(&hwstat->syncp, start));
 }
 
 static void b44_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
@@ -2113,7 +2129,7 @@ static const struct net_device_ops b44_netdev_ops = {
 	.ndo_open		= b44_open,
 	.ndo_stop		= b44_close,
 	.ndo_start_xmit		= b44_start_xmit,
-	.ndo_get_stats		= b44_get_stats,
+	.ndo_get_stats64	= b44_get_stats64,
 	.ndo_set_rx_mode	= b44_set_rx_mode,
 	.ndo_set_mac_address	= b44_set_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
diff --git a/drivers/net/ethernet/broadcom/b44.h b/drivers/net/ethernet/broadcom/b44.h
index e1905a4..8993d72 100644
--- a/drivers/net/ethernet/broadcom/b44.h
+++ b/drivers/net/ethernet/broadcom/b44.h
@@ -338,9 +338,10 @@ struct ring_info {
  * the layout
  */
 struct b44_hw_stats {
-#define _B44(x)	u32 x;
+#define _B44(x)	u64 x;
 B44_STAT_REG_DECLARE
 #undef _B44
+	struct u64_stats_sync	syncp;
 };
 
 struct ssb_device;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH 0/5] Long term PMTU/redirect storage in ipv4.
From: Eric Dumazet @ 2012-07-18  3:46 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev
In-Reply-To: <alpine.LFD.2.00.1207180358190.2128@ja.ssi.bg>

On Wed, 2012-07-18 at 04:06 +0300, Julian Anastasov wrote:

> 
> 	I created patch with seqlock usage. This version
> is with global seqlock because I'm not sure if 2048 locks
> per NH are good idea. This is only compile tested.
> After comments may be I have to resubmit in separate message.
> 
> 
> Subject: [PATCH] ipv4: use seqlock for nh_exceptions
> 
> From: Julian Anastasov <ja@ssi.bg>
> 
> 	Use global seqlock for the nh_exceptions. Call
> fnhe_oldest with the right hash chain. Correct the diff
> value for dst_set_expires.
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  include/net/ip_fib.h |    2 +-
>  net/ipv4/route.c     |  117 ++++++++++++++++++++++++++++++++------------------
>  2 files changed, 76 insertions(+), 43 deletions(-)
> 

...

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index f67e702..e037c73 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1334,8 +1334,9 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk,
>  }
>  
>  static DEFINE_SPINLOCK(fnhe_lock);
> +static DEFINE_SEQLOCK(fnhe_seqlock);

Hi Julian

I find this patch too complex.

You could only change fnhe_lock to a seqlock

 net/ipv4/route.c |   35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f67e702..a96fc9d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1333,7 +1333,7 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk,
 		build_sk_flow_key(fl4, sk);
 }
 
-static DEFINE_SPINLOCK(fnhe_lock);
+static DEFINE_SEQLOCK(fnhe_seqlock);
 
 static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash, __be32 daddr)
 {
@@ -1454,11 +1454,11 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 				struct fib_nh *nh = &FIB_RES_NH(res);
 				struct fib_nh_exception *fnhe;
 
-				spin_lock_bh(&fnhe_lock);
+				write_seqlock_bh(&fnhe_seqlock);
 				fnhe = find_or_create_fnhe(nh, fl4->daddr);
 				if (fnhe)
 					fnhe->fnhe_gw = new_gw;
-				spin_unlock_bh(&fnhe_lock);
+				write_sequnlock_bh(&fnhe_seqlock);
 			}
 			rt->rt_gateway = new_gw;
 			rt->rt_flags |= RTCF_REDIRECTED;
@@ -1665,13 +1665,13 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 		struct fib_nh *nh = &FIB_RES_NH(res);
 		struct fib_nh_exception *fnhe;
 
-		spin_lock_bh(&fnhe_lock);
+		write_seqlock_bh(&fnhe_seqlock);
 		fnhe = find_or_create_fnhe(nh, fl4->daddr);
 		if (fnhe) {
 			fnhe->fnhe_pmtu = mtu;
 			fnhe->fnhe_expires = jiffies + ip_rt_mtu_expires;
 		}
-		spin_unlock_bh(&fnhe_lock);
+		write_sequnlock_bh(&fnhe_seqlock);
 	}
 	rt->rt_pmtu = mtu;
 	dst_set_expires(&rt->dst, ip_rt_mtu_expires);
@@ -1904,18 +1904,29 @@ static void rt_bind_exception(struct rtable *rt, struct fib_nh *nh, __be32 daddr
 
 	for (fnhe = rcu_dereference(hash[hval].chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
-		if (fnhe->fnhe_daddr == daddr) {
-			if (fnhe->fnhe_pmtu) {
-				unsigned long expires = fnhe->fnhe_expires;
-				unsigned long diff = jiffies - expires;
+		unsigned int seq;
+		__be32 fnhe_daddr, gw;
+		u32 pmtu;
+		unsigned long expires;
+
+		do {
+			seq = read_seqbegin(&fnhe_seqlock);
+			fnhe_daddr = fnhe->fnhe_daddr;
+			gw = fnhe->fnhe_gw;
+			pmtu = fnhe->fnhe_pmtu;
+			expires = fnhe->fnhe_expires;
+		} while (read_seqretry(&fnhe_seqlock, seq));
+		if (fnhe_daddr == daddr) {
+			if (pmtu) {
+				unsigned long diff = expires - jiffies;
 
 				if (time_before(jiffies, expires)) {
-					rt->rt_pmtu = fnhe->fnhe_pmtu;
+					rt->rt_pmtu = pmtu;
 					dst_set_expires(&rt->dst, diff);
 				}
 			}
-			if (fnhe->fnhe_gw)
-				rt->rt_gateway = fnhe->fnhe_gw;
+			if (gw)
+				rt->rt_gateway = gw;
 			fnhe->fnhe_stamp = jiffies;
 			break;
 		}

^ permalink raw reply related

* Re: [PATCH v2] b44: add 64 bit stats
From: Eric Dumazet @ 2012-07-18  3:50 UTC (permalink / raw)
  To: Kevin Groeneveld; +Cc: netdev
In-Reply-To: <1342583161-1184-1-git-send-email-kgroeneveld@gmail.com>

On Tue, 2012-07-17 at 23:46 -0400, Kevin Groeneveld wrote:
> From: Kevin Groeneveld <kgroeneveld@gmail.com>
> 
> Add support for 64 bit stats to Broadcom b44 ethernet driver.
> 
> Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
> ---
> v2: use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of
>     u64_stats_fetch_begin/u64_stats_fetch_retry as stats update happens in a
>     timer interrupt

Seems good this time, thanks

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* net-next and IPv6
From: Eric Dumazet @ 2012-07-18  4:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120717.110314.1984735460095688066.davem@davemloft.net>


IPv6 doesnt work anymore for me, at least TCP doesnt work

ssh ::1

ping6 is ok

tcpdump shows garbled source IP and ip-proto

^ permalink raw reply

* Re: [RFC PATCH] net: cgroup: null ptr dereference in netprio cgroup during init
From: John Fastabend @ 2012-07-18  5:50 UTC (permalink / raw)
  To: Gao feng; +Cc: davem, nhorman, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <5006188B.7060606@cn.fujitsu.com>

On 7/17/2012 6:59 PM, Gao feng wrote:
> 于 2012年07月18日 08:33, John Fastabend 写道:
>> When the netprio cgroup is built in the kernel cgroup_init will call
>> cgrp_create which eventually calls update_netdev_tables. This is
>> being called before do_initcalls() so a null ptr dereference occurs
>> on init_net.
>>

[...]

>
>
> Thanks John.
> It's my mistake.
>
> Can we make sure init_net.count is zero here?
> I can't find some places to initialize it to zero.
>

Its defined in net_namespace.c so it's zeroed by virtue
of being global. And initialized in setup_net via
pure_initcall() always after cgroup_init() if I've done
my accounting correctly.

.John

^ permalink raw reply

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
From: hayeswang @ 2012-07-18  6:45 UTC (permalink / raw)
  To: 'Francois Romieu', 'Eric Dumazet'; +Cc: netdev
In-Reply-To: <20120717234037.GA26972@electric-eye.fr.zoreil.com>

Francois Romieu [mailto:romieu@fr.zoreil.com] 
[...]

> Hayes, should we not add into the kernel driver something similar to
> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
> 8168 driver ?
> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.

For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
with the length less than 60 bytes. The hardware should pad this kind of packet
to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
60 bytes. However, the hw checksum would be incorrect for the modified packet,
so the software checksum is necessary. That is, for the packet less than 60
bytes, the software has to pad the packet and calculate the checksum, and the hw
checksum has to be disabled.
 
Best Regards,
Hayes

^ permalink raw reply

* Re: net-next and IPv6
From: Eric Dumazet @ 2012-07-18  7:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1342587513.2626.1518.camel@edumazet-glaptop>

On Wed, 2012-07-18 at 06:58 +0200, Eric Dumazet wrote:
> IPv6 doesnt work anymore for me, at least TCP doesnt work
> 
> ssh ::1
> 
> ping6 is ok
> 
> tcpdump shows garbled source IP and ip-proto
> 
> 

Probable bug coming from

commit 35ad9b9cf7d8a2e6259a0d24022e910adb6f3489
Author: David S. Miller <davem@davemloft.net>
Date:   Mon Jul 16 03:44:56 2012 -0700

    ipv6: Add helper inet6_csk_update_pmtu().
    
    This is the ipv6 version of inet_csk_update_pmtu().
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH 0/5] Long term PMTU/redirect storage in ipv4.
From: Julian Anastasov @ 2012-07-18  7:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1342583166.2626.1367.camel@edumazet-glaptop>


	Hello,

On Wed, 18 Jul 2012, Eric Dumazet wrote:

> Hi Julian
> 
> I find this patch too complex.
> 
> You could only change fnhe_lock to a seqlock

	I'll change it, I was not sure if we are going to
use some array of seqlocks and also without adding locks in
the struct fib_nh.

> -static DEFINE_SPINLOCK(fnhe_lock);
> +static DEFINE_SEQLOCK(fnhe_seqlock);
>  
>  static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash, __be32 daddr)
>  {
> @@ -1454,11 +1454,11 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
>  				struct fib_nh *nh = &FIB_RES_NH(res);
>  				struct fib_nh_exception *fnhe;
>  
> -				spin_lock_bh(&fnhe_lock);
> +				write_seqlock_bh(&fnhe_seqlock);
>  				fnhe = find_or_create_fnhe(nh, fl4->daddr);
>  				if (fnhe)
>  					fnhe->fnhe_gw = new_gw;
> -				spin_unlock_bh(&fnhe_lock);
> +				write_sequnlock_bh(&fnhe_seqlock);
>  			}
>  			rt->rt_gateway = new_gw;
>  			rt->rt_flags |= RTCF_REDIRECTED;
> @@ -1665,13 +1665,13 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
>  		struct fib_nh *nh = &FIB_RES_NH(res);
>  		struct fib_nh_exception *fnhe;
>  
> -		spin_lock_bh(&fnhe_lock);
> +		write_seqlock_bh(&fnhe_seqlock);
>  		fnhe = find_or_create_fnhe(nh, fl4->daddr);
>  		if (fnhe) {
>  			fnhe->fnhe_pmtu = mtu;
>  			fnhe->fnhe_expires = jiffies + ip_rt_mtu_expires;
>  		}
> -		spin_unlock_bh(&fnhe_lock);
> +		write_sequnlock_bh(&fnhe_seqlock);
>  	}
>  	rt->rt_pmtu = mtu;
>  	dst_set_expires(&rt->dst, ip_rt_mtu_expires);
> @@ -1904,18 +1904,29 @@ static void rt_bind_exception(struct rtable *rt, struct fib_nh *nh, __be32 daddr
>  
>  	for (fnhe = rcu_dereference(hash[hval].chain); fnhe;
>  	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
> -		if (fnhe->fnhe_daddr == daddr) {
> -			if (fnhe->fnhe_pmtu) {
> -				unsigned long expires = fnhe->fnhe_expires;
> -				unsigned long diff = jiffies - expires;
> +		unsigned int seq;
> +		__be32 fnhe_daddr, gw;
> +		u32 pmtu;
> +		unsigned long expires;
> +
> +		do {
> +			seq = read_seqbegin(&fnhe_seqlock);
> +			fnhe_daddr = fnhe->fnhe_daddr;
> +			gw = fnhe->fnhe_gw;
> +			pmtu = fnhe->fnhe_pmtu;
> +			expires = fnhe->fnhe_expires;
> +		} while (read_seqretry(&fnhe_seqlock, seq));

	This is going to read all values in the chain
before reaching daddr? Or may be FNHE_RECLAIM_DEPTH is
small and nobody will increase it. May be I can create
some func that searches daddr in chain instead. Do you still
prefer to remove the first daddr check or it is only
that the code is intended too much?

> +		if (fnhe_daddr == daddr) {

	Also, do we need some rcu locking in
__ip_rt_update_pmtu or may be ipv4_update_pmtu is
called always under rcu lock?

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: net-next and IPv6
From: Eric Dumazet @ 2012-07-18  7:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1342595099.2626.1774.camel@edumazet-glaptop>

On Wed, 2012-07-18 at 09:05 +0200, Eric Dumazet wrote:
> On Wed, 2012-07-18 at 06:58 +0200, Eric Dumazet wrote:
> > IPv6 doesnt work anymore for me, at least TCP doesnt work
> > 
> > ssh ::1
> > 
> > ping6 is ok
> > 
> > tcpdump shows garbled source IP and ip-proto
> > 
> > 
> 
> Probable bug coming from
> 
> commit 35ad9b9cf7d8a2e6259a0d24022e910adb6f3489
> Author: David S. Miller <davem@davemloft.net>
> Date:   Mon Jul 16 03:44:56 2012 -0700
> 
>     ipv6: Add helper inet6_csk_update_pmtu().
>     
>     This is the ipv6 version of inet_csk_update_pmtu().
>     
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

OK, I am testing a fix

^ permalink raw reply

* Re: [PATCH 0/5] Long term PMTU/redirect storage in ipv4.
From: Eric Dumazet @ 2012-07-18  7:30 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev
In-Reply-To: <alpine.LFD.2.00.1207181014050.1652@ja.ssi.bg>

On Wed, 2012-07-18 at 10:28 +0300, Julian Anastasov wrote:

> 	This is going to read all values in the chain
> before reaching daddr? Or may be FNHE_RECLAIM_DEPTH is
> small and nobody will increase it. May be I can create
> some func that searches daddr in chain instead. Do you still
> prefer to remove the first daddr check or it is only
> that the code is intended too much?
> 

I would not bother, since real cost is the initial cache line miss.
Once you read one field, reading others is really fast.

> > +		if (fnhe_daddr == daddr) {
> 
> 	Also, do we need some rcu locking in
> __ip_rt_update_pmtu or may be ipv4_update_pmtu is
> called always under rcu lock?

Sorry, I dont understand, we use the full lock
write_seqlock_bh(&fnhe_seqlock);/write_sequnlock_bh(&fnhe_seqlock);

^ permalink raw reply

* [PATCH net-next] ipv6: fix inet6_csk_xmit()
From: Eric Dumazet @ 2012-07-18  7:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <1342596218.2626.1813.camel@edumazet-glaptop>

From: Eric Dumazet <edumazet@google.com>

We should provide to inet6_csk_route_socket a struct flowi6 pointer,
so that net6_csk_xmit() works correctly instead of sending garbage.

Also add some consts 

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
 include/linux/ipv6.h             |    4 +-
 include/net/ip6_route.h          |    3 +-
 net/ipv6/inet6_connection_sock.c |   40 +++++++++++++++--------------
 3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index bc6c8fd..379e433 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -299,9 +299,9 @@ struct ipv6_pinfo {
 	struct in6_addr 	rcv_saddr;
 	struct in6_addr		daddr;
 	struct in6_pktinfo	sticky_pktinfo;
-	struct in6_addr		*daddr_cache;
+	const struct in6_addr		*daddr_cache;
 #ifdef CONFIG_IPV6_SUBTREES
-	struct in6_addr		*saddr_cache;
+	const struct in6_addr		*saddr_cache;
 #endif
 
 	__be32			flow_label;
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index b6b6f7d..5fa2af0 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -158,7 +158,8 @@ extern void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
  *	Store a destination cache entry in a socket
  */
 static inline void __ip6_dst_store(struct sock *sk, struct dst_entry *dst,
-				   struct in6_addr *daddr, struct in6_addr *saddr)
+				   const struct in6_addr *daddr,
+				   const struct in6_addr *saddr)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct rt6_info *rt = (struct rt6_info *) dst;
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 4a0c4d2..0251a60 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -171,7 +171,8 @@ EXPORT_SYMBOL_GPL(inet6_csk_addr2sockaddr);
 
 static inline
 void __inet6_csk_dst_store(struct sock *sk, struct dst_entry *dst,
-			   struct in6_addr *daddr, struct in6_addr *saddr)
+			   const struct in6_addr *daddr,
+			   const struct in6_addr *saddr)
 {
 	__ip6_dst_store(sk, dst, daddr, saddr);
 
@@ -203,31 +204,31 @@ struct dst_entry *__inet6_csk_dst_check(struct sock *sk, u32 cookie)
 	return dst;
 }
 
-static struct dst_entry *inet6_csk_route_socket(struct sock *sk)
+static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
+						struct flowi6 *fl6)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct in6_addr *final_p, final;
 	struct dst_entry *dst;
-	struct flowi6 fl6;
 
-	memset(&fl6, 0, sizeof(fl6));
-	fl6.flowi6_proto = sk->sk_protocol;
-	fl6.daddr = np->daddr;
-	fl6.saddr = np->saddr;
-	fl6.flowlabel = np->flow_label;
-	IP6_ECN_flow_xmit(sk, fl6.flowlabel);
-	fl6.flowi6_oif = sk->sk_bound_dev_if;
-	fl6.flowi6_mark = sk->sk_mark;
-	fl6.fl6_sport = inet->inet_sport;
-	fl6.fl6_dport = inet->inet_dport;
-	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+	memset(fl6, 0, sizeof(*fl6));
+	fl6->flowi6_proto = sk->sk_protocol;
+	fl6->daddr = np->daddr;
+	fl6->saddr = np->saddr;
+	fl6->flowlabel = np->flow_label;
+	IP6_ECN_flow_xmit(sk, fl6->flowlabel);
+	fl6->flowi6_oif = sk->sk_bound_dev_if;
+	fl6->flowi6_mark = sk->sk_mark;
+	fl6->fl6_sport = inet->inet_sport;
+	fl6->fl6_dport = inet->inet_dport;
+	security_sk_classify_flow(sk, flowi6_to_flowi(fl6));
 
-	final_p = fl6_update_dst(&fl6, np->opt, &final);
+	final_p = fl6_update_dst(fl6, np->opt, &final);
 
 	dst = __inet6_csk_dst_check(sk, np->dst_cookie);
 	if (!dst) {
-		dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
+		dst = ip6_dst_lookup_flow(sk, fl6, final_p, false);
 
 		if (!IS_ERR(dst))
 			__inet6_csk_dst_store(sk, dst, NULL, NULL);
@@ -243,7 +244,7 @@ int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl_unused)
 	struct dst_entry *dst;
 	int res;
 
-	dst = inet6_csk_route_socket(sk);
+	dst = inet6_csk_route_socket(sk, &fl6);
 	if (IS_ERR(dst)) {
 		sk->sk_err_soft = -PTR_ERR(dst);
 		sk->sk_route_caps = 0;
@@ -265,12 +266,13 @@ EXPORT_SYMBOL_GPL(inet6_csk_xmit);
 
 struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu)
 {
-	struct dst_entry *dst = inet6_csk_route_socket(sk);
+	struct flowi6 fl6;
+	struct dst_entry *dst = inet6_csk_route_socket(sk, &fl6);
 
 	if (IS_ERR(dst))
 		return NULL;
 	dst->ops->update_pmtu(dst, sk, NULL, mtu);
 
-	return inet6_csk_route_socket(sk);
+	return inet6_csk_route_socket(sk, &fl6);
 }
 EXPORT_SYMBOL_GPL(inet6_csk_update_pmtu);

^ permalink raw reply related

* Re: [RFC PATCH] net: cgroup: null ptr dereference in netprio cgroup during init
From: Gao feng @ 2012-07-18  7:58 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, nhorman, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <50064E95.7020503@intel.com>

于 2012年07月18日 13:50, John Fastabend 写道:
> On 7/17/2012 6:59 PM, Gao feng wrote:
>> 于 2012年07月18日 08:33, John Fastabend 写道:
>>> When the netprio cgroup is built in the kernel cgroup_init will call
>>> cgrp_create which eventually calls update_netdev_tables. This is
>>> being called before do_initcalls() so a null ptr dereference occurs
>>> on init_net.
>>>
> 
> [...]
> 
>>
>>
>> Thanks John.
>> It's my mistake.
>>
>> Can we make sure init_net.count is zero here?
>> I can't find some places to initialize it to zero.
>>
> 
> Its defined in net_namespace.c so it's zeroed by virtue
> of being global. And initialized in setup_net via
> pure_initcall() always after cgroup_init() if I've done
> my accounting correctly.

This looks fine to me.
Thanks John.

^ permalink raw reply

* Re: [PATCH 0/5] Long term PMTU/redirect storage in ipv4.
From: Julian Anastasov @ 2012-07-18  8:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1342596648.2626.1831.camel@edumazet-glaptop>


	Hello,

On Wed, 18 Jul 2012, Eric Dumazet wrote:

> On Wed, 2012-07-18 at 10:28 +0300, Julian Anastasov wrote:
> 
> > 	This is going to read all values in the chain
> > before reaching daddr? Or may be FNHE_RECLAIM_DEPTH is
> > small and nobody will increase it. May be I can create
> > some func that searches daddr in chain instead. Do you still
> > prefer to remove the first daddr check or it is only
> > that the code is intended too much?
> > 
> 
> I would not bother, since real cost is the initial cache line miss.
> Once you read one field, reading others is really fast.

	Is the cost of read_seqbegin a problem? Here is a
2nd version, I still keep this first check for now.

> > > +		if (fnhe_daddr == daddr) {
> > 
> > 	Also, do we need some rcu locking in
> > __ip_rt_update_pmtu or may be ipv4_update_pmtu is
> > called always under rcu lock?
> 
> Sorry, I dont understand, we use the full lock
> write_seqlock_bh(&fnhe_seqlock);/write_sequnlock_bh(&fnhe_seqlock);

	No, it is not related to the fnhe locking, I mean:

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c7a40c3..c911caf 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1686,12 +1686,14 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 	if (mtu < ip_rt_min_pmtu)
 		mtu = ip_rt_min_pmtu;
 
+	rcu_read_lock();
 	if (fib_lookup(dev_net(rt->dst.dev), fl4, &res) == 0) {
 		struct fib_nh *nh = &FIB_RES_NH(res);
 
 		update_or_create_fnhe(nh, fl4->daddr, 0, mtu,
 				      jiffies + ip_rt_mtu_expires);
 	}
+	rcu_read_unlock();
 	rt->rt_pmtu = mtu;
 	dst_set_expires(&rt->dst, ip_rt_mtu_expires);
 }


	How about the following, is the first daddr check
still a problem?

Subject: [PATCH v2] ipv4: use seqlock for nh_exceptions

From: Julian Anastasov <ja@ssi.bg>

	Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/ip_fib.h |    2 +-
 net/ipv4/route.c     |  119 +++++++++++++++++++++++++++++---------------------
 2 files changed, 70 insertions(+), 51 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index e9ee1ca..2daf096 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -51,7 +51,7 @@ struct fib_nh_exception {
 	struct fib_nh_exception __rcu	*fnhe_next;
 	__be32				fnhe_daddr;
 	u32				fnhe_pmtu;
-	u32				fnhe_gw;
+	__be32				fnhe_gw;
 	unsigned long			fnhe_expires;
 	unsigned long			fnhe_stamp;
 };
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f67e702..1485db1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1333,9 +1333,9 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk,
 		build_sk_flow_key(fl4, sk);
 }
 
-static DEFINE_SPINLOCK(fnhe_lock);
+static DEFINE_SEQLOCK(fnhe_seqlock);
 
-static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash, __be32 daddr)
+static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash)
 {
 	struct fib_nh_exception *fnhe, *oldest;
 
@@ -1358,47 +1358,63 @@ static inline u32 fnhe_hashfun(__be32 daddr)
 	return hval & (FNHE_HASH_SIZE - 1);
 }
 
-static struct fib_nh_exception *find_or_create_fnhe(struct fib_nh *nh, __be32 daddr)
+static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
+				  u32 pmtu, unsigned long expires)
 {
-	struct fnhe_hash_bucket *hash = nh->nh_exceptions;
+	struct fnhe_hash_bucket *hash;
 	struct fib_nh_exception *fnhe;
 	int depth;
-	u32 hval;
+	u32 hval = fnhe_hashfun(daddr);
+
+	write_seqlock_bh(&fnhe_seqlock);
 
+	hash = nh->nh_exceptions;
 	if (!hash) {
-		hash = nh->nh_exceptions = kzalloc(FNHE_HASH_SIZE * sizeof(*hash),
-						   GFP_ATOMIC);
+		hash = kzalloc(FNHE_HASH_SIZE * sizeof(*hash), GFP_ATOMIC);
 		if (!hash)
-			return NULL;
+			goto out_unlock;
+		nh->nh_exceptions = hash;
 	}
 
-	hval = fnhe_hashfun(daddr);
 	hash += hval;
 
 	depth = 0;
 	for (fnhe = rcu_dereference(hash->chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
 		if (fnhe->fnhe_daddr == daddr)
-			goto out;
+			break;
 		depth++;
 	}
 
-	if (depth > FNHE_RECLAIM_DEPTH) {
-		fnhe = fnhe_oldest(hash + hval, daddr);
-		goto out_daddr;
+	if (fnhe) {
+		if (gw)
+			fnhe->fnhe_gw = gw;
+		if (pmtu) {
+			fnhe->fnhe_pmtu = pmtu;
+			fnhe->fnhe_expires = expires;
+		}
+	} else {
+		if (depth > FNHE_RECLAIM_DEPTH)
+			fnhe = fnhe_oldest(hash);
+		else {
+			fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC);
+			if (!fnhe)
+				goto out_unlock;
+
+			fnhe->fnhe_next = hash->chain;
+			rcu_assign_pointer(hash->chain, fnhe);
+		}
+		fnhe->fnhe_daddr = daddr;
+		fnhe->fnhe_gw = gw;
+		fnhe->fnhe_pmtu = pmtu;
+		fnhe->fnhe_expires = expires;
 	}
-	fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC);
-	if (!fnhe)
-		return NULL;
-
-	fnhe->fnhe_next = hash->chain;
-	rcu_assign_pointer(hash->chain, fnhe);
 
-out_daddr:
-	fnhe->fnhe_daddr = daddr;
-out:
 	fnhe->fnhe_stamp = jiffies;
-	return fnhe;
+
+out_unlock:
+	write_sequnlock_bh(&fnhe_seqlock);
+	return;
 }
 
 static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4)
@@ -1452,13 +1468,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 		} else {
 			if (fib_lookup(net, fl4, &res) == 0) {
 				struct fib_nh *nh = &FIB_RES_NH(res);
-				struct fib_nh_exception *fnhe;
 
-				spin_lock_bh(&fnhe_lock);
-				fnhe = find_or_create_fnhe(nh, fl4->daddr);
-				if (fnhe)
-					fnhe->fnhe_gw = new_gw;
-				spin_unlock_bh(&fnhe_lock);
+				update_or_create_fnhe(nh, fl4->daddr, new_gw,
+						      0, 0);
 			}
 			rt->rt_gateway = new_gw;
 			rt->rt_flags |= RTCF_REDIRECTED;
@@ -1663,15 +1675,9 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 
 	if (fib_lookup(dev_net(rt->dst.dev), fl4, &res) == 0) {
 		struct fib_nh *nh = &FIB_RES_NH(res);
-		struct fib_nh_exception *fnhe;
 
-		spin_lock_bh(&fnhe_lock);
-		fnhe = find_or_create_fnhe(nh, fl4->daddr);
-		if (fnhe) {
-			fnhe->fnhe_pmtu = mtu;
-			fnhe->fnhe_expires = jiffies + ip_rt_mtu_expires;
-		}
-		spin_unlock_bh(&fnhe_lock);
+		update_or_create_fnhe(nh, fl4->daddr, 0, mtu,
+				      jiffies + ip_rt_mtu_expires);
 	}
 	rt->rt_pmtu = mtu;
 	dst_set_expires(&rt->dst, ip_rt_mtu_expires);
@@ -1904,21 +1910,34 @@ static void rt_bind_exception(struct rtable *rt, struct fib_nh *nh, __be32 daddr
 
 	for (fnhe = rcu_dereference(hash[hval].chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
-		if (fnhe->fnhe_daddr == daddr) {
-			if (fnhe->fnhe_pmtu) {
-				unsigned long expires = fnhe->fnhe_expires;
-				unsigned long diff = jiffies - expires;
-
-				if (time_before(jiffies, expires)) {
-					rt->rt_pmtu = fnhe->fnhe_pmtu;
-					dst_set_expires(&rt->dst, diff);
-				}
+		__be32 fnhe_daddr, gw;
+		u32 pmtu;
+		unsigned long expires;
+		unsigned int seq;
+
+		if (fnhe->fnhe_daddr != daddr)
+			continue;
+		do {
+			seq = read_seqbegin(&fnhe_seqlock);
+			fnhe_daddr = fnhe->fnhe_daddr;
+			gw = fnhe->fnhe_gw;
+			pmtu = fnhe->fnhe_pmtu;
+			expires = fnhe->fnhe_expires;
+		} while (read_seqretry(&fnhe_seqlock, seq));
+		if (daddr != fnhe_daddr)
+			continue;
+		if (pmtu) {
+			unsigned long diff = expires - jiffies;
+
+			if (time_before(jiffies, expires)) {
+				rt->rt_pmtu = pmtu;
+				dst_set_expires(&rt->dst, diff);
 			}
-			if (fnhe->fnhe_gw)
-				rt->rt_gateway = fnhe->fnhe_gw;
-			fnhe->fnhe_stamp = jiffies;
-			break;
 		}
+		if (gw)
+			rt->rt_gateway = gw;
+		fnhe->fnhe_stamp = jiffies;
+		break;
 	}
 }
 
-- 
1.7.3.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox