Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC] bridge: add netfilter hook for forwarding 802.1D group addresses
From: Stephen Hemminger @ 2011-08-19 22:24 UTC (permalink / raw)
  To: Christian Benvenuti (benve)
  Cc: David Lamparter, Nick Carter, Ed Swierk, netdev, bridge,
	netfilter-devel
In-Reply-To: <184D23435BECB444AB6B9D4630C8EC830258A300@XMB-RCD-303.cisco.com>

On Fri, 19 Aug 2011 17:18:04 -0500
"Christian Benvenuti (benve)" <benve@cisco.com> wrote:

> The patch description and the code are clearly saying that STP is
> an exception, but I am just worried about the users.
> Maybe a proper description in the iptables help is sufficient.
> 
> Users may otherwise try to use this new hook for STP too
> (for example to generate logs or produce statistics/counters
> or divert STP traffic to userspace, etc).

STP traffic already goes to userspace. And gets processed
by the LOCAL_IN chain. So I don't think it is needed.


> Out of curiosity, ... if this gets accepted, shouldn't you provide
> NF_BR_LINK_LOCAL_OUT too?
> Or maybe you should call it NF_BR_LINK_LOCAL_FWD instead of
> NF_BR_LINK_LOCAL_IN?

Thanks, that is a better name, I'll change it in next version.

^ permalink raw reply

* RE: [RFC] bridge: add netfilter hook for forwarding 802.1D group addresses
From: Christian Benvenuti (benve) @ 2011-08-19 22:18 UTC (permalink / raw)
  To: Stephen Hemminger, David Lamparter
  Cc: Nick Carter, Ed Swierk, netdev, bridge, netfilter-devel
In-Reply-To: <20110819135810.1a529ab2@nehalam.ftrdhcpuser.net>

The patch description and the code are clearly saying that STP is
an exception, but I am just worried about the users.
Maybe a proper description in the iptables help is sufficient.

Users may otherwise try to use this new hook for STP too
(for example to generate logs or produce statistics/counters
or divert STP traffic to userspace, etc).

Out of curiosity, ... if this gets accepted, shouldn't you provide
NF_BR_LINK_LOCAL_OUT too?
Or maybe you should call it NF_BR_LINK_LOCAL_FWD instead of
NF_BR_LINK_LOCAL_IN?

/Chris

> -----Original Message-----
> From: netfilter-devel-owner@vger.kernel.org [mailto:netfilter-devel-
> owner@vger.kernel.org] On Behalf Of Stephen Hemminger
> Sent: Friday, August 19, 2011 1:58 PM
> To: David Lamparter
> Cc: Nick Carter; Ed Swierk; netdev@vger.kernel.org; bridge@linux-
> foundation.org; netfilter-devel@vger.kernel.org
> Subject: [RFC] bridge: add netfilter hook for forwarding 802.1D group
> addresses
> 
> The IEEE standard expects that link local multicast packets will not
> be forwarded by a bridge. But there are cases like 802.1X which may
> require that packets be forwarded. For maximum flexibilty implement
> this via netfilter.
> 
> The netfilter chain is slightly different from other chains in that
> if packet is ACCEPTED by the chain, it means it should be forwarded.
> And if the packet verdict result is DROP, the packet is processed
> as a local packet. The default result for this chain is DROP and
> therefore users who do not install any rules will get the same
> result as before; ie. packets are only processed on the local host
> and not forwarded.
> 
> Spanning Tree Packets are treated specially and do not
> go through the new chain.
> 
> This code is conceptual design concept only. It compiles but
> hasn't been tested.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
>  include/linux/netfilter_bridge.h      |    5 ++++-
>  net/bridge/br_input.c                 |   15 ++++++++++++---
>  net/bridge/netfilter/ebtable_filter.c |   18 ++++++++++++++++--
>  3 files changed, 32 insertions(+), 6 deletions(-)
> 
> --- a/include/linux/netfilter_bridge.h	2011-08-19
13:11:51.972125670
> -0700
> +++ b/include/linux/netfilter_bridge.h	2011-08-19
13:13:36.452130443
> -0700
> @@ -22,7 +22,10 @@
>  #define NF_BR_POST_ROUTING	4
>  /* Not really a hook, but used for the ebtables broute table */
>  #define NF_BR_BROUTING		5
> -#define NF_BR_NUMHOOKS		6
> +/* Packets to link local multicast addresses (01-80-C2-00-00-XX) */
> +#define NF_BR_LINK_LOCAL_IN	6
> +
> +#define NF_BR_NUMHOOKS		7
> 
>  #ifdef __KERNEL__
> 
> --- a/net/bridge/br_input.c	2011-08-18 16:12:02.576672548 -0700
> +++ b/net/bridge/br_input.c	2011-08-19 13:28:13.696170518 -0700
> @@ -166,10 +166,19 @@ rx_handler_result_t br_handle_frame(stru
>  		if (skb->protocol == htons(ETH_P_PAUSE))
>  			goto drop;
> 
> -		/* If STP is turned off, then forward */
> -		if (p->br->stp_enabled == BR_NO_STP && dest[5] == 0)
> -			goto forward;
> +		/* If this is Spanning Tree Protocol packet */
> +		if (dest[5] == 0) {
> +			/* and STP is turned off, then forward */
> +			if (p->br->stp_enabled == BR_NO_STP)
> +				goto forward;
> +		}
> +		/* Hook to allow forwarding other group MAC addresses */
> +		else if (p->state == BR_STATE_FORWARDING &&
> +			 NF_HOOK(NFPROTO_BRIDGE, NF_BR_LINK_LOCAL_IN,
skb,
> skb->dev,
> +				 NULL, br_handle_frame_finish))
> +			return RX_HANDLER_CONSUMED;	/* forwarded */
> 
> +		/* Packet will go only to the local host. */
>  		if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, skb,
skb->dev,
>  			    NULL, br_handle_local_finish)) {
>  			return RX_HANDLER_CONSUMED; /* consumed by
filter */
> --- a/net/bridge/netfilter/ebtable_filter.c	2011-08-19
> 13:14:46.232133631 -0700
> +++ b/net/bridge/netfilter/ebtable_filter.c	2011-08-19
> 13:27:33.436168679 -0700
> @@ -11,8 +11,10 @@
>  #include <linux/netfilter_bridge/ebtables.h>
>  #include <linux/module.h>
> 
> -#define FILTER_VALID_HOOKS ((1 << NF_BR_LOCAL_IN) | (1 <<
> NF_BR_FORWARD) | \
> -   (1 << NF_BR_LOCAL_OUT))
> +#define FILTER_VALID_HOOKS ((1 << NF_BR_LOCAL_IN) | \
> +			    (1 << NF_BR_FORWARD) | \
> +			    (1 << NF_BR_LOCAL_OUT) | \
> +			    (1 << NF_BR_LINK_LOCAL_IN))
> 
>  static struct ebt_entries initial_chains[] =
>  {
> @@ -28,6 +30,10 @@ static struct ebt_entries initial_chains
>  		.name	= "OUTPUT",
>  		.policy	= EBT_ACCEPT,
>  	},
> +	{
> +		.name	= "LINKLOCAL",
> +		.policy = EBT_DROP,
> +	},
>  };
> 
>  static struct ebt_replace_kernel initial_table =
> @@ -39,6 +45,7 @@ static struct ebt_replace_kernel initial
>  		[NF_BR_LOCAL_IN]	= &initial_chains[0],
>  		[NF_BR_FORWARD]		= &initial_chains[1],
>  		[NF_BR_LOCAL_OUT]	= &initial_chains[2],
> +		[NF_BR_LINK_LOCAL_IN]	= &initial_chains[3],
>  	},
>  	.entries	= (char *)initial_chains,
>  };
> @@ -95,6 +102,13 @@ static struct nf_hook_ops ebt_ops_filter
>  		.hooknum	= NF_BR_LOCAL_OUT,
>  		.priority	= NF_BR_PRI_FILTER_OTHER,
>  	},
> +	{
> +		.hook		= ebt_in_hook,
> +		.owner		= THIS_MODULE,
> +		.pf		= NFPROTO_BRIDGE,
> +		.hooknum	= NF_BR_LINK_LOCAL_IN,
> +		.priority	= NF_BR_PRI_FILTER_BRIDGED,
> +	},
>  };
> 
>  static int __net_init frame_filter_net_init(struct net *net)
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-
> devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] atm: br2684: Fix oops due to skb->dev being NULL
From: Daniel Schwierzeck @ 2011-08-19 22:04 UTC (permalink / raw)
  To: netdev; +Cc: stable, David S . Miller

This oops have been already fixed with commit

    27141666b69f535a4d63d7bc6d9e84ee5032f82a

    atm: [br2684] Fix oops due to skb->dev being NULL

    It happens that if a packet arrives in a VC between the call to open it on
    the hardware and the call to change the backend to br2684, br2684_regvcc
    processes the packet and oopses dereferencing skb->dev because it is
    NULL before the call to br2684_push().

but have been introduced again with commit

    b6211ae7f2e56837c6a4849316396d1535606e90

    atm: Use SKB queue and list helpers instead of doing it by-hand.

Signed-off-by: Daniel Schwierzeck <daniel.schwierzeck@googlemail.com>
---
 net/atm/br2684.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 52cfd0c..d07223c 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -558,12 +558,13 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 	spin_unlock_irqrestore(&rq->lock, flags);
 
 	skb_queue_walk_safe(&queue, skb, tmp) {
-		struct net_device *dev = skb->dev;
+		struct net_device *dev;
+
+		br2684_push(atmvcc, skb);
+		dev = skb->dev;
 
 		dev->stats.rx_bytes -= skb->len;
 		dev->stats.rx_packets--;
-
-		br2684_push(atmvcc, skb);
 	}
 
 	/* initialize netdev carrier state */
-- 
1.7.6


^ permalink raw reply related

* [PATCH 13/13] bna: Driver Version changed to 3.0.2.1
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 41c984c..7bdde74 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -71,7 +71,7 @@ struct bnad_rx_ctrl {
 #define BNAD_NAME			"bna"
 #define BNAD_NAME_LEN			64
 
-#define BNAD_VERSION			"3.0.2.0"
+#define BNAD_VERSION			"3.0.2.1"
 
 #define BNAD_MAILBOX_MSIX_INDEX		0
 #define BNAD_MAILBOX_MSIX_VECTORS	1
-- 
1.7.1


^ permalink raw reply related

* [PATCH 12/13] bna: SKB PCI UNMAP Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Found a leak in sk_buff unmapping of PCI dma addresses where boundary
   conditions are not properly handled in freeing all Tx buffers. Freeing
   of all Tx buffers is done considering sk_buffs data and fragments can
   be mapped at the boundary.

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |   39 ++++++++-----------------------
 1 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 3f597f9..74425f5 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -145,39 +145,20 @@ bnad_free_all_txbufs(struct bnad *bnad,
 	struct bnad_unmap_q *unmap_q = tcb->unmap_q;
 	struct bnad_skb_unmap *unmap_array;
 	struct sk_buff		*skb = NULL;
-	int			i;
+	int			q;
 
 	unmap_array = unmap_q->unmap_array;
 
-	unmap_cons = 0;
-	while (unmap_cons < unmap_q->q_depth) {
-		skb = unmap_array[unmap_cons].skb;
-		if (!skb) {
-			unmap_cons++;
+	for (q = 0; q < unmap_q->q_depth; q++) {
+		skb = unmap_array[q].skb;
+		if (!skb)
 			continue;
-		}
-		unmap_array[unmap_cons].skb = NULL;
-
-		dma_unmap_single(&bnad->pcidev->dev,
-				 dma_unmap_addr(&unmap_array[unmap_cons],
-						dma_addr), skb_headlen(skb),
-						DMA_TO_DEVICE);
 
-		dma_unmap_addr_set(&unmap_array[unmap_cons], dma_addr, 0);
-		if (++unmap_cons >= unmap_q->q_depth)
-			break;
+		unmap_cons = q;
+		BNAD_PCI_UNMAP_SKB(&bnad->pcidev->dev, unmap_array, unmap_cons,
+				   unmap_q->q_depth, skb,
+				   skb_shinfo(skb)->nr_frags);
 
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-			dma_unmap_page(&bnad->pcidev->dev,
-				       dma_unmap_addr(&unmap_array[unmap_cons],
-						      dma_addr),
-				       skb_shinfo(skb)->frags[i].size,
-				       DMA_TO_DEVICE);
-			dma_unmap_addr_set(&unmap_array[unmap_cons], dma_addr,
-					   0);
-			if (++unmap_cons >= unmap_q->q_depth)
-				break;
-		}
 		dev_kfree_skb_any(skb);
 	}
 }
-- 
1.7.1


^ permalink raw reply related

* [PATCH 09/13] bna: Async Mode Tx Rx Init Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Async mode of Tx/Rx queue initialization in BNAD from a task queue context
   runs into non-unique taskq allocation issues. Get rid of Tx/Rx
   initialization from task q context
 - In the attach function, wait for IOC enable, then do Tx/Rx queue
   initialization. Default BNA attributes are used when IOC enable from attach
   fails and values are set to:
   1 TxQ, 1 RxQ, 1 Unicast MAC, 1 RIT entry

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bna_enet.c    |   29 ++++++++++++++++++-----
 drivers/net/ethernet/brocade/bna/bna_hw_defs.h |    4 +++
 drivers/net/ethernet/brocade/bna/bna_types.h   |    1 +
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bna_enet.c b/drivers/net/ethernet/brocade/bna/bna_enet.c
index 68a275d..26f5c5a 100644
--- a/drivers/net/ethernet/brocade/bna/bna_enet.c
+++ b/drivers/net/ethernet/brocade/bna/bna_enet.c
@@ -167,13 +167,14 @@ bna_bfi_attr_get_rsp(struct bna_ioceth *ioceth,
 	 * Store only if not set earlier, since BNAD can override the HW
 	 * attributes
 	 */
-	if (!ioceth->attr.num_txq)
+	if (!ioceth->attr.fw_query_complete) {
 		ioceth->attr.num_txq = ntohl(rsp->max_cfg);
-	if (!ioceth->attr.num_rxp)
 		ioceth->attr.num_rxp = ntohl(rsp->max_cfg);
-	ioceth->attr.num_ucmac = ntohl(rsp->max_ucmac);
-	ioceth->attr.num_mcmac = BFI_ENET_MAX_MCAM;
-	ioceth->attr.max_rit_size = ntohl(rsp->rit_size);
+		ioceth->attr.num_ucmac = ntohl(rsp->max_ucmac);
+		ioceth->attr.num_mcmac = BFI_ENET_MAX_MCAM;
+		ioceth->attr.max_rit_size = ntohl(rsp->rit_size);
+		ioceth->attr.fw_query_complete = true;
+	}
 
 	bfa_fsm_send_event(ioceth, IOCETH_E_ENET_ATTR_RESP);
 }
@@ -1693,6 +1694,16 @@ static struct bfa_ioc_cbfn bna_ioceth_cbfn = {
 	bna_cb_ioceth_reset
 };
 
+static void bna_attr_init(struct bna_ioceth *ioceth)
+{
+	ioceth->attr.num_txq = BFI_ENET_DEF_TXQ;
+	ioceth->attr.num_rxp = BFI_ENET_DEF_RXP;
+	ioceth->attr.num_ucmac = BFI_ENET_DEF_UCAM;
+	ioceth->attr.num_mcmac = BFI_ENET_MAX_MCAM;
+	ioceth->attr.max_rit_size = BFI_ENET_DEF_RITSZ;
+	ioceth->attr.fw_query_complete = false;
+}
+
 static void
 bna_ioceth_init(struct bna_ioceth *ioceth, struct bna *bna,
 		struct bna_res_info *res_info)
@@ -1738,6 +1749,8 @@ bna_ioceth_init(struct bna_ioceth *ioceth, struct bna *bna,
 	ioceth->stop_cbfn = NULL;
 	ioceth->stop_cbarg = NULL;
 
+	bna_attr_init(ioceth);
+
 	bfa_fsm_set_state(ioceth, bna_ioceth_sm_stopped);
 }
 
@@ -2036,7 +2049,8 @@ bna_uninit(struct bna *bna)
 int
 bna_num_txq_set(struct bna *bna, int num_txq)
 {
-	if (num_txq > 0 && (num_txq <= bna->ioceth.attr.num_txq)) {
+	if (bna->ioceth.attr.fw_query_complete &&
+		(num_txq <= bna->ioceth.attr.num_txq)) {
 		bna->ioceth.attr.num_txq = num_txq;
 		return BNA_CB_SUCCESS;
 	}
@@ -2047,7 +2061,8 @@ bna_num_txq_set(struct bna *bna, int num_txq)
 int
 bna_num_rxp_set(struct bna *bna, int num_rxp)
 {
-	if (num_rxp > 0 && (num_rxp <= bna->ioceth.attr.num_rxp)) {
+	if (bna->ioceth.attr.fw_query_complete &&
+		(num_rxp <= bna->ioceth.attr.num_rxp)) {
 		bna->ioceth.attr.num_rxp = num_rxp;
 		return BNA_CB_SUCCESS;
 	}
diff --git a/drivers/net/ethernet/brocade/bna/bna_hw_defs.h b/drivers/net/ethernet/brocade/bna/bna_hw_defs.h
index 7ecdca5..dde8a46 100644
--- a/drivers/net/ethernet/brocade/bna/bna_hw_defs.h
+++ b/drivers/net/ethernet/brocade/bna/bna_hw_defs.h
@@ -30,6 +30,10 @@
  * SW imposed limits
  *
  */
+#define BFI_ENET_DEF_TXQ		1
+#define BFI_ENET_DEF_RXP		1
+#define BFI_ENET_DEF_UCAM		1
+#define BFI_ENET_DEF_RITSZ		1
 
 #define BFI_ENET_MAX_MCAM		256
 
diff --git a/drivers/net/ethernet/brocade/bna/bna_types.h b/drivers/net/ethernet/brocade/bna/bna_types.h
index 59417b1..242d799 100644
--- a/drivers/net/ethernet/brocade/bna/bna_types.h
+++ b/drivers/net/ethernet/brocade/bna/bna_types.h
@@ -323,6 +323,7 @@ struct bna_qpt {
 };
 
 struct bna_attr {
+	bool			fw_query_complete;
 	int			num_txq;
 	int			num_rxp;
 	int			num_ucmac;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 11/13] bna: Queue Depth and SKB Unmap Array Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - sk_buff unmap_array grows greater than 65536 (x2) with Tx ring of 65536.
   The index used for accessing it is incorrectly declared as u16. It quickly
   wraps around and accesses null sk_buff ptr. So using u32 to handle
   unmap_array.
 - Reducing TXQ depth and safe(max) acking of Tx events to 32768 (same as Rx)

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c         |    4 ++--
 drivers/net/ethernet/brocade/bna/bnad.h         |    4 ++++
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c |    8 ++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 28864f6..3f597f9 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -194,8 +194,8 @@ static u32
 bnad_free_txbufs(struct bnad *bnad,
 		 struct bna_tcb *tcb)
 {
-	u32		sent_packets = 0, sent_bytes = 0;
-	u16		wis, unmap_cons, updated_hw_cons;
+	u32		unmap_cons, sent_packets = 0, sent_bytes = 0;
+	u16		wis, updated_hw_cons;
 	struct bnad_unmap_q *unmap_q = tcb->unmap_q;
 	struct bnad_skb_unmap *unmap_array;
 	struct sk_buff		*skb;
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index b31b893..41c984c 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -86,6 +86,10 @@ struct bnad_rx_ctrl {
 #define BNAD_MAX_Q_DEPTH		0x10000
 #define BNAD_MIN_Q_DEPTH		0x200
 
+#define BNAD_MAX_RXQ_DEPTH		(BNAD_MAX_Q_DEPTH / bnad_rxqs_per_cq)
+/* keeping MAX TX and RX Q depth equal */
+#define BNAD_MAX_TXQ_DEPTH		BNAD_MAX_RXQ_DEPTH
+
 #define BNAD_JUMBO_MTU			9000
 
 #define BNAD_NETIF_WAKE_THRESHOLD	8
diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 1199f01..e85fb2b 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -407,10 +407,10 @@ bnad_get_ringparam(struct net_device *netdev,
 {
 	struct bnad *bnad = netdev_priv(netdev);
 
-	ringparam->rx_max_pending = BNAD_MAX_Q_DEPTH / bnad_rxqs_per_cq;
+	ringparam->rx_max_pending = BNAD_MAX_RXQ_DEPTH;
 	ringparam->rx_mini_max_pending = 0;
 	ringparam->rx_jumbo_max_pending = 0;
-	ringparam->tx_max_pending = BNAD_MAX_Q_DEPTH;
+	ringparam->tx_max_pending = BNAD_MAX_TXQ_DEPTH;
 
 	ringparam->rx_pending = bnad->rxq_depth;
 	ringparam->rx_mini_max_pending = 0;
@@ -434,13 +434,13 @@ bnad_set_ringparam(struct net_device *netdev,
 	}
 
 	if (ringparam->rx_pending < BNAD_MIN_Q_DEPTH ||
-	    ringparam->rx_pending > BNAD_MAX_Q_DEPTH / bnad_rxqs_per_cq ||
+	    ringparam->rx_pending > BNAD_MAX_RXQ_DEPTH ||
 	    !BNA_POWER_OF_2(ringparam->rx_pending)) {
 		mutex_unlock(&bnad->conf_mutex);
 		return -EINVAL;
 	}
 	if (ringparam->tx_pending < BNAD_MIN_Q_DEPTH ||
-	    ringparam->tx_pending > BNAD_MAX_Q_DEPTH ||
+	    ringparam->tx_pending > BNAD_MAX_TXQ_DEPTH ||
 	    !BNA_POWER_OF_2(ringparam->tx_pending)) {
 		mutex_unlock(&bnad->conf_mutex);
 		return -EINVAL;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 10/13] bna: MBOX IRQ Flag Check after Locking
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Check the BNAD_RF_MBOX_IRQ_DISABLED flag after acquiring the bna_lock.

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index bfed285..28864f6 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -594,10 +594,11 @@ bnad_msix_mbox_handler(int irq, void *data)
 	unsigned long flags;
 	struct bnad *bnad = (struct bnad *)data;
 
-	if (unlikely(test_bit(BNAD_RF_MBOX_IRQ_DISABLED, &bnad->run_flags)))
-		return IRQ_HANDLED;
-
 	spin_lock_irqsave(&bnad->bna_lock, flags);
+	if (unlikely(test_bit(BNAD_RF_MBOX_IRQ_DISABLED, &bnad->run_flags))) {
+		spin_unlock_irqrestore(&bnad->bna_lock, flags);
+		return IRQ_HANDLED;
+	}
 
 	bna_intr_status_get(&bnad->bna, intr_status);
 
@@ -620,15 +621,18 @@ bnad_isr(int irq, void *data)
 	struct bnad_rx_ctrl *rx_ctrl;
 	struct bna_tcb *tcb = NULL;
 
-	if (unlikely(test_bit(BNAD_RF_MBOX_IRQ_DISABLED, &bnad->run_flags)))
+	spin_lock_irqsave(&bnad->bna_lock, flags);
+	if (unlikely(test_bit(BNAD_RF_MBOX_IRQ_DISABLED, &bnad->run_flags))) {
+		spin_unlock_irqrestore(&bnad->bna_lock, flags);
 		return IRQ_NONE;
+	}
 
 	bna_intr_status_get(&bnad->bna, intr_status);
 
-	if (unlikely(!intr_status))
+	if (unlikely(!intr_status)) {
+		spin_unlock_irqrestore(&bnad->bna_lock, flags);
 		return IRQ_NONE;
-
-	spin_lock_irqsave(&bnad->bna_lock, flags);
+	}
 
 	if (BNA_IS_MBOX_ERR_INTR(&bnad->bna, intr_status))
 		bna_mbox_handler(&bnad->bna, intr_status);
-- 
1.7.1


^ permalink raw reply related

* [PATCH 07/13] bna: Initialization and Locking Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Initialize rx_id to 0 for bnad_cleanup_rx
 - Return -ENOMEM in case if bna_rx_create fails
 - Count the Rx buffer allocation failures in bnad_alloc_n_post_rxbufs()
 - Remove unnecessary initialization of using_dac to false in bnad_pci_probe
 - Release lock if error while doing bna_num_txq_set in bnad_pci_probe
 - Release all the locks while doing free_netdev

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bna_hw_defs.h |    1 +
 drivers/net/ethernet/brocade/bna/bnad.c        |   15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bna_hw_defs.h b/drivers/net/ethernet/brocade/bna/bna_hw_defs.h
index 07bb792..7ecdca5 100644
--- a/drivers/net/ethernet/brocade/bna/bna_hw_defs.h
+++ b/drivers/net/ethernet/brocade/bna/bna_hw_defs.h
@@ -99,6 +99,7 @@
 	(_bna)->bits.error_status_bits = (__HFN_INT_ERR_MASK);		\
 	(_bna)->bits.error_mask_bits = (__HFN_INT_ERR_MASK);		\
 	(_bna)->bits.halt_status_bits = __HFN_INT_LL_HALT;		\
+	(_bna)->bits.halt_mask_bits = __HFN_INT_LL_HALT;		\
 }
 
 #define ct2_reg_addr_init(_bna, _pcidev)				\
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 6ee604e..7cbc88e 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -401,6 +401,7 @@ bnad_alloc_n_post_rxbufs(struct bnad *bnad, struct bna_rcb *rcb)
 						rcb->rxq->buffer_size);
 		if (unlikely(!skb)) {
 			BNAD_UPDATE_CTR(bnad, rxbuf_alloc_failed);
+			rcb->rxq->rxbuf_alloc_failed++;
 			goto finishing;
 		}
 		unmap_array[unmap_prod].skb = skb;
@@ -1892,6 +1893,7 @@ bnad_cleanup_rx(struct bnad *bnad, u32 rx_id)
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
 	rx_info->rx = NULL;
+	rx_info->rx_id = 0;
 
 	bnad_rx_res_free(bnad, res_info);
 }
@@ -1947,8 +1949,10 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
 	rx = bna_rx_create(&bnad->bna, bnad, rx_config, &rx_cbfn, res_info,
 			rx_info);
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
-	if (!rx)
+	if (!rx) {
+		err = -ENOMEM;
 		goto err_return;
+	}
 	rx_info->rx = rx;
 
 	/*
@@ -3167,7 +3171,7 @@ static int __devinit
 bnad_pci_probe(struct pci_dev *pdev,
 		const struct pci_device_id *pcidev_id)
 {
-	bool	using_dac = false;
+	bool	using_dac;
 	int	err;
 	struct bnad *bnad;
 	struct bna *bna;
@@ -3290,6 +3294,11 @@ bnad_pci_probe(struct pci_dev *pdev,
 			bna_num_rxp_set(bna, BNAD_NUM_RXP + 1))
 			err = -EIO;
 	}
+	spin_unlock_irqrestore(&bnad->bna_lock, flags);
+	if (err)
+		goto disable_ioceth;
+
+	spin_lock_irqsave(&bnad->bna_lock, flags);
 	bna_mod_res_req(&bnad->bna, &bnad->mod_res_info[0]);
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
@@ -3343,9 +3352,9 @@ drv_uninit:
 	bnad_uninit(bnad);
 pci_uninit:
 	bnad_pci_uninit(pdev);
+free_netdev:
 	mutex_unlock(&bnad->conf_mutex);
 	bnad_lock_uninit(bnad);
-free_netdev:
 	free_netdev(netdev);
 	return err;
 }
-- 
1.7.1


^ permalink raw reply related

* [PATCH 08/13] bna: Ethtool Enhancements and Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Use available bnad_dim_timer_stop macro in bnad_set_coalesce.
 - Add tx_skb counters and NAPI debug counters to ethtool stats.
 - Add rlb stats strings to bnad_net_stats_strings{} array. rlb_stats field
   was added to struct bfi_enet_stats {} but the corresponding name structure
   array for ethtool was not initialized with right strings, even though the
   actual name structure array got expanded. This caused a NULL pointer
   violation and a crash when doing ehtool -S <if_name>.
 - While setting the ring parameter restore the rx, vlan configuration and
   set rx mode
 - Indentation fix

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c         |    8 +-
 drivers/net/ethernet/brocade/bna/bnad.h         |   10 ++-
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c |   88 +++++++++++++++++++----
 3 files changed, 84 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 7cbc88e..bfed285 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -2027,7 +2027,7 @@ bnad_rx_coalescing_timeo_set(struct bnad *bnad)
 /*
  * Called with bnad->bna_lock held
  */
-static int
+int
 bnad_mac_addr_set_locked(struct bnad *bnad, u8 *mac_addr)
 {
 	int ret;
@@ -2047,7 +2047,7 @@ bnad_mac_addr_set_locked(struct bnad *bnad, u8 *mac_addr)
 }
 
 /* Should be called with conf_lock held */
-static int
+int
 bnad_enable_default_bcast(struct bnad *bnad)
 {
 	struct bnad_rx_info *rx_info = &bnad->rx_info[0];
@@ -2073,7 +2073,7 @@ bnad_enable_default_bcast(struct bnad *bnad)
 }
 
 /* Called with mutex_lock(&bnad->conf_mutex) held */
-static void
+void
 bnad_restore_vlans(struct bnad *bnad, u32 rx_id)
 {
 	u16 vid;
@@ -2787,7 +2787,7 @@ bnad_get_stats64(struct net_device *netdev, struct rtnl_link_stats64 *stats)
 	return stats;
 }
 
-static void
+void
 bnad_set_rx_mode(struct net_device *netdev)
 {
 	struct bnad *bnad = netdev_priv(netdev);
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index b03e3a9..b31b893 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -328,6 +328,12 @@ extern u32		bnad_rxqs_per_cq;
  */
 extern u32 *cna_get_firmware_buf(struct pci_dev *pdev);
 /* Netdev entry point prototypes */
+extern void bnad_set_rx_mode(struct net_device *netdev);
+extern struct net_device_stats *bnad_get_netdev_stats(
+				struct net_device *netdev);
+extern int bnad_mac_addr_set_locked(struct bnad *bnad, u8 *mac_addr);
+extern int bnad_enable_default_bcast(struct bnad *bnad);
+extern void bnad_restore_vlans(struct bnad *bnad, u32 rx_id);
 extern void bnad_set_ethtool_ops(struct net_device *netdev);
 
 /* Configuration & setup */
@@ -366,10 +372,6 @@ extern void bnad_netdev_hwstats_fill(struct bnad *bnad,
 	}							\
 }
 
-#define bnad_dim_timer_running(_bnad)				\
-	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) &&		\
-	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
-
 /*
  * Stops the DIM timer
  * Called with bnad->bna_lock held
diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 1c19dce..1199f01 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -75,14 +75,25 @@ static char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
 	"tcpcsum_offload",
 	"udpcsum_offload",
 	"csum_help",
-	"csum_help_err",
+	"tx_skb_too_short",
+	"tx_skb_stopping",
+	"tx_skb_max_vectors",
+	"tx_skb_mss_too_long",
+	"tx_skb_tso_too_short",
+	"tx_skb_tso_prepare",
+	"tx_skb_non_tso_too_long",
+	"tx_skb_tcp_hdr",
+	"tx_skb_udp_hdr",
+	"tx_skb_csum_err",
+	"tx_skb_headlen_too_long",
+	"tx_skb_headlen_zero",
+	"tx_skb_frag_zero",
+	"tx_skb_len_mismatch",
 	"hw_stats_updates",
-	"netif_rx_schedule",
-	"netif_rx_complete",
 	"netif_rx_dropped",
 
 	"link_toggle",
-	"cee_up",
+	"cee_toggle",
 
 	"rxp_info_alloc_failed",
 	"mbox_intr_disabled",
@@ -201,6 +212,20 @@ static char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
 	"rad_rx_bcast_vlan",
 	"rad_rx_drops",
 
+	"rlb_rad_rx_frames",
+	"rlb_rad_rx_octets",
+	"rlb_rad_rx_vlan_frames",
+	"rlb_rad_rx_ucast",
+	"rlb_rad_rx_ucast_octets",
+	"rlb_rad_rx_ucast_vlan",
+	"rlb_rad_rx_mcast",
+	"rlb_rad_rx_mcast_octets",
+	"rlb_rad_rx_mcast_vlan",
+	"rlb_rad_rx_bcast",
+	"rlb_rad_rx_bcast_octets",
+	"rlb_rad_rx_bcast_vlan",
+	"rlb_rad_rx_drops",
+
 	"fc_rx_ucast_octets",
 	"fc_rx_ucast",
 	"fc_rx_ucast_vlan",
@@ -321,7 +346,6 @@ bnad_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *coalesce)
 {
 	struct bnad *bnad = netdev_priv(netdev);
 	unsigned long flags;
-	int dim_timer_del = 0;
 
 	if (coalesce->rx_coalesce_usecs == 0 ||
 	    coalesce->rx_coalesce_usecs >
@@ -348,14 +372,7 @@ bnad_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *coalesce)
 	} else {
 		if (bnad->cfg_flags & BNAD_CF_DIM_ENABLED) {
 			bnad->cfg_flags &= ~BNAD_CF_DIM_ENABLED;
-			dim_timer_del = bnad_dim_timer_running(bnad);
-			if (dim_timer_del) {
-				clear_bit(BNAD_RF_DIM_TIMER_RUNNING,
-							&bnad->run_flags);
-				spin_unlock_irqrestore(&bnad->bna_lock, flags);
-				del_timer_sync(&bnad->dim_timer);
-				spin_lock_irqsave(&bnad->bna_lock, flags);
-			}
+			bnad_dim_timer_stop(bnad, flags);
 			bnad_rx_coalescing_timeo_set(bnad);
 		}
 	}
@@ -407,6 +424,7 @@ bnad_set_ringparam(struct net_device *netdev,
 {
 	int i, current_err, err = 0;
 	struct bnad *bnad = netdev_priv(netdev);
+	unsigned long flags;
 
 	mutex_lock(&bnad->conf_mutex);
 	if (ringparam->rx_pending == bnad->rxq_depth &&
@@ -430,6 +448,11 @@ bnad_set_ringparam(struct net_device *netdev,
 
 	if (ringparam->rx_pending != bnad->rxq_depth) {
 		bnad->rxq_depth = ringparam->rx_pending;
+		if (!netif_running(netdev)) {
+			mutex_unlock(&bnad->conf_mutex);
+			return 0;
+		}
+
 		for (i = 0; i < bnad->num_rx; i++) {
 			if (!bnad->rx_info[i].rx)
 				continue;
@@ -437,10 +460,26 @@ bnad_set_ringparam(struct net_device *netdev,
 			current_err = bnad_setup_rx(bnad, i);
 			if (current_err && !err)
 				err = current_err;
+			if (!err)
+				bnad_restore_vlans(bnad, i);
+		}
+
+		if (!err && bnad->rx_info[0].rx) {
+			/* restore rx configuration */
+			bnad_enable_default_bcast(bnad);
+			spin_lock_irqsave(&bnad->bna_lock, flags);
+			bnad_mac_addr_set_locked(bnad, netdev->dev_addr);
+			spin_unlock_irqrestore(&bnad->bna_lock, flags);
+			bnad_set_rx_mode(netdev);
 		}
 	}
 	if (ringparam->tx_pending != bnad->txq_depth) {
 		bnad->txq_depth = ringparam->tx_pending;
+		if (!netif_running(netdev)) {
+			mutex_unlock(&bnad->conf_mutex);
+			return 0;
+		}
+
 		for (i = 0; i < bnad->num_tx; i++) {
 			if (!bnad->tx_info[i].tx)
 				continue;
@@ -578,6 +617,16 @@ bnad_get_strings(struct net_device *netdev, u32 stringset, u8 * string)
 				sprintf(string, "cq%d_hw_producer_index",
 					q_num);
 				string += ETH_GSTRING_LEN;
+				sprintf(string, "cq%d_intr", q_num);
+				string += ETH_GSTRING_LEN;
+				sprintf(string, "cq%d_poll", q_num);
+				string += ETH_GSTRING_LEN;
+				sprintf(string, "cq%d_schedule", q_num);
+				string += ETH_GSTRING_LEN;
+				sprintf(string, "cq%d_keep_poll", q_num);
+				string += ETH_GSTRING_LEN;
+				sprintf(string, "cq%d_complete", q_num);
+				string += ETH_GSTRING_LEN;
 				q_num++;
 			}
 		}
@@ -660,7 +709,7 @@ static int
 bnad_get_stats_count_locked(struct net_device *netdev)
 {
 	struct bnad *bnad = netdev_priv(netdev);
-	int i, j, count, rxf_active_num = 0, txf_active_num = 0;
+	int i, j, count = 0, rxf_active_num = 0, txf_active_num = 0;
 	u32 bmap;
 
 	bmap = bna_tx_rid_mask(&bnad->bna);
@@ -718,6 +767,17 @@ bnad_per_q_stats_fill(struct bnad *bnad, u64 *buf, int bi)
 				buf[bi++] = 0; /* ccb->consumer_index */
 				buf[bi++] = *(bnad->rx_info[i].rx_ctrl[j].
 						ccb->hw_producer_index);
+
+				buf[bi++] = bnad->rx_info[i].
+						rx_ctrl[j].rx_intr_ctr;
+				buf[bi++] = bnad->rx_info[i].
+						rx_ctrl[j].rx_poll_ctr;
+				buf[bi++] = bnad->rx_info[i].
+						rx_ctrl[j].rx_schedule;
+				buf[bi++] = bnad->rx_info[i].
+						rx_ctrl[j].rx_keep_poll;
+				buf[bi++] = bnad->rx_info[i].
+						rx_ctrl[j].rx_complete;
 			}
 	}
 	for (i = 0; i < bnad->num_rx; i++) {
-- 
1.7.1


^ permalink raw reply related

* [PATCH 06/13] bna: Formatting and Code Cleanup
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Print log messages when running with reduced number of MSI-X vectors
   and when defaulting to INTx mode.
 - Remove BUG_ONs and header file inclusion that are not needed
 - Comments addition/cleanup
 - Unused code cleanup
 - Add New Line to Print msg in bfa_sm_fault
 - Formatting fix

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bfa_cee.c         |    2 -
 .../net/ethernet/brocade/bna/bfa_defs_mfg_comm.h   |    1 -
 drivers/net/ethernet/brocade/bna/bfi.h             |   46 --------------------
 drivers/net/ethernet/brocade/bna/bna.h             |   18 +++-----
 drivers/net/ethernet/brocade/bna/bna_types.h       |    1 -
 drivers/net/ethernet/brocade/bna/bnad.c            |   46 ++++++--------------
 drivers/net/ethernet/brocade/bna/bnad.h            |   13 +++---
 drivers/net/ethernet/brocade/bna/cna.h             |   11 ++---
 8 files changed, 31 insertions(+), 107 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bfa_cee.c b/drivers/net/ethernet/brocade/bna/bfa_cee.c
index b45b8eb..8e62718 100644
--- a/drivers/net/ethernet/brocade/bna/bfa_cee.c
+++ b/drivers/net/ethernet/brocade/bna/bfa_cee.c
@@ -16,8 +16,6 @@
  * www.brocade.com
  */
 
-#include "bfa_defs_cna.h"
-#include "cna.h"
 #include "bfa_cee.h"
 #include "bfi_cna.h"
 #include "bfa_ioc.h"
diff --git a/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h b/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
index 7ddd16f..7e5df90 100644
--- a/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
+++ b/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
@@ -18,7 +18,6 @@
 #ifndef __BFA_DEFS_MFG_COMM_H__
 #define __BFA_DEFS_MFG_COMM_H__
 
-#include "cna.h"
 #include "bfa_defs.h"
 
 /**
diff --git a/drivers/net/ethernet/brocade/bna/bfi.h b/drivers/net/ethernet/brocade/bna/bfi.h
index 19654cc..4e04c14 100644
--- a/drivers/net/ethernet/brocade/bna/bfi.h
+++ b/drivers/net/ethernet/brocade/bna/bfi.h
@@ -73,20 +73,6 @@ struct bfi_mhdr {
  ****************************************************************************
  */
 
-#define BFI_SGE_INLINE	1
-#define BFI_SGE_INLINE_MAX	(BFI_SGE_INLINE + 1)
-
-/**
- * SG Flags
- */
-enum {
-	BFI_SGE_DATA		= 0,	/*!< data address, not last	     */
-	BFI_SGE_DATA_CPL	= 1,	/*!< data addr, last in current page */
-	BFI_SGE_DATA_LAST	= 3,	/*!< data address, last		     */
-	BFI_SGE_LINK		= 2,	/*!< link address		     */
-	BFI_SGE_PGDLEN		= 2,	/*!< cumulative data length for page */
-};
-
 /**
  * DMA addresses
  */
@@ -97,33 +83,6 @@ union bfi_addr_u {
 	} a32;
 };
 
-/**
- * Scatter Gather Element
- */
-struct bfi_sge {
-#ifdef __BIGENDIAN
-	u32	flags:2,
-			rsvd:2,
-			sg_len:28;
-#else
-	u32	sg_len:28,
-			rsvd:2,
-			flags:2;
-#endif
-	union bfi_addr_u sga;
-};
-
-/**
- * Scatter Gather Page
- */
-#define BFI_SGPG_DATA_SGES		7
-#define BFI_SGPG_SGES_MAX		(BFI_SGPG_DATA_SGES + 1)
-#define BFI_SGPG_RSVD_WD_LEN	8
-struct bfi_sgpg {
-	struct bfi_sge sges[BFI_SGPG_SGES_MAX];
-	u32	rsvd[BFI_SGPG_RSVD_WD_LEN];
-};
-
 /*
  * Large Message structure - 128 Bytes size Msgs
  */
@@ -131,11 +90,6 @@ struct bfi_sgpg {
 #define BFI_LMSG_PL_WSZ	\
 			((BFI_LMSG_SZ - sizeof(struct bfi_mhdr)) / 4)
 
-struct bfi_msg {
-	struct bfi_mhdr mhdr;
-	u32	pl[BFI_LMSG_PL_WSZ];
-};
-
 /**
  * Mailbox message structure
  */
diff --git a/drivers/net/ethernet/brocade/bna/bna.h b/drivers/net/ethernet/brocade/bna/bna.h
index 2a587c5..3a6e790 100644
--- a/drivers/net/ethernet/brocade/bna/bna.h
+++ b/drivers/net/ethernet/brocade/bna/bna.h
@@ -10,12 +10,17 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  * General Public License for more details.
  */
+/*
+ * Copyright (c) 2005-2011 Brocade Communications Systems, Inc.
+ * All rights reserved
+ * www.brocade.com
+ */
 #ifndef __BNA_H__
 #define __BNA_H__
 
-#include "bfa_cs.h"
+#include "bfa_defs.h"
 #include "bfa_ioc.h"
-#include "cna.h"
+#include "bfi_enet.h"
 #include "bna_types.h"
 
 extern const u32 bna_napi_dim_vector[][BNA_BIAS_T_MAX];
@@ -395,12 +400,8 @@ void bna_mod_init(struct bna *bna, struct bna_res_info *res_info);
 void bna_uninit(struct bna *bna);
 int bna_num_txq_set(struct bna *bna, int num_txq);
 int bna_num_rxp_set(struct bna *bna, int num_rxp);
-void bna_stats_get(struct bna *bna);
-void bna_get_perm_mac(struct bna *bna, u8 *mac);
 void bna_hw_stats_get(struct bna *bna);
 
-/* APIs for Rx */
-
 /* APIs for RxF */
 struct bna_mac *bna_ucam_mod_mac_get(struct bna_ucam_mod *ucam_mod);
 void bna_ucam_mod_mac_put(struct bna_ucam_mod *ucam_mod,
@@ -521,11 +522,6 @@ bna_rx_mode_set(struct bna_rx *rx, enum bna_rxmode rxmode,
 void bna_rx_vlan_add(struct bna_rx *rx, int vlan_id);
 void bna_rx_vlan_del(struct bna_rx *rx, int vlan_id);
 void bna_rx_vlanfilter_enable(struct bna_rx *rx);
-void bna_rx_hds_enable(struct bna_rx *rx, struct bna_hds_config *hds_config,
-		       void (*cbfn)(struct bnad *, struct bna_rx *));
-void bna_rx_hds_disable(struct bna_rx *rx,
-			void (*cbfn)(struct bnad *, struct bna_rx *));
-
 /**
  * ENET
  */
diff --git a/drivers/net/ethernet/brocade/bna/bna_types.h b/drivers/net/ethernet/brocade/bna/bna_types.h
index 8a6da0c..59417b1 100644
--- a/drivers/net/ethernet/brocade/bna/bna_types.h
+++ b/drivers/net/ethernet/brocade/bna/bna_types.h
@@ -21,7 +21,6 @@
 #include "cna.h"
 #include "bna_hw_defs.h"
 #include "bfa_cee.h"
-#include "bfi_enet.h"
 #include "bfa_msgq.h"
 
 /**
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index b53267f..6ee604e 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -394,10 +394,9 @@ bnad_alloc_n_post_rxbufs(struct bnad *bnad, struct bna_rcb *rcb)
 	BNA_RXQ_QPGE_PTR_GET(unmap_prod, rcb->sw_qpt, rxent, wi_range);
 
 	while (to_alloc--) {
-		if (!wi_range) {
+		if (!wi_range)
 			BNA_RXQ_QPGE_PTR_GET(unmap_prod, rcb->sw_qpt, rxent,
 					     wi_range);
-		}
 		skb = netdev_alloc_skb_ip_align(bnad->netdev,
 						rcb->rxq->buffer_size);
 		if (unlikely(!skb)) {
@@ -559,27 +558,6 @@ next:
 }
 
 static void
-bnad_disable_rx_irq(struct bnad *bnad, struct bna_ccb *ccb)
-{
-	if (unlikely(!test_bit(BNAD_RXQ_STARTED, &ccb->rcb[0]->flags)))
-		return;
-
-	bna_ib_coalescing_timer_set(ccb->i_dbell, 0);
-	bna_ib_ack(ccb->i_dbell, 0);
-}
-
-static void
-bnad_enable_rx_irq(struct bnad *bnad, struct bna_ccb *ccb)
-{
-	unsigned long flags;
-
-	/* Because of polling context */
-	spin_lock_irqsave(&bnad->bna_lock, flags);
-	bnad_enable_rx_irq_unsafe(ccb);
-	spin_unlock_irqrestore(&bnad->bna_lock, flags);
-}
-
-static void
 bnad_netif_rx_schedule_poll(struct bnad *bnad, struct bna_ccb *ccb)
 {
 	struct bnad_rx_ctrl *rx_ctrl = (struct bnad_rx_ctrl *)(ccb->ctrl);
@@ -1679,7 +1657,7 @@ bnad_napi_poll_rx(struct napi_struct *napi, int budget)
 		return rcvd;
 
 poll_exit:
-	napi_complete((napi));
+	napi_complete(napi);
 
 	rx_ctrl->rx_complete++;
 
@@ -2090,15 +2068,13 @@ bnad_enable_default_bcast(struct bnad *bnad)
 	return 0;
 }
 
-/* Called with bnad_conf_lock() held */
+/* Called with mutex_lock(&bnad->conf_mutex) held */
 static void
 bnad_restore_vlans(struct bnad *bnad, u32 rx_id)
 {
 	u16 vid;
 	unsigned long flags;
 
-	BUG_ON(!(VLAN_N_VID == BFI_ENET_VLAN_ID_MAX));
-
 	for_each_set_bit(vid, bnad->active_vlans, VLAN_N_VID) {
 		spin_lock_irqsave(&bnad->bna_lock, flags);
 		bna_rx_vlan_add(bnad->rx_info[rx_id].rx, vid);
@@ -2207,9 +2183,6 @@ bnad_tso_prepare(struct bnad *bnad, struct sk_buff *skb)
 {
 	int err;
 
-	/* SKB_GSO_TCPV4 and SKB_GSO_TCPV6 is defined since 2.6.18. */
-	BUG_ON(!(skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4 ||
-		   skb_shinfo(skb)->gso_type == SKB_GSO_TCPV6));
 	if (skb_header_cloned(skb)) {
 		err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
 		if (err) {
@@ -2236,7 +2209,6 @@ bnad_tso_prepare(struct bnad *bnad, struct sk_buff *skb)
 	} else {
 		struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 
-		BUG_ON(!(skb->protocol == htons(ETH_P_IPV6)));
 		ipv6h->payload_len = 0;
 		tcp_hdr(skb)->check =
 			~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, 0,
@@ -2387,6 +2359,8 @@ bnad_enable_msix(struct bnad *bnad)
 	ret = pci_enable_msix(bnad->pcidev, bnad->msix_table, bnad->msix_num);
 	if (ret > 0) {
 		/* Not enough MSI-X vectors. */
+		pr_warn("BNA: %d MSI-X vectors allocated < %d requested\n",
+			ret, bnad->msix_num);
 
 		spin_lock_irqsave(&bnad->bna_lock, flags);
 		/* ret = #of vectors that we got */
@@ -2415,6 +2389,7 @@ bnad_enable_msix(struct bnad *bnad)
 	return;
 
 intx_mode:
+	pr_warn("BNA: MSI-X enable failed - operating in INTx mode\n");
 
 	kfree(bnad->msix_table);
 	bnad->msix_table = NULL;
@@ -2565,7 +2540,7 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 
 	/*
 	 * Takes care of the Tx that is scheduled between clearing the flag
-	 * and the netif_stop_all_queue() call.
+	 * and the netif_tx_stop_all_queues() call.
 	 */
 	BNAD_DROP_AND_RETURN_IF(!test_bit(BNAD_TXQ_TX_STARTED, &tcb->flags),
 				tx_skb_stopping);
@@ -2613,7 +2588,6 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 
 	txq_prod = tcb->producer_index;
 	BNA_TXQ_QPGE_PTR_GET(txq_prod, tcb->sw_qpt, txqent, wi_range);
-	BUG_ON(!(wi_range <= tcb->q_depth));
 	txqent->hdr.wi.reserved = 0;
 	txqent->hdr.wi.num_vectors = vectors;
 
@@ -2997,6 +2971,12 @@ bnad_netpoll(struct net_device *netdev)
 		bnad_isr(bnad->pcidev->irq, netdev);
 		bna_intx_enable(&bnad->bna, curr_mask);
 	} else {
+		/*
+		 * Tx processing may happen in sending context, so no need
+		 * to explicitly process completions here
+		 */
+
+		/* Rx processing */
 		for (i = 0; i < bnad->num_rx; i++) {
 			rx_info = &bnad->rx_info[i];
 			if (!rx_info->rx)
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 8a31882..b03e3a9 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -65,8 +65,6 @@ struct bnad_rx_ctrl {
 
 #define BNAD_RXMODE_PROMISC_DEFAULT	BNA_RXMODE_PROMISC
 
-#define BNAD_GET_TX_ID(_skb)	(0)
-
 /*
  * GLOBAL #defines (CONSTANTS)
  */
@@ -152,7 +150,6 @@ struct bnad_drv_stats {
 	u64		tcpcsum_offload;
 	u64		udpcsum_offload;
 	u64		csum_help;
-	u64		csum_help_err;
 	u64		tx_skb_too_short;
 	u64		tx_skb_stopping;
 	u64		tx_skb_max_vectors;
@@ -169,13 +166,10 @@ struct bnad_drv_stats {
 	u64		tx_skb_len_mismatch;
 
 	u64		hw_stats_updates;
-	u64		netif_rx_schedule;
-	u64		netif_rx_complete;
 	u64		netif_rx_dropped;
 
 	u64		link_toggle;
 	u64		cee_toggle;
-	u64		cee_up;
 
 	u64		rxp_info_alloc_failed;
 	u64		mbox_intr_disabled;
@@ -375,6 +369,13 @@ extern void bnad_netdev_hwstats_fill(struct bnad *bnad,
 #define bnad_dim_timer_running(_bnad)				\
 	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) &&		\
 	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
+
+/*
+ * Stops the DIM timer
+ * Called with bnad->bna_lock held
+ * Implemented as macro, since we want to use
+ * the correct flags(on stack) while unlocking.
+ */
 #define bnad_dim_timer_stop(_bnad, _flags)		\
 do {							\
 	int to_del = 0;					\
diff --git a/drivers/net/ethernet/brocade/bna/cna.h b/drivers/net/ethernet/brocade/bna/cna.h
index 50fce15..cb48742 100644
--- a/drivers/net/ethernet/brocade/bna/cna.h
+++ b/drivers/net/ethernet/brocade/bna/cna.h
@@ -21,21 +21,18 @@
 
 #include <linux/kernel.h>
 #include <linux/types.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/delay.h>
 #include <linux/bitops.h>
 #include <linux/timer.h>
 #include <linux/interrupt.h>
+#include <linux/if_vlan.h>
 #include <linux/if_ether.h>
-#include <asm/page.h>
-#include <asm/io.h>
-#include <asm/string.h>
-
-#include <linux/list.h>
 
 #define bfa_sm_fault(__event)    do {                            \
-	pr_err("SM Assertion failure: %s: %d: event = %d", __FILE__, __LINE__, \
-		__event); \
+	pr_err("SM Assertion failure: %s: %d: event = %d\n",	\
+		 __FILE__, __LINE__, __event);			\
 } while (0)
 
 extern char bfa_version[];
-- 
1.7.1


^ permalink raw reply related

* [PATCH 05/13] bna: TX Path and RX Path Changes
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Disable and enable interrupts from the same polling context to prevent
   reordering in Rx path.
 - Add Rx NAPI debug counters.
 - Make NAPI budget check more generic
 - Add a macro bnad_dim_timer_stop for DIM(Dynamic Interrupt Moderation)
   timer stop
 - Handle reduced MSI-X vectors case in bnad_enable_msix
 - Replace existing checks with macros and add more checks for illegal skbs
   in transmit path. Add more tx_skb counters for dropped skbs.
 - Check for single frame TSO skbs and send them out as non-TSO.
 - Put memory barrier after bna_txq_prod_indx_doorbell()

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |  200 ++++++++++++++++++-------------
 drivers/net/ethernet/brocade/bna/bnad.h |   33 +++++-
 2 files changed, 147 insertions(+), 86 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index beeffa2..b53267f 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -547,7 +547,8 @@ next:
 	BNA_QE_INDX_ADD(ccb->producer_index, wis, ccb->q_depth);
 
 	if (likely(test_bit(BNAD_RXQ_STARTED, &ccb->rcb[0]->flags)))
-		bna_ib_ack(ccb->i_dbell, packets);
+		bna_ib_ack_disable_irq(ccb->i_dbell, packets);
+
 	bnad_refill_rxq(bnad, ccb->rcb[0]);
 	if (ccb->rcb[1])
 		bnad_refill_rxq(bnad, ccb->rcb[1]);
@@ -585,10 +586,9 @@ bnad_netif_rx_schedule_poll(struct bnad *bnad, struct bna_ccb *ccb)
 	struct napi_struct *napi = &rx_ctrl->napi;
 
 	if (likely(napi_schedule_prep(napi))) {
-		bnad_disable_rx_irq(bnad, ccb);
 		__napi_schedule(napi);
+		rx_ctrl->rx_schedule++;
 	}
-	BNAD_UPDATE_CTR(bnad, netif_rx_schedule);
 }
 
 /* MSIX Rx Path Handler */
@@ -597,8 +597,10 @@ bnad_msix_rx(int irq, void *data)
 {
 	struct bna_ccb *ccb = (struct bna_ccb *)data;
 
-	if (ccb)
+	if (ccb) {
+		((struct bnad_rx_ctrl *)(ccb->ctrl))->rx_intr_ctr++;
 		bnad_netif_rx_schedule_poll(ccb->bnad, ccb);
+	}
 
 	return IRQ_HANDLED;
 }
@@ -1667,22 +1669,23 @@ bnad_napi_poll_rx(struct napi_struct *napi, int budget)
 	struct bnad *bnad = rx_ctrl->bnad;
 	int rcvd = 0;
 
+	rx_ctrl->rx_poll_ctr++;
 
 	if (!netif_carrier_ok(bnad->netdev))
 		goto poll_exit;
 
 	rcvd = bnad_poll_cq(bnad, rx_ctrl->ccb, budget);
-	if (rcvd == budget)
+	if (rcvd >= budget)
 		return rcvd;
 
 poll_exit:
 	napi_complete((napi));
 
-	BNAD_UPDATE_CTR(bnad, netif_rx_complete);
-
+	rx_ctrl->rx_complete++;
 
 	if (rx_ctrl->ccb)
-		bnad_enable_rx_irq(bnad, rx_ctrl->ccb);
+		bnad_enable_rx_irq_unsafe(rx_ctrl->ccb);
+
 	return rcvd;
 }
 
@@ -1886,20 +1889,14 @@ bnad_cleanup_rx(struct bnad *bnad, u32 rx_id)
 	struct bna_rx_config *rx_config = &bnad->rx_config[rx_id];
 	struct bna_res_info *res_info = &bnad->rx_res_info[rx_id].res_info[0];
 	unsigned long flags;
-	int dim_timer_del = 0;
 
 	if (!rx_info->rx)
 		return;
 
-	if (0 == rx_id) {
-		spin_lock_irqsave(&bnad->bna_lock, flags);
-		dim_timer_del = bnad_dim_timer_running(bnad);
-		if (dim_timer_del)
-			clear_bit(BNAD_RF_DIM_TIMER_RUNNING, &bnad->run_flags);
-		spin_unlock_irqrestore(&bnad->bna_lock, flags);
-		if (dim_timer_del)
-			del_timer_sync(&bnad->dim_timer);
-	}
+	spin_lock_irqsave(&bnad->bna_lock, flags);
+	if (0 == rx_id)
+		bnad_dim_timer_stop(bnad, flags);
+	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
 	init_completion(&bnad->bnad_completions.rx_comp);
 	spin_lock_irqsave(&bnad->bna_lock, flags);
@@ -2393,12 +2390,11 @@ bnad_enable_msix(struct bnad *bnad)
 
 		spin_lock_irqsave(&bnad->bna_lock, flags);
 		/* ret = #of vectors that we got */
-		bnad_q_num_adjust(bnad, ret, 0);
+		bnad_q_num_adjust(bnad, (ret - BNAD_MAILBOX_MSIX_VECTORS) / 2,
+			(ret - BNAD_MAILBOX_MSIX_VECTORS) / 2);
 		spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
-		bnad->msix_num = (bnad->num_tx * bnad->num_txq_per_tx)
-			+ (bnad->num_rx
-			* bnad->num_rxp_per_rx) +
+		bnad->msix_num = BNAD_NUM_TXQ + BNAD_NUM_RXP +
 			 BNAD_MAILBOX_MSIX_VECTORS;
 
 		if (bnad->msix_num > ret)
@@ -2555,17 +2551,17 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	u32		unmap_prod, wis, wis_used, wi_range;
 	u32		vectors, vect_id, i, acked;
 	int			err;
+	unsigned int		len;
 
 	struct bnad_unmap_q *unmap_q = tcb->unmap_q;
 	dma_addr_t		dma_addr;
 	struct bna_txq_entry *txqent;
 	u16	flags;
 
-	if (unlikely
-	    (skb->len <= ETH_HLEN || skb->len > BFI_TX_MAX_DATA_PER_PKT)) {
-		dev_kfree_skb(skb);
-		return NETDEV_TX_OK;
-	}
+	BNAD_DROP_AND_RETURN_IF(skb->len <= ETH_HLEN, tx_skb_too_short);
+	BNAD_DROP_AND_RETURN_IF(skb_headlen(skb) > BFI_TX_MAX_DATA_PER_VECTOR,
+				tx_skb_headlen_too_long);
+	BNAD_DROP_AND_RETURN_IF(skb_headlen(skb) == 0, tx_skb_headlen_zero);
 
 	/*
 	 * Takes care of the Tx that is scheduled between clearing the flag
@@ -2613,8 +2609,6 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	}
 
 	unmap_prod = unmap_q->producer_index;
-	wis_used = 1;
-	vect_id = 0;
 	flags = 0;
 
 	txq_prod = tcb->producer_index;
@@ -2622,9 +2616,6 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	BUG_ON(!(wi_range <= tcb->q_depth));
 	txqent->hdr.wi.reserved = 0;
 	txqent->hdr.wi.num_vectors = vectors;
-	txqent->hdr.wi.opcode =
-		htons((skb_is_gso(skb) ? BNA_TXQ_WI_SEND_LSO :
-		       BNA_TXQ_WI_SEND));
 
 	if (vlan_tx_tag_present(skb)) {
 		vlan_tag = (u16) vlan_tx_tag_get(skb);
@@ -2639,62 +2630,74 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	txqent->hdr.wi.vlan_tag = htons(vlan_tag);
 
 	if (skb_is_gso(skb)) {
-		err = bnad_tso_prepare(bnad, skb);
-		if (err) {
-			dev_kfree_skb(skb);
-			return NETDEV_TX_OK;
+		BNAD_DROP_AND_RETURN_IF(skb_is_gso(skb) > netdev->mtu,
+					tx_skb_mss_too_long);
+		if (unlikely((skb_is_gso(skb) + skb_transport_offset(skb) +
+			      tcp_hdrlen(skb)) >= skb->len)) {
+			txqent->hdr.wi.opcode =
+				__constant_htons(BNA_TXQ_WI_SEND);
+			txqent->hdr.wi.lso_mss = 0;
+			BNAD_UPDATE_CTR(bnad, tx_skb_tso_too_short);
+		} else {
+			txqent->hdr.wi.opcode =
+				__constant_htons(BNA_TXQ_WI_SEND_LSO);
+			txqent->hdr.wi.lso_mss = htons(skb_is_gso(skb));
 		}
-		txqent->hdr.wi.lso_mss = htons(skb_is_gso(skb));
+
+		err = bnad_tso_prepare(bnad, skb);
+		BNAD_DROP_AND_RETURN_IF(err, tx_skb_tso_prepare);
 		flags |= (BNA_TXQ_WI_CF_IP_CKSUM | BNA_TXQ_WI_CF_TCP_CKSUM);
 		txqent->hdr.wi.l4_hdr_size_n_offset =
 			htons(BNA_TXQ_WI_L4_HDR_N_OFFSET
 			      (tcp_hdrlen(skb) >> 2,
 			       skb_transport_offset(skb)));
-	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		u8 proto = 0;
-
+	} else {
+		txqent->hdr.wi.opcode =	__constant_htons(BNA_TXQ_WI_SEND);
 		txqent->hdr.wi.lso_mss = 0;
 
-		if (skb->protocol == htons(ETH_P_IP))
-			proto = ip_hdr(skb)->protocol;
-		else if (skb->protocol == htons(ETH_P_IPV6)) {
-			/* nexthdr may not be TCP immediately. */
-			proto = ipv6_hdr(skb)->nexthdr;
-		}
-		if (proto == IPPROTO_TCP) {
-			flags |= BNA_TXQ_WI_CF_TCP_CKSUM;
-			txqent->hdr.wi.l4_hdr_size_n_offset =
-				htons(BNA_TXQ_WI_L4_HDR_N_OFFSET
-				      (0, skb_transport_offset(skb)));
-
-			BNAD_UPDATE_CTR(bnad, tcpcsum_offload);
-
-			BUG_ON(!(skb_headlen(skb) >=
-				skb_transport_offset(skb) + tcp_hdrlen(skb)));
-
-		} else if (proto == IPPROTO_UDP) {
-			flags |= BNA_TXQ_WI_CF_UDP_CKSUM;
-			txqent->hdr.wi.l4_hdr_size_n_offset =
-				htons(BNA_TXQ_WI_L4_HDR_N_OFFSET
-				      (0, skb_transport_offset(skb)));
+		BNAD_DROP_AND_RETURN_IF(skb->len > (netdev->mtu + ETH_HLEN),
+					tx_skb_non_tso_too_long);
 
-			BNAD_UPDATE_CTR(bnad, udpcsum_offload);
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			u8 proto = 0;
 
-			BUG_ON(!(skb_headlen(skb) >=
-				   skb_transport_offset(skb) +
-				   sizeof(struct udphdr)));
-		} else {
-			err = skb_checksum_help(skb);
-			BNAD_UPDATE_CTR(bnad, csum_help);
-			if (err) {
-				dev_kfree_skb(skb);
-				BNAD_UPDATE_CTR(bnad, csum_help_err);
-				return NETDEV_TX_OK;
+			if (skb->protocol == __constant_htons(ETH_P_IP))
+				proto = ip_hdr(skb)->protocol;
+			else if (skb->protocol ==
+				 __constant_htons(ETH_P_IPV6)) {
+				/* nexthdr may not be TCP immediately. */
+				proto = ipv6_hdr(skb)->nexthdr;
+			}
+			if (proto == IPPROTO_TCP) {
+				flags |= BNA_TXQ_WI_CF_TCP_CKSUM;
+				txqent->hdr.wi.l4_hdr_size_n_offset =
+					htons(BNA_TXQ_WI_L4_HDR_N_OFFSET
+					      (0, skb_transport_offset(skb)));
+
+				BNAD_UPDATE_CTR(bnad, tcpcsum_offload);
+
+				BNAD_DROP_AND_RETURN_IF(skb_headlen(skb) <
+					    skb_transport_offset(skb) +
+					    tcp_hdrlen(skb), tx_skb_tcp_hdr);
+
+			} else if (proto == IPPROTO_UDP) {
+				flags |= BNA_TXQ_WI_CF_UDP_CKSUM;
+				txqent->hdr.wi.l4_hdr_size_n_offset =
+					htons(BNA_TXQ_WI_L4_HDR_N_OFFSET
+					      (0, skb_transport_offset(skb)));
+
+				BNAD_UPDATE_CTR(bnad, udpcsum_offload);
+				BNAD_DROP_AND_RETURN_IF(skb_headlen(skb) <
+					    skb_transport_offset(skb) +
+					    sizeof(struct udphdr),
+							tx_skb_udp_hdr);
+
+			} else {
+				BNAD_DROP_AND_RETURN(tx_skb_csum_err);
 			}
+		} else {
+			txqent->hdr.wi.l4_hdr_size_n_offset = 0;
 		}
-	} else {
-		txqent->hdr.wi.lso_mss = 0;
-		txqent->hdr.wi.l4_hdr_size_n_offset = 0;
 	}
 
 	txqent->hdr.wi.flags = htons(flags);
@@ -2702,20 +2705,36 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	txqent->hdr.wi.frame_length = htonl(skb->len);
 
 	unmap_q->unmap_array[unmap_prod].skb = skb;
-	BUG_ON(!(skb_headlen(skb) <= BFI_TX_MAX_DATA_PER_VECTOR));
-	txqent->vector[vect_id].length = htons(skb_headlen(skb));
+	len = skb_headlen(skb);
+	txqent->vector[0].length = htons(len);
 	dma_addr = dma_map_single(&bnad->pcidev->dev, skb->data,
 				  skb_headlen(skb), DMA_TO_DEVICE);
 	dma_unmap_addr_set(&unmap_q->unmap_array[unmap_prod], dma_addr,
 			   dma_addr);
 
-	BNA_SET_DMA_ADDR(dma_addr, &txqent->vector[vect_id].host_addr);
+	BNA_SET_DMA_ADDR(dma_addr, &txqent->vector[0].host_addr);
 	BNA_QE_INDX_ADD(unmap_prod, 1, unmap_q->q_depth);
 
+	vect_id = 0;
+	wis_used = 1;
+
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 		struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i];
 		u16		size = frag->size;
 
+		if (unlikely(size == 0)) {
+			unmap_prod = unmap_q->producer_index;
+			prefetch(&unmap_q->unmap_array[unmap_prod + 1]);
+
+			BNAD_PCI_UNMAP_SKB(&bnad->pcidev->dev,
+					   unmap_q->unmap_array,
+					   unmap_prod, unmap_q->q_depth, skb,
+					   i);
+			BNAD_DROP_AND_RETURN(tx_skb_frag_zero);
+		}
+
+		len += size;
+
 		if (++vect_id == BFI_TX_MAX_VECTORS_PER_WI) {
 			vect_id = 0;
 			if (--wi_range)
@@ -2726,10 +2745,10 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 				wis_used = 0;
 				BNA_TXQ_QPGE_PTR_GET(txq_prod, tcb->sw_qpt,
 						     txqent, wi_range);
-				BUG_ON(!(wi_range <= tcb->q_depth));
 			}
 			wis_used++;
-			txqent->hdr.wi_ext.opcode = htons(BNA_TXQ_WI_EXTENSION);
+			txqent->hdr.wi_ext.opcode =
+				__constant_htons(BNA_TXQ_WI_EXTENSION);
 		}
 
 		BUG_ON(!(size <= BFI_TX_MAX_DATA_PER_VECTOR));
@@ -2742,6 +2761,16 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 		BNA_QE_INDX_ADD(unmap_prod, 1, unmap_q->q_depth);
 	}
 
+	if (unlikely(len != skb->len)) {
+		unmap_prod = unmap_q->producer_index;
+		prefetch(&unmap_q->unmap_array[unmap_prod + 1]);
+
+		BNAD_PCI_UNMAP_SKB(&bnad->pcidev->dev, unmap_q->unmap_array,
+				   unmap_prod, unmap_q->q_depth, skb,
+				   skb_shinfo(skb)->nr_frags);
+		BNAD_DROP_AND_RETURN(tx_skb_len_mismatch);
+	}
+
 	unmap_q->producer_index = unmap_prod;
 	BNA_QE_INDX_ADD(txq_prod, wis_used, tcb->q_depth);
 	tcb->producer_index = txq_prod;
@@ -2752,6 +2781,7 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 		return NETDEV_TX_OK;
 
 	bna_txq_prod_indx_doorbell(tcb);
+	smp_mb();
 
 	if ((u16) (*tcb->hw_consumer_index) != tcb->consumer_index)
 		tasklet_schedule(&bnad->tx_free_tasklet);
@@ -2818,6 +2848,9 @@ bnad_set_rx_mode(struct net_device *netdev)
 		}
 	}
 
+	if (bnad->rx_info[0].rx == NULL)
+		goto unlock;
+
 	bna_rx_mode_set(bnad->rx_info[0].rx, new_mask, valid_mask, NULL);
 
 	if (!netdev_mc_empty(netdev)) {
@@ -2970,12 +3003,9 @@ bnad_netpoll(struct net_device *netdev)
 				continue;
 			for (j = 0; j < bnad->num_rxp_per_rx; j++) {
 				rx_ctrl = &rx_info->rx_ctrl[j];
-				if (rx_ctrl->ccb) {
-					bnad_disable_rx_irq(bnad,
-							    rx_ctrl->ccb);
+				if (rx_ctrl->ccb)
 					bnad_netif_rx_schedule_poll(bnad,
 							    rx_ctrl->ccb);
-				}
 			}
 		}
 	}
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index c4772e3..8a31882 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -56,6 +56,11 @@ struct bnad_rx_ctrl {
 	struct bnad *bnad;
 	unsigned long  flags;
 	struct napi_struct	napi;
+	u64		rx_intr_ctr;
+	u64		rx_poll_ctr;
+	u64		rx_schedule;
+	u64		rx_keep_poll;
+	u64		rx_complete;
 };
 
 #define BNAD_RXMODE_PROMISC_DEFAULT	BNA_RXMODE_PROMISC
@@ -148,8 +153,20 @@ struct bnad_drv_stats {
 	u64		udpcsum_offload;
 	u64		csum_help;
 	u64		csum_help_err;
+	u64		tx_skb_too_short;
 	u64		tx_skb_stopping;
 	u64		tx_skb_max_vectors;
+	u64		tx_skb_mss_too_long;
+	u64		tx_skb_tso_too_short;
+	u64		tx_skb_tso_prepare;
+	u64		tx_skb_non_tso_too_long;
+	u64		tx_skb_tcp_hdr;
+	u64		tx_skb_udp_hdr;
+	u64		tx_skb_csum_err;
+	u64		tx_skb_headlen_too_long;
+	u64		tx_skb_headlen_zero;
+	u64		tx_skb_frag_zero;
+	u64		tx_skb_len_mismatch;
 
 	u64		hw_stats_updates;
 	u64		netif_rx_schedule;
@@ -348,7 +365,7 @@ extern void bnad_netdev_hwstats_fill(struct bnad *bnad,
 
 #define bnad_enable_rx_irq_unsafe(_ccb)			\
 {							\
-	if (likely(test_bit(BNAD_RXQ_STARTED, &ccb->rcb[0]->flags))) {\
+	if (likely(test_bit(BNAD_RXQ_STARTED, &(_ccb)->rcb[0]->flags))) {\
 		bna_ib_coalescing_timer_set((_ccb)->i_dbell,	\
 			(_ccb)->rx_coalescing_timeo);		\
 		bna_ib_ack((_ccb)->i_dbell, 0);			\
@@ -358,5 +375,19 @@ extern void bnad_netdev_hwstats_fill(struct bnad *bnad,
 #define bnad_dim_timer_running(_bnad)				\
 	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) &&		\
 	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
+#define bnad_dim_timer_stop(_bnad, _flags)		\
+do {							\
+	int to_del = 0;					\
+							\
+	if ((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED &&	\
+	    test_bit(BNAD_RF_DIM_TIMER_RUNNING, &(_bnad)->run_flags)) {\
+		clear_bit(BNAD_RF_DIM_TIMER_RUNNING, &(_bnad)->run_flags);\
+		to_del = 1;				\
+	}						\
+	spin_unlock_irqrestore(&(_bnad)->bna_lock, (_flags));	\
+	if (to_del)					\
+		del_timer_sync(&(_bnad)->dim_timer);	\
+	spin_lock_irqsave(&(_bnad)->bna_lock, (_flags));\
+} while (0)
 
 #endif /* __BNAD_H__ */
-- 
1.7.1


^ permalink raw reply related

* [PATCH 04/13] bna: SKB Check and Drop Macros
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Add macros to check and drop skb from transmit path and return.

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |   68 +++++++++++++++++-------------
 drivers/net/ethernet/brocade/bna/bnad.h |    2 +
 2 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 095eac9..beeffa2 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -74,6 +74,36 @@ do {								\
 
 #define BNAD_TXRX_SYNC_MDELAY	250	/* 250 msecs */
 
+#define BNAD_DROP_AND_RETURN(_counter)	\
+{ \
+	dev_kfree_skb(skb); \
+	BNAD_UPDATE_CTR(bnad, _counter); \
+	return NETDEV_TX_OK; \
+}
+
+#define BNAD_DROP_AND_RETURN_IF(_condition, _counter)	\
+if (unlikely(_condition)) { \
+	BNAD_DROP_AND_RETURN(_counter); \
+}
+
+#define BNAD_PCI_UNMAP_SKB(_pdev, _array, _index, _depth, _skb, _frag) \
+{ \
+	int j; \
+	(_array)[_index].skb = NULL; \
+	dma_unmap_single(_pdev, dma_unmap_addr(&(_array)[_index], dma_addr), \
+			skb_headlen(_skb), DMA_TO_DEVICE); \
+	dma_unmap_addr_set(&(_array)[_index], dma_addr, 0); \
+	BNA_QE_INDX_ADD(_index, 1, _depth); \
+	for (j = 0; j < (_frag); j++) { \
+		prefetch(&(_array)[(_index) + 1]); \
+		dma_unmap_page(_pdev, dma_unmap_addr(&(_array)[_index], \
+						     dma_addr), \
+			  skb_shinfo(_skb)->frags[j].size, DMA_TO_DEVICE); \
+		dma_unmap_addr_set(&(_array)[_index], dma_addr, 0); \
+		BNA_QE_INDX_ADD(_index, 1, _depth); \
+	} \
+}
+
 /*
  * Reinitialize completions in CQ, once Rx is taken down
  */
@@ -169,7 +199,6 @@ bnad_free_txbufs(struct bnad *bnad,
 	struct bnad_unmap_q *unmap_q = tcb->unmap_q;
 	struct bnad_skb_unmap *unmap_array;
 	struct sk_buff		*skb;
-	int i;
 
 	/*
 	 * Just return if TX is stopped. This check is useful
@@ -195,32 +224,14 @@ bnad_free_txbufs(struct bnad *bnad,
 	while (wis) {
 		skb = unmap_array[unmap_cons].skb;
 
-		unmap_array[unmap_cons].skb = NULL;
-
 		sent_packets++;
 		sent_bytes += skb->len;
 		wis -= BNA_TXQ_WI_NEEDED(1 + skb_shinfo(skb)->nr_frags);
 
-		dma_unmap_single(&bnad->pcidev->dev,
-				 dma_unmap_addr(&unmap_array[unmap_cons],
-						dma_addr), skb_headlen(skb),
-				 DMA_TO_DEVICE);
-		dma_unmap_addr_set(&unmap_array[unmap_cons], dma_addr, 0);
-		BNA_QE_INDX_ADD(unmap_cons, 1, unmap_q->q_depth);
-
-		prefetch(&unmap_array[unmap_cons + 1]);
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-			prefetch(&unmap_array[unmap_cons + 1]);
+		BNAD_PCI_UNMAP_SKB(&bnad->pcidev->dev, unmap_array, unmap_cons,
+				   unmap_q->q_depth, skb,
+				   skb_shinfo(skb)->nr_frags);
 
-			dma_unmap_page(&bnad->pcidev->dev,
-				       dma_unmap_addr(&unmap_array[unmap_cons],
-						      dma_addr),
-				       skb_shinfo(skb)->frags[i].size,
-				       DMA_TO_DEVICE);
-			dma_unmap_addr_set(&unmap_array[unmap_cons], dma_addr,
-					   0);
-			BNA_QE_INDX_ADD(unmap_cons, 1, unmap_q->q_depth);
-		}
 		dev_kfree_skb_any(skb);
 	}
 
@@ -2560,16 +2571,13 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	 * Takes care of the Tx that is scheduled between clearing the flag
 	 * and the netif_stop_all_queue() call.
 	 */
-	if (unlikely(!test_bit(BNAD_TXQ_TX_STARTED, &tcb->flags))) {
-		dev_kfree_skb(skb);
-		return NETDEV_TX_OK;
-	}
+	BNAD_DROP_AND_RETURN_IF(!test_bit(BNAD_TXQ_TX_STARTED, &tcb->flags),
+				tx_skb_stopping);
 
 	vectors = 1 + skb_shinfo(skb)->nr_frags;
-	if (vectors > BFI_TX_MAX_VECTORS_PER_PKT) {
-		dev_kfree_skb(skb);
-		return NETDEV_TX_OK;
-	}
+	BNAD_DROP_AND_RETURN_IF(vectors > BFI_TX_MAX_VECTORS_PER_PKT,
+				tx_skb_max_vectors);
+
 	wis = BNA_TXQ_WI_NEEDED(vectors);	/* 4 vectors per work item */
 	acked = 0;
 	if (unlikely(wis > BNA_QE_FREE_CNT(tcb, tcb->q_depth) ||
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 60c2e9d..c4772e3 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -148,6 +148,8 @@ struct bnad_drv_stats {
 	u64		udpcsum_offload;
 	u64		csum_help;
 	u64		csum_help_err;
+	u64		tx_skb_stopping;
+	u64		tx_skb_max_vectors;
 
 	u64		hw_stats_updates;
 	u64		netif_rx_schedule;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 03/13] bna: Interrupt Polling and NAPI Init Changes
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Remove unnecessary ccb check from bnad_poll_cq
 - Add bnad pointer to rx_ctrl structure, so that bnad can be accessed directly
   from rx_ctrl in the NAPI poll routines, even if ccb is NULL
 - Validate ccb before referencing to it in bnad_msix_rx and bnad_napi_poll_rx
 - Fix the order of NAPI init / uninit in Tx / Rx setup / teardown path:
   a. Kill bnad tx free tasklet ahead of call to bna_tx_destroy()
   b. Call NAPI disable only after call to Rx free_irq(). This makes sure Rx
      interrupt does not schedule a poll when NAPI is already disabled
 - NAPI poll runs before the h/w has completed configuration. This causes a
   crash. Delay enabling NAPI till after bna_rx_enable(). Split NAPI
   initialization into 2 steps, bnad_napi_init() & bnad_napi_enable().

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |   83 ++++++++++++++++++++-----------
 drivers/net/ethernet/brocade/bna/bnad.h |    1 +
 2 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 3f19a4d..095eac9 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -535,16 +535,11 @@ next:
 
 	BNA_QE_INDX_ADD(ccb->producer_index, wis, ccb->q_depth);
 
-	if (likely(ccb)) {
-		if (likely(test_bit(BNAD_RXQ_STARTED, &ccb->rcb[0]->flags)))
-			bna_ib_ack(ccb->i_dbell, packets);
-		bnad_refill_rxq(bnad, ccb->rcb[0]);
-		if (ccb->rcb[1])
-			bnad_refill_rxq(bnad, ccb->rcb[1]);
-	} else {
-		if (likely(test_bit(BNAD_RXQ_STARTED, &ccb->rcb[0]->flags)))
-			bna_ib_ack(ccb->i_dbell, 0);
-	}
+	if (likely(test_bit(BNAD_RXQ_STARTED, &ccb->rcb[0]->flags)))
+		bna_ib_ack(ccb->i_dbell, packets);
+	bnad_refill_rxq(bnad, ccb->rcb[0]);
+	if (ccb->rcb[1])
+		bnad_refill_rxq(bnad, ccb->rcb[1]);
 
 	clear_bit(BNAD_FP_IN_RX_PATH, &rx_ctrl->flags);
 
@@ -590,9 +585,9 @@ static irqreturn_t
 bnad_msix_rx(int irq, void *data)
 {
 	struct bna_ccb *ccb = (struct bna_ccb *)data;
-	struct bnad *bnad = ccb->bnad;
 
-	bnad_netif_rx_schedule_poll(bnad, ccb);
+	if (ccb)
+		bnad_netif_rx_schedule_poll(ccb->bnad, ccb);
 
 	return IRQ_HANDLED;
 }
@@ -1658,18 +1653,14 @@ bnad_napi_poll_rx(struct napi_struct *napi, int budget)
 {
 	struct bnad_rx_ctrl *rx_ctrl =
 		container_of(napi, struct bnad_rx_ctrl, napi);
-	struct bna_ccb *ccb;
-	struct bnad *bnad;
+	struct bnad *bnad = rx_ctrl->bnad;
 	int rcvd = 0;
 
-	ccb = rx_ctrl->ccb;
-
-	bnad = ccb->bnad;
 
 	if (!netif_carrier_ok(bnad->netdev))
 		goto poll_exit;
 
-	rcvd = bnad_poll_cq(bnad, ccb, budget);
+	rcvd = bnad_poll_cq(bnad, rx_ctrl->ccb, budget);
 	if (rcvd == budget)
 		return rcvd;
 
@@ -1678,12 +1669,15 @@ poll_exit:
 
 	BNAD_UPDATE_CTR(bnad, netif_rx_complete);
 
-	bnad_enable_rx_irq(bnad, ccb);
+
+	if (rx_ctrl->ccb)
+		bnad_enable_rx_irq(bnad, rx_ctrl->ccb);
 	return rcvd;
 }
 
+#define BNAD_NAPI_POLL_QUOTA		64
 static void
-bnad_napi_enable(struct bnad *bnad, u32 rx_id)
+bnad_napi_init(struct bnad *bnad, u32 rx_id)
 {
 	struct bnad_rx_ctrl *rx_ctrl;
 	int i;
@@ -1691,9 +1685,20 @@ bnad_napi_enable(struct bnad *bnad, u32 rx_id)
 	/* Initialize & enable NAPI */
 	for (i = 0; i <	bnad->num_rxp_per_rx; i++) {
 		rx_ctrl = &bnad->rx_info[rx_id].rx_ctrl[i];
-
 		netif_napi_add(bnad->netdev, &rx_ctrl->napi,
-			       bnad_napi_poll_rx, 64);
+			       bnad_napi_poll_rx, BNAD_NAPI_POLL_QUOTA);
+	}
+}
+
+static void
+bnad_napi_enable(struct bnad *bnad, u32 rx_id)
+{
+	struct bnad_rx_ctrl *rx_ctrl;
+	int i;
+
+	/* Initialize & enable NAPI */
+	for (i = 0; i <	bnad->num_rxp_per_rx; i++) {
+		rx_ctrl = &bnad->rx_info[rx_id].rx_ctrl[i];
 
 		napi_enable(&rx_ctrl->napi);
 	}
@@ -1732,6 +1737,9 @@ bnad_cleanup_tx(struct bnad *bnad, u32 tx_id)
 		bnad_tx_msix_unregister(bnad, tx_info,
 			bnad->num_txq_per_tx);
 
+	if (0 == tx_id)
+		tasklet_kill(&bnad->tx_free_tasklet);
+
 	spin_lock_irqsave(&bnad->bna_lock, flags);
 	bna_tx_destroy(tx_info->tx);
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
@@ -1739,9 +1747,6 @@ bnad_cleanup_tx(struct bnad *bnad, u32 tx_id)
 	tx_info->tx = NULL;
 	tx_info->tx_id = 0;
 
-	if (0 == tx_id)
-		tasklet_kill(&bnad->tx_free_tasklet);
-
 	bnad_tx_res_free(bnad, res_info);
 }
 
@@ -1852,6 +1857,16 @@ bnad_init_rx_config(struct bnad *bnad, struct bna_rx_config *rx_config)
 	rx_config->vlan_strip_status = BNA_STATUS_T_ENABLED;
 }
 
+static void
+bnad_rx_ctrl_init(struct bnad *bnad, u32 rx_id)
+{
+	struct bnad_rx_info *rx_info = &bnad->rx_info[rx_id];
+	int i;
+
+	for (i = 0; i < bnad->num_rxp_per_rx; i++)
+		rx_info->rx_ctrl[i].bnad = bnad;
+}
+
 /* Called with mutex_lock(&bnad->conf_mutex) held */
 void
 bnad_cleanup_rx(struct bnad *bnad, u32 rx_id)
@@ -1875,8 +1890,6 @@ bnad_cleanup_rx(struct bnad *bnad, u32 rx_id)
 			del_timer_sync(&bnad->dim_timer);
 	}
 
-	bnad_napi_disable(bnad, rx_id);
-
 	init_completion(&bnad->bnad_completions.rx_comp);
 	spin_lock_irqsave(&bnad->bna_lock, flags);
 	bna_rx_disable(rx_info->rx, BNA_HARD_CLEANUP, bnad_cb_rx_disabled);
@@ -1886,6 +1899,8 @@ bnad_cleanup_rx(struct bnad *bnad, u32 rx_id)
 	if (rx_info->rx_ctrl[0].ccb->intr_type == BNA_INTR_T_MSIX)
 		bnad_rx_msix_unregister(bnad, rx_info, rx_config->num_paths);
 
+	bnad_napi_disable(bnad, rx_id);
+
 	spin_lock_irqsave(&bnad->bna_lock, flags);
 	bna_rx_destroy(rx_info->rx);
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
@@ -1939,6 +1954,8 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
 	if (err)
 		return err;
 
+	bnad_rx_ctrl_init(bnad, rx_id);
+
 	/* Ask BNA to create one Rx object, supplying required resources */
 	spin_lock_irqsave(&bnad->bna_lock, flags);
 	rx = bna_rx_create(&bnad->bna, bnad, rx_config, &rx_cbfn, res_info,
@@ -1948,6 +1965,12 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
 		goto err_return;
 	rx_info->rx = rx;
 
+	/*
+	 * Init NAPI, so that state is set to NAPI_STATE_SCHED,
+	 * so that IRQ handler cannot schedule NAPI at this point.
+	 */
+	bnad_napi_init(bnad, rx_id);
+
 	/* Register ISR for the Rx object */
 	if (intr_info->intr_type == BNA_INTR_T_MSIX) {
 		err = bnad_rx_msix_register(bnad, rx_info, rx_id,
@@ -1956,9 +1979,6 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
 			goto err_return;
 	}
 
-	/* Enable NAPI */
-	bnad_napi_enable(bnad, rx_id);
-
 	spin_lock_irqsave(&bnad->bna_lock, flags);
 	if (0 == rx_id) {
 		/* Set up Dynamic Interrupt Moderation Vector */
@@ -1975,6 +1995,9 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
 	bna_rx_enable(rx);
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
+	/* Enable scheduling of NAPI */
+	bnad_napi_enable(bnad, rx_id);
+
 	return 0;
 
 err_return:
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 3c23139..60c2e9d 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -53,6 +53,7 @@
  */
 struct bnad_rx_ctrl {
 	struct bna_ccb *ccb;
+	struct bnad *bnad;
 	unsigned long  flags;
 	struct napi_struct	napi;
 };
-- 
1.7.1


^ permalink raw reply related

* [PATCH 02/13] bna: PCI Probe Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Change details:
 - Return error as -EIO if bnad_res_alloc fails
 - Release the configuration lock before registering with net_device layer.

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index d18ffb3..3f19a4d 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -3253,8 +3253,10 @@ bnad_pci_probe(struct pci_dev *pdev,
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
 	err = bnad_res_alloc(bnad, &bnad->mod_res_info[0], BNA_MOD_RES_T_MAX);
-	if (err)
+	if (err) {
+		err = -EIO;
 		goto disable_ioceth;
+	}
 
 	spin_lock_irqsave(&bnad->bna_lock, flags);
 	bna_mod_init(&bnad->bna, &bnad->mod_res_info[0]);
@@ -3266,6 +3268,8 @@ bnad_pci_probe(struct pci_dev *pdev,
 	bnad_set_netdev_perm_addr(bnad);
 	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
+	mutex_unlock(&bnad->conf_mutex);
+
 	/* Finally, reguister with net_device layer */
 	err = register_netdev(netdev);
 	if (err) {
@@ -3274,6 +3278,8 @@ bnad_pci_probe(struct pci_dev *pdev,
 	}
 	set_bit(BNAD_RF_NETDEV_REGISTERED, &bnad->run_flags);
 
+	return 0;
+
 probe_success:
 	mutex_unlock(&bnad->conf_mutex);
 	return 0;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 01/13] bna: Naming Change and Minor Macro Fix
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody, Gurunatha Karaje
In-Reply-To: <1313789972-22711-1-git-send-email-rmody@brocade.com>

Naming changes: rename devid, BNAD_MAX_TXS, BNAD_MAX_RXS,
BNAD_MAX_RXPS_PER_RX to device, BNAD_MAX_TX, BNAD_MAX_RX,
BNAD_MAX_RXP_PER_RX respectively and change all the references.

Macro Fix: Add ioc_isr_mod_set check to bfa_nw_ioc_mbox_regisr macro

Signed-off-by: Gurunatha Karaje <gkaraje@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/ethernet/brocade/bna/bfa_defs.h |    8 ++++----
 drivers/net/ethernet/brocade/bna/bfa_ioc.h  |    6 ++++--
 drivers/net/ethernet/brocade/bna/bnad.c     |    6 +++---
 drivers/net/ethernet/brocade/bna/bnad.h     |   20 ++++++++++----------
 4 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bfa_defs.h b/drivers/net/ethernet/brocade/bna/bfa_defs.h
index 205b92b..a81c0cc 100644
--- a/drivers/net/ethernet/brocade/bna/bfa_defs.h
+++ b/drivers/net/ethernet/brocade/bna/bfa_defs.h
@@ -251,10 +251,10 @@ struct bfa_mfg_block {
  * ---------------------- pci definitions ------------
  */
 
-#define bfa_asic_id_ct(devid)			\
-	((devid) == PCI_DEVICE_ID_BROCADE_CT ||	\
-	(devid) == PCI_DEVICE_ID_BROCADE_CT_FC)
-#define bfa_asic_id_ctc(devid) (bfa_asic_id_ct(devid))
+#define bfa_asic_id_ct(device)			\
+	((device) == PCI_DEVICE_ID_BROCADE_CT ||	\
+	 (device) == PCI_DEVICE_ID_BROCADE_CT_FC)
+#define bfa_asic_id_ctc(device) (bfa_asic_id_ct(device))
 
 enum bfa_mode {
 	BFA_MODE_HBA		= 1,
diff --git a/drivers/net/ethernet/brocade/bna/bfa_ioc.h b/drivers/net/ethernet/brocade/bna/bfa_ioc.h
index f5a3d4e..9116324 100644
--- a/drivers/net/ethernet/brocade/bna/bfa_ioc.h
+++ b/drivers/net/ethernet/brocade/bna/bfa_ioc.h
@@ -274,8 +274,10 @@ void bfa_nw_ioc_mbox_regisr(struct bfa_ioc *ioc, enum bfi_mclass mc,
 	((__ioc)->ioc_hwif->ioc_pll_init((__ioc)->pcidev.pci_bar_kva, \
 			   (__ioc)->asic_mode))
 
-#define	bfa_ioc_isr_mode_set(__ioc, __msix)			\
-			((__ioc)->ioc_hwif->ioc_isr_mode_set(__ioc, __msix))
+#define	bfa_ioc_isr_mode_set(__ioc, __msix) do {			\
+	if ((__ioc)->ioc_hwif->ioc_isr_mode_set)			\
+		((__ioc)->ioc_hwif->ioc_isr_mode_set(__ioc, __msix));	\
+} while (0)
 #define	bfa_ioc_ownership_reset(__ioc)				\
 			((__ioc)->ioc_hwif->ioc_ownership_reset(__ioc))
 
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index bdfda07..d18ffb3 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -1001,7 +1001,7 @@ bnad_cb_rx_cleanup(struct bnad *bnad, struct bna_rx *rx)
 
 	mdelay(BNAD_TXRX_SYNC_MDELAY);
 
-	for (i = 0; i < BNAD_MAX_RXPS_PER_RX; i++) {
+	for (i = 0; i < BNAD_MAX_RXP_PER_RX; i++) {
 		rx_ctrl = &rx_info->rx_ctrl[i];
 		ccb = rx_ctrl->ccb;
 		if (!ccb)
@@ -1030,7 +1030,7 @@ bnad_cb_rx_post(struct bnad *bnad, struct bna_rx *rx)
 	int i;
 	int j;
 
-	for (i = 0; i < BNAD_MAX_RXPS_PER_RX; i++) {
+	for (i = 0; i < BNAD_MAX_RXP_PER_RX; i++) {
 		rx_ctrl = &rx_info->rx_ctrl[i];
 		ccb = rx_ctrl->ccb;
 		if (!ccb)
@@ -2227,7 +2227,7 @@ bnad_q_num_init(struct bnad *bnad)
 	int rxps;
 
 	rxps = min((uint)num_online_cpus(),
-			(uint)(BNAD_MAX_RXS * BNAD_MAX_RXPS_PER_RX));
+			(uint)(BNAD_MAX_RX * BNAD_MAX_RXP_PER_RX));
 
 	if (!(bnad->cfg_flags & BNAD_CF_MSIX))
 		rxps = 1;	/* INTx */
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h b/drivers/net/ethernet/brocade/bna/bnad.h
index 5b5451e..3c23139 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -38,12 +38,12 @@
 #define BNAD_TXQ_DEPTH		2048
 #define BNAD_RXQ_DEPTH		2048
 
-#define BNAD_MAX_TXS		1
+#define BNAD_MAX_TX		1
 #define BNAD_MAX_TXQ_PER_TX	8	/* 8 priority queues */
 #define BNAD_TXQ_NUM		1
 
-#define BNAD_MAX_RXS		1
-#define BNAD_MAX_RXPS_PER_RX	16
+#define BNAD_MAX_RX		1
+#define BNAD_MAX_RXP_PER_RX	16
 #define BNAD_MAX_RXQ_PER_RXP	2
 
 /*
@@ -190,7 +190,7 @@ struct bnad_tx_info {
 struct bnad_rx_info {
 	struct bna_rx *rx; /* 1:1 between rx_info & rx */
 
-	struct bnad_rx_ctrl rx_ctrl[BNAD_MAX_RXPS_PER_RX];
+	struct bnad_rx_ctrl rx_ctrl[BNAD_MAX_RXP_PER_RX];
 	u32 rx_id;
 } ____cacheline_aligned;
 
@@ -234,8 +234,8 @@ struct bnad {
 	struct net_device	*netdev;
 
 	/* Data path */
-	struct bnad_tx_info tx_info[BNAD_MAX_TXS];
-	struct bnad_rx_info rx_info[BNAD_MAX_RXS];
+	struct bnad_tx_info tx_info[BNAD_MAX_TX];
+	struct bnad_rx_info rx_info[BNAD_MAX_RX];
 
 	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 	/*
@@ -255,8 +255,8 @@ struct bnad {
 	u8			tx_coalescing_timeo;
 	u8			rx_coalescing_timeo;
 
-	struct bna_rx_config rx_config[BNAD_MAX_RXS];
-	struct bna_tx_config tx_config[BNAD_MAX_TXS];
+	struct bna_rx_config rx_config[BNAD_MAX_RX];
+	struct bna_tx_config tx_config[BNAD_MAX_TX];
 
 	void __iomem		*bar0;	/* BAR0 address */
 
@@ -283,8 +283,8 @@ struct bnad {
 	/* Control path resources, memory & irq */
 	struct bna_res_info res_info[BNA_RES_T_MAX];
 	struct bna_res_info mod_res_info[BNA_MOD_RES_T_MAX];
-	struct bnad_tx_res_info tx_res_info[BNAD_MAX_TXS];
-	struct bnad_rx_res_info rx_res_info[BNAD_MAX_RXS];
+	struct bnad_tx_res_info tx_res_info[BNAD_MAX_TX];
+	struct bnad_rx_res_info rx_res_info[BNAD_MAX_RX];
 
 	struct bnad_completion bnad_completions;
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 00/13] bna: Update bna driver version to 3.0.2.1
From: Rasesh Mody @ 2011-08-19 21:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: adapter_linux_open_src_team, Rasesh Mody

Hi Dave,

   We are re-submitting this patch set with the comments addressed.

   The following patch set contains TX and RX path changes and Ethtool
   enhancements. This also fixes bugs found with new code. It cleans
   up unused code, naming changes, formatting changes and comments
   addition/deletion.

   This updates the Brocade BNA driver to v3.0.2.1.

   The driver has been compiled & tested against net-next-2.6(3.0.0-rc7)

Thanks,
Rasesh

Rasesh Mody (13):
  bna: Naming Change and Minor Macro Fix
  bna: PCI Probe Fix
  bna: Interrupt Polling and NAPI Init Changes
  bna: SKB Check and Drop Macros
  bna: TX Path and RX Path Changes
  bna: Formatting and Code Cleanup
  bna: Initialization and Locking Fix
  bna: Ethtool Enhancements and Fix
  bna: Async Mode Tx Rx Init Fix
  bna: MBOX IRQ Flag Check after Locking
  bna: Queue Depth and SKB Unmap Array Fix
  bna: SKB PCI UNMAP Fix
  bna: Driver Version changed to 3.0.2.1

 drivers/net/ethernet/brocade/bna/bfa_cee.c         |    2 -
 drivers/net/ethernet/brocade/bna/bfa_defs.h        |    8 +-
 .../net/ethernet/brocade/bna/bfa_defs_mfg_comm.h   |    1 -
 drivers/net/ethernet/brocade/bna/bfa_ioc.h         |    6 +-
 drivers/net/ethernet/brocade/bna/bfi.h             |   46 --
 drivers/net/ethernet/brocade/bna/bna.h             |   18 +-
 drivers/net/ethernet/brocade/bna/bna_enet.c        |   29 +-
 drivers/net/ethernet/brocade/bna/bna_hw_defs.h     |    5 +
 drivers/net/ethernet/brocade/bna/bna_types.h       |    2 +-
 drivers/net/ethernet/brocade/bna/bnad.c            |  485 +++++++++++---------
 drivers/net/ethernet/brocade/bna/bnad.h            |   83 +++-
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c    |   96 ++++-
 drivers/net/ethernet/brocade/bna/cna.h             |   11 +-
 13 files changed, 450 insertions(+), 342 deletions(-)


^ permalink raw reply

* [RFC] bridge: add netfilter hook for forwarding 802.1D group addresses
From: Stephen Hemminger @ 2011-08-19 20:58 UTC (permalink / raw)
  To: David Lamparter; +Cc: Nick Carter, Ed Swierk, netdev, bridge, netfilter-devel
In-Reply-To: <20110819022731.GC180151@jupiter.n2.diac24.net>

The IEEE standard expects that link local multicast packets will not
be forwarded by a bridge. But there are cases like 802.1X which may
require that packets be forwarded. For maximum flexibilty implement
this via netfilter.

The netfilter chain is slightly different from other chains in that
if packet is ACCEPTED by the chain, it means it should be forwarded.
And if the packet verdict result is DROP, the packet is processed
as a local packet. The default result for this chain is DROP and
therefore users who do not install any rules will get the same
result as before; ie. packets are only processed on the local host
and not forwarded.

Spanning Tree Packets are treated specially and do not
go through the new chain.

This code is conceptual design concept only. It compiles but
hasn't been tested.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 include/linux/netfilter_bridge.h      |    5 ++++-
 net/bridge/br_input.c                 |   15 ++++++++++++---
 net/bridge/netfilter/ebtable_filter.c |   18 ++++++++++++++++--
 3 files changed, 32 insertions(+), 6 deletions(-)

--- a/include/linux/netfilter_bridge.h	2011-08-19 13:11:51.972125670 -0700
+++ b/include/linux/netfilter_bridge.h	2011-08-19 13:13:36.452130443 -0700
@@ -22,7 +22,10 @@
 #define NF_BR_POST_ROUTING	4
 /* Not really a hook, but used for the ebtables broute table */
 #define NF_BR_BROUTING		5
-#define NF_BR_NUMHOOKS		6
+/* Packets to link local multicast addresses (01-80-C2-00-00-XX) */
+#define NF_BR_LINK_LOCAL_IN	6
+
+#define NF_BR_NUMHOOKS		7
 
 #ifdef __KERNEL__
 
--- a/net/bridge/br_input.c	2011-08-18 16:12:02.576672548 -0700
+++ b/net/bridge/br_input.c	2011-08-19 13:28:13.696170518 -0700
@@ -166,10 +166,19 @@ rx_handler_result_t br_handle_frame(stru
 		if (skb->protocol == htons(ETH_P_PAUSE))
 			goto drop;
 
-		/* If STP is turned off, then forward */
-		if (p->br->stp_enabled == BR_NO_STP && dest[5] == 0)
-			goto forward;
+		/* If this is Spanning Tree Protocol packet */
+		if (dest[5] == 0) {
+			/* and STP is turned off, then forward */
+			if (p->br->stp_enabled == BR_NO_STP)
+				goto forward;
+		}
+		/* Hook to allow forwarding other group MAC addresses */
+		else if (p->state == BR_STATE_FORWARDING &&
+			 NF_HOOK(NFPROTO_BRIDGE, NF_BR_LINK_LOCAL_IN, skb, skb->dev,
+				 NULL, br_handle_frame_finish))
+			return RX_HANDLER_CONSUMED;	/* forwarded */
 
+		/* Packet will go only to the local host. */
 		if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
 			    NULL, br_handle_local_finish)) {
 			return RX_HANDLER_CONSUMED; /* consumed by filter */
--- a/net/bridge/netfilter/ebtable_filter.c	2011-08-19 13:14:46.232133631 -0700
+++ b/net/bridge/netfilter/ebtable_filter.c	2011-08-19 13:27:33.436168679 -0700
@@ -11,8 +11,10 @@
 #include <linux/netfilter_bridge/ebtables.h>
 #include <linux/module.h>
 
-#define FILTER_VALID_HOOKS ((1 << NF_BR_LOCAL_IN) | (1 << NF_BR_FORWARD) | \
-   (1 << NF_BR_LOCAL_OUT))
+#define FILTER_VALID_HOOKS ((1 << NF_BR_LOCAL_IN) | \
+			    (1 << NF_BR_FORWARD) | \
+			    (1 << NF_BR_LOCAL_OUT) | \
+			    (1 << NF_BR_LINK_LOCAL_IN))
 
 static struct ebt_entries initial_chains[] =
 {
@@ -28,6 +30,10 @@ static struct ebt_entries initial_chains
 		.name	= "OUTPUT",
 		.policy	= EBT_ACCEPT,
 	},
+	{
+		.name	= "LINKLOCAL",
+		.policy = EBT_DROP,
+	},
 };
 
 static struct ebt_replace_kernel initial_table =
@@ -39,6 +45,7 @@ static struct ebt_replace_kernel initial
 		[NF_BR_LOCAL_IN]	= &initial_chains[0],
 		[NF_BR_FORWARD]		= &initial_chains[1],
 		[NF_BR_LOCAL_OUT]	= &initial_chains[2],
+		[NF_BR_LINK_LOCAL_IN]	= &initial_chains[3],
 	},
 	.entries	= (char *)initial_chains,
 };
@@ -95,6 +102,13 @@ static struct nf_hook_ops ebt_ops_filter
 		.hooknum	= NF_BR_LOCAL_OUT,
 		.priority	= NF_BR_PRI_FILTER_OTHER,
 	},
+	{
+		.hook		= ebt_in_hook,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_BRIDGE,
+		.hooknum	= NF_BR_LINK_LOCAL_IN,
+		.priority	= NF_BR_PRI_FILTER_BRIDGED,
+	},
 };
 
 static int __net_init frame_filter_net_init(struct net *net)

^ permalink raw reply

* [PATCH v2] dm9000: define debug level as a module parameter
From: Vladimir Zapolskiy @ 2011-08-19 20:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Ben Dooks

This change allows to get driver specific debug messages output
providing a module parameter. As far as the maximum level of verbosity
is too high, it is demoted by default.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Cc: Ben Dooks <ben-linux@fluff.org>
---
 drivers/net/Kconfig  |    8 --------
 drivers/net/dm9000.c |   11 ++++++++---
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8d0314d..bb0733d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -985,14 +985,6 @@ config DM9000
 	  To compile this driver as a module, choose M here.  The module
 	  will be called dm9000.
 
-config DM9000_DEBUGLEVEL
-	int "DM9000 maximum debug level"
-	depends on DM9000
-	default 4
-	help
-	  The maximum level of debugging code compiled into the DM9000
-	  driver.
-
 config DM9000_FORCE_SIMPLE_PHY_POLL
 	bool "Force simple NSR based PHY polling"
 	depends on DM9000
diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
index 8ef31dc..4080e55 100644
--- a/drivers/net/dm9000.c
+++ b/drivers/net/dm9000.c
@@ -56,6 +56,13 @@ static int watchdog = 5000;
 module_param(watchdog, int, 0400);
 MODULE_PARM_DESC(watchdog, "transmit timeout in milliseconds");
 
+/*
+ * Debug messages level
+ */
+static int debug;
+module_param(debug, int, 0644);
+MODULE_PARM_DESC(debug, "dm9000 debug level (0-4)");
+
 /* DM9000 register address locking.
  *
  * The DM9000 uses an address register to control where data written
@@ -103,7 +110,6 @@ typedef struct board_info {
 	unsigned int	flags;
 	unsigned int	in_suspend :1;
 	unsigned int	wake_supported :1;
-	int		debug_level;
 
 	enum dm9000_type type;
 
@@ -138,8 +144,7 @@ typedef struct board_info {
 /* debug code */
 
 #define dm9000_dbg(db, lev, msg...) do {		\
-	if ((lev) < CONFIG_DM9000_DEBUGLEVEL &&		\
-	    (lev) < db->debug_level) {			\
+	if ((lev) < debug) {				\
 		dev_dbg(db->dev, msg);			\
 	}						\
 } while (0)
-- 
1.7.5.1


^ permalink raw reply related

* Re: [PATCH] dm9000: control debug level of the driver
From: Vladimir Zapolskiy @ 2011-08-19 20:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ben-linux
In-Reply-To: <20110818.220902.1663745747317150778.davem@davemloft.net>

On 19.08.2011 08:09, David Miller wrote:
> From: Vladimir Zapolskiy<vz@mleia.com>
> Date: Mon, 15 Aug 2011 19:38:34 +0300
>
>> This change allows to get driver specific debug messages output
>> setting a default value for db->debug_level. As far as the maximum
>> level of verbosity is too high, it is demoted by default.
>>
>> Signed-off-by: Vladimir Zapolskiy<vz@mleia.com>
>> Cc: Ben Dooks<ben-linux@fluff.org>
>
> I would much rather see this config option eliminated entirely.
>
> The default can be set by the user at kernel boot or module
> load time with command line settings.

This definitely should be an improvement, initially I wanted to fix a 
particular problem, but let's do it even better.

--
With best wishes,
Vladimir

^ permalink raw reply

* [PATCH net-next v5 2/2] af-packet: TPACKET_V3 flexible buffer implementation.
From: Chetan Loke @ 2011-08-19 20:18 UTC (permalink / raw)
  To: netdev, davem; +Cc: Chetan Loke
In-Reply-To: <1313785096-911-1-git-send-email-loke.chetan@gmail.com>

1) Blocks can be configured with non-static frame-size.
2) Read/poll is at a block-level(as opposed to packet-level).
3) Added poll timeout to avoid indefinite user-space wait on idle links.
4) Added user-configurable knobs:
   4.1) block::timeout.
   4.2) tpkt_hdr::sk_rxhash.


Changes:
C1) tpacket_rcv()
    C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
          The bulk of the processing is then moved in the following chain:
          packet_current_rx_frame()
            __packet_lookup_frame_in_block
              fill_curr_block()
              or
                retire_current_block
                dispatch_next_block
              or
              return NULL(queue is plugged/paused)

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 net/packet/af_packet.c |  937 +++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 891 insertions(+), 46 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c698cec..4371e3a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -40,6 +40,10 @@
  *					byte arrays at the end of sockaddr_ll
  *					and packet_mreq.
  *		Johann Baudy	:	Added TX RING.
+ *		Chetan Loke	:	Implemented TPACKET_V3 block abstraction
+ *					layer.
+ *					Copyright (C) 2011, <lokec@ccs.neu.edu>
+ *
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -161,9 +165,56 @@ struct packet_mreq_max {
 	unsigned char	mr_address[MAX_ADDR_LEN];
 };
 
-static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
+static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 		int closing, int tx_ring);
 
+
+#define V3_ALIGNMENT	(8)
+
+#define BLK_HDR_LEN	(ALIGN(sizeof(struct block_desc), V3_ALIGNMENT))
+
+#define BLK_PLUS_PRIV(sz_of_priv) \
+	(BLK_HDR_LEN + ALIGN((sz_of_priv), V3_ALIGNMENT))
+
+/* kbdq - kernel block descriptor queue */
+struct kbdq_core {
+	struct pgv	*pkbdq;
+	unsigned int	feature_req_word;
+	unsigned int	hdrlen;
+	unsigned char	reset_pending_on_curr_blk;
+	unsigned char   delete_blk_timer;
+	unsigned short	kactive_blk_num;
+	unsigned short	blk_sizeof_priv;
+
+	/* last_kactive_blk_num:
+	 * trick to see if user-space has caught up
+	 * in order to avoid refreshing timer when every single pkt arrives.
+	 */
+	unsigned short	last_kactive_blk_num;
+
+	char		*pkblk_start;
+	char		*pkblk_end;
+	int		kblk_size;
+	unsigned int	knum_blocks;
+	uint64_t	knxt_seq_num;
+	char		*prev;
+	char		*nxt_offset;
+	struct sk_buff	*skb;
+
+	atomic_t	blk_fill_in_prog;
+
+	/* Default is set to 8ms */
+#define DEFAULT_PRB_RETIRE_TOV	(8)
+
+	unsigned short  retire_blk_tov;
+	unsigned short  version;
+	unsigned long	tov_in_jiffies;
+
+	/* timer to retire an outstanding block */
+	struct timer_list retire_blk_timer;
+};
+
+#define PGV_FROM_VMALLOC 1
 struct pgv {
 	char *buffer;
 };
@@ -179,12 +230,40 @@ struct packet_ring_buffer {
 	unsigned int		pg_vec_pages;
 	unsigned int		pg_vec_len;
 
+	struct kbdq_core	prb_bdqc;
 	atomic_t		pending;
 };
 
+#define BLOCK_STATUS(x)	((x)->hdr.bh1.block_status)
+#define BLOCK_NUM_PKTS(x)	((x)->hdr.bh1.num_pkts)
+#define BLOCK_O2FP(x)		((x)->hdr.bh1.offset_to_first_pkt)
+#define BLOCK_LEN(x)		((x)->hdr.bh1.blk_len)
+#define BLOCK_SNUM(x)		((x)->hdr.bh1.seq_num)
+#define BLOCK_O2PRIV(x)	((x)->offset_to_priv)
+#define BLOCK_PRIV(x)		((void *)((char *)(x) + BLOCK_O2PRIV(x)))
+
 struct packet_sock;
 static int tpacket_snd(struct packet_sock *po, struct msghdr *msg);
 
+static void *packet_previous_frame(struct packet_sock *po,
+		struct packet_ring_buffer *rb,
+		int status);
+static void packet_increment_head(struct packet_ring_buffer *buff);
+static int prb_curr_blk_in_use(struct kbdq_core *,
+			struct block_desc *);
+static void *prb_dispatch_next_block(struct kbdq_core *,
+			struct packet_sock *);
+static void prb_retire_current_block(struct kbdq_core *,
+		struct packet_sock *, unsigned int status);
+static int prb_queue_frozen(struct kbdq_core *);
+static void prb_open_block(struct kbdq_core *, struct block_desc *);
+static void prb_retire_rx_blk_timer_expired(unsigned long);
+static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *);
+static void prb_init_blk_timer(struct packet_sock *, struct kbdq_core *,
+				void (*func) (unsigned long));
+static void prb_fill_rxhash(struct kbdq_core *, struct tpacket3_hdr *);
+static void prb_clear_rxhash(struct kbdq_core *, struct tpacket3_hdr *);
+static void prb_fill_vlan_info(struct kbdq_core *, struct tpacket3_hdr *);
 static void packet_flush_mclist(struct sock *sk);
 
 struct packet_fanout;
@@ -193,6 +272,7 @@ struct packet_sock {
 	struct sock		sk;
 	struct packet_fanout	*fanout;
 	struct tpacket_stats	stats;
+	union  tpacket_stats_u	stats_u;
 	struct packet_ring_buffer	rx_ring;
 	struct packet_ring_buffer	tx_ring;
 	int			copy_thresh;
@@ -242,6 +322,15 @@ struct packet_skb_cb {
 
 #define PACKET_SKB_CB(__skb)	((struct packet_skb_cb *)((__skb)->cb))
 
+#define GET_PBDQC_FROM_RB(x)	((struct kbdq_core *)(&(x)->prb_bdqc))
+#define GET_PBLOCK_DESC(x, bid)	\
+	((struct block_desc *)((x)->pkbdq[(bid)].buffer))
+#define GET_CURR_PBLOCK_DESC_FROM_CORE(x)	\
+	((struct block_desc *)((x)->pkbdq[(x)->kactive_blk_num].buffer))
+#define GET_NEXT_PRB_BLK_NUM(x) \
+	(((x)->kactive_blk_num < ((x)->knum_blocks-1)) ? \
+	((x)->kactive_blk_num+1) : 0)
+
 static inline struct packet_sock *pkt_sk(struct sock *sk)
 {
 	return (struct packet_sock *)sk;
@@ -325,8 +414,9 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
 		h.h2->tp_status = status;
 		flush_dcache_page(pgv_to_page(&h.h2->tp_status));
 		break;
+	case TPACKET_V3:
 	default:
-		pr_err("TPACKET version not supported\n");
+		WARN(1, "TPACKET version not supported.\n");
 		BUG();
 	}
 
@@ -351,8 +441,9 @@ static int __packet_get_status(struct packet_sock *po, void *frame)
 	case TPACKET_V2:
 		flush_dcache_page(pgv_to_page(&h.h2->tp_status));
 		return h.h2->tp_status;
+	case TPACKET_V3:
 	default:
-		pr_err("TPACKET version not supported\n");
+		WARN(1, "TPACKET version not supported.\n");
 		BUG();
 		return 0;
 	}
@@ -389,6 +480,665 @@ static inline void *packet_current_frame(struct packet_sock *po,
 	return packet_lookup_frame(po, rb, rb->head, status);
 }
 
+static void prb_del_retire_blk_timer(struct kbdq_core *pkc)
+{
+	del_timer_sync(&pkc->retire_blk_timer);
+}
+
+static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
+		int tx_ring,
+		struct sk_buff_head *rb_queue)
+{
+	struct kbdq_core *pkc;
+
+	pkc = tx_ring ? &po->tx_ring.prb_bdqc : &po->rx_ring.prb_bdqc;
+
+	spin_lock(&rb_queue->lock);
+	pkc->delete_blk_timer = 1;
+	spin_unlock(&rb_queue->lock);
+
+	prb_del_retire_blk_timer(pkc);
+}
+
+static void prb_init_blk_timer(struct packet_sock *po,
+		struct kbdq_core *pkc,
+		void (*func) (unsigned long))
+{
+	init_timer(&pkc->retire_blk_timer);
+	pkc->retire_blk_timer.data = (long)po;
+	pkc->retire_blk_timer.function = func;
+	pkc->retire_blk_timer.expires = jiffies;
+}
+
+static void prb_setup_retire_blk_timer(struct packet_sock *po, int tx_ring)
+{
+	struct kbdq_core *pkc;
+
+	if (tx_ring)
+		BUG();
+
+	pkc = tx_ring ? &po->tx_ring.prb_bdqc : &po->rx_ring.prb_bdqc;
+	prb_init_blk_timer(po, pkc, prb_retire_rx_blk_timer_expired);
+}
+
+static int prb_calc_retire_blk_tmo(struct packet_sock *po,
+				int blk_size_in_bytes)
+{
+	struct net_device *dev;
+	unsigned int mbits = 0, msec = 0, div = 0, tmo = 0;
+
+	dev = dev_get_by_index(sock_net(&po->sk), po->ifindex);
+	if (unlikely(dev == NULL))
+		return DEFAULT_PRB_RETIRE_TOV;
+
+	if (dev->ethtool_ops && dev->ethtool_ops->get_settings) {
+		struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET, };
+
+		if (!dev->ethtool_ops->get_settings(dev, &ecmd)) {
+			switch (ecmd.speed) {
+			case SPEED_10000:
+				msec = 1;
+				div = 10000/1000;
+				break;
+			case SPEED_1000:
+				msec = 1;
+				div = 1000/1000;
+				break;
+			/*
+			 * If the link speed is so slow you don't really
+			 * need to worry about perf anyways
+			 */
+			case SPEED_100:
+			case SPEED_10:
+			default:
+				return DEFAULT_PRB_RETIRE_TOV;
+			}
+		}
+	}
+
+	mbits = (blk_size_in_bytes * 8) / (1024 * 1024);
+
+	if (div)
+		mbits /= div;
+
+	tmo = mbits * msec;
+
+	if (div)
+		return tmo+1;
+	return tmo;
+}
+
+static void prb_init_ft_ops(struct kbdq_core *p1,
+			union tpacket_req_u *req_u)
+{
+	p1->feature_req_word = req_u->req3.tp_feature_req_word;
+}
+
+static void init_prb_bdqc(struct packet_sock *po,
+			struct packet_ring_buffer *rb,
+			struct pgv *pg_vec,
+			union tpacket_req_u *req_u, int tx_ring)
+{
+	struct kbdq_core *p1 = &rb->prb_bdqc;
+	struct block_desc *pbd;
+
+	memset(p1, 0x0, sizeof(*p1));
+
+	p1->knxt_seq_num = 1;
+	p1->pkbdq = pg_vec;
+	pbd = (struct block_desc *)pg_vec[0].buffer;
+	p1->pkblk_start	= (char *)pg_vec[0].buffer;
+	p1->kblk_size = req_u->req3.tp_block_size;
+	p1->knum_blocks	= req_u->req3.tp_block_nr;
+	p1->hdrlen = po->tp_hdrlen;
+	p1->version = po->tp_version;
+	p1->last_kactive_blk_num = 0;
+	po->stats_u.stats3.tp_freeze_q_cnt = 0;
+	if (req_u->req3.tp_retire_blk_tov)
+		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
+	else
+		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
+						req_u->req3.tp_block_size);
+	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
+	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
+
+	prb_init_ft_ops(p1, req_u);
+	prb_setup_retire_blk_timer(po, tx_ring);
+	prb_open_block(p1, pbd);
+}
+
+/*  Do NOT update the last_blk_num first.
+ *  Assumes sk_buff_head lock is held.
+ */
+static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *pkc)
+{
+	mod_timer(&pkc->retire_blk_timer,
+			jiffies + pkc->tov_in_jiffies);
+	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
+}
+
+/*
+ * Timer logic:
+ * 1) We refresh the timer only when we open a block.
+ *    By doing this we don't waste cycles refreshing the timer
+ *	  on packet-by-packet basis.
+ *
+ * With a 1MB block-size, on a 1Gbps line, it will take
+ * i) ~8 ms to fill a block + ii) memcpy etc.
+ * In this cut we are not accounting for the memcpy time.
+ *
+ * So, if the user sets the 'tmo' to 10ms then the timer
+ * will never fire while the block is still getting filled
+ * (which is what we want). However, the user could choose
+ * to close a block early and that's fine.
+ *
+ * But when the timer does fire, we check whether or not to refresh it.
+ * Since the tmo granularity is in msecs, it is not too expensive
+ * to refresh the timer, lets say every '8' msecs.
+ * Either the user can set the 'tmo' or we can derive it based on
+ * a) line-speed and b) block-size.
+ * prb_calc_retire_blk_tmo() calculates the tmo.
+ *
+ */
+static void prb_retire_rx_blk_timer_expired(unsigned long data)
+{
+	struct packet_sock *po = (struct packet_sock *)data;
+	struct kbdq_core *pkc = &po->rx_ring.prb_bdqc;
+	unsigned int frozen;
+	struct block_desc *pbd;
+
+	spin_lock(&po->sk.sk_receive_queue.lock);
+
+	frozen = prb_queue_frozen(pkc);
+	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	if (unlikely(pkc->delete_blk_timer))
+		goto out;
+
+	/* We only need to plug the race when the block is partially filled.
+	 * tpacket_rcv:
+	 *		lock(); increment BLOCK_NUM_PKTS; unlock()
+	 *		copy_bits() is in progress ...
+	 *		timer fires on other cpu:
+	 *		we can't retire the current block because copy_bits
+	 *		is in progress.
+	 *
+	 */
+	if (BLOCK_NUM_PKTS(pbd)) {
+		while (atomic_read(&pkc->blk_fill_in_prog)) {
+			/* Waiting for skb_copy_bits to finish... */
+			cpu_relax();
+		}
+	}
+
+	if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
+		if (!frozen) {
+			prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
+			if (!prb_dispatch_next_block(pkc, po))
+				goto refresh_timer;
+			else
+				goto out;
+		} else {
+			/* Case 1. Queue was frozen because user-space was
+			 *	   lagging behind.
+			 */
+			if (prb_curr_blk_in_use(pkc, pbd)) {
+				/*
+				 * Ok, user-space is still behind.
+				 * So just refresh the timer.
+				 */
+				goto refresh_timer;
+			} else {
+			       /* Case 2. queue was frozen,user-space caught up,
+				* now the link went idle && the timer fired.
+				* We don't have a block to close.So we open this
+				* block and restart the timer.
+				* opening a block thaws the queue,restarts timer
+				* Thawing/timer-refresh is a side effect.
+				*/
+				prb_open_block(pkc, pbd);
+				goto out;
+			}
+		}
+	}
+
+refresh_timer:
+	_prb_refresh_rx_retire_blk_timer(pkc);
+
+out:
+	spin_unlock(&po->sk.sk_receive_queue.lock);
+}
+
+static inline void prb_flush_block(struct kbdq_core *pkc1,
+		struct block_desc *pbd1, __u32 status)
+{
+	/* Flush everything minus the block header */
+
+#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
+	u8 *start, *end;
+
+	start = (u8 *)pbd1;
+
+	/* Skip the block header(we know header WILL fit in 4K) */
+	start += PAGE_SIZE;
+
+	end = (u8 *)PAGE_ALIGN((unsigned long)pkc1->pkblk_end);
+	for (; start < end; start += PAGE_SIZE)
+		flush_dcache_page(pgv_to_page(start));
+
+	smp_wmb();
+#endif
+
+	/* Now update the block status. */
+
+	BLOCK_STATUS(pbd1) = status;
+
+	/* Flush the block header */
+
+#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
+	start = (u8 *)pbd1;
+	flush_dcache_page(pgv_to_page(start));
+
+	smp_wmb();
+#endif
+}
+
+/*
+ * Side effect:
+ *
+ * 1) flush the block
+ * 2) Increment active_blk_num
+ *
+ * Note:We DONT refresh the timer on purpose.
+ *	Because almost always the next block will be opened.
+ */
+static void prb_close_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
+		struct packet_sock *po, unsigned int stat)
+{
+	__u32 status = TP_STATUS_USER | stat;
+
+	struct tpacket3_hdr *last_pkt;
+	struct hdr_v1 *h1 = &pbd1->hdr.bh1;
+
+	if (po->stats.tp_drops)
+		status |= TP_STATUS_LOSING;
+
+	last_pkt = (struct tpacket3_hdr *)pkc1->prev;
+	last_pkt->tp_next_offset = 0;
+
+	/* Get the ts of the last pkt */
+	if (BLOCK_NUM_PKTS(pbd1)) {
+		h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
+		h1->ts_last_pkt.ts_nsec	= last_pkt->tp_nsec;
+	} else {
+		/* Ok, we tmo'd - so get the current time */
+		struct timespec ts;
+		getnstimeofday(&ts);
+		h1->ts_last_pkt.ts_sec = ts.tv_sec;
+		h1->ts_last_pkt.ts_nsec	= ts.tv_nsec;
+	}
+
+	smp_wmb();
+
+	/* Flush the block */
+	prb_flush_block(pkc1, pbd1, status);
+
+	pkc1->kactive_blk_num = GET_NEXT_PRB_BLK_NUM(pkc1);
+}
+
+static inline void prb_thaw_queue(struct kbdq_core *pkc)
+{
+	pkc->reset_pending_on_curr_blk = 0;
+}
+
+/*
+ * Side effect of opening a block:
+ *
+ * 1) prb_queue is thawed.
+ * 2) retire_blk_timer is refreshed.
+ *
+ */
+static void prb_open_block(struct kbdq_core *pkc1, struct block_desc *pbd1)
+{
+	struct timespec ts;
+	struct hdr_v1 *h1 = &pbd1->hdr.bh1;
+
+	smp_rmb();
+
+	if (likely(TP_STATUS_KERNEL == BLOCK_STATUS(pbd1))) {
+
+		/* We could have just memset this but we will lose the
+		 * flexibility of making the priv area sticky
+		 */
+		BLOCK_SNUM(pbd1) = pkc1->knxt_seq_num++;
+		BLOCK_NUM_PKTS(pbd1) = 0;
+		BLOCK_LEN(pbd1) = BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);
+		getnstimeofday(&ts);
+		h1->ts_first_pkt.ts_sec = ts.tv_sec;
+		h1->ts_first_pkt.ts_nsec = ts.tv_nsec;
+		pkc1->pkblk_start = (char *)pbd1;
+		pkc1->nxt_offset = (char *)(pkc1->pkblk_start +
+		BLK_PLUS_PRIV(pkc1->blk_sizeof_priv));
+		BLOCK_O2FP(pbd1) = (__u32)BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);
+		BLOCK_O2PRIV(pbd1) = BLK_HDR_LEN;
+		pbd1->version = pkc1->version;
+		pkc1->prev = pkc1->nxt_offset;
+		pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
+		prb_thaw_queue(pkc1);
+		_prb_refresh_rx_retire_blk_timer(pkc1);
+
+		smp_wmb();
+
+		return;
+	}
+
+	WARN(1, "ERROR block:%p is NOT FREE status:%d kactive_blk_num:%d\n",
+		pbd1, BLOCK_STATUS(pbd1), pkc1->kactive_blk_num);
+	dump_stack();
+	BUG();
+}
+
+/*
+ * Queue freeze logic:
+ * 1) Assume tp_block_nr = 8 blocks.
+ * 2) At time 't0', user opens Rx ring.
+ * 3) Some time past 't0', kernel starts filling blocks starting from 0 .. 7
+ * 4) user-space is either sleeping or processing block '0'.
+ * 5) tpacket_rcv is currently filling block '7', since there is no space left,
+ *    it will close block-7,loop around and try to fill block '0'.
+ *    call-flow:
+ *    __packet_lookup_frame_in_block
+ *      prb_retire_current_block()
+ *      prb_dispatch_next_block()
+ *        |->(BLOCK_STATUS == USER) evaluates to true
+ *    5.1) Since block-0 is currently in-use, we just freeze the queue.
+ * 6) Now there are two cases:
+ *    6.1) Link goes idle right after the queue is frozen.
+ *         But remember, the last open_block() refreshed the timer.
+ *         When this timer expires,it will refresh itself so that we can
+ *         re-open block-0 in near future.
+ *    6.2) Link is busy and keeps on receiving packets. This is a simple
+ *         case and __packet_lookup_frame_in_block will check if block-0
+ *         is free and can now be re-used.
+ */
+static inline void prb_freeze_queue(struct kbdq_core *pkc,
+				  struct packet_sock *po)
+{
+	pkc->reset_pending_on_curr_blk = 1;
+	po->stats_u.stats3.tp_freeze_q_cnt++;
+}
+
+#define TOTAL_PKT_LEN_INCL_ALIGN(length) (ALIGN((length), V3_ALIGNMENT))
+
+/*
+ * If the next block is free then we will dispatch it
+ * and return a good offset.
+ * Else, we will freeze the queue.
+ * So, caller must check the return value.
+ */
+static void *prb_dispatch_next_block(struct kbdq_core *pkc,
+		struct packet_sock *po)
+{
+	struct block_desc *pbd;
+
+	smp_rmb();
+
+	/* 1. Get current block num */
+	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	/* 2. If this block is currently in_use then freeze the queue */
+	if (TP_STATUS_USER & BLOCK_STATUS(pbd)) {
+		prb_freeze_queue(pkc, po);
+		return NULL;
+	}
+
+	/*
+	 * 3.
+	 * open this block and return the offset where the first packet
+	 * needs to get stored.
+	 */
+	prb_open_block(pkc, pbd);
+	return (void *)pkc->nxt_offset;
+}
+
+static void prb_retire_current_block(struct kbdq_core *pkc,
+		struct packet_sock *po, unsigned int status)
+{
+	struct block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	/* retire/close the current block */
+	if (likely(TP_STATUS_KERNEL == BLOCK_STATUS(pbd))) {
+		/*
+		 * Plug the case where copy_bits() is in progress on
+		 * cpu-0 and tpacket_rcv() got invoked on cpu-1, didn't
+		 * have space to copy the pkt in the current block and
+		 * called prb_retire_current_block()
+		 *
+		 * We don't need to worry about the TMO case because
+		 * the timer-handler already handled this case.
+		 */
+		if (!(status & TP_STATUS_BLK_TMO)) {
+			while (atomic_read(&pkc->blk_fill_in_prog)) {
+				/* Waiting for skb_copy_bits to finish... */
+				cpu_relax();
+			}
+		}
+		prb_close_block(pkc, pbd, po, status);
+		return;
+	}
+
+	WARN(1, "ERROR-pbd[%d]:%p\n", pkc->kactive_blk_num, pbd);
+	dump_stack();
+	BUG();
+}
+
+static inline int prb_curr_blk_in_use(struct kbdq_core *pkc,
+				      struct block_desc *pbd)
+{
+	return TP_STATUS_USER & BLOCK_STATUS(pbd);
+}
+
+static inline int prb_queue_frozen(struct kbdq_core *pkc)
+{
+	return pkc->reset_pending_on_curr_blk;
+}
+
+static inline void prb_clear_blk_fill_status(struct packet_ring_buffer *rb)
+{
+	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	atomic_dec(&pkc->blk_fill_in_prog);
+}
+
+static inline void prb_fill_rxhash(struct kbdq_core *pkc,
+			struct tpacket3_hdr *ppd)
+{
+	ppd->hv1.tp_rxhash = skb_get_rxhash(pkc->skb);
+}
+
+static inline void prb_clear_rxhash(struct kbdq_core *pkc,
+			struct tpacket3_hdr *ppd)
+{
+	ppd->hv1.tp_rxhash = 0;
+}
+
+static inline void prb_fill_vlan_info(struct kbdq_core *pkc,
+			struct tpacket3_hdr *ppd)
+{
+	if (vlan_tx_tag_present(pkc->skb)) {
+		ppd->hv1.tp_vlan_tci = vlan_tx_tag_get(pkc->skb);
+		ppd->tp_status = TP_STATUS_VLAN_VALID;
+	} else {
+		ppd->hv1.tp_vlan_tci = ppd->tp_status = 0;
+	}
+}
+
+static void prb_run_all_ft_ops(struct kbdq_core *pkc,
+			struct tpacket3_hdr *ppd)
+{
+	prb_fill_vlan_info(pkc, ppd);
+
+	if (pkc->feature_req_word & TP_FT_REQ_FILL_RXHASH)
+		prb_fill_rxhash(pkc, ppd);
+	else
+		prb_clear_rxhash(pkc, ppd);
+}
+
+static inline void prb_fill_curr_block(char *curr, struct kbdq_core *pkc,
+				struct block_desc *pbd,
+				unsigned int len)
+{
+	struct tpacket3_hdr *ppd;
+
+	ppd  = (struct tpacket3_hdr *)curr;
+	ppd->tp_next_offset = TOTAL_PKT_LEN_INCL_ALIGN(len);
+	pkc->prev = curr;
+	pkc->nxt_offset += TOTAL_PKT_LEN_INCL_ALIGN(len);
+	BLOCK_LEN(pbd) += TOTAL_PKT_LEN_INCL_ALIGN(len);
+	BLOCK_NUM_PKTS(pbd) += 1;
+	atomic_inc(&pkc->blk_fill_in_prog);
+	prb_run_all_ft_ops(pkc, ppd);
+}
+
+/* Assumes caller has the sk->rx_queue.lock */
+static void *__packet_lookup_frame_in_block(struct packet_sock *po,
+					    struct sk_buff *skb,
+						int status,
+					    unsigned int len
+					    )
+{
+	struct kbdq_core *pkc;
+	struct block_desc *pbd;
+	char *curr, *end;
+
+	pkc = GET_PBDQC_FROM_RB(((struct packet_ring_buffer *)&po->rx_ring));
+	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	/* Queue is frozen when user space is lagging behind */
+	if (prb_queue_frozen(pkc)) {
+		/*
+		 * Check if that last block which caused the queue to freeze,
+		 * is still in_use by user-space.
+		 */
+		if (prb_curr_blk_in_use(pkc, pbd)) {
+			/* Can't record this packet */
+			return NULL;
+		} else {
+			/*
+			 * Ok, the block was released by user-space.
+			 * Now let's open that block.
+			 * opening a block also thaws the queue.
+			 * Thawing is a side effect.
+			 */
+			prb_open_block(pkc, pbd);
+		}
+	}
+
+	smp_mb();
+	curr = pkc->nxt_offset;
+	pkc->skb = skb;
+	end = (char *) ((char *)pbd + pkc->kblk_size);
+
+	/* first try the current block */
+	if (curr+TOTAL_PKT_LEN_INCL_ALIGN(len) < end) {
+		prb_fill_curr_block(curr, pkc, pbd, len);
+		return (void *)curr;
+	}
+
+	/* Ok, close the current block */
+	prb_retire_current_block(pkc, po, 0);
+
+	/* Now, try to dispatch the next block */
+	curr = (char *)prb_dispatch_next_block(pkc, po);
+	if (curr) {
+		pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+		prb_fill_curr_block(curr, pkc, pbd, len);
+		return (void *)curr;
+	}
+
+	/*
+	 * No free blocks are available.user_space hasn't caught up yet.
+	 * Queue was just frozen and now this packet will get dropped.
+	 */
+	return NULL;
+}
+
+static inline void *packet_current_rx_frame(struct packet_sock *po,
+					    struct sk_buff *skb,
+					    int status, unsigned int len)
+{
+	char *curr = NULL;
+	switch (po->tp_version) {
+	case TPACKET_V1:
+	case TPACKET_V2:
+		curr = packet_lookup_frame(po, &po->rx_ring,
+					po->rx_ring.head, status);
+		return curr;
+	case TPACKET_V3:
+		return __packet_lookup_frame_in_block(po, skb, status, len);
+	default:
+		WARN(1, "TPACKET version not supported\n");
+		BUG();
+		return 0;
+	}
+}
+
+static inline void *prb_lookup_block(struct packet_sock *po,
+				     struct packet_ring_buffer *rb,
+				     unsigned int previous,
+				     int status)
+{
+	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	struct block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
+
+	if (status != BLOCK_STATUS(pbd))
+		return NULL;
+	return pbd;
+}
+
+static inline int prb_previous_blk_num(struct packet_ring_buffer *rb)
+{
+	unsigned int prev;
+	if (rb->prb_bdqc.kactive_blk_num)
+		prev = rb->prb_bdqc.kactive_blk_num-1;
+	else
+		prev = rb->prb_bdqc.knum_blocks-1;
+	return prev;
+}
+
+/* Assumes caller has held the rx_queue.lock */
+static inline void *__prb_previous_block(struct packet_sock *po,
+					 struct packet_ring_buffer *rb,
+					 int status)
+{
+	unsigned int previous = prb_previous_blk_num(rb);
+	return prb_lookup_block(po, rb, previous, status);
+}
+
+static inline void *packet_previous_rx_frame(struct packet_sock *po,
+					     struct packet_ring_buffer *rb,
+					     int status)
+{
+	if (po->tp_version <= TPACKET_V2)
+		return packet_previous_frame(po, rb, status);
+
+	return __prb_previous_block(po, rb, status);
+}
+
+static inline void packet_increment_rx_head(struct packet_sock *po,
+					    struct packet_ring_buffer *rb)
+{
+	switch (po->tp_version) {
+	case TPACKET_V1:
+	case TPACKET_V2:
+		return packet_increment_head(rb);
+	case TPACKET_V3:
+	default:
+		WARN(1, "TPACKET version not supported.\n");
+		BUG();
+		return;
+	}
+}
+
 static inline void *packet_previous_frame(struct packet_sock *po,
 		struct packet_ring_buffer *rb,
 		int status)
@@ -982,12 +1732,13 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	union {
 		struct tpacket_hdr *h1;
 		struct tpacket2_hdr *h2;
+		struct tpacket3_hdr *h3;
 		void *raw;
 	} h;
 	u8 *skb_head = skb->data;
 	int skb_len = skb->len;
 	unsigned int snaplen, res;
-	unsigned long status = TP_STATUS_LOSING|TP_STATUS_USER;
+	unsigned long status = TP_STATUS_USER;
 	unsigned short macoff, netoff, hdrlen;
 	struct sk_buff *copy_skb = NULL;
 	struct timeval tv;
@@ -1033,37 +1784,46 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 			po->tp_reserve;
 		macoff = netoff - maclen;
 	}
-
-	if (macoff + snaplen > po->rx_ring.frame_size) {
-		if (po->copy_thresh &&
-		    atomic_read(&sk->sk_rmem_alloc) + skb->truesize <
-		    (unsigned)sk->sk_rcvbuf) {
-			if (skb_shared(skb)) {
-				copy_skb = skb_clone(skb, GFP_ATOMIC);
-			} else {
-				copy_skb = skb_get(skb);
-				skb_head = skb->data;
+	if (po->tp_version <= TPACKET_V2) {
+		if (macoff + snaplen > po->rx_ring.frame_size) {
+			if (po->copy_thresh &&
+				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
+				< (unsigned)sk->sk_rcvbuf) {
+				if (skb_shared(skb)) {
+					copy_skb = skb_clone(skb, GFP_ATOMIC);
+				} else {
+					copy_skb = skb_get(skb);
+					skb_head = skb->data;
+				}
+				if (copy_skb)
+					skb_set_owner_r(copy_skb, sk);
 			}
-			if (copy_skb)
-				skb_set_owner_r(copy_skb, sk);
+			snaplen = po->rx_ring.frame_size - macoff;
+			if ((int)snaplen < 0)
+				snaplen = 0;
 		}
-		snaplen = po->rx_ring.frame_size - macoff;
-		if ((int)snaplen < 0)
-			snaplen = 0;
 	}
-
 	spin_lock(&sk->sk_receive_queue.lock);
-	h.raw = packet_current_frame(po, &po->rx_ring, TP_STATUS_KERNEL);
+	h.raw = packet_current_rx_frame(po, skb,
+					TP_STATUS_KERNEL, (macoff+snaplen));
 	if (!h.raw)
 		goto ring_is_full;
-	packet_increment_head(&po->rx_ring);
+	if (po->tp_version <= TPACKET_V2) {
+		packet_increment_rx_head(po, &po->rx_ring);
+	/*
+	 * LOSING will be reported till you read the stats,
+	 * because it's COR - Clear On Read.
+	 * Anyways, moving it for V1/V2 only as V3 doesn't need this
+	 * at packet level.
+	 */
+		if (po->stats.tp_drops)
+			status |= TP_STATUS_LOSING;
+	}
 	po->stats.tp_packets++;
 	if (copy_skb) {
 		status |= TP_STATUS_COPY;
 		__skb_queue_tail(&sk->sk_receive_queue, copy_skb);
 	}
-	if (!po->stats.tp_drops)
-		status &= ~TP_STATUS_LOSING;
 	spin_unlock(&sk->sk_receive_queue.lock);
 
 	skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
@@ -1114,6 +1874,29 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 		h.h2->tp_padding = 0;
 		hdrlen = sizeof(*h.h2);
 		break;
+	case TPACKET_V3:
+		/* tp_nxt_offset,vlan are already populated above.
+		 * So DONT clear those fields here
+		 */
+		h.h3->tp_status |= status;
+		h.h3->tp_len = skb->len;
+		h.h3->tp_snaplen = snaplen;
+		h.h3->tp_mac = macoff;
+		h.h3->tp_net = netoff;
+		if ((po->tp_tstamp & SOF_TIMESTAMPING_SYS_HARDWARE)
+				&& shhwtstamps->syststamp.tv64)
+			ts = ktime_to_timespec(shhwtstamps->syststamp);
+		else if ((po->tp_tstamp & SOF_TIMESTAMPING_RAW_HARDWARE)
+				&& shhwtstamps->hwtstamp.tv64)
+			ts = ktime_to_timespec(shhwtstamps->hwtstamp);
+		else if (skb->tstamp.tv64)
+			ts = ktime_to_timespec(skb->tstamp);
+		else
+			getnstimeofday(&ts);
+		h.h3->tp_sec  = ts.tv_sec;
+		h.h3->tp_nsec = ts.tv_nsec;
+		hdrlen = sizeof(*h.h3);
+		break;
 	default:
 		BUG();
 	}
@@ -1134,13 +1917,19 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	{
 		u8 *start, *end;
 
-		end = (u8 *)PAGE_ALIGN((unsigned long)h.raw + macoff + snaplen);
-		for (start = h.raw; start < end; start += PAGE_SIZE)
-			flush_dcache_page(pgv_to_page(start));
+		if (po->tp_version <= TPACKET_V2) {
+			end = (u8 *)PAGE_ALIGN((unsigned long)h.raw
+				+ macoff + snaplen);
+			for (start = h.raw; start < end; start += PAGE_SIZE)
+				flush_dcache_page(pgv_to_page(start));
+		}
 		smp_wmb();
 	}
 #endif
-	__packet_set_status(po, h.raw, status);
+	if (po->tp_version <= TPACKET_V2)
+		__packet_set_status(po, h.raw, status);
+	else
+		prb_clear_blk_fill_status(&po->rx_ring);
 
 	sk->sk_data_ready(sk, 0);
 
@@ -1631,7 +2420,7 @@ static int packet_release(struct socket *sock)
 	struct sock *sk = sock->sk;
 	struct packet_sock *po;
 	struct net *net;
-	struct tpacket_req req;
+	union tpacket_req_u req_u;
 
 	if (!sk)
 		return 0;
@@ -1654,13 +2443,13 @@ static int packet_release(struct socket *sock)
 
 	packet_flush_mclist(sk);
 
-	memset(&req, 0, sizeof(req));
+	memset(&req_u, 0, sizeof(req_u));
 
 	if (po->rx_ring.pg_vec)
-		packet_set_ring(sk, &req, 1, 0);
+		packet_set_ring(sk, &req_u, 1, 0);
 
 	if (po->tx_ring.pg_vec)
-		packet_set_ring(sk, &req, 1, 1);
+		packet_set_ring(sk, &req_u, 1, 1);
 
 	fanout_release(sk);
 
@@ -2280,15 +3069,27 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 	case PACKET_RX_RING:
 	case PACKET_TX_RING:
 	{
-		struct tpacket_req req;
+		union tpacket_req_u req_u;
+		int len;
 
-		if (optlen < sizeof(req))
+		switch (po->tp_version) {
+		case TPACKET_V1:
+		case TPACKET_V2:
+			len = sizeof(req_u.req);
+			break;
+		case TPACKET_V3:
+		default:
+			len = sizeof(req_u.req3);
+			break;
+		}
+		if (optlen < len)
 			return -EINVAL;
 		if (pkt_sk(sk)->has_vnet_hdr)
 			return -EINVAL;
-		if (copy_from_user(&req, optval, sizeof(req)))
+		if (copy_from_user(&req_u.req, optval, len))
 			return -EFAULT;
-		return packet_set_ring(sk, &req, 0, optname == PACKET_TX_RING);
+		return packet_set_ring(sk, &req_u, 0,
+			optname == PACKET_TX_RING);
 	}
 	case PACKET_COPY_THRESH:
 	{
@@ -2315,6 +3116,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 		switch (val) {
 		case TPACKET_V1:
 		case TPACKET_V2:
+		case TPACKET_V3:
 			po->tp_version = val;
 			return 0;
 		default:
@@ -2424,6 +3226,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 	struct packet_sock *po = pkt_sk(sk);
 	void *data;
 	struct tpacket_stats st;
+	union tpacket_stats_u st_u;
 
 	if (level != SOL_PACKET)
 		return -ENOPROTOOPT;
@@ -2436,15 +3239,27 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 
 	switch (optname) {
 	case PACKET_STATISTICS:
-		if (len > sizeof(struct tpacket_stats))
-			len = sizeof(struct tpacket_stats);
+		if (po->tp_version == TPACKET_V3) {
+			len = sizeof(struct tpacket_stats_v3);
+		} else {
+			if (len > sizeof(struct tpacket_stats))
+				len = sizeof(struct tpacket_stats);
+		}
 		spin_lock_bh(&sk->sk_receive_queue.lock);
-		st = po->stats;
+		if (po->tp_version == TPACKET_V3) {
+			memcpy(&st_u.stats3, &po->stats,
+			sizeof(struct tpacket_stats));
+			st_u.stats3.tp_freeze_q_cnt =
+			po->stats_u.stats3.tp_freeze_q_cnt;
+			st_u.stats3.tp_packets += po->stats.tp_drops;
+			data = &st_u.stats3;
+		} else {
+			st = po->stats;
+			st.tp_packets += st.tp_drops;
+			data = &st;
+		}
 		memset(&po->stats, 0, sizeof(st));
 		spin_unlock_bh(&sk->sk_receive_queue.lock);
-		st.tp_packets += st.tp_drops;
-
-		data = &st;
 		break;
 	case PACKET_AUXDATA:
 		if (len > sizeof(int))
@@ -2485,6 +3300,9 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 		case TPACKET_V2:
 			val = sizeof(struct tpacket2_hdr);
 			break;
+		case TPACKET_V3:
+			val = sizeof(struct tpacket3_hdr);
+			break;
 		default:
 			return -EINVAL;
 		}
@@ -2641,7 +3459,8 @@ static unsigned int packet_poll(struct file *file, struct socket *sock,
 
 	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (po->rx_ring.pg_vec) {
-		if (!packet_previous_frame(po, &po->rx_ring, TP_STATUS_KERNEL))
+		if (!packet_previous_rx_frame(po, &po->rx_ring,
+			TP_STATUS_KERNEL))
 			mask |= POLLIN | POLLRDNORM;
 	}
 	spin_unlock_bh(&sk->sk_receive_queue.lock);
@@ -2760,7 +3579,7 @@ out_free_pgvec:
 	goto out;
 }
 
-static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
+static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 		int closing, int tx_ring)
 {
 	struct pgv *pg_vec = NULL;
@@ -2769,7 +3588,15 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 	struct packet_ring_buffer *rb;
 	struct sk_buff_head *rb_queue;
 	__be16 num;
-	int err;
+	int err = -EINVAL;
+	/* Added to avoid minimal code churn */
+	struct tpacket_req *req = &req_u->req;
+
+	/* Opening a Tx-ring is NOT supported in TPACKET_V3 */
+	if (!closing && tx_ring && (po->tp_version > TPACKET_V2)) {
+		WARN(1, "Tx-ring is not supported.\n");
+		goto out;
+	}
 
 	rb = tx_ring ? &po->tx_ring : &po->rx_ring;
 	rb_queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
@@ -2795,6 +3622,9 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 		case TPACKET_V2:
 			po->tp_hdrlen = TPACKET2_HDRLEN;
 			break;
+		case TPACKET_V3:
+			po->tp_hdrlen = TPACKET3_HDRLEN;
+			break;
 		}
 
 		err = -EINVAL;
@@ -2820,6 +3650,17 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 		pg_vec = alloc_pg_vec(req, order);
 		if (unlikely(!pg_vec))
 			goto out;
+		switch (po->tp_version) {
+		case TPACKET_V3:
+		/* Transmit path is not supported. We checked
+		 * it above but just being paranoid
+		 */
+			if (!tx_ring)
+				init_prb_bdqc(po, rb, pg_vec, req_u, tx_ring);
+				break;
+		default:
+			break;
+		}
 	}
 	/* Done */
 	else {
@@ -2872,7 +3713,11 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 		register_prot_hook(sk);
 	}
 	spin_unlock(&po->bind_lock);
-
+	if (closing && (po->tp_version > TPACKET_V2)) {
+		/* Because we don't support block-based V3 on tx-ring */
+		if (!tx_ring)
+			prb_shutdown_retire_blk_timer(po, tx_ring, rb_queue);
+	}
 	release_sock(sk);
 
 	if (pg_vec)
-- 
1.7.5.2


^ permalink raw reply related

* [PATCH net-next v5 1/2] af-packet: Added TPACKET_V3 headers.
From: Chetan Loke @ 2011-08-19 20:18 UTC (permalink / raw)
  To: netdev, davem; +Cc: Chetan Loke
In-Reply-To: <1313785096-911-1-git-send-email-loke.chetan@gmail.com>

Added TPACKET_V3 definitions.

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 include/linux/if_packet.h |  119 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 119 insertions(+), 0 deletions(-)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index c148606..5926d59 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -61,6 +61,17 @@ struct tpacket_stats {
 	unsigned int	tp_drops;
 };
 
+struct tpacket_stats_v3 {
+	unsigned int	tp_packets;
+	unsigned int	tp_drops;
+	unsigned int	tp_freeze_q_cnt;
+};
+
+union tpacket_stats_u {
+	struct tpacket_stats stats1;
+	struct tpacket_stats_v3 stats3;
+};
+
 struct tpacket_auxdata {
 	__u32		tp_status;
 	__u32		tp_len;
@@ -78,6 +89,7 @@ struct tpacket_auxdata {
 #define TP_STATUS_LOSING	0x4
 #define TP_STATUS_CSUMNOTREADY	0x8
 #define TP_STATUS_VLAN_VALID   0x10 /* auxdata has valid tp_vlan_tci */
+#define TP_STATUS_BLK_TMO	0x20
 
 /* Tx ring - header status */
 #define TP_STATUS_AVAILABLE	0x0
@@ -85,6 +97,9 @@ struct tpacket_auxdata {
 #define TP_STATUS_SENDING	0x2
 #define TP_STATUS_WRONG_FORMAT	0x4
 
+/* Rx ring - feature request bits */
+#define TP_FT_REQ_FILL_RXHASH	0x1
+
 struct tpacket_hdr {
 	unsigned long	tp_status;
 	unsigned int	tp_len;
@@ -111,11 +126,100 @@ struct tpacket2_hdr {
 	__u16		tp_padding;
 };
 
+struct hdr_variant1 {
+	__u32	tp_rxhash;
+	__u32	tp_vlan_tci;
+};
+
+struct tpacket3_hdr {
+	__u32		tp_next_offset;
+	__u32		tp_sec;
+	__u32		tp_nsec;
+	__u32		tp_snaplen;
+	__u32		tp_len;
+	__u32		tp_status;
+	__u16		tp_mac;
+	__u16		tp_net;
+	/* pkt_hdr variants */
+	union {
+		struct hdr_variant1 hv1;
+	};
+};
+
+struct bd_ts {
+	unsigned int ts_sec;
+	union {
+		unsigned int ts_usec;
+		unsigned int ts_nsec;
+	};
+};
+
+struct hdr_v1 {
+	__u32	block_status;
+	__u32	num_pkts;
+	__u32	offset_to_first_pkt;
+
+	/* Number of valid bytes (including padding)
+	 * blk_len <= tp_block_size
+	 */
+	__u32	blk_len;
+
+	/*
+	 * Quite a few uses of sequence number:
+	 * 1. Make sure cache flush etc worked.
+	 *    Well, one can argue - why not use the increasing ts below?
+	 *    But look at 2. below first.
+	 * 2. When you pass around blocks to other user space decoders,
+	 *    you can see which blk[s] is[are] outstanding etc.
+	 * 3. Validate kernel code.
+	 */
+	aligned_u64	seq_num;
+
+	/*
+	 * ts_last_pkt:
+	 *
+	 * Case 1.	Block has 'N'(N >=1) packets and TMO'd(timed out)
+	 *		ts_last_pkt == 'time-stamp of last packet' and NOT the
+	 *		time when the timer fired and the block was closed.
+	 *		By providing the ts of the last packet we can absolutely
+	 *		guarantee that time-stamp wise, the first packet in the
+	 *		next block will never precede the last packet of the
+	 *		previous block.
+	 * Case 2.	Block has zero packets and TMO'd
+	 *		ts_last_pkt = time when the timer fired and the block
+	 *		was closed.
+	 * Case 3.	Block has 'N' packets and NO TMO.
+	 *		ts_last_pkt = time-stamp of the last pkt in the block.
+	 *
+	 * ts_first_pkt:
+	 *		Is always the time-stamp when the block was opened.
+	 *		Case a)	ZERO packets
+	 *			No packets to deal with but atleast you know the
+	 *			time-interval of this block.
+	 *		Case b) Non-zero packets
+	 *			Use the ts of the first packet in the block.
+	 *
+	 */
+	struct bd_ts	ts_first_pkt, ts_last_pkt;
+};
+
+union bd_header_u {
+	struct hdr_v1 bh1;
+};
+
+struct block_desc {
+	__u32 version;
+	__u32 offset_to_priv;
+	union bd_header_u hdr;
+};
+
 #define TPACKET2_HDRLEN		(TPACKET_ALIGN(sizeof(struct tpacket2_hdr)) + sizeof(struct sockaddr_ll))
+#define TPACKET3_HDRLEN		(TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + sizeof(struct sockaddr_ll))
 
 enum tpacket_versions {
 	TPACKET_V1,
 	TPACKET_V2,
+	TPACKET_V3
 };
 
 /*
@@ -138,6 +242,21 @@ struct tpacket_req {
 	unsigned int	tp_frame_nr;	/* Total number of frames */
 };
 
+struct tpacket_req3 {
+	unsigned int	tp_block_size;	/* Minimal size of contiguous block */
+	unsigned int	tp_block_nr;	/* Number of blocks */
+	unsigned int	tp_frame_size;	/* Size of frame */
+	unsigned int	tp_frame_nr;	/* Total number of frames */
+	unsigned int	tp_retire_blk_tov; /* timeout in msecs */
+	unsigned int	tp_sizeof_priv; /* offset to private data area */
+	unsigned int	tp_feature_req_word;
+};
+
+union tpacket_req_u {
+	struct tpacket_req	req;
+	struct tpacket_req3	req3;
+};
+
 struct packet_mreq {
 	int		mr_ifindex;
 	unsigned short	mr_type;
-- 
1.7.5.2


^ permalink raw reply related

* [PATCH net-next v5 0/2] af-packet: Enhance af-packet to provide a flexible mmap ring buffer scheme.
From: Chetan Loke @ 2011-08-19 20:18 UTC (permalink / raw)
  To: netdev, davem; +Cc: Chetan Loke

Changes in v5:
1) Provide accurate patch description.			(Dave Miller)
   Tightened up patch descriptions.
2) Replace indirect calls with inline tests.		(Dave Miller)
3) Use distinct subject-line per patch.			(Dave Miller)

Changes in v4:
1) Used ALIGN macro                                     (Joe Perches)
2) Deleted duplicate field                              (Eric Dumazet)
3) Re-aligned tpacket fields for disk-capture

Changes in v3:
1) Stripped __packed__ attribute.                       (Dave Miller)
   Replaced with aligned_u64 and padding.
2) Added 'feature_request_word'.
3) Added rx_hash field to the v3-header.                (Chetan L)

Changes in v2:

1) Aligned bdqc members, pr_err to WARN, sob email      (Joe Perches)
2) Added tp_padding                                     (Eric Dumazet)
3) Nuked useless ;) white space                         (Stephen H)
4) Use __u types in headers                             (Ben Hutchings)
5) Added field for creating private area                (Chetan Loke)

Enhanced af-packet to provide a flexible mmap ring buffer scheme by:
A) eliminating fixed frame-size requirement.
B) providing block-level read/poll.

Benefits:
  B1) ~15-20% reduction in cpu-usage.
  B2) ~20% increase in packet capture rate.
  B3) ~2x  increase in packet density(higher capture visibility).
  B4) Capture entire packet payload.
  B5) Captures 99% 64-byte traffic as seen by the kernel.

Detailed description of the enhancement-need/test-setup/etc can be viewed at:
http://thread.gmane.org/gmane.linux.kernel/1158216

Test-suite:
git://lolpcap.git.sourceforge.net/gitroot/lolpcap/lolpcap

----------------------------------------

 include/linux/if_packet.h |  119 ++++++
 net/packet/af_packet.c    |  937 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 1010 insertions(+), 46 deletions(-)

-- 
1.7.5.2

^ permalink raw reply

* Re: [PATCH] net: add Documentation/networking/scaling.txt
From: Will de Bruijn @ 2011-08-19 19:50 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Rick Jones, Tom Herbert, rdunlap, linux-doc, davem, netdev
In-Reply-To: <1313524404.2725.50.camel@bwh-desktop>

> As I'm sure you're aware, there is often a trade-off between throughput
> and latency.  It might be useful to provide some guidelines for
> optimising each of the above.

The suggested configuration section on RSS already gives some vague
heuristics. In general, I hesitate to write down best guesses, and I
lack the hard experimental data that would make such advise more
sound.

> The default affinity for most IRQs is all-CPUs.  At least on x86, that
> really means CPU 0 only, so far as I can see.

Yes, I believe so, but since the specified behavior is all-CPUs and
other architectures are free to implement this differently, I prefer
to leave the weaker statement. Also, even on x86 there is the
possibility of migration, so starting on CPU0 is not equivalent to
having the affinity set to that CPU.

>> Requires that ntuple filtering be enabled?
> [...]
>
> As a matter of fact, n-tuple filtering is enabled by default where
> available.  So it might actually make more sense to say that RFS
> acceleration can be *disabled* by disabling n-tuple filtering using
> ethtool.

I didn't know (or check, clearly). I'll clarify that in a next
revision. Since the text is still factually correct and I don't want
to spam the list with frequent minor changes, I'll batch them.

Thanks, Ben.

^ permalink raw reply

* Re: Linux vs FreeBSD Which is correct.
From: Stephen Clark @ 2011-08-19 19:10 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Pascal Hambourg, Rémi Denis-Courmont,
	Linux Kernel Network Developers
In-Reply-To: <4E4E8CEE.102@genband.com>

On 08/19/2011 12:18 PM, Chris Friesen wrote:
> On 08/18/2011 06:42 AM, Stephen Clark wrote:
>
>> I guess I don't really understand what reverse path filter stuff is all
>> about, much less making it weaker.
>> But using 2 made the pings responses be seen.
>
> It's described in RFC3704.  The idea is to block spoofed packets.
>
> From Documentation/networking/ip-sysctl.txt:
>
> rp_filter - INTEGER
> 0 - No source validation.
> 1 - Strict mode as defined in RFC3704 Strict Reverse Path
>     Each incoming packet is tested against the FIB and if the interface
>     is not the best reverse path the packet check will fail.
>     By default failed packets are discarded.
> 2 - Loose mode as defined in RFC3704 Loose Reverse Path
>     Each incoming packet's source address is also tested against the FIB
>     and if the source address is not reachable via any interface
>     the packet check will fail.
>
>    Current recommended practice in RFC3704 is to enable strict mode
>    to prevent IP spoofing from DDos attacks. If using asymmetric routing
>    or other complicated routing, then loose mode is recommended.
>
>    The max value from conf/{all,interface}/rp_filter is used
>    when doing source validation on the {interface}.
>
>    Default value is 0. Note that some distributions enable it
>    in startup scripts.
>
>
>
Thanks for taking the time to explain this. Much appreciated.


-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)




^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox