Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: include/linux/netlink.h: problem when included by an application
From: David Miller @ 2011-08-08  5:48 UTC (permalink / raw)
  To: michel; +Cc: ben, netdev
In-Reply-To: <1312755263.2908.6.camel@Thor>

From: Michel Machado <michel@digirati.com.br>
Date: Sun, 07 Aug 2011 18:14:23 -0400

>> >    The simplest solution that I came up was replacing sa_family_t in
>> > include/linux/netlink.h to 'unsigned short' as header
>> > include/linux/socket.h does for struct __kernel_sockaddr_storage
>> > available to applications.
>> 
>> Maybe we should do something like this in <linux/socket.h>:
>> 
>> typedef unsigned short __kernel_sa_family_t;
>> #ifdef __KERNEL__
>> typedef __kernel_sa_family_t sa_family_t;
>> #endif
>> 
>> and then use __kernel_sa_family_t in <linux/netlink.h>.
>> 
>> Ben.
> 
>    I like this solution, it solves both struct __kernel_sockaddr_storage
> in include/linux/socket.h, and struct sockaddr_nl in
> include/linux/netlink.h.

Ok, I've applied the following patch:

--------------------
net: Make userland include of netlink.h more sane.

Currently userland will barf when including linux/netlink.h unless it
precisely includes sys/socket.h first.  The issue is where the
definition of "sa_family_t" comes from.

We've been back and forth on how to fix this issue in the past, see:

http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
http://thread.gmane.org/gmane.linux.network/143380

Ben Hutchings suggested we take a hint from how we handle the
sockaddr_storage type.  First we define a "__kernel_sa_family_t"
to linux/socket.h that is always defined.

Then if __KERNEL__ is defined, we also define "sa_family_t" as
equal to "__kernel_sa_family_t".

Then in places like linux/netlink.h we use __kernel_sa_family_t
in user visible datastructures.

Reported-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netlink.h |    2 +-
 include/linux/socket.h  |    6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 2e17c5d..180540a 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -29,7 +29,7 @@
 #define MAX_LINKS 32		
 
 struct sockaddr_nl {
-	sa_family_t	nl_family;	/* AF_NETLINK	*/
+	__kernel_sa_family_t	nl_family;	/* AF_NETLINK	*/
 	unsigned short	nl_pad;		/* zero		*/
 	__u32		nl_pid;		/* port ID	*/
        	__u32		nl_groups;	/* multicast groups mask */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index e17f822..d0e77f6 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -8,8 +8,10 @@
 #define _K_SS_ALIGNSIZE	(__alignof__ (struct sockaddr *))
 				/* Implementation specific desired alignment */
 
+typedef unsigned short __kernel_sa_family_t;
+
 struct __kernel_sockaddr_storage {
-	unsigned short	ss_family;		/* address family */
+	__kernel_sa_family_t	ss_family;		/* address family */
 	/* Following field(s) are implementation specific */
 	char		__data[_K_SS_MAXSIZE - sizeof(unsigned short)];
 				/* space to achieve desired size, */
@@ -35,7 +37,7 @@ struct seq_file;
 extern void socket_seq_show(struct seq_file *seq);
 #endif
 
-typedef unsigned short	sa_family_t;
+typedef __kernel_sa_family_t	sa_family_t;
 
 /*
  *	1003.1g requires sa_family_t and that sa_data is char.
-- 
1.7.6


^ permalink raw reply related

* Re: [PATCH] ipv4: Fix ip_getsockopt for IP_PKTOPTIONS
From: David Miller @ 2011-08-08  5:31 UTC (permalink / raw)
  To: dbaluta; +Cc: kuznet, pekkas, yoshfuji, kaber, netdev, tszocs
In-Reply-To: <1312672850-13676-1-git-send-email-dbaluta@ixiacom.com>

From: Daniel Baluta <dbaluta@ixiacom.com>
Date: Sun,  7 Aug 2011 02:20:50 +0300

> IP_PKTOPTIONS is broken for 32-bit applications running
> in COMPAT mode on 64-bit kernels.
> 
> This happens because msghdr's msg_flags field is always
> set to zero. When running in COMPAT mode this should be
> set to MSG_CMSG_COMPAT instead.
> 
> Signed-off-by: Tiberiu Szocs-Mihai <tszocs@ixiacom.com>
> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/usb: Add IPv6 support to the LG-VL600 LTE USB modem driver
From: David Miller @ 2011-08-08  5:29 UTC (permalink / raw)
  To: prox; +Cc: gregkh, linux-usb, netdev, linux-kernel
In-Reply-To: <1312669694-28800-1-git-send-email-prox@prolixium.com>

From: Mark Kamichoff <prox@prolixium.com>
Date: Sat,  6 Aug 2011 18:28:14 -0400

> The LG-VL600 LTE USB modem supports IPv6, but uses and expects an IPv4
> ethertype (0x800) for these packets instead of the standard 0x86dd.
> This patch peeks at the IP version in the L3 header and sets the
> ethertype appropriately for IPv6 packets.
> 
> Signed-off-by: Mark Kamichoff <prox@prolixium.com>

Applied, thanks.

^ permalink raw reply

* Re: return of ip_rt_bug()
From: David Miller @ 2011-08-08  5:20 UTC (permalink / raw)
  To: ja; +Cc: selinux, davej, netdev
In-Reply-To: <alpine.LFD.2.00.1108070104440.1413@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Sun, 7 Aug 2011 01:14:22 +0300 (EEST)

> 	The problem: if we have input route in the cache
> it can be returned to callers that request output route.
> That is why dst_output points to ip_rt_bug.

Good spotting Julian.

This is my fault entirely.  First I removed the thing we now call
->rt_route_iif which led to bug fix:

commit 1b86a58f9d7ce4fe2377687f378fbfb53bdc9b6c
Author: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Date:   Thu Apr 7 14:04:08 2011 -0700

    ipv4: Fix "Set rt->rt_iif more sanely on output routes."

but I forgot to make sure we also added back the key comparison
on lookups as well :-/

Applied and queued up for -stable, thanks!


^ permalink raw reply

* Re: [PATCH v3] bonding: document two undocumented options.
From: David Miller @ 2011-08-08  5:16 UTC (permalink / raw)
  To: nicolas.2p.debian; +Cc: fubar, andy, netdev
In-Reply-To: <1312650399-5165-1-git-send-email-nicolas.2p.debian@free.fr>

From: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Date: Sat,  6 Aug 2011 19:06:39 +0200

> Commit 655f8919d549ad1872e24d826b6ce42530516d2e
>     bonding: add min links parameter to 802.3ad
> 
> and commit ebd8e4977a87cb81d93c62a9bff0102a9713722f
>     bonding: add all_slaves_active parameter
> 
> introduced new options to bonding, but didn't provide the documentation
> for those options.
> 
> V2: add the default value for both options.
> V3: document the exact behavior of min_links default value.
> 
> Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] slip: fix NOHZ local_softirq_pending 08 warning
From: David Miller @ 2011-08-08  5:14 UTC (permalink / raw)
  To: matvejchikov; +Cc: netdev
In-Reply-To: <CAKh5naYrNPG3UO02VzG13QUP-SOP8-Td+O0G5LAr-+npduP49A@mail.gmail.com>

From: Matvejchikov Ilya <matvejchikov@gmail.com>
Date: Fri, 5 Aug 2011 23:23:51 +0400

> When using nanosleep() in an userspace application we get a ratelimit warning:
> 
> 	NOHZ: local_softirq_pending 08
> 
> According to 481a8199142c050b72bff8a1956a49fd0a75bbe0 the problem is caused by
> netif_rx() function. This patch replaces netif_rx() with netif_rx_ni() which
> has to be used from process/softirq context.
> 
> Signed-off-by: Matvejchikov Ilya <matvejchikov@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] netfilter: avoid double free in nf_reinject
From: David Miller @ 2011-08-08  5:11 UTC (permalink / raw)
  To: ja; +Cc: kaber, netfilter-devel, netdev, kswamy
In-Reply-To: <alpine.LFD.2.00.1108051326300.1494@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Fri, 5 Aug 2011 13:36:28 +0300 (EEST)

> 
> 	NF_STOLEN means skb was already freed
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
> 
> 	May be fixes IPVS+ip_queue problem reported by Kumar Swamy:
> 
> 	http://marc.info/?l=linux-virtual-server&m=131098073717449&w=2

Since the netfilter maintainers are taking too damn long to integrate
bug fixes (and this has been happening for months), I'm going to apply
this directly.

Thanks Julian.

^ permalink raw reply

* Re: [PATCH] ucc_geth: Add SUPPORTED_MII and SUPPORTED_Autoneg
From: David Miller @ 2011-08-08  5:09 UTC (permalink / raw)
  To: Joakim.Tjernlund; +Cc: netdev, cbouatmailru
In-Reply-To: <1312463985-2230-1-git-send-email-Joakim.Tjernlund@transmode.se>

From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Thu, 4 Aug 2011 15:19:45 +0200

> The driver supports Autoneg and at least MII. Tell the PHY
> that to avoid any confusion in the PHY code.
> 
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>

Applied.

^ permalink raw reply

* Re: [PATCH 3/3] via-velocity : cleanups.
From: David Miller @ 2011-08-08  5:09 UTC (permalink / raw)
  To: romieu; +Cc: netdev, jnelson
In-Reply-To: <20110804123903.GC14858@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 4 Aug 2011 14:39:03 +0200

> - empty lines
> - tabs / spaces
> - ETHTOOL_GWOL _is_ defined
> - useless cast from void *
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3] via-velocity : ethtool statistics support.
From: David Miller @ 2011-08-08  5:09 UTC (permalink / raw)
  To: romieu; +Cc: netdev, jnelson
In-Reply-To: <20110804123828.GB14858@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 4 Aug 2011 14:38:28 +0200

> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Jon Nelson <jnelson@jamponi.net>

Applied.

^ permalink raw reply

* Re: [PATCH 1/3] via-velocity : update receive packets statistics.
From: David Miller @ 2011-08-08  5:09 UTC (permalink / raw)
  To: romieu; +Cc: netdev, jnelson
In-Reply-To: <20110804123755.GA14858@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 4 Aug 2011 14:37:55 +0200

> Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14076.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Jon Nelson <jnelson@jamponi.net>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/6] be2net: add be_cmd_set_port_speed_v1 to set port speed
From: David Miller @ 2011-08-08  5:08 UTC (permalink / raw)
  To: ajit.khaparde; +Cc: netdev
In-Reply-To: <20110805195958.GA13539@akhaparde-VBox>

From: Ajit Khaparde <ajit.khaparde@Emulex.Com>
Date: Fri, 5 Aug 2011 14:59:58 -0500

> diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
> index 8d178d2..863ae67 100644
> --- a/drivers/net/benet/be_cmds.c
> +++ b/drivers/net/benet/be_cmds.c
> @@ -2367,3 +2367,38 @@ err:
>  	mutex_unlock(&adapter->mbox_lock);
 ...
> +	status = be_mcc_notify_wait(adapter);
> +err:
> +	spin_unlock_bh(&adapter->mcc_lock);
> +	return status;
> +}
> +

Please do not add trailing empty lines to source files, GIT complains
about this and will abort when I try to apply your patch.

^ permalink raw reply

* [RFC PATCH v2 9/9] sfc: Support for byte queue limits
From: Tom Herbert @ 2011-08-08  4:53 UTC (permalink / raw)
  To: davem, netdev

Changes to sfc to use byte queue limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/sfc/tx.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/net/sfc/tx.c b/drivers/net/sfc/tx.c
index 84eb99e..9aa4339 100644
--- a/drivers/net/sfc/tx.c
+++ b/drivers/net/sfc/tx.c
@@ -31,7 +31,9 @@
 #define EFX_TXQ_THRESHOLD(_efx) ((_efx)->txq_entries / 2u)
 
 static void efx_dequeue_buffer(struct efx_tx_queue *tx_queue,
-			       struct efx_tx_buffer *buffer)
+			       struct efx_tx_buffer *buffer,
+			       unsigned int *pkts_compl,
+			       unsigned int *bytes_compl)
 {
 	if (buffer->unmap_len) {
 		struct pci_dev *pci_dev = tx_queue->efx->pci_dev;
@@ -48,6 +50,8 @@ static void efx_dequeue_buffer(struct efx_tx_queue *tx_queue,
 	}
 
 	if (buffer->skb) {
+		(*pkts_compl)++;
+		(*bytes_compl) += buffer->skb->len;
 		dev_kfree_skb_any((struct sk_buff *) buffer->skb);
 		buffer->skb = NULL;
 		netif_vdbg(tx_queue->efx, tx_done, tx_queue->efx->net_dev,
@@ -254,6 +258,8 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	buffer->skb = skb;
 	buffer->continuation = false;
 
+	netdev_tx_sent_queue(tx_queue->core_txq, 1, skb->len);
+
 	/* Pass off to hardware */
 	efx_nic_push_buffers(tx_queue);
 
@@ -271,10 +277,11 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
  unwind:
 	/* Work backwards until we hit the original insert pointer value */
 	while (tx_queue->insert_count != tx_queue->write_count) {
+		unsigned int pkts_compl = 0, bytes_compl = 0;
 		--tx_queue->insert_count;
 		insert_ptr = tx_queue->insert_count & tx_queue->ptr_mask;
 		buffer = &tx_queue->buffer[insert_ptr];
-		efx_dequeue_buffer(tx_queue, buffer);
+		efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
 		buffer->len = 0;
 	}
 
@@ -297,7 +304,9 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
  * specified index.
  */
 static void efx_dequeue_buffers(struct efx_tx_queue *tx_queue,
-				unsigned int index)
+				unsigned int index,
+				unsigned int *pkts_compl,
+				unsigned int *bytes_compl)
 {
 	struct efx_nic *efx = tx_queue->efx;
 	unsigned int stop_index, read_ptr;
@@ -315,7 +324,7 @@ static void efx_dequeue_buffers(struct efx_tx_queue *tx_queue,
 			return;
 		}
 
-		efx_dequeue_buffer(tx_queue, buffer);
+		efx_dequeue_buffer(tx_queue, buffer, pkts_compl, bytes_compl);
 		buffer->continuation = true;
 		buffer->len = 0;
 
@@ -426,10 +435,12 @@ void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index)
 {
 	unsigned fill_level;
 	struct efx_nic *efx = tx_queue->efx;
+	unsigned int pkts_compl = 0, bytes_compl = 0;
 
 	EFX_BUG_ON_PARANOID(index > tx_queue->ptr_mask);
 
-	efx_dequeue_buffers(tx_queue, index);
+	efx_dequeue_buffers(tx_queue, index, &pkts_compl, &bytes_compl);
+	netdev_tx_completed_queue(tx_queue->core_txq, pkts_compl, bytes_compl);
 
 	/* See if we need to restart the netif queue.  This barrier
 	 * separates the update of read_count from the test of the
@@ -519,13 +530,15 @@ void efx_release_tx_buffers(struct efx_tx_queue *tx_queue)
 
 	/* Free any buffers left in the ring */
 	while (tx_queue->read_count != tx_queue->write_count) {
+		unsigned int pkts_compl = 0, bytes_compl = 0;
 		buffer = &tx_queue->buffer[tx_queue->read_count & tx_queue->ptr_mask];
-		efx_dequeue_buffer(tx_queue, buffer);
+		efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
 		buffer->continuation = true;
 		buffer->len = 0;
 
 		++tx_queue->read_count;
 	}
+	netdev_tx_reset_queue(tx_queue->core_txq);
 }
 
 void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
@@ -1168,6 +1181,8 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
 	/* Pass off to hardware */
 	efx_nic_push_buffers(tx_queue);
 
+	netdev_tx_sent_queue(tx_queue->core_txq, 1, skb->len);
+
 	tx_queue->tso_bursts++;
 	return NETDEV_TX_OK;
 
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 8/9] bnx2x: Support for byte queue limits
From: Tom Herbert @ 2011-08-08  4:53 UTC (permalink / raw)
  To: davem, netdev

Changes to bnx2x to use byte queue limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/bnx2x/bnx2x_cmn.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
index 5b0dba6..d4f921a 100644
--- a/drivers/net/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/bnx2x/bnx2x_cmn.c
@@ -97,7 +97,8 @@ int load_count[2][3] = { {0} }; /* per-path: 0-common, 1-port0, 2-port1 */
  * return idx of last bd freed
  */
 static u16 bnx2x_free_tx_pkt(struct bnx2x *bp, struct bnx2x_fp_txdata *txdata,
-			     u16 idx)
+			     u16 idx, unsigned int *pkts_compl,
+			     unsigned int *bytes_compl)
 {
 	struct sw_tx_bd *tx_buf = &txdata->tx_buf_ring[idx];
 	struct eth_tx_start_bd *tx_start_bd;
@@ -154,6 +155,10 @@ static u16 bnx2x_free_tx_pkt(struct bnx2x *bp, struct bnx2x_fp_txdata *txdata,
 
 	/* release skb */
 	WARN_ON(!skb);
+	if (skb) {
+		(*pkts_compl)++;
+		(*bytes_compl) += skb->len;
+	}
 	dev_kfree_skb_any(skb);
 	tx_buf->first_bd = 0;
 	tx_buf->skb = NULL;
@@ -165,6 +170,7 @@ int bnx2x_tx_int(struct bnx2x *bp, struct bnx2x_fp_txdata *txdata)
 {
 	struct netdev_queue *txq;
 	u16 hw_cons, sw_cons, bd_cons = txdata->tx_bd_cons;
+	unsigned int pkts_compl = 0, bytes_compl = 0;
 
 #ifdef BNX2X_STOP_ON_ERROR
 	if (unlikely(bp->panic))
@@ -184,10 +190,13 @@ int bnx2x_tx_int(struct bnx2x *bp, struct bnx2x_fp_txdata *txdata)
 				      " pkt_cons %u\n",
 		   txdata->txq_index, hw_cons, sw_cons, pkt_cons);
 
-		bd_cons = bnx2x_free_tx_pkt(bp, txdata, pkt_cons);
+		bd_cons = bnx2x_free_tx_pkt(bp, txdata, pkt_cons,
+		    &pkts_compl, &bytes_compl);
 		sw_cons++;
 	}
 
+	netdev_tx_completed_queue(txq, pkts_compl, bytes_compl);
+
 	txdata->tx_pkt_cons = sw_cons;
 	txdata->tx_bd_cons = bd_cons;
 
@@ -1088,6 +1097,7 @@ static void bnx2x_free_tx_skbs(struct bnx2x *bp)
 		struct bnx2x_fastpath *fp = &bp->fp[i];
 		for_each_cos_in_tx_queue(fp, cos) {
 			struct bnx2x_fp_txdata *txdata = &fp->txdata[cos];
+			unsigned pkts_compl = 0, bytes_compl = 0;
 
 			u16 bd_cons = txdata->tx_bd_cons;
 			u16 sw_prod = txdata->tx_pkt_prod;
@@ -1095,9 +1105,13 @@ static void bnx2x_free_tx_skbs(struct bnx2x *bp)
 
 			while (sw_cons != sw_prod) {
 				bd_cons = bnx2x_free_tx_pkt(bp, txdata,
-							    TX_BD(sw_cons));
+							    TX_BD(sw_cons),
+							    &pkts_compl,
+							    &bytes_compl);
 				sw_cons++;
 			}
+			netdev_tx_reset_queue(
+			    netdev_get_tx_queue(bp->dev, txdata->txq_index));
 		}
 	}
 }
@@ -2771,6 +2785,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 				       frag->page_offset, frag->size,
 				       DMA_TO_DEVICE);
 		if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {
+			unsigned int pkts_compl = 0, bytes_compl = 0;
 
 			DP(NETIF_MSG_TX_QUEUED, "Unable to map page - "
 						"dropping packet...\n");
@@ -2782,7 +2797,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			 */
 			first_bd->nbd = cpu_to_le16(nbd);
 			bnx2x_free_tx_pkt(bp, txdata,
-					  TX_BD(txdata->tx_pkt_prod));
+					  TX_BD(txdata->tx_pkt_prod),
+					  &pkts_compl, &bytes_compl);
 			return NETDEV_TX_OK;
 		}
 
@@ -2843,6 +2859,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		   pbd_e2->parsing_data);
 	DP(NETIF_MSG_TX_QUEUED, "doorbell: nbd %d  bd %u\n", nbd, bd_prod);
 
+	netdev_tx_sent_queue(txq, 1, skb->len);
+
 	txdata->tx_pkt_prod++;
 	/*
 	 * Make sure that the BD data is updated before updating the producer
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 7/9] tg3: Support for byte queue limits
From: Tom Herbert @ 2011-08-08  4:53 UTC (permalink / raw)
  To: davem, netdev

Changes to tg3 to use byte queue limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/tg3.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index dc3fbf6..ad06c40 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4818,6 +4818,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
 	u32 sw_idx = tnapi->tx_cons;
 	struct netdev_queue *txq;
 	int index = tnapi - tp->napi;
+	unsigned int pkts_compl = 0, bytes_compl = 0;
 
 	if (tg3_flag(tp, ENABLE_TSS))
 		index--;
@@ -4868,6 +4869,9 @@ static void tg3_tx(struct tg3_napi *tnapi)
 			sw_idx = NEXT_TX(sw_idx);
 		}
 
+		pkts_compl++;
+		bytes_compl += skb->len;
+
 		dev_kfree_skb(skb);
 
 		if (unlikely(tx_bug)) {
@@ -4876,6 +4880,8 @@ static void tg3_tx(struct tg3_napi *tnapi)
 		}
 	}
 
+	netdev_completed_queue(tp->dev, pkts_compl, bytes_compl);
+
 	tnapi->tx_cons = sw_idx;
 
 	/* Need to make the tx_cons update visible to tg3_start_xmit()
@@ -6313,6 +6319,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	skb_tx_timestamp(skb);
+	netdev_sent_queue(tp->dev, 1, skb->len);
 
 	/* Packets are ready, update Tx producer idx local and on card. */
 	tw32_tx_mbox(tnapi->prodmbox, entry);
@@ -6680,6 +6687,7 @@ static void tg3_free_rings(struct tg3 *tp)
 
 			dev_kfree_skb_any(skb);
 		}
+		netdev_reset_queue(tp->dev);
 	}
 }
 
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 6/9] forcedeth: Support for byte queue limits
From: Tom Herbert @ 2011-08-08  4:51 UTC (permalink / raw)
  To: davem, netdev

Changes to forcedeth to use byte queue limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/forcedeth.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index e55df30..fcd664a 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -1924,6 +1924,7 @@ static void nv_drain_tx(struct net_device *dev)
 		np->tx_skb[i].first_tx_desc = NULL;
 		np->tx_skb[i].next_tx_ctx = NULL;
 	}
+	netdev_reset_queue(np->dev);
 	np->tx_pkts_in_progress = 0;
 	np->tx_change_owner = NULL;
 	np->tx_end_flip = NULL;
@@ -2178,6 +2179,9 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* set tx flags */
 	start_tx->flaglen |= cpu_to_le32(tx_flags | tx_flags_extra);
+
+	netdev_sent_queue(np->dev, 1, skb->len);
+
 	np->put_tx.orig = put_tx;
 
 	spin_unlock_irqrestore(&np->lock, flags);
@@ -2317,6 +2321,9 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 
 	/* set tx flags */
 	start_tx->flaglen |= cpu_to_le32(tx_flags | tx_flags_extra);
+
+	netdev_sent_queue(np->dev, 1, skb->len);
+
 	np->put_tx.ex = put_tx;
 
 	spin_unlock_irqrestore(&np->lock, flags);
@@ -2354,6 +2361,7 @@ static int nv_tx_done(struct net_device *dev, int limit)
 	u32 flags;
 	int tx_work = 0;
 	struct ring_desc *orig_get_tx = np->get_tx.orig;
+	unsigned int bytes_compl = 0;
 
 	while ((np->get_tx.orig != np->put_tx.orig) &&
 	       !((flags = le32_to_cpu(np->get_tx.orig->flaglen)) & NV_TX_VALID) &&
@@ -2375,6 +2383,7 @@ static int nv_tx_done(struct net_device *dev, int limit)
 					dev->stats.tx_packets++;
 					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
 				}
+				bytes_compl += np->get_tx_ctx->skb->len;
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
 				tx_work++;
@@ -2393,6 +2402,7 @@ static int nv_tx_done(struct net_device *dev, int limit)
 					dev->stats.tx_packets++;
 					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
 				}
+				bytes_compl += np->get_tx_ctx->skb->len;
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
 				tx_work++;
@@ -2403,6 +2413,9 @@ static int nv_tx_done(struct net_device *dev, int limit)
 		if (unlikely(np->get_tx_ctx++ == np->last_tx_ctx))
 			np->get_tx_ctx = np->first_tx_ctx;
 	}
+
+	netdev_completed_queue(np->dev, tx_work, bytes_compl);
+
 	if (unlikely((np->tx_stop == 1) && (np->get_tx.orig != orig_get_tx))) {
 		np->tx_stop = 0;
 		netif_wake_queue(dev);
@@ -2416,6 +2429,7 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 	u32 flags;
 	int tx_work = 0;
 	struct ring_desc_ex *orig_get_tx = np->get_tx.ex;
+	unsigned long bytes_cleaned = 0;
 
 	while ((np->get_tx.ex != np->put_tx.ex) &&
 	       !((flags = le32_to_cpu(np->get_tx.ex->flaglen)) & NV_TX2_VALID) &&
@@ -2435,6 +2449,7 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 				}
 			}
 
+			bytes_cleaned += np->get_tx_ctx->skb->len;
 			dev_kfree_skb_any(np->get_tx_ctx->skb);
 			np->get_tx_ctx->skb = NULL;
 			tx_work++;
@@ -2447,6 +2462,9 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 		if (unlikely(np->get_tx_ctx++ == np->last_tx_ctx))
 			np->get_tx_ctx = np->first_tx_ctx;
 	}
+
+	netdev_completed_queue(np->dev, tx_work, bytes_cleaned);
+
 	if (unlikely((np->tx_stop == 1) && (np->get_tx.ex != orig_get_tx))) {
 		np->tx_stop = 0;
 		netif_wake_queue(dev);
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 5/9] e1000e: Support for byte queue limits
From: Tom Herbert @ 2011-08-08  4:49 UTC (permalink / raw)
  To: davem, netdev

Changes to e1000e to use byte queue limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/e1000e/netdev.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4353ad5..4ce114c 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -998,6 +998,7 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
 	unsigned int i, eop;
 	unsigned int count = 0;
 	unsigned int total_tx_bytes = 0, total_tx_packets = 0;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
 
 	i = tx_ring->next_to_clean;
 	eop = tx_ring->buffer_info[i].next_to_watch;
@@ -1015,6 +1016,11 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
 			if (cleaned) {
 				total_tx_packets += buffer_info->segs;
 				total_tx_bytes += buffer_info->bytecount;
+				if (buffer_info->skb) {
+					bytes_compl += buffer_info->skb->len;
+					pkts_compl++;
+				}
+
 			}
 
 			e1000_put_txbuf(adapter, buffer_info);
@@ -1033,6 +1039,8 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
 
 	tx_ring->next_to_clean = i;
 
+	netdev_completed_queue(netdev, pkts_compl, bytes_compl);
+
 #define TX_WAKE_THRESHOLD 32
 	if (count && netif_carrier_ok(netdev) &&
 	    e1000_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD) {
@@ -2164,6 +2172,7 @@ static void e1000_clean_tx_ring(struct e1000_adapter *adapter)
 		e1000_put_txbuf(adapter, buffer_info);
 	}
 
+	netdev_reset_queue(adapter->netdev);
 	size = sizeof(struct e1000_buffer) * tx_ring->count;
 	memset(tx_ring->buffer_info, 0, size);
 
@@ -4882,6 +4891,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 	/* if count is 0 then mapping error has occurred */
 	count = e1000_tx_map(adapter, skb, first, max_per_txd, nr_frags, mss);
 	if (count) {
+		netdev_sent_queue(netdev, 1, skb->len);
 		e1000_tx_queue(adapter, tx_flags, count);
 		/* Make sure there is space in the ring for the next send. */
 		e1000_maybe_stop_tx(netdev, MAX_SKB_FRAGS + 2);
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 4/9] bql: Byte queue limits
From: Tom Herbert @ 2011-08-08  4:48 UTC (permalink / raw)
  To: davem, netdev

Networking stack support for byte queue limits, uses dynamic queue
limits library.  Byte queue limits are maintained per transmit queue,
and a bql structure has been added to netdev_queue structure for this
purpose.

Configuration of bql is in the tx-<n> sysfs directory for the queue
under the byte_queue_limits directory.  Configuration includes:
limit_min, bql minimum limit
limit_max, bql maximum limit
hold_time, bql slack hold time

Also under the directory are:
limit, current byte limit
inflight, current number of bytes on the queue

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |   16 +++
 net/core/dev.c            |    1 +
 net/core/net-sysfs.c      |  230 ++++++++++++++++++++++++++++++++++-----------
 3 files changed, 192 insertions(+), 55 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74e8862..d49265b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -43,6 +43,7 @@
 #include <linux/rculist.h>
 #include <linux/dmaengine.h>
 #include <linux/workqueue.h>
+#include <linux/dynamic_queue_limits.h>
 
 #include <linux/ethtool.h>
 #include <net/net_namespace.h>
@@ -536,6 +537,7 @@ struct netdev_queue {
 #if defined(CONFIG_RPS) || defined(CONFIG_XPS)
 	struct kobject		kobj;
 #endif
+	struct dql		dql;
 #if defined(CONFIG_XPS) && defined(CONFIG_NUMA)
 	int			numa_node;
 #endif
@@ -1913,29 +1915,43 @@ static inline int netif_xmit_frozen_or_stopped(const struct netdev_queue *dev_qu
 static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
 					unsigned int pkts, unsigned int bytes)
 {
+	dql_queued(&dev_queue->dql, bytes);
+	if (dql_avail(&dev_queue->dql) < 0)
+		set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
 }
 
 static inline void netdev_sent_queue(struct net_device *dev,
 				     unsigned int pkts, unsigned int bytes)
 {
+	netdev_tx_sent_queue(netdev_get_tx_queue(dev, 0), pkts, bytes);
 }
 
 static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
 					     unsigned pkts, unsigned bytes)
 {
+	if (bytes) {
+		dql_completed(&dev_queue->dql, bytes);
+		if (dql_avail(&dev_queue->dql) >= 0 &&
+		    test_and_clear_bit(__QUEUE_STATE_STACK_XOFF,
+		     &dev_queue->state))
+			netif_schedule_queue(dev_queue);
+	}
 }
 
 static inline void netdev_completed_queue(struct net_device *dev,
 					  unsigned pkts, unsigned bytes)
 {
+	netdev_tx_completed_queue(netdev_get_tx_queue(dev, 0), pkts, bytes);
 }
 
 static inline void netdev_tx_reset_queue(struct netdev_queue *q)
 {
+	dql_reset(&q->dql);
 }
 
 static inline void netdev_reset_queue(struct net_device *dev_queue)
 {
+	netdev_tx_reset_queue(netdev_get_tx_queue(dev_queue, 0));
 }
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index a7f8c38..bd5cd15 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5395,6 +5395,7 @@ static void netdev_init_one_queue(struct net_device *dev,
 	queue->xmit_lock_owner = -1;
 	netdev_queue_numa_node_write(queue, NUMA_NO_NODE);
 	queue->dev = dev;
+	dql_init(&queue->dql, 1000);
 }
 
 static int netif_alloc_netdev_queues(struct net_device *dev)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1683e5d..eca8684 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -20,6 +20,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <linux/vmalloc.h>
+#include <linux/jiffies.h>
 #include <net/wext.h>
 
 #include "net-sysfs.h"
@@ -779,7 +780,6 @@ net_rx_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 #endif
 }
 
-#ifdef CONFIG_XPS
 /*
  * netdev_queue sysfs structures and functions.
  */
@@ -839,7 +839,121 @@ static inline unsigned int get_netdev_queue_index(struct netdev_queue *queue)
 	return i;
 }
 
+static ssize_t bql_show(char *buf, unsigned long value)
+{
+	int p = 0;
+
+	p = sprintf(buf, "%lu\n", value);
+	return p;
+}
+
+static ssize_t bql_set(const char *buf, const size_t count,
+		       unsigned long *pvalue)
+{
+	unsigned long value;
+	int err;
+
+	if (!strcmp(buf, "max") || !strcmp(buf, "max\n"))
+		value = DQL_MAX_LIMIT;
+	else {
+		err = kstrtoul(buf, 10, &value);
+		if (err < 0)
+			return err;
+		if (value > DQL_MAX_LIMIT)
+			return -EINVAL;
+	}
+
+	*pvalue = value;
+
+	return count;
+}
+
+static ssize_t bql_show_hold_time(struct netdev_queue *queue,
+				  struct netdev_queue_attribute *attr,
+				  char *buf)
+{
+	struct dql *dql = &queue->dql;
+	int p = 0;
+
+	p = sprintf(buf, "%u\n", jiffies_to_msecs(dql->slack_hold_time));
+
+	return p;
+}
+
+static ssize_t bql_set_hold_time(struct netdev_queue *queue,
+				 struct netdev_queue_attribute *attribute,
+				 const char *buf, size_t len)
+{
+	struct dql *dql = &queue->dql;
+	unsigned value;
+	int err;
+
+	err = kstrtouint(buf, 10, &value);
+	if (err < 0)
+		return err;
+
+	dql->slack_hold_time = msecs_to_jiffies(value);
+
+	return len;
+}
+
+static struct netdev_queue_attribute bql_hold_time_attribute =
+	__ATTR(hold_time, S_IRUGO | S_IWUSR, bql_show_hold_time,
+	    bql_set_hold_time);
+
+static ssize_t bql_show_inflight(struct netdev_queue *queue,
+				 struct netdev_queue_attribute *attr,
+				 char *buf)
+{
+	struct dql *dql = &queue->dql;
+	int p = 0;
+
+	p = sprintf(buf, "%lu\n", dql->num_queued - dql->num_completed);
+
+	return p;
+}
+
+static struct netdev_queue_attribute bql_inflight_attribute =
+	__ATTR(inflight, S_IRUGO | S_IWUSR, bql_show_inflight, NULL);
+
+#define BQL_ATTR(NAME, FIELD)						\
+static ssize_t bql_show_ ## NAME(struct netdev_queue *queue,		\
+				 struct netdev_queue_attribute *attr,	\
+				 char *buf)				\
+{									\
+	return bql_show(buf, queue->dql.FIELD);				\
+}									\
+									\
+static ssize_t bql_set_ ## NAME(struct netdev_queue *queue,		\
+				struct netdev_queue_attribute *attr,	\
+				const char *buf, size_t len)		\
+{									\
+	return bql_set(buf, len, &queue->dql.FIELD);			\
+}									\
+									\
+static struct netdev_queue_attribute bql_ ## NAME ## _attribute =	\
+	__ATTR(NAME, S_IRUGO | S_IWUSR, bql_show_ ## NAME,		\
+	    bql_set_ ## NAME);
+
+BQL_ATTR(limit, limit)
+BQL_ATTR(limit_max, max_limit)
+BQL_ATTR(limit_min, min_limit)
+
+static struct attribute *dql_attrs[] = {
+	&bql_limit_attribute.attr,
+	&bql_limit_max_attribute.attr,
+	&bql_limit_min_attribute.attr,
+	&bql_hold_time_attribute.attr,
+	&bql_inflight_attribute.attr,
+	NULL
+};
+
+static struct attribute_group dql_group = {
+	.name  = "byte_queue_limits",
+	.attrs  = dql_attrs,
+};
 
+#ifdef CONFIG_XPS
 static ssize_t show_xps_map(struct netdev_queue *queue,
 			    struct netdev_queue_attribute *attribute, char *buf)
 {
@@ -889,6 +1003,51 @@ static DEFINE_MUTEX(xps_map_mutex);
 #define xmap_dereference(P)		\
 	rcu_dereference_protected((P), lockdep_is_held(&xps_map_mutex))
 
+static void xps_queue_release(struct netdev_queue *queue)
+{
+	struct net_device *dev = queue->dev;
+	struct xps_dev_maps *dev_maps;
+	struct xps_map *map;
+	unsigned long index;
+	int i, pos, nonempty = 0;
+
+	index = get_netdev_queue_index(queue);
+
+	mutex_lock(&xps_map_mutex);
+	dev_maps = xmap_dereference(dev->xps_maps);
+
+	if (dev_maps) {
+		for_each_possible_cpu(i) {
+			map = xmap_dereference(dev_maps->cpu_map[i]);
+			if (!map)
+				continue;
+
+			for (pos = 0; pos < map->len; pos++)
+				if (map->queues[pos] == index)
+					break;
+
+			if (pos < map->len) {
+				if (map->len > 1)
+					map->queues[pos] =
+					    map->queues[--map->len];
+				else {
+					RCU_INIT_POINTER(dev_maps->cpu_map[i],
+					    NULL);
+					kfree_rcu(map, rcu);
+					map = NULL;
+				}
+			}
+			if (map)
+				nonempty = 1;
+		}
+
+		if (!nonempty) {
+			RCU_INIT_POINTER(dev->xps_maps, NULL);
+			kfree_rcu(dev_maps, rcu);
+		}
+	}
+}
+
 static ssize_t store_xps_map(struct netdev_queue *queue,
 		      struct netdev_queue_attribute *attribute,
 		      const char *buf, size_t len)
@@ -1024,53 +1183,13 @@ static struct attribute *netdev_queue_default_attrs[] = {
 	&xps_cpus_attribute.attr,
 	NULL
 };
+#endif
 
 static void netdev_queue_release(struct kobject *kobj)
 {
 	struct netdev_queue *queue = to_netdev_queue(kobj);
-	struct net_device *dev = queue->dev;
-	struct xps_dev_maps *dev_maps;
-	struct xps_map *map;
-	unsigned long index;
-	int i, pos, nonempty = 0;
-
-	index = get_netdev_queue_index(queue);
-
-	mutex_lock(&xps_map_mutex);
-	dev_maps = xmap_dereference(dev->xps_maps);
-
-	if (dev_maps) {
-		for_each_possible_cpu(i) {
-			map = xmap_dereference(dev_maps->cpu_map[i]);
-			if (!map)
-				continue;
 
-			for (pos = 0; pos < map->len; pos++)
-				if (map->queues[pos] == index)
-					break;
-
-			if (pos < map->len) {
-				if (map->len > 1)
-					map->queues[pos] =
-					    map->queues[--map->len];
-				else {
-					RCU_INIT_POINTER(dev_maps->cpu_map[i],
-					    NULL);
-					kfree_rcu(map, rcu);
-					map = NULL;
-				}
-			}
-			if (map)
-				nonempty = 1;
-		}
-
-		if (!nonempty) {
-			RCU_INIT_POINTER(dev->xps_maps, NULL);
-			kfree_rcu(dev_maps, rcu);
-		}
-	}
-
-	mutex_unlock(&xps_map_mutex);
+	xps_queue_release(queue);
 
 	memset(kobj, 0, sizeof(*kobj));
 	dev_put(queue->dev);
@@ -1091,22 +1210,26 @@ static int netdev_queue_add_kobject(struct net_device *net, int index)
 	kobj->kset = net->queues_kset;
 	error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
 	    "tx-%u", index);
+	if (error)
+		goto exit;
+
+	error = sysfs_create_group(kobj, &dql_group);
 	if (error) {
 		kobject_put(kobj);
-		return error;
+		goto exit;
 	}
 
 	kobject_uevent(kobj, KOBJ_ADD);
 	dev_hold(queue->dev);
 
+	return 0;
+exit:
 	return error;
 }
-#endif /* CONFIG_XPS */
 
 int
 netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 {
-#ifdef CONFIG_XPS
 	int i;
 	int error = 0;
 
@@ -1118,25 +1241,24 @@ netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 		}
 	}
 
-	while (--i >= new_num)
-		kobject_put(&net->_tx[i].kobj);
+	while (--i >= new_num) {
+		struct netdev_queue *queue = net->_tx + i;
+
+		sysfs_remove_group(&queue->kobj, &dql_group);
+		kobject_put(&queue->kobj);
+	}
 
 	return error;
-#else
-	return 0;
-#endif
 }
 
 static int register_queue_kobjects(struct net_device *net)
 {
 	int error = 0, txq = 0, rxq = 0, real_rx = 0, real_tx = 0;
 
-#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
 	net->queues_kset = kset_create_and_add("queues",
 	    NULL, &net->dev.kobj);
 	if (!net->queues_kset)
 		return -ENOMEM;
-#endif
 
 #ifdef CONFIG_RPS
 	real_rx = net->real_num_rx_queues;
@@ -1172,9 +1294,7 @@ static void remove_queue_kobjects(struct net_device *net)
 
 	net_rx_queue_update_kobjects(net, real_rx, 0);
 	netdev_queue_update_kobjects(net, real_tx, 0);
-#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
 	kset_unregister(net->queues_kset);
-#endif
 }
 
 static void *net_grab_current_ns(void)
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 3/9] net: Add netdev interfaces for recording sends and completions
From: Tom Herbert @ 2011-08-08  4:48 UTC (permalink / raw)
  To: davem, netdev

Add interfaces for driver to call for recording number of packets and
bytes at send time an transmit completion.  Also, a function to "reset"
a queue.  These will be used by Byte Queue Limits.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4115b4d..74e8862 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1910,6 +1910,32 @@ static inline int netif_xmit_frozen_or_stopped(const struct netdev_queue *dev_qu
 	return dev_queue->state & QUEUE_STATE_ANY_XOFF_OR_FROZEN;
 }
 
+static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
+					unsigned int pkts, unsigned int bytes)
+{
+}
+
+static inline void netdev_sent_queue(struct net_device *dev,
+				     unsigned int pkts, unsigned int bytes)
+{
+}
+
+static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
+					     unsigned pkts, unsigned bytes)
+{
+}
+
+static inline void netdev_completed_queue(struct net_device *dev,
+					  unsigned pkts, unsigned bytes)
+{
+}
+
+static inline void netdev_tx_reset_queue(struct netdev_queue *q)
+{
+}
+
+static inline void netdev_reset_queue(struct net_device *dev_queue)
+{
 }
 
 /**
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 2/9] net: Add queue state xoff flag for stack
From: Tom Herbert @ 2011-08-08  4:44 UTC (permalink / raw)
  To: davem, netdev

>From c3a8c0ace2322f9ccf78089936a504af9c9e0c7f Mon Sep 17 00:00:00 2001
From: Tom Herbert <therbert@google.com>
Date: Thu, 14 Jul 2011 22:08:27 -0700
Subject: [PATCH 2/9] net: Add queue state xoff flag for stack

Create separate queue state flags so that either the stack or drivers
can turn on XOFF.  Added a set of functions usedin the stack to determine
if a queue is really stopped (either by stack of driver)

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |   32 +++++++++++++++++++++-----------
 net/core/dev.c            |    4 ++--
 net/core/netpoll.c        |    4 ++--
 net/core/pktgen.c         |    2 +-
 net/sched/sch_generic.c   |    8 ++++----
 net/sched/sch_multiq.c    |    6 ++++--
 net/sched/sch_teql.c      |    6 +++---
 7 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddee79b..4115b4d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -516,10 +516,13 @@ static inline void napi_synchronize(const struct napi_struct *n)
 #endif
 
 enum netdev_queue_state_t {
-	__QUEUE_STATE_XOFF,
+	__QUEUE_STATE_DRV_XOFF,
+	__QUEUE_STATE_STACK_XOFF,
 	__QUEUE_STATE_FROZEN,
-#define QUEUE_STATE_XOFF_OR_FROZEN ((1 << __QUEUE_STATE_XOFF)		| \
-				    (1 << __QUEUE_STATE_FROZEN))
+#define QUEUE_STATE_ANY_XOFF ((1 << __QUEUE_STATE_DRV_XOFF)		| \
+			      (1 << __QUEUE_STATE_STACK_XOFF))
+#define QUEUE_STATE_ANY_XOFF_OR_FROZEN (QUEUE_STATE_ANY_XOFF		| \
+					(1 << __QUEUE_STATE_FROZEN))
 };
 
 struct netdev_queue {
@@ -1778,7 +1781,7 @@ extern void __netif_schedule(struct Qdisc *q);
 
 static inline void netif_schedule_queue(struct netdev_queue *txq)
 {
-	if (!test_bit(__QUEUE_STATE_XOFF, &txq->state))
+	if (!(txq->state & QUEUE_STATE_ANY_XOFF))
 		__netif_schedule(txq->qdisc);
 }
 
@@ -1792,7 +1795,7 @@ static inline void netif_tx_schedule_all(struct net_device *dev)
 
 static inline void netif_tx_start_queue(struct netdev_queue *dev_queue)
 {
-	clear_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
+	clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
 
 /**
@@ -1824,7 +1827,7 @@ static inline void netif_tx_wake_queue(struct netdev_queue *dev_queue)
 		return;
 	}
 #endif
-	if (test_and_clear_bit(__QUEUE_STATE_XOFF, &dev_queue->state))
+	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state))
 		__netif_schedule(dev_queue->qdisc);
 }
 
@@ -1856,7 +1859,7 @@ static inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
 		pr_info("netif_stop_queue() cannot be called before register_netdev()\n");
 		return;
 	}
-	set_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
+	set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
 
 /**
@@ -1883,7 +1886,7 @@ static inline void netif_tx_stop_all_queues(struct net_device *dev)
 
 static inline int netif_tx_queue_stopped(const struct netdev_queue *dev_queue)
 {
-	return test_bit(__QUEUE_STATE_XOFF, &dev_queue->state);
+	return test_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
 
 /**
@@ -1897,9 +1900,16 @@ static inline int netif_queue_stopped(const struct net_device *dev)
 	return netif_tx_queue_stopped(netdev_get_tx_queue(dev, 0));
 }
 
-static inline int netif_tx_queue_frozen_or_stopped(const struct netdev_queue *dev_queue)
+static inline int netif_xmit_stopped(const struct netdev_queue *dev_queue)
 {
-	return dev_queue->state & QUEUE_STATE_XOFF_OR_FROZEN;
+	return dev_queue->state & QUEUE_STATE_ANY_XOFF;
+}
+
+static inline int netif_xmit_frozen_or_stopped(const struct netdev_queue *dev_queue)
+{
+	return dev_queue->state & QUEUE_STATE_ANY_XOFF_OR_FROZEN;
+}
+
 }
 
 /**
@@ -1986,7 +1996,7 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
 	if (netpoll_trap())
 		return;
 #endif
-	if (test_and_clear_bit(__QUEUE_STATE_XOFF, &txq->state))
+	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state))
 		__netif_schedule(txq->qdisc);
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 17d67b5..a7f8c38 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2190,7 +2190,7 @@ gso:
 			return rc;
 		}
 		txq_trans_update(txq);
-		if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
+		if (unlikely(netif_xmit_stopped(txq) && skb->next))
 			return NETDEV_TX_BUSY;
 	} while (skb->next);
 
@@ -2464,7 +2464,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 
 			HARD_TX_LOCK(dev, txq, cpu);
 
-			if (!netif_tx_queue_stopped(txq)) {
+			if (!netif_xmit_stopped(txq)) {
 				__this_cpu_inc(xmit_recursion);
 				rc = dev_hard_start_xmit(skb, dev, txq);
 				__this_cpu_dec(xmit_recursion);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index adf84dd..9c71328 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -75,7 +75,7 @@ static void queue_process(struct work_struct *work)
 
 		local_irq_save(flags);
 		__netif_tx_lock(txq, smp_processor_id());
-		if (netif_tx_queue_frozen_or_stopped(txq) ||
+		if (netif_xmit_frozen_or_stopped(txq) ||
 		    ops->ndo_start_xmit(skb, dev) != NETDEV_TX_OK) {
 			skb_queue_head(&npinfo->txq, skb);
 			__netif_tx_unlock(txq);
@@ -316,7 +316,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 		for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;
 		     tries > 0; --tries) {
 			if (__netif_tx_trylock(txq)) {
-				if (!netif_tx_queue_stopped(txq)) {
+				if (!netif_xmit_stopped(txq)) {
 					status = ops->ndo_start_xmit(skb, dev);
 					if (status == NETDEV_TX_OK)
 						txq_trans_update(txq);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index e35a6fb..5c481c5 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3342,7 +3342,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 
 	__netif_tx_lock_bh(txq);
 
-	if (unlikely(netif_tx_queue_frozen_or_stopped(txq))) {
+	if (unlikely(netif_xmit_frozen_or_stopped(txq))) {
 		ret = NETDEV_TX_BUSY;
 		pkt_dev->last_ok = 0;
 		goto unlock;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 69fca27..7c84f08 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -60,7 +60,7 @@ static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
 
 		/* check the reason of requeuing without tx lock first */
 		txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
-		if (!netif_tx_queue_frozen_or_stopped(txq)) {
+		if (!netif_xmit_frozen_or_stopped(txq)) {
 			q->gso_skb = NULL;
 			q->q.qlen--;
 		} else
@@ -121,7 +121,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	spin_unlock(root_lock);
 
 	HARD_TX_LOCK(dev, txq, smp_processor_id());
-	if (!netif_tx_queue_frozen_or_stopped(txq))
+	if (!netif_xmit_frozen_or_stopped(txq))
 		ret = dev_hard_start_xmit(skb, dev, txq);
 
 	HARD_TX_UNLOCK(dev, txq);
@@ -143,7 +143,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 		ret = dev_requeue_skb(skb, q);
 	}
 
-	if (ret && netif_tx_queue_frozen_or_stopped(txq))
+	if (ret && netif_xmit_frozen_or_stopped(txq))
 		ret = 0;
 
 	return ret;
@@ -242,7 +242,7 @@ static void dev_watchdog(unsigned long arg)
 				 * old device drivers set dev->trans_start
 				 */
 				trans_start = txq->trans_start ? : dev->trans_start;
-				if (netif_tx_queue_stopped(txq) &&
+				if (netif_xmit_stopped(txq) &&
 				    time_after(jiffies, (trans_start +
 							 dev->watchdog_timeo))) {
 					some_queue_timedout = 1;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index edc1950..49131d7 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -107,7 +107,8 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
 		/* Check that target subqueue is available before
 		 * pulling an skb to avoid head-of-line blocking.
 		 */
-		if (!__netif_subqueue_stopped(qdisc_dev(sch), q->curband)) {
+		if (!netif_xmit_stopped(
+		    netdev_get_tx_queue(qdisc_dev(sch), q->curband))) {
 			qdisc = q->queues[q->curband];
 			skb = qdisc->dequeue(qdisc);
 			if (skb) {
@@ -138,7 +139,8 @@ static struct sk_buff *multiq_peek(struct Qdisc *sch)
 		/* Check that target subqueue is available before
 		 * pulling an skb to avoid head-of-line blocking.
 		 */
-		if (!__netif_subqueue_stopped(qdisc_dev(sch), curband)) {
+		if (!netif_xmit_stopped(
+		    netdev_get_tx_queue(qdisc_dev(sch), curband))) {
 			qdisc = q->queues[curband];
 			skb = qdisc->ops->peek(qdisc);
 			if (skb)
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index a3b7120..283bfe3 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -301,7 +301,7 @@ restart:
 
 		if (slave_txq->qdisc_sleeping != q)
 			continue;
-		if (__netif_subqueue_stopped(slave, subq) ||
+		if (netif_xmit_stopped(netdev_get_tx_queue(slave, subq)) ||
 		    !netif_running(slave)) {
 			busy = 1;
 			continue;
@@ -312,7 +312,7 @@ restart:
 			if (__netif_tx_trylock(slave_txq)) {
 				unsigned int length = qdisc_pkt_len(skb);
 
-				if (!netif_tx_queue_frozen_or_stopped(slave_txq) &&
+				if (!netif_xmit_frozen_or_stopped(slave_txq) &&
 				    slave_ops->ndo_start_xmit(skb, slave) == NETDEV_TX_OK) {
 					txq_trans_update(slave_txq);
 					__netif_tx_unlock(slave_txq);
@@ -324,7 +324,7 @@ restart:
 				}
 				__netif_tx_unlock(slave_txq);
 			}
-			if (netif_queue_stopped(dev))
+			if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)))
 				busy = 1;
 			break;
 		case 1:
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 1/9] dql: Dynamic queue limits
From: Tom Herbert @ 2011-08-08  4:43 UTC (permalink / raw)
  To: davem, netdev

Implementation of dynamic queue limits (dql).  This is a libary which
allows a queue limit to be dynamically managed.  The goal of dql is
to set the queue limit, number of ojects to the queue, to be minimized
without allowing the queue to be starved.

dql would be used with a queue whose use has these properties:

1) Objects are queued up to some limit which can be expressed as a
   count of objects.
2) Periodically a completion process executes which retires consumed
   objects.
3) Starvation occurs when limit has been reached, all queued data has
   actually been consumed but completion processing has not yet run,
   so queuing new data is blocked.
4) Minimizing the amount of queued data is desirable.

A canonical example of such a queue would be a NIC HW transmit queue.

The queue limit is dynamic, it will increase or decrease over time
depending on the workload.  The queue limit is recalculated each time
completion processing is done.  Increases occur when the queue is
starved and can exponentially increase over successive intervals.
Decreases occur when more data is being maintained in the queue than
needed to prevent starvation.  The number of extra objects, or "slack",
is measured over successive intervals, and to avoid hysteresis the
limit is only reduced by the miminum slack seen over a configurable
time period.

dql API provides routines to manage the queue:
- dql_init is called to intialize the dql structure
- dql_reset is called to reset dynamic structures
- dql_queued when objects are being enqueued
- dql_avail returns availability in the queue
- dql_completed is called when objects have be consumed in the queue

Configuration consists of:
- max_limit, maximum limit
- min_limt, minimum limit
- slack_hold_time, time to measure instances of slack before reducing
  queue limit.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/dynamic_queue_limits.h |   80 ++++++++++++++++++++
 lib/Makefile                         |    2 +-
 lib/dynamic_queue_limits.c           |  132 ++++++++++++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/dynamic_queue_limits.h
 create mode 100644 lib/dynamic_queue_limits.c

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
new file mode 100644
index 0000000..3ffc591
--- /dev/null
+++ b/include/linux/dynamic_queue_limits.h
@@ -0,0 +1,80 @@
+/*
+ * Dynamic queue limits (dql) - Definitions
+ *
+ * Author: Tom Herbert (therbert@google.com)
+ *
+ * This header file contains the definitions for dynamic queue limits (dql).
+ * dql would be used in conjunction with a producer/consumer type queue
+ * (possibly a HW queue).  Such a queue would have these general properties:
+ *
+ *   1) Objects are queued up to some limit.
+ *   2) Periodically a completion process executes which retires consumed
+ *      objects.
+ *   3) Starvation occurs when limit has been reached, all queued data has
+ *      actually been consumed but completion processing has not yet run
+ *      so queuing new data is blocked.
+ *   4) Minimizing the amount of queued data is desirable.
+ *
+ * The goal of dql is to calculate the limit as the minimum number of objects
+ * needed to prevent starvation.
+ *
+ * The dql implemenation does not implement any locking for the dql data
+ * structures, the higher layer should provide this.
+ */
+
+#ifndef _LINUX_DQL_H
+#define _LINUX_DQL_H
+
+#ifdef __KERNEL__
+
+struct dql {
+	unsigned long	limit;			/* Current limit */
+	unsigned long	prev_ovlimit;		/* Previous over limit */
+
+	unsigned long	num_queued;		/* Total ever queued */
+	unsigned long	prev_num_queued;	/* Previous queue total */
+	unsigned long	num_completed;		/* Total ever completed */
+
+	unsigned long	last_obj_cnt;		/* Count at last queuing */
+	unsigned long	prev_last_obj_cnt;	/* Previous queuing cnt */
+
+	unsigned long	lowest_slack;		/* Lowest slack found */
+	unsigned long	slack_start_time;	/* Time slacks seen */
+
+	unsigned long	max_limit;		/* Maximum limit */
+	unsigned long	min_limit;		/* Minimum limit */
+	unsigned	slack_hold_time;	/* Time to measure slack */
+};
+
+/* Set some static maximums */
+#define	DQL_MAX_OBJECT (-1UL / 16)
+#define	DQL_MAX_LIMIT ((-1UL / 2) - DQL_MAX_OBJECT)
+
+/* Record number of objects queued. */
+static inline void dql_queued(struct dql *dql, unsigned long count)
+{
+	BUG_ON(count > DQL_MAX_OBJECT);
+	BUG_ON(dql->num_queued - dql->num_completed > DQL_MAX_LIMIT);
+
+	dql->num_queued += count;
+	dql->last_obj_cnt = count;
+}
+
+/* Returns how many objects can be queued, < 0 indicates over limit.  */
+static inline long dql_avail(struct dql *dql)
+{
+	return dql->limit - (dql->num_queued - dql->num_completed);
+}
+
+/* Record number of completed objects and recalculate the limit. */
+extern void dql_completed(struct dql *dql, unsigned long count);
+
+/* Reset dql state */
+extern void dql_reset(struct dql *dql);
+
+/* Initialize dql state */
+extern int dql_init(struct dql *dql, unsigned hold_time);
+
+#endif /* _KERNEL_ */
+
+#endif /* _LINUX_DQL_H */
diff --git a/lib/Makefile b/lib/Makefile
index 892f4e2..c008661 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,7 @@ lib-y	+= kobject.o kref.o klist.o
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
 	 string_helpers.o gcd.o lcm.o list_sort.o uuid.o flex_array.o \
-	 bsearch.o find_last_bit.o
+	 bsearch.o find_last_bit.o dynamic_queue_limits.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 
diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
new file mode 100644
index 0000000..6a1f5b9
--- /dev/null
+++ b/lib/dynamic_queue_limits.c
@@ -0,0 +1,132 @@
+/*
+ * Dynamic byte queue limits.  See include/linux/dynamic_queue_limits.h
+ *
+ * Author: Tom Herbert (therbert@google.com)
+ */
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/ctype.h>
+#include <linux/kernel.h>
+#include <linux/dynamic_queue_limits.h>
+
+#define POSDIFF(A, B) ((A) > (B) ? (A) - (B) : 0)
+
+/* Records completed count and recalculates the queue limit */
+void dql_completed(struct dql *dql, unsigned long count)
+{
+	unsigned long inprogress, prev_inprogress, limit;
+	unsigned long ovlimit, all_prev_completed, completed;
+
+	/* Can't complete more than what's in queue */
+	BUG_ON(count > dql->num_queued - dql->num_completed);
+
+	completed = dql->num_completed + count;
+	limit = dql->limit;
+	ovlimit = POSDIFF(dql->num_queued - dql->num_completed, limit);
+	inprogress = dql->num_queued - completed;
+	prev_inprogress = dql->prev_num_queued - dql->num_completed;
+	all_prev_completed = POSDIFF(completed, dql->prev_num_queued);
+
+	if ((ovlimit && !inprogress) ||
+	    (dql->prev_ovlimit && all_prev_completed)) {
+		/*
+		 * Queue considered starved if:
+		 *   - The queue was over-limit in the last interval,
+		 *     and there is no more data in the queue.
+		 *  OR
+		 *   - The queue was over-limit in the previous interval and
+		 *     when enqueuing it was possible that all queued data
+		 *     had been consumed.  This covers the case when queue
+		 *     may have becomes starved between completion processing
+		 *     running and next time enqueue was scheduled.
+		 *
+		 *     When queue is starved increase the limit by the amount
+		 *     of bytes both sent and completed in the last interval,
+		 *     plus any previous over-limit.
+		 */
+		limit += POSDIFF(completed, dql->prev_num_queued) +
+		     dql->prev_ovlimit;
+		dql->slack_start_time = jiffies;
+		dql->lowest_slack = -1UL;
+	} else if (inprogress && prev_inprogress && !all_prev_completed) {
+		/*
+		 * Queue was not starved, check if the limit can be decreased.
+		 * A decrease is only considered if the queue has been busy in
+		 * the whole interval (the check above).
+		 *
+		 * If there is slack, the amount execess data queued above the
+		 * the amount needed to prevent starvation, the queue limit can
+		 * be decreased.  To avoid hysteresis we consider the
+		 * minimum amount of slack found over several iterations of the
+		 * completion routine.
+		 */
+		unsigned long slack, slack_last_objs;
+
+		/*
+		 * Slack is the maximum of
+		 *   - The queue limit plus previous over-limit minus twice
+		 *     the number of objects completed.  Note that two times
+		 *     number of completed bytes is basis for upper bound
+		 *     of the limit.
+		 *   - Portion of objects in the last queuing operation that
+		 *     was not part of non-zero previous over-limit.  That is
+		 *     "round down" by non-overlimit portion of the last
+		 *     queueing operation.
+		 */
+		slack = POSDIFF(limit + dql->prev_ovlimit,
+		    2 * (completed - dql->num_completed));
+		slack_last_objs = dql->prev_ovlimit ?
+		    POSDIFF(dql->prev_last_obj_cnt, dql->prev_ovlimit) : 0;
+
+		slack = max(slack, slack_last_objs);
+
+		if (slack < dql->lowest_slack)
+			dql->lowest_slack = slack;
+
+		if (time_after(jiffies,
+			       dql->slack_start_time + dql->slack_hold_time)) {
+			limit = POSDIFF(limit, dql->lowest_slack);
+			dql->slack_start_time = jiffies;
+			dql->lowest_slack = -1UL;
+		}
+	}
+
+	/* Enforce bounds on limit */
+	limit = clamp(limit, dql->min_limit, dql->max_limit);
+
+	if (limit != dql->limit) {
+		dql->limit = limit;
+		ovlimit = 0;
+	}
+
+	dql->prev_ovlimit = ovlimit;
+	dql->prev_last_obj_cnt = dql->last_obj_cnt;
+	dql->num_completed = completed;
+	dql->prev_num_queued = dql->num_queued;
+}
+EXPORT_SYMBOL(dql_completed);
+
+void dql_reset(struct dql *dql)
+{
+	/* Reset all dynamic values */
+	dql->limit = 0;
+	dql->num_queued = 0;
+	dql->num_completed = 0;
+	dql->last_obj_cnt = 0;
+	dql->prev_num_queued = 0;
+	dql->prev_last_obj_cnt = 0;
+	dql->prev_ovlimit = 0;
+	dql->lowest_slack = -1UL;
+	dql->slack_start_time = jiffies;
+}
+EXPORT_SYMBOL(dql_reset);
+
+int dql_init(struct dql *dql, unsigned hold_time)
+{
+	dql->max_limit = DQL_MAX_LIMIT;
+	dql->min_limit = 0;
+	dql->slack_hold_time = hold_time;
+	dql_reset(dql);
+	return 0;
+}
+EXPORT_SYMBOL(dql_init);
-- 
1.7.3.1


^ permalink raw reply related

* [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: Tom Herbert @ 2011-08-08  4:43 UTC (permalink / raw)
  To: davem, netdev

Changes from last version:
- Simplified and generalized driver interface.  Drivers need to
  implement two functions:
    netdev_tx_completed_queue: Called at end of transmit completion
      to inform stack of number of bytes and packets processed.
    netdev_tx_sent_queue: Called to inform stack when packets are
      queued.

    netdev_tx_reset_queue: is optional to reset state in the stack

- Added new per queue flags that allow stack to stop a queue
  separately from driver doing this.  Driver continue using the
  same functions to stop queues, but there are two functions that
  the stack calls (to check if queue has been stopped by driver or
  stack:

  netif_xmit_stopped,netif_xmit_frozen_or_stopped

- Added example support for bnx2x and sfc (demonstrates operation over
  multi-queue)

- Removed BQL being under CONFIG_RPS (didn't add CONFIG_BQL)

- Still needs some more testing, including ishowing benfits to high
  priority packets in QoS.
----

This patch series implements byte queue limits (bql) for NIC TX queues.

Byte queue limits are a mechanism to limit the size of the transmit
hardware queue on a NIC by number of bytes. The goal of these byte
limits is too reduce latency caused by excessive queuing in hardware
without sacrificing throughput.

Hardware queuing limits are typically specified in terms of a number
hardware descriptors, each of which has a variable size. The variability
of the size of individual queued items can have a very wide range. For
instance with the e1000 NIC the size could range from 64 bytes to 4K
(with TSO enabled). This variability makes it next to impossible to
choose a single queue limit that prevents starvation and provides lowest
possible latency.

The objective of byte queue limits is to set the limit to be the
minimum needed to prevent starvation between successive transmissions to
the hardware. The latency between two transmissions can be variable in a
system. It is dependent on interrupt frequency, NAPI polling latencies,
scheduling of the queuing discipline, lock contention, etc. Therefore we
propose that byte queue limits should be dynamic and change in
iaccordance with networking stack latencies a system encounters.

Patches to implement this:
Patch 1: Dynamic queue limits (dql) library.  This provides the general
queuing algorithm.
Patch 2: netdev changes that use dlq to support byte queue limits.
Patch 3: Support in forcedeth drvier for byte queue limits.

The effects of BQL are demonstrated in the benchmark results below.
These were made running 200 stream of netperf RR tests:

140000 rr size
BQL: 80-215K bytes in queue, 856 tps, 3.26%
No BQL: 2700-2930K bytes in queue, 854 tps, 3.71% cpu

14000 rr size
BQ: 25-55K bytes in queue, 8500 tps
No BQL: 1500-1622K bytes in queue,  8523 tps, 4.53% cpu

1400 rr size
BQL: 20-38K in queue bytes in queue, 86582 tps,  7.38% cpu
No BQL: 29-117K 85738 tps, 7.67% cpu

140 rr size
BQL: 1-10K bytes in queue, 320540 tps, 34.6% cpu
No BQL: 1-13K bytes in queue, 323158, 37.16% cpu

1 rr size
BQL: 0-3K in queue, 338811 tps, 41.41% cpu
No BQL: 0-3K in queue, 339947 42.36% cpu

The amount of queuing in the NIC is reduced up to 90%, and I haven't
yet seen a consistent negative impact in terms of throughout or
CPU utilization.

^ permalink raw reply

* Re: [Bug 40542] overflow/panic on KVM hipervizor
From: Brad Campbell @ 2011-08-08  1:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: bugzilla-daemon, kvm, slawek, netdev
In-Reply-To: <4E3EAA84.7040708@redhat.com>

On 07/08/11 23:08, Avi Kivity wrote:
> On 08/07/2011 04:39 PM, Brad Campbell wrote:
>>
>> This looks like the bug I've been fighting with on and off.
>
> What's the bugzilla number for that?
>
> (unfortunately, no great insight except for "CLOSED DUPLICATE")
>
> hopefully someone from netdev can take a look, DNAT is seriously broken.
>
I can reproduce it at will, but it's on a live production machine. I've just ordered a second 
machine which I can use to reproduce and test against. From a bisection standpoint I'm about half 
way between 2.6.35 & 2.6.36, but until the second machine arrives I'm just unable to chase it any 
further.

Brad

^ permalink raw reply

* Re: include/linux/netlink.h: problem when included by an application
From: Michel Machado @ 2011-08-07 22:14 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1312679707.2591.987.camel@deadeye>

> >    The simplest solution that I came up was replacing sa_family_t in
> > include/linux/netlink.h to 'unsigned short' as header
> > include/linux/socket.h does for struct __kernel_sockaddr_storage
> > available to applications.
> 
> Maybe we should do something like this in <linux/socket.h>:
> 
> typedef unsigned short __kernel_sa_family_t;
> #ifdef __KERNEL__
> typedef __kernel_sa_family_t sa_family_t;
> #endif
> 
> and then use __kernel_sa_family_t in <linux/netlink.h>.
> 
> Ben.

   I like this solution, it solves both struct __kernel_sockaddr_storage
in include/linux/socket.h, and struct sockaddr_nl in
include/linux/netlink.h.

[ ]'s
Michel Machado


^ permalink raw reply

* [PATCH] ipv4: use dst with ref during bcast/mcast loopback
From: Julian Anastasov @ 2011-08-07 20:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


	Make sure skb dst has reference when moving to
another context. Currently, I don't see protocols that can
hit it when sending broadcasts/multicasts to loopback using
noref dsts, so it is just a precaution.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---

	Please, review and apply if needed...

diff -urp v3.0/linux/net/ipv4/ip_output.c linux/net/ipv4/ip_output.c
--- v3.0/linux/net/ipv4/ip_output.c	2011-07-22 09:43:32.000000000 +0300
+++ linux/net/ipv4/ip_output.c	2011-08-07 22:21:23.909347184 +0300
@@ -122,6 +122,7 @@ static int ip_dev_loopback_xmit(struct s
 	newskb->pkt_type = PACKET_LOOPBACK;
 	newskb->ip_summed = CHECKSUM_UNNECESSARY;
 	WARN_ON(!skb_dst(newskb));
+	skb_dst_force(newskb);
 	netif_rx_ni(newskb);
 	return 0;
 }

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox