Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 03/14] acenic: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-30  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Ian Campbell, Jes Sorensen, linux-acenic
In-Reply-To: <1314695847.10283.102.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: linux-acenic@sunsite.dk
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/alteon/acenic.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/alteon/acenic.c b/drivers/net/ethernet/alteon/acenic.c
index 1d6f2db..8794cf8 100644
--- a/drivers/net/ethernet/alteon/acenic.c
+++ b/drivers/net/ethernet/alteon/acenic.c
@@ -2485,9 +2485,9 @@ restart:
 			info = ap->skb->tx_skbuff + idx;
 			desc = ap->tx_ring + idx;
 
-			mapping = pci_map_page(ap->pdev, frag->page,
-					       frag->page_offset, frag->size,
-					       PCI_DMA_TODEVICE);
+			mapping = skb_frag_dma_map(&ap->pdev->dev, frag, 0,
+						   frag->size,
+						   PCI_DMA_TODEVICE);
 
 			flagsize = (frag->size << 16);
 			if (skb->ip_summed == CHECKSUM_PARTIAL)
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 02/14] 8139cp: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-30  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Ian Campbell
In-Reply-To: <1314695847.10283.102.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/realtek/8139cp.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 5d2d1b8..c77d5af 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -784,8 +784,7 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
 
 			len = this_frag->size;
 			mapping = dma_map_single(&cp->pdev->dev,
-						 ((void *) page_address(this_frag->page) +
-						  this_frag->page_offset),
+						 skb_frag_address(this_frag),
 						 len, PCI_DMA_TODEVICE);
 			eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
 
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 01/14] 3c59x: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-30  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Ian Campbell, Steffen Klassert
In-Reply-To: <1314695847.10283.102.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/3com/3c59x.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c
index 6e1f595..9ca45dc 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -2179,9 +2179,10 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
 			vp->tx_ring[entry].frag[i+1].addr =
-					cpu_to_le32(pci_map_single(VORTEX_PCI(vp),
-											   (void*)page_address(frag->page) + frag->page_offset,
-											   frag->size, PCI_DMA_TODEVICE));
+					cpu_to_le32(pci_map_single(
+						VORTEX_PCI(vp),
+						(void *)skb_frag_address(frag),
+						frag->size, PCI_DMA_TODEVICE));
 
 			if (i == skb_shinfo(skb)->nr_frags-1)
 					vp->tx_ring[entry].frag[i+1].length = cpu_to_le32(frag->size|LAST_FRAG);
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 10/14] intel: convert to SKB paged frag API.
From: Ian Campbell @ 2011-08-30  9:18 UTC (permalink / raw)
  To: netdev
  Cc: Ian Campbell, e1000-devel, Bruce Allan, Jesse Brandeburg,
	John Ronciak
In-Reply-To: <1314695847.10283.102.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: e1000-devel@lists.sourceforge.net
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000/e1000_main.c     |   16 +++++++++-------
 drivers/net/ethernet/intel/e1000e/netdev.c        |    7 +++----
 drivers/net/ethernet/intel/igb/igb_main.c         |    5 +----
 drivers/net/ethernet/intel/igbvf/netdev.c         |    5 +----
 drivers/net/ethernet/intel/ixgb/ixgb_main.c       |    6 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |    3 +--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   10 ++++------
 7 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 4a32c15..27f586a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2911,9 +2911,10 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
 
 		frag = &skb_shinfo(skb)->frags[f];
 		len = frag->size;
-		offset = frag->page_offset;
+		offset = 0;
 
 		while (len) {
+			unsigned long bufend;
 			i++;
 			if (unlikely(i == tx_ring->count))
 				i = 0;
@@ -2927,18 +2928,19 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
 			/* Workaround for potential 82544 hang in PCI-X.
 			 * Avoid terminating buffers within evenly-aligned
 			 * dwords. */
+			bufend = (unsigned long)
+				page_to_phys(skb_frag_page(frag));
+			bufend += offset + size - 1;
 			if (unlikely(adapter->pcix_82544 &&
-			    !((unsigned long)(page_to_phys(frag->page) + offset
-			                      + size - 1) & 4) &&
-			    size > 4))
+				     !(bufend & 4) &&
+				     size > 4))
 				size -= 4;
 
 			buffer_info->length = size;
 			buffer_info->time_stamp = jiffies;
 			buffer_info->mapped_as_page = true;
-			buffer_info->dma = dma_map_page(&pdev->dev, frag->page,
-							offset,	size,
-							DMA_TO_DEVICE);
+			buffer_info->dma = skb_frag_dma_map(&pdev->dev, frag,
+						offset, size, DMA_TO_DEVICE);
 			if (dma_mapping_error(&pdev->dev, buffer_info->dma))
 				goto dma_error;
 			buffer_info->next_to_watch = i;
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9742bc6..b585383 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4677,7 +4677,7 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
 
 		frag = &skb_shinfo(skb)->frags[f];
 		len = frag->size;
-		offset = frag->page_offset;
+		offset = 0;
 
 		while (len) {
 			i++;
@@ -4690,9 +4690,8 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
 			buffer_info->length = size;
 			buffer_info->time_stamp = jiffies;
 			buffer_info->next_to_watch = i;
-			buffer_info->dma = dma_map_page(&pdev->dev, frag->page,
-							offset, size,
-							DMA_TO_DEVICE);
+			buffer_info->dma = skb_frag_dma_map(&pdev->dev, frag,
+						offset, size, DMA_TO_DEVICE);
 			buffer_info->mapped_as_page = true;
 			if (dma_mapping_error(&pdev->dev, buffer_info->dma))
 				goto dma_error;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8016084..3cb1bc9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4174,10 +4174,7 @@ static inline int igb_tx_map_adv(struct igb_ring *tx_ring, struct sk_buff *skb,
 		buffer_info->time_stamp = jiffies;
 		buffer_info->next_to_watch = i;
 		buffer_info->mapped_as_page = true;
-		buffer_info->dma = dma_map_page(dev,
-						frag->page,
-						frag->page_offset,
-						len,
+		buffer_info->dma = skb_frag_dma_map(dev, frag, 0, len,
 						DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, buffer_info->dma))
 			goto dma_error;
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index a6bdb3c..b3d760b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -2061,10 +2061,7 @@ static inline int igbvf_tx_map_adv(struct igbvf_adapter *adapter,
 		buffer_info->time_stamp = jiffies;
 		buffer_info->next_to_watch = i;
 		buffer_info->mapped_as_page = true;
-		buffer_info->dma = dma_map_page(&pdev->dev,
-						frag->page,
-						frag->page_offset,
-						len,
+		buffer_info->dma = skb_frag_dma_map(&pdev->dev, frag, 0, len,
 						DMA_TO_DEVICE);
 		if (dma_mapping_error(&pdev->dev, buffer_info->dma))
 			goto dma_error;
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index b8ef2c0..c8b9c90 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -1341,7 +1341,7 @@ ixgb_tx_map(struct ixgb_adapter *adapter, struct sk_buff *skb,
 
 		frag = &skb_shinfo(skb)->frags[f];
 		len = frag->size;
-		offset = frag->page_offset;
+		offset = 0;
 
 		while (len) {
 			i++;
@@ -1361,8 +1361,8 @@ ixgb_tx_map(struct ixgb_adapter *adapter, struct sk_buff *skb,
 			buffer_info->time_stamp = jiffies;
 			buffer_info->mapped_as_page = true;
 			buffer_info->dma =
-				dma_map_page(&pdev->dev, frag->page,
-					     offset, size, DMA_TO_DEVICE);
+				skb_frag_dma_map(&pdev->dev, frag, offset, size,
+						 DMA_TO_DEVICE);
 			if (dma_mapping_error(&pdev->dev, buffer_info->dma))
 				goto dma_error;
 			buffer_info->next_to_watch = 0;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e8aad76..7dba3ab 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6562,8 +6562,7 @@ static void ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 		offset = 0;
 		tx_flags |= IXGBE_TX_FLAGS_MAPPED_AS_PAGE;
 
-		dma = dma_map_page(dev, frag->page, frag->page_offset,
-				   size, DMA_TO_DEVICE);
+		dma = skb_frag_dma_map(dev, frag, 0, size, DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, dma))
 			goto dma_error;
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index b1e1c2d..3bc38e1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2919,18 +2919,16 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 
 		frag = &skb_shinfo(skb)->frags[f];
 		len = min((unsigned int)frag->size, total);
-		offset = frag->page_offset;
+		offset = 0;
 
 		while (len) {
 			tx_buffer_info = &tx_ring->tx_buffer_info[i];
 			size = min(len, (unsigned int)IXGBE_MAX_DATA_PER_TXD);
 
 			tx_buffer_info->length = size;
-			tx_buffer_info->dma = dma_map_page(&adapter->pdev->dev,
-							   frag->page,
-							   offset,
-							   size,
-							   DMA_TO_DEVICE);
+			tx_buffer_info->dma =
+				skb_frag_dma_map(&adapter->pdev->dev, frag,
+						 offset, size, DMA_TO_DEVICE);
 			tx_buffer_info->mapped_as_page = true;
 			if (dma_mapping_error(&pdev->dev, tx_buffer_info->dma))
 				goto dma_error;
-- 
1.7.2.5


------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* [PATCH 0/14] skb fragment API: convert network drivers (part I)
From: Ian Campbell @ 2011-08-30  9:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org

The following series converts the first batch of network drivers to the
SKB pages fragment API introduced in 131ea6675c76. I expect there will
be ~4 similarly sized batches to convert all the drivers over.

This is part of my series to enable visibility into SKB paged fragment's
lifecycles, [0] contains some more background and rationale but
basically the completed series will allow entities which inject pages
into the networking stack to receive a notification when the stack has
really finished with those pages (i.e. including retransmissions,
clones, pull-ups etc) and not just when the original skb is finished
with, which is beneficial to many subsystems which wish to inject pages
into the network stack without giving up full ownership of those page's
lifecycle. It implements something broadly along the lines of what was
described in [1].

Cheers,
Ian.

[0] http://marc.info/?l=linux-netdev&m=131072801125521&w=2
[1] http://marc.info/?l=linux-netdev&m=130925719513084&w=2

^ permalink raw reply

* [PATCH RFT] bnx2: don't request firmware when there's no userspace.
From: Francois Romieu @ 2011-08-30  7:34 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev

The firmware is cached during the first successful call to open() and
released once the network device is unregistered. The driver uses the
cached firmware between open() and unregister_netdev().

It's similar to 953a12cc2889d1be92e80a2d0bab5ffef4942300 but the
firmware is mandatory.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Michael Chan <mchan@broadcom.com>
---

 Tester(s) needed. There should be a difference of behavior when the
 driver is included in a monolithic kernel and the firmwares are not
 included in the vmimage.

 drivers/net/bnx2.c |   60 ++++++++++++++++++++++++++++++++--------------------
 1 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4b2b570..fb54afd 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3649,8 +3649,15 @@ check_mips_fw_entry(const struct firmware *fw,
 	return 0;
 }
 
+static void bnx2_release_firmware(struct bnx2 *bp)
+{
+	release_firmware(bp->mips_firmware);
+	release_firmware(bp->rv2p_firmware);
+	bp->rv2p_firmware = NULL;
+}
+
 static int __devinit
-bnx2_request_firmware(struct bnx2 *bp)
+bnx2_request_uncached_firmware(struct bnx2 *bp)
 {
 	const char *mips_fw_file, *rv2p_fw_file;
 	const struct bnx2_mips_fw_file *mips_fw;
@@ -3672,13 +3679,13 @@ bnx2_request_firmware(struct bnx2 *bp)
 	rc = request_firmware(&bp->mips_firmware, mips_fw_file, &bp->pdev->dev);
 	if (rc) {
 		pr_err("Can't load firmware file \"%s\"\n", mips_fw_file);
-		return rc;
+		goto out;
 	}
 
 	rc = request_firmware(&bp->rv2p_firmware, rv2p_fw_file, &bp->pdev->dev);
 	if (rc) {
 		pr_err("Can't load firmware file \"%s\"\n", rv2p_fw_file);
-		return rc;
+		goto err_release_mips_firmware;
 	}
 	mips_fw = (const struct bnx2_mips_fw_file *) bp->mips_firmware->data;
 	rv2p_fw = (const struct bnx2_rv2p_fw_file *) bp->rv2p_firmware->data;
@@ -3689,16 +3696,30 @@ bnx2_request_firmware(struct bnx2 *bp)
 	    check_mips_fw_entry(bp->mips_firmware, &mips_fw->tpat) ||
 	    check_mips_fw_entry(bp->mips_firmware, &mips_fw->txp)) {
 		pr_err("Firmware file \"%s\" is invalid\n", mips_fw_file);
-		return -EINVAL;
+		rc = -EINVAL;
+		goto err_release_firmware;
 	}
 	if (bp->rv2p_firmware->size < sizeof(*rv2p_fw) ||
 	    check_fw_section(bp->rv2p_firmware, &rv2p_fw->proc1.rv2p, 8, true) ||
 	    check_fw_section(bp->rv2p_firmware, &rv2p_fw->proc2.rv2p, 8, true)) {
 		pr_err("Firmware file \"%s\" is invalid\n", rv2p_fw_file);
-		return -EINVAL;
+		rc = -EINVAL;
+		goto err_release_firmware;
 	}
+out:
+	return rc;
 
-	return 0;
+err_release_firmware:
+	release_firmware(bp->rv2p_firmware);
+	bp->rv2p_firmware = NULL;
+err_release_mips_firmware:
+	release_firmware(bp->mips_firmware);
+	goto out;
+}
+
+static int bnx2_request_firmware(struct bnx2 *bp)
+{
+	return bp->rv2p_firmware ? 0 : bnx2_request_uncached_firmware(bp);
 }
 
 static u32
@@ -6266,6 +6287,10 @@ bnx2_open(struct net_device *dev)
 	struct bnx2 *bp = netdev_priv(dev);
 	int rc;
 
+	rc = bnx2_request_firmware(bp);
+	if (rc < 0)
+		goto out;
+
 	netif_carrier_off(dev);
 
 	bnx2_set_power_state(bp, PCI_D0);
@@ -6326,8 +6351,8 @@ bnx2_open(struct net_device *dev)
 		netdev_info(dev, "using MSIX\n");
 
 	netif_tx_start_all_queues(dev);
-
-	return 0;
+out:
+	return rc;
 
 open_err:
 	bnx2_napi_disable(bp);
@@ -6335,7 +6360,8 @@ open_err:
 	bnx2_free_irq(bp);
 	bnx2_free_mem(bp);
 	bnx2_del_napi(bp);
-	return rc;
+	bnx2_release_firmware(bp);
+	goto out;
 }
 
 static void
@@ -8353,10 +8379,6 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pci_set_drvdata(pdev, dev);
 
-	rc = bnx2_request_firmware(bp);
-	if (rc)
-		goto error;
-
 	memcpy(dev->dev_addr, bp->mac_addr, 6);
 	memcpy(dev->perm_addr, bp->mac_addr, 6);
 
@@ -8387,11 +8409,6 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 error:
-	if (bp->mips_firmware)
-		release_firmware(bp->mips_firmware);
-	if (bp->rv2p_firmware)
-		release_firmware(bp->rv2p_firmware);
-
 	if (bp->regview)
 		iounmap(bp->regview);
 	pci_release_regions(pdev);
@@ -8412,11 +8429,6 @@ bnx2_remove_one(struct pci_dev *pdev)
 	del_timer_sync(&bp->timer);
 	cancel_work_sync(&bp->reset_task);
 
-	if (bp->mips_firmware)
-		release_firmware(bp->mips_firmware);
-	if (bp->rv2p_firmware)
-		release_firmware(bp->rv2p_firmware);
-
 	if (bp->regview)
 		iounmap(bp->regview);
 
@@ -8427,6 +8439,8 @@ bnx2_remove_one(struct pci_dev *pdev)
 		bp->flags &= ~BNX2_FLAG_AER_ENABLED;
 	}
 
+	bnx2_release_firmware(bp);
+
 	free_netdev(dev);
 
 	pci_release_regions(pdev);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [patch 1/2] 9p: move dereference after NULL check
From: Aneesh Kumar K.V @ 2011-08-30  7:39 UTC (permalink / raw)
  To: Dan Carpenter, Eric Van Hensbergen
  Cc: David S. Miller, Venkateswararao Jujjuri, M. Mohan Kumar,
	open list:NETWORKING [GENERAL], kernel-janitors
In-Reply-To: <20110826165559.GE3775@shale.localdomain>

On Fri, 26 Aug 2011 19:55:59 +0300, Dan Carpenter <error27@gmail.com> wrote:
> We dereferenced "req->tc" and "req->rc" before checking for NULL.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 3f8c046..b0bcace 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -248,10 +248,8 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag, int max_size)
>  		init_waitqueue_head(req->wq);
>  		req->tc = kmalloc(sizeof(struct p9_fcall) + alloc_msize,
>  				  GFP_NOFS);
> -		req->tc->capacity = alloc_msize;
>  		req->rc = kmalloc(sizeof(struct p9_fcall) + alloc_msize,
>  				  GFP_NOFS);
> -		req->rc->capacity = alloc_msize;
>  		if ((!req->tc) || (!req->rc)) {
>  			printk(KERN_ERR "Couldn't grow tag array\n");
>  			kfree(req->tc);
> @@ -261,6 +259,8 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag, int max_size)
>  			req->wq = NULL;
>  			return ERR_PTR(-ENOMEM);
>  		}
> +		req->tc->capacity = alloc_msize;
> +		req->rc->capacity = alloc_msize;
>  		req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>  		req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>  	}

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh

^ permalink raw reply

* Re: [patch 2/2] 9p: change an int to unsigned int
From: Aneesh Kumar K.V @ 2011-08-30  7:38 UTC (permalink / raw)
  To: Dan Carpenter, Eric Van Hensbergen
  Cc: David S. Miller, Venkateswararao Jujjuri (JV), M. Mohan Kumar,
	Stephen Hemminger, open list:NETWORKING [GENERAL],
	kernel-janitors
In-Reply-To: <20110826165740.GF3775@shale.localdomain>

On Fri, 26 Aug 2011 19:57:40 +0300, Dan Carpenter <error27@gmail.com> wrote:
> The size of things should be unsigned because negative sizes are
> silly.  My concern is the the limit checks don't take negative values
> into consideration in p9_client_create()
> 	if (clnt->msize > clnt->trans_mod->maxsize)
> 		clnt->msize = clnt->trans_mod->maxsize;
> and in p9_tag_alloc()
> 	int alloc_msize = min(c->msize, max_size);
> 
> I don't know if this is exported to user space?  Hopefully it's not
> too late to change this.

The change is also needed to make sure large msize value (429496729) works
Without the change it cause a server crash with Qemu 9p server.

> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 55ce72c..d479d7d 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -151,7 +151,7 @@ struct p9_req_t {
> 
>  struct p9_client {
>  	spinlock_t lock; /* protect client structure */
> -	int msize;
> +	unsigned int msize;
>  	unsigned char proto_version;
>  	struct p9_trans_module *trans_mod;
>  	enum p9_trans_status status;

I applied this with comment update to 
git://git.kernel.org/pub/scm/linux/kernel/git/kvaneesh/v9fs.git for-upstream-next-merge

-aneesh

^ permalink raw reply

* [PATCH] sctp: deal with multiple COOKIE_ECHO chunks
From: Max Matveev @ 2011-08-30  7:02 UTC (permalink / raw)
  To: linux-sctp; +Cc: netdev, vladislav.yasevich

Attempt to reduce the number of IP packets emitted in response to single
SCTP packet (2e3216cd) introduced a complication - if a packet contains
two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
socket while processing first COOKIE_ECHO and then loses the association
and forgets to uncork the socket. To deal with the issue add new SCTP
command which can be used to set association explictly. Use this new
command when processing second COOKIE_ECHO chunk to restore the context
for SCTP state machine.

Signed-off-by: Max Matveev <makc@redhat.com>
---
 include/net/sctp/command.h |    1 +
 net/sctp/sm_sideeffect.c   |    5 +++++
 net/sctp/sm_statefuns.c    |    6 ++++++
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h
index 6506458..712b3be 100644
--- a/include/net/sctp/command.h
+++ b/include/net/sctp/command.h
@@ -109,6 +109,7 @@ typedef enum {
 	SCTP_CMD_SEND_MSG,	 /* Send the whole use message */
 	SCTP_CMD_SEND_NEXT_ASCONF, /* Send the next ASCONF after ACK */
 	SCTP_CMD_PURGE_ASCONF_QUEUE, /* Purge all asconf queues.*/
+	SCTP_CMD_SET_ASOC,	 /* Restore association context */
 	SCTP_CMD_LAST
 } sctp_verb_t;
 
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 167c880..76388b0 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1689,6 +1689,11 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 		case SCTP_CMD_PURGE_ASCONF_QUEUE:
 			sctp_asconf_queue_teardown(asoc);
 			break;
+
+		case SCTP_CMD_SET_ASOC:
+			asoc = cmd->obj.asoc;
+			break;
+
 		default:
 			pr_warn("Impossible command: %u, %p\n",
 				cmd->verb, cmd->obj.ptr);
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 49b847b..1949703 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -2047,6 +2047,12 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const struct sctp_endpoint *ep,
 	sctp_add_cmd_sf(commands, SCTP_CMD_NEW_ASOC, SCTP_ASOC(new_asoc));
 	sctp_add_cmd_sf(commands, SCTP_CMD_DELETE_TCB, SCTP_NULL());
 
+	/* Restore association pointer to provide SCTP command interpeter 
+ 	 * with a valid context in case it needs to manipulate 
+ 	 * the queues */
+	sctp_add_cmd_sf(commands, SCTP_CMD_SET_ASOC,
+			 SCTP_ASOC((struct sctp_association *)asoc));
+
 	return retval;
 
 nomem:
-- 
1.7.4.4

^ permalink raw reply related

* 最新广交会买家库+500万海外买家资源+邮件发送方案=每天都有新买家回复！
From: 每天都有新买家 @ 2011-08-30  5:44 UTC (permalink / raw)


500万海外买家资源+邮件发送方案+海外买家Email搜索机=每天都有客户回复！
500万海外买家资源+邮件发送方案+海外买家Email搜索机=每天都有客户回复！

1、500万买家资源： 赠送的500万买家资源库，更新 (可以按照您的产品提取出来，更精准开发)。 
2、超级海外买家Email搜索机： 内置了600万行业关键词，根据长尾词搜索，结果更精确匹配；每天能搜索1-2万以上买家真实EMAIL，成单率高。 
3、群发软件： 操作简单，功能强大，模仿人工操作模式，到达率高，日发送5万封以上。 
 
支持支付宝担保交易 (先发货并安装设置群发软件，然后付款) 彻底打消您的 顾虑。

请联系：
QQ:1339625218   email: 1339625218@qq.com
QQ:1339625218   email: 1339625218@qq.com
QQ:1339625218   email: 1339625218@qq.com
QQ:1339625218   email: 1339625218@qq.com
QQ:1339625218   email: 1339625218@qq.com
QQ:1339625218   email: 1339625218@qq.com

免费赠送:
1、超级海外买家搜索机,可以轻松扫描出 500-1000万买家Email，建立自己的营销数据库。
2、2009年海关提单数据piers版数据 1千万。
3、2011春季109届广交会买家现场询盘数据库新鲜出炉，超级新鲜买家，新鲜数据，容易成单！ 
4、2008年至2010年 春季+秋季广交会买家名录，103 104 105 106 107 108 共六届，共120.6万数据。 
5、最新全球买家库,共451660条数据，每个均有Email。
6、2010年国际促销协会（PPAI）成员名单 PPAI Members Directory，非常重要的大买家。
7、2010年到香港采购的国外客人名录(香港贸发局提供)，共7.2万数据，超级重要的买家。
8、100万条国外贸易询盘，每个均有Email。
9、买家资源更新中。。。
全部都有。
 

^ permalink raw reply

* Re: BQL crap and wireless
From: Andrew McGregor @ 2011-08-30  4:23 UTC (permalink / raw)
  To: Adrian Chadd
  Cc: Tom Herbert, Jim Gettys, Luis R. Rodriguez, Dave Taht,
	linux-wireless, Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAJ-Vmonwur-SXddNwjPEidCMqes+PwbRWFBddfdwTp2jOMu64g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


On 30/08/2011, at 3:42 PM, Adrian Chadd wrote:

> On 30 August 2011 11:34, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> The generalization of BQL would be to set the queue limit in terms of
>> a cost function implemented by the driver.  The cost function would
>> most likely be an estimate of time to transmit a packet.  

That's a great idea.  Best that it be in nanoseconds, we may well have some seriously fast network interfaces to deal with.

>> So C(P)
>> could represent cost of a packet, sum(C(P) for P queued) is aggregate
>> cost of queue packets, and queue limit is the maximum cost sum.  For
>> wired Ethernet, number of bytes in packet might be a reasonable
>> function (although framing cost could be included, but I'm not sure
>> that would make a material difference).  For wireless, maybe the
>> function could be more complex possibly taking multicast, previous
>> history of transmission times, or other arbitrary characteristics of
>> the packet into account...
>> 
>> I can post a new patch with this generalization if this is interesting.
> 
> As I said before, I think this is the kind of thing the rate control
> code needs to get its dirty hands into.
> 
> With 802.11 you have to care about the PHY side of things too, so your
> cost suddenly would include the PER for combinations of {remote node,
> antenna setup, TX rate, sub-frame length, aggregate length}, etc. Do
> you choose that up front and then match a cost to it, or do you
> involve the rate control code in deciding a "good enough" way of
> handling what's on the queue by making rate decisions, then implement
> random/weighted/etc drop of what's left? Do you do some weighted/etc
> drop beforehand in the face of congestion, then pass what's left to
> the rate control code, then discard the rest?

Since Minstrel already knows an estimate of the PER to each remote node (expressed in terms of success probability per shot, so there's a bit of math to do), and the stack knows about transmit times, implementing a way to ask the question isn't particularly hard.  Other rate controls could make up their own guesstimates based on whatever factors they want to use.

However, that's going to change rapidly, so I suggest that we want some backlog grooming on a regular basis (like, after each rate control iteration) that might well reevaluate and drop or mark packets that are already in the queues.

> 
> C(P) is going to be quite variable - a full frame retransmit of a 4ms
> long aggregate frame is SUM(exponential backoff, grab the air,
> preamble, header, 4ms, etc. for each pass.)
> 
> 
> Adrian

Indeed.

Andrew

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BQL crap and wireless
From: Adrian Chadd @ 2011-08-30  3:42 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jim Gettys, Luis R. Rodriguez, Dave Taht, linux-wireless,
	Andrew McGregor, Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CA+mtBx-j_+mrNpntR7CfYOromDnM-ZGzvebpUCjC8Qf14X6u9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 30 August 2011 11:34, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:

> The generalization of BQL would be to set the queue limit in terms of
> a cost function implemented by the driver.  The cost function would
> most likely be an estimate of time to transmit a packet.  So C(P)
> could represent cost of a packet, sum(C(P) for P queued) is aggregate
> cost of queue packets, and queue limit is the maximum cost sum.  For
> wired Ethernet, number of bytes in packet might be a reasonable
> function (although framing cost could be included, but I'm not sure
> that would make a material difference).  For wireless, maybe the
> function could be more complex possibly taking multicast, previous
> history of transmission times, or other arbitrary characteristics of
> the packet into account...
>
> I can post a new patch with this generalization if this is interesting.

As I said before, I think this is the kind of thing the rate control
code needs to get its dirty hands into.

With 802.11 you have to care about the PHY side of things too, so your
cost suddenly would include the PER for combinations of {remote node,
antenna setup, TX rate, sub-frame length, aggregate length}, etc. Do
you choose that up front and then match a cost to it, or do you
involve the rate control code in deciding a "good enough" way of
handling what's on the queue by making rate decisions, then implement
random/weighted/etc drop of what's left? Do you do some weighted/etc
drop beforehand in the face of congestion, then pass what's left to
the rate control code, then discard the rest?

C(P) is going to be quite variable - a full frame retransmit of a 4ms
long aggregate frame is SUM(exponential backoff, grab the air,
preamble, header, 4ms, etc. for each pass.)

Adrian
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BQL crap and wireless
From: Tom Herbert @ 2011-08-30  3:34 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Luis R. Rodriguez, Dave Taht, linux-wireless, Andrew McGregor,
	Matt Smith, Kevin Hayes, Derek Smithies, netdev
In-Reply-To: <4E5C3B47.1050809@freedesktop.org>

> Computing the buffering in bytes is better than in packets; but since on
> wireless multicast/broadcast is transmitted at a radically different
> rate than other packets, I expect something based on time is really the
> long term solution; and only the driver has any idea how long a packet
> of a given flavour will likely take to transmit.

The generalization of BQL would be to set the queue limit in terms of
a cost function implemented by the driver.  The cost function would
most likely be an estimate of time to transmit a packet.  So C(P)
could represent cost of a packet, sum(C(P) for P queued) is aggregate
cost of queue packets, and queue limit is the maximum cost sum.  For
wired Ethernet, number of bytes in packet might be a reasonable
function (although framing cost could be included, but I'm not sure
that would make a material difference).  For wireless, maybe the
function could be more complex possibly taking multicast, previous
history of transmission times, or other arbitrary characteristics of
the packet into account...

I can post a new patch with this generalization if this is interesting.

Tom

^ permalink raw reply

* Re: [PATCH net-next 2/5] qlcnic: Change debug messages in loopback path
From: Joe Perches @ 2011-08-30  3:00 UTC (permalink / raw)
  To: Sony Chacko; +Cc: David Miller, netdev, Dept_NX_Linux_NIC_Driver, Manish chopra
In-Reply-To: <1314658231-30735-2-git-send-email-sony.chacko@qlogic.com>

On Mon, 2011-08-29 at 15:50 -0700, Sony Chacko wrote:
> From: Manish chopra <Manish.Chopra@qlogic.com>
> Added more debug messages while loopback test in progress
[]
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c
[]
> @@ -736,13 +736,18 @@ static int qlcnic_do_lb_test(struct qlcnic_adapter *adapter)
[]
> +		if (mode != QLCNIC_ILB_MODE) {
> +			dev_warn(&adapter->pdev->dev,
> +				"WARNING: Please make sure external"
> +				"loopback connector is plugged in\n");

It's better to avoid splitting format strings.
This emits "externalloopback" instead of "external loopback".

> -static void dump_skb(struct sk_buff *skb)
> +static void dump_skb(struct sk_buff *skb, struct qlcnic_adapter *adapter)
>  {
>  	int i;
>  	unsigned char *data = skb->data;
>  
>  	printk(KERN_INFO "\n");
>  	for (i = 0; i < skb->len; i++) {
> -		printk(KERN_INFO "%02x ", data[i]);
> +		QLCDB(adapter, DRV, "%02x ", data[i]);

print_hex_dump

^ permalink raw reply

* Re: linux-next: Tree for Aug 29 (bnx2x)
From: David Miller @ 2011-08-30  2:57 UTC (permalink / raw)
  To: rdunlap; +Cc: dmitry, sfr, netdev, linux-next, linux-kernel, eilong
In-Reply-To: <20110829155618.01a73cce.rdunlap@xenotime.net>

From: Randy Dunlap <rdunlap@xenotime.net>
Date: Mon, 29 Aug 2011 15:56:18 -0700

> On Tue, 30 Aug 2011 00:35:44 +0300 Dmitry Kravkov wrote:
> 
>> On Mon, 2011-08-29 at 13:28 -0700, Randy Dunlap wrote:
>> > On Mon, 29 Aug 2011 16:19:07 +1000 Stephen Rothwell wrote:
>> > 
>> > > Hi all,
>> > 
>> > (on i386 or x86_64)
>> > 
>> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:10148: error: 'bnx2x_fcoe_get_wwn' undeclared here (not in a function)
>> > 
>> > 
>> > Full randconfig file is attached.
>> > 
>> > ---
>> > ~Randy
>> > *** Remember to use Documentation/SubmitChecklist when testing your code ***
>> This should sync #define structures between definition and declaration
>> ---
>> 
>> Reported-by: Randy Dunlap <rdunlap@xenotime.net>
>> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> 
> Acked-by: Randy Dunlap <rdunlap@xenotime.net>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH -next] net: fix Makefile typos & build errors
From: David Miller @ 2011-08-30  2:55 UTC (permalink / raw)
  To: sfr; +Cc: rdunlap, netdev, linux-next, linux-kernel
In-Reply-To: <20110830115925.45984bb8eeb73401dd5d289f@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 30 Aug 2011 11:59:25 +1000

> It may be my old(er) eyes, but isn't that CONFIG_SLIP line repeated?

My old eyes can confirm this, so I'll fix it up, thanks! :-)

^ permalink raw reply

* Re: [PATCH net-next 0/5] qlcnic: Bug fixes
From: David Miller @ 2011-08-30  2:53 UTC (permalink / raw)
  To: sony.chacko; +Cc: netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1314658231-30735-6-git-send-email-sony.chacko@qlogic.com>

From: Sony Chacko <sony.chacko@qlogic.com>
Date: Mon, 29 Aug 2011 15:50:31 -0700

> Please apply to net-next.

Applied, thanks.

I would like to suggest that you move away from driver internal
debug logging macros and either use existing generic kernel
facilities that exist or extend what exists so that it fits
your needs and others can enjoy the improvements too.

Thanks.

^ permalink raw reply

* Re: [PATCH 19/24] sunrpc: Remove unnecessary OOM logging messages
From: David Miller @ 2011-08-30  2:35 UTC (permalink / raw)
  To: Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA
  Cc: bharrosh-C4P08NqkoRlBDgjK7y7TUQ, joe-6d6DIl74uiNBDgjK7y7TUQ,
	bfields-uC3wQj2KruNg9hUCZPvPmw, neilb-l3A5Bk7waGM,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <2E1EB2CF9ED1CB4AA966F0EB76EAB4430AEB4FAF-hX7t0kiaRRrlMGe9HJ1VYQK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>

From: "Myklebust, Trond" <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Date: Mon, 29 Aug 2011 16:25:08 -0700

> I can see that slub.c has the slab_out_of_memory() function that
> (although ratelimited) warns you if the allocation failed. However I
> can't find any equivalent for slab.c or slob.c.

See the page allocator.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BQL crap and wireless
From: Jim Gettys @ 2011-08-30  2:12 UTC (permalink / raw)
  To: Andrew McGregor
  Cc: Luis R. Rodriguez, Dave Taht, Tom Herbert, linux-wireless,
	Matt Smith, Kevin Hayes, Derek Smithies, netdev
In-Reply-To: <903AA8A8-9ACD-44FB-9BA8-50137359EC2B@gmail.com>

On 08/29/2011 09:59 PM, Andrew McGregor wrote:
> On 30/08/2011, at 1:22 PM, Jim Gettys wrote:
>
>> The gotcha is we don't have a AQM algorithm known to work in
>> the face of the highly dynamic bandwidth variation that is wireless
> It's worse than highly dynamic... the bandwidth may be completely undefined moment to moment, as it is dependent on both the wireless environment, which varies on timescales that can be about equal to a packet transmit time, and on the traffic mix.  There's about 30 ms of correlation time at best.
Yup.  It makes ethernet look trivial. If we can handle buffers there,
everywhere else is easy by comparison.
>>  This was/is
>> the great surprise to me as I had always thought of AQM as a property of
>> internet routers, not hosts.
> There's no distinction in the forwarding plane, every router is a host, every host is a router.
Exactly; but I hadn't thought this through to hosts, nor, I think had
many other people.  Naive me, having been scarred by '90's congestion,
was aware of RED, and roughly how it worked, but always thought of AQM
as something you did in routers.  Realising that in principle I needed
it turned on in my laptop was not something I expected.
                        - Jim

^ permalink raw reply

* Re: [PATCH -next] net: fix Makefile typos & build errors
From: Stephen Rothwell @ 2011-08-30  1:59 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: netdev, linux-next, LKML, davem
In-Reply-To: <20110829114940.4d29febc.rdunlap@xenotime.net>

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

Hi all,

On Mon, 29 Aug 2011 11:49:40 -0700 Randy Dunlap <rdunlap@xenotime.net> wrote:
>
> @@ -44,10 +44,10 @@ obj-$(CONFIG_PPP_SYNC_TTY) += ppp/
>  obj-$(CONFIG_PPPOE) += ppp/
>  obj-$(CONFIG_PPPOL2TP) += ppp/
>  obj-$(CONFIG_PPTP) += ppp/
> -onj-$(CONFIG_SLIP) += slip/
> +obj-$(CONFIG_SLIP) += slip/
>  obj-$(CONFIG_SLHC) += slip/
>  obj-$(CONFIG_NET_SB1000) += sb1000.o
> -onj-$(CONFIG_SLIP) += slip/
> +obj-$(CONFIG_SLIP) += slip/

It may be my old(er) eyes, but isn't that CONFIG_SLIP line repeated?

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: BQL crap and wireless
From: Andrew McGregor @ 2011-08-30  1:59 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Luis R. Rodriguez, Dave Taht, Tom Herbert, linux-wireless,
	Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4E5C3B47.1050809-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org>

On 30/08/2011, at 1:22 PM, Jim Gettys wrote:

> The gotcha is we don't have a AQM algorithm known to work in
> the face of the highly dynamic bandwidth variation that is wireless

It's worse than highly dynamic... the bandwidth may be completely undefined moment to moment, as it is dependent on both the wireless environment, which varies on timescales that can be about equal to a packet transmit time, and on the traffic mix.  There's about 30 ms of correlation time at best.

>  This was/is
> the great surprise to me as I had always thought of AQM as a property of
> internet routers, not hosts.

There's no distinction in the forwarding plane, every router is a host, every host is a router.

Andrew--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BQL crap and wireless
From: Adrian Chadd @ 2011-08-30  1:48 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Luis R. Rodriguez, Dave Taht, Tom Herbert, linux-wireless,
	Andrew McGregor, Matt Smith, Kevin Hayes, Derek Smithies, netdev
In-Reply-To: <CAJ-Vmom_EwR31Z6r4VpSiKTwE+YXwmRwtwurzr5VUX6nnC67fg@mail.gmail.com>

.. Whilst knee-deep in it all, I should also mention things like
"transmission opportunity/beacon transmit time being enforced by
hardware" - ie, where the hardware stops a packet TX from occuring
because it would exceed the programmed TXOP window, or because it
would interfere with an upcoming beacon TX from the hostap.

Another 2c,

Adrian

^ permalink raw reply

* Re: BQL crap and wireless
From: Adrian Chadd @ 2011-08-30  1:44 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Luis R. Rodriguez, Dave Taht, Tom Herbert, linux-wireless,
	Andrew McGregor, Matt Smith, Kevin Hayes, Derek Smithies, netdev
In-Reply-To: <4E5C3B47.1050809@freedesktop.org>

On 30 August 2011 09:22, Jim Gettys <jg@freedesktop.org> wrote:

Note: I'm knee deep in the aggregation TX/RX path at the present time
- I'm porting the atheros 802.11n TX aggregation code to FreeBSD.

> Computing the buffering in bytes is better than in packets; but since on
> wireless multicast/broadcast is transmitted at a radically different
> rate than other packets, I expect something based on time is really the
> long term solution; and only the driver has any idea how long a packet
> of a given flavour will likely take to transmit.

And the driver (hopefully!) can find out how long the packet -did-
take to transmit.

There are a bunch of different reasons for why the packet isn't
transmitting or why it can take so long. If (say) an aggregate has 10
hardware (long, short) retries at a high MCS rate, and then 10
software retries, that's up to 100 attempts at transmitting the
sub-frames in some way. It may also involve 10 attempts at an RTS
exchange. But it may also be 10 attempts at transmitting the -whole-
frame. In the case of a long aggregate (say the upper bounds of 4ms,
easily achievable when lower MCS rates are selected), this can take a
long time.

I'm occasionally seeing this in my testing, where the block-ack isn't
seen by the sender. The whole aggregate frame is thus retransmitted in
its entirety. This causes occasional bumps in the testing latency. The
obvious solution is to not form such large aggregates at lower MCS
rates but even single events have an impact on latency.

I'm not at the point yet where I can start tinkering with rate control
and queue management in this way but the idea of asking the rate
control code to manage per-node and overall airtime has crossed my
mind. Ie, the rate control code can see how successful transmissions
are to a given node (at given streams, rates, antenna configurations,
etc) and then enforce aggregate size and retransmission limits there.
Since a decision for any given node will affect the latency on all
subsequent nodes, it makes sense for the rate control code to keep a
global idea of the airtime involved as well as the (current) per-node
logic.

2c, which'll be more when I work the 11n TX A-MPDU kinks out in the
FreeBSD driver,

Adrian

^ permalink raw reply

* Re: BQL crap and wireless
From: Jim Gettys @ 2011-08-30  1:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Dave Taht, Tom Herbert, linux-wireless, Andrew McGregor,
	Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAA93jw7c+Nxc6ZbWZWsQ+F78AoPWU=quSRaOUpT0yRcwJOXsGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 08/29/2011 08:24 PM, Dave Taht wrote:
> On Mon, Aug 29, 2011 at 2:02 PM, Luis R. Rodriguez <mcgrof-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Fri, Aug 26, 2011 at 4:27 PM, Luis R. Rodriguez <mcgrof-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Let me elaborate on 802.11 and bufferbloat as so far I see only crap
>> documentation on this and also random crap adhoc patches.
> I agree that the research into bufferbloat has been an evolving topic, and
> the existing documentation and solutions throughout the web is inaccurate
>  or just plan wrong in many respects. While I've been accumulating better
> and more interesting results as research continues, we're not there yet...
>
>> Given that I
>> see effort on netdev to try to help with latency issues its important
>> for netdev developers to be aware of what issues we do face today and
>> what stuff is being mucked with.
> Hear, Hear!
>
>> As far as I see it I break down the issues into two categories:
>>
>>  * 1. High latencies on ping
>>  * 2. Constant small drops in throughput
> I'll take on 2, in a separate email.
>
>>  1. High latencies on ping
>> ===================
> For starters, no, "high - and wildly varying - latencies on all sorts
> of packets".
>
> Ping is merely a diagnostic tool in this case.
>
> If you would like several gb of packet captures of all sorts of streams
> from various places and circumstances, ask. JG published a long
> series about 7 months back, more are coming.
>
> Regrettably most of the most recent traces come from irreproducible
> circumstances, a flaw we are trying to fix after 'CeroWrt' is finished.
>
>> It seems the bufferbloat folks are blaming the high latencies on our
>> obsession on modern hardware to create huge queues and also with
>> software retries. They assert that reducing the queue length
>> (ATH_MAX_QDEPTH on ath9k) and software retries (ATH_MAX_SW_RETRIES on
>> ath9k) helps with latencies. They have at least empirically tested
>> this with ath9k with
>> a simple patch:

The retries in wireless interact here only because they have encouraged
buffering for the retries.  This is not unique to 802.11, but also
present in 3g networks (there, they fragment packets and put in lots of
buffering hoping to get the packet fragment transmitted at some future
time; they really hate dropping a packet if only a piece got damaged.

>> https://www.bufferbloat.net/attachments/43/580-ath9k_lowlatency.patch
>>
>> The obvious issue with this approach is it assumes STA mode of
>> operation, with an AP you do not want to reduce the queue size like
>> that. In fact because of the dynamic nature of 802.11 and the
> If there is any one assumption about the bufferbloat issue that people
> keep assuming we have, it's this one.
>
> In article after article, in blog post after blog post, people keep
> 'fixing' bufferbloat by setting their queues to very low values,
> and almost miraculously start seeing their  QoS start working
> (which it does), and then they gleefully publish their results
>  as recommendations, and then someone from the bufferbloat
> effort has to go and comment on that piece, whenever we
> notice, to straighten them out.
>
> In no presentation, no documentation, anywhere I know of,
> have we expressed  that queuing as it works today
> is the right thing.
>
> More recently, JG got fed up and wrote these...
>
> http://gettys.wordpress.com/2011/07/06/rant-warning-there-is-no-single-right-answer-for-buffering-ever/
>
> http://gettys.wordpress.com/2011/07/09/rant-warning-there-is-no-single-right-answer-for-buffering-ever-part-2/

Yes, I got really frustrated....

>
> There has been no time, since the inception of the bufferbloat
> concept, have we had a fixed buffer size in any layer of the
> stack as even a potential solution.

Right now, we have typically 2 (large) buffers: the transmit queue and
the driver rings.  Some hardware/software hides buffers in additional
places (e.g. on the OLPC X0-1, there are 4 packets in the wireless
module and 1 hidden in the driver itself. YMWV.
>
> And you just did applied that preconception to us again.
>
> My take on matters is that *unmanaged* buffer sizes > 1 is a
> problem. Others set the number higher.
>
> Of late, given what tools we have, we HAVE been trying to establish
> what *good baseline* queue sizes (txqueues, driver queues, etc)
> actually are for wireless under ANY circumstance that was
> duplicate-able.
>
> For the drivers JG was using last year, that answer was: 0.
>
> Actually, less than 0  would have been good, but that
> would have involved having tachyon emitters in the
> architecture.

Zero is what I set the transmit queue in my *experiments*  ***only***
because I knew by that point the drivers underneath the transmit queue
had another 250 or so packets of buffering on the hardware I (and most
of you) have; I went and looked at quite a few Linux drivers, and
confirmed similar ring buffer sizes on Mac and Windows both empirically
and when possible from driver control panel information.  At the
bandwidth delay product of my experiments, 250 packets is way more than
TCP will ever need.   See:
http://gettys.wordpress.com/2010/11/29/home-router-puzzle-piece-one-fun-with-your-switch/

Most current ethernet and wireless drivers have that much in the
transmit rings today, on all operating systems that I've played with.
The hardware will typically support up to 4096 packet rings, but the
defaults in the drivers seem to be typically in the 200-300 packet range
(sometimes per queue).

Remember that any long lived TCP session (an "elephant" flow), will fill
any size buffer just before the bottleneck link in a path, given time. 
It will fill the buffer at the rate at one packet/ack; in the traces I
took over cable modems you can watch the delay go up  and up cleanly,
and up (in my case, to 1.2 seconds when they filled after of order 10
seconds.  The same thing happens on 802.11 wireless, but its noisier in
my traces as I don't have a handy faraday cage ;-).  An additional
problem, which was a huge surprise to everyone who studied the traces is
that congestion avoidance is getting terminally confused.    And by
delaying packet drop (or ECN marking), TCP never slows down; it actually
continues to speed up (since current TCP algorithms typically do not
take notice of the RTT). The delay is so long that TCP's servo system is
no longer stable and it oscillates with a constant period.  I have no
clue if this is at all related to the other periodic behaviour people
have noticed.  If you think about it, the fact that the delay is several
orders of magnitude larger than the actual delay of the path makes it
less surprising than it might be.

Indeed there is no simple single right answer for buffering; it needs to
be dynamic, and ultimately we need to have AQM
even in hosts to control buffering (think about the case of two
different long lived TCP sessions over vastly different bandwidth/delay
paths).  The gotcha is we don't have a AQM algorithm known to work in
the face of the highly dynamic bandwidth variation that is wireless;
classic RED does not
have the output bandwidth as a parameter in its algorithm.  This was/is
the great surprise to me as I had always thought of AQM as a property of
internet routers, not hosts.

That buffering between the transmit queue is completely divorced from
driver buffering, when it needs to be treated together in some fashion. 
What the "right" way to do that is, I don't know, though Andrew's
interview gave me some hope.  And it needs to be dynamic, over (in the
802.11 case) at least 3 orders of magnitude.

This is a non-trivial, hard problem we have on our hands.

Computing the buffering in bytes is better than in packets; but since on
wireless multicast/broadcast is transmitted at a radically different
rate than other packets, I expect something based on time is really the
long term solution; and only the driver has any idea how long a packet
of a given flavour will likely take to transmit.
                    - Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 18/24] sctp: Remove unnecessary OOM logging messages
From: Joe Perches @ 2011-08-30  1:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Vlad Yasevich, Sridhar Samudrala, David S. Miller, linux-sctp,
	netdev, linux-kernel
In-Reply-To: <1314654209.2563.7.camel@edumazet-laptop>

On Mon, 2011-08-29 at 23:43 +0200, Eric Dumazet wrote:
> Le lundi 29 août 2011 à 14:17 -0700, Joe Perches a écrit :
> > Removing unnecessary messages saves code and text.
> > Site specific OOM messages are duplications of a generic MM
> > out of memory message and aren't really useful, so just
> > delete them.
> > Signed-off-by: Joe Perches <joe@perches.com>
> > ---
> >  net/sctp/protocol.c |    3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> > diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
[]
> > @@ -1326,7 +1326,6 @@ SCTP_STATIC __init int sctp_init(void)
> >  			__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, order);
> >  	} while (!sctp_assoc_hashtable && --order > 0);
> >  	if (!sctp_assoc_hashtable) {
> > -		pr_err("Failed association hash alloc\n");
> >  		status = -ENOMEM;
> >  		goto err_ahash_alloc;
> >  	}
[]
> > @@ -1359,7 +1357,6 @@ SCTP_STATIC __init int sctp_init(void)
> >  			__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, order);
> >  	} while (!sctp_port_hashtable && --order > 0);
> >  	if (!sctp_port_hashtable) {
> > -		pr_err("Failed bind hash alloc\n");
> >  		status = -ENOMEM;
> >  		goto err_bhash_alloc;
> >  	}
> It would be nice if you could avoid all these patches, that you dont
> even read.

Didn't read is not the same thing as didn't notice.

> As I already told you in the past, __GFP_NOWARN dont print generic OOM
> messages.

I didn't notice those had GFP_NOWARN.

> Its not because I told Wang Shaoyan not adding a useless "pr_err("Out of
> memory\n");" in last gianfar patch, that you have to remove all
> messages, with one hundred or more patches.

> If I remember well, you even disagreed at that time.

No, what I said was that it'd be better to get agreement
to delete them before deleting them.

https://lkml.org/lkml/2011/8/9/379

So I submitted an RFC and cc'd you.
You did not reply.

https://lkml.org/lkml/2011/8/25/580

> Furthermore, a failed vmalloc() is not guaranteed to emit an OOM
> message, is it ?

Doesn't seem to be, perhaps it should be
when __GFP_NOWARN is not set...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox