Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 36/51] DMA-API: usb: use dma_set_coherent_mask()
From: Nicolas Ferre @ 2013-09-23 12:30 UTC (permalink / raw)
  To: Russell King, alsa-devel, b43-dev, devel, devicetree, dri-devel,
	e1000-devel, linux-arm-kernel, linux-crypto, linux-doc,
	linux-fbdev, linux-ide, linux-media, linux-mmc, linux-nvme,
	linux-omap, linuxppc-dev, linux-samsung-soc, linux-scsi,
	linux-tegra, linux-usb, linux-wireless, netdev,
	Solarflare linux maintainers, uclinux-dist-devel
  Cc: Kukjin Kim, Stephen Warren, Alexander Shishkin,
	Greg Kroah-Hartman, Felipe Balbi, Alan Stern
In-Reply-To: <E1VMmHX-0007jq-Cj@rmk-PC.arm.linux.org.uk>

On 20/09/2013 00:01, Russell King :
> The correct way for a driver to specify the coherent DMA mask is
> not to directly access the field in the struct device, but to use
> dma_set_coherent_mask().  Only arch and bus code should access this
> member directly.
>
> Convert all direct write accesses to using the correct API.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
>   drivers/usb/chipidea/ci_hdrc_imx.c |    5 +++--
>   drivers/usb/dwc3/dwc3-exynos.c     |    5 +++--
>   drivers/usb/gadget/lpc32xx_udc.c   |    4 +++-
>   drivers/usb/host/ehci-atmel.c      |    5 +++--

For Atmel driver:

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>

[..]

> diff --git a/drivers/usb/host/ehci-atmel.c b/drivers/usb/host/ehci-atmel.c
> index 3b645ff..5831a88 100644
> --- a/drivers/usb/host/ehci-atmel.c
> +++ b/drivers/usb/host/ehci-atmel.c
> @@ -92,8 +92,9 @@ static int ehci_atmel_drv_probe(struct platform_device *pdev)
>   	 */
>   	if (!pdev->dev.dma_mask)
>   		pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
> -	if (!pdev->dev.coherent_dma_mask)
> -		pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
> +	retval = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
> +	if (retval)
> +		goto fail_create_hcd;
>
>   	hcd = usb_create_hcd(driver, &pdev->dev, dev_name(&pdev->dev));
>   	if (!hcd) {

[..]

Thanks,
-- 
Nicolas Ferre

^ permalink raw reply

* [PATCH] ipvs: improved SH fallback strategy
From: Alexander Frolkin @ 2013-09-23 11:51 UTC (permalink / raw)
  To: Julian Anastasov, lvs-devel
  Cc: Wensong Zhang, Simon Horman, netdev, linux-kernel

Improve the SH fallback realserver selection strategy.

With sh and sh-fallback, if a realserver is down, this attempts to
distribute the traffic that would have gone to that server evenly
among the remaining servers.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
---
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index f16c027..1676354 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -120,22 +120,35 @@ static inline struct ip_vs_dest *
 ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
 		      const union nf_inet_addr *addr, __be16 port)
 {
-	unsigned int offset;
-	unsigned int hash;
+	unsigned int offset, roffset;
+	unsigned int hash, ihash;
 	struct ip_vs_dest *dest;
 
-	for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
-		hash = ip_vs_sh_hashkey(svc->af, addr, port, offset);
-		dest = rcu_dereference(s->buckets[hash].dest);
-		if (!dest)
-			break;
-		if (is_unavailable(dest))
-			IP_VS_DBG_BUF(6, "SH: selected unavailable server "
-				      "%s:%d (offset %d)",
+	ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0);
+	dest = rcu_dereference(s->buckets[ihash].dest);
+
+	if (!dest)
+		return NULL;
+
+	if (is_unavailable(dest)) {
+		IP_VS_DBG_BUF(6, "SH: selected unavailable server "
+		      "%s:%d, reselecting",
+		      IP_VS_DBG_ADDR(svc->af, &dest->addr),
+		      ntohs(dest->port));
+		for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
+			roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE;
+			hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset);
+			dest = rcu_dereference(s->buckets[hash].dest);
+			if (is_unavailable(dest))
+				IP_VS_DBG_BUF(6, "SH: selected unavailable "
+				      "server %s:%d (offset %d), reselecting",
 				      IP_VS_DBG_ADDR(svc->af, &dest->addr),
-				      ntohs(dest->port), offset);
-		else
-			return dest;
+				      ntohs(dest->port), roffset);
+			else
+				return dest;
+		}
+	} else {
+		return dest;
 	}
 
 	return NULL;


^ permalink raw reply related

* Re: [PATCH 43/51] DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
From: Russell King - ARM Linux @ 2013-09-23 11:37 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel, linux-doc, linux-mmc, linux-fbdev, linux-nvme,
	linux-ide, devel, linux-samsung-soc, linux-scsi, e1000-devel,
	b43-dev, linux-media, devicetree, dri-devel, linux-tegra,
	Dan Williams, linux-omap, linux-arm-kernel,
	Solarflare linux maintainers, netdev, linux-usb, linux-wireless,
	linux-crypto, uclinux-dist-devel, linuxppc-dev
In-Reply-To: <20130923102533.GI17188@intel.com>

On Mon, Sep 23, 2013 at 03:55:33PM +0530, Vinod Koul wrote:
> On Fri, Sep 20, 2013 at 12:15:39AM +0100, Russell King wrote:
> > register_platform_device_full() can setup the DMA mask provided the
> > appropriate member is set in struct platform_device_info.  So lets
> > make that be the case.  This avoids a direct reference to the DMA
> > masks by this driver.
> > 
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> Acked-by: Vinod Koul <vinod.koul@intel.com>
> 
> This also brings me question that should we force the driver to use the
> dma_set_mask_and_coherent() API or they have below flexiblity too?

There's two issues here:
1. dma_set_mask_and_coherent() will only work if dev->dma_mask points at
   some storage for the mask.  This needs to have .dma_mask in the
   platform_device_info initialised.

2. Yes, this driver should also be calling the appropriate DMA mask setting
   functions in addition to having the mask initialized at device creation
   time.

Here's a replacement patch, though maybe it would be better to roll all
the additions of dma_set_mask_and_coherent() in drivers/dma into one
patch?  In other words, combine the addition of this with these two
patches:

	dma: pl330: add dma_set_mask_and_coherent() call
	dma: pl08x: add dma_set_mask_and_coherent() call

8<=====
From: Russell King <rmk+kernel@arm.linux.org.uk>
Subject: [PATCH] DMA-API: dma: edma.c: no need to explicitly initialize DMA
 masks

register_platform_device_full() can setup the DMA mask provided the
appropriate member is set in struct platform_device_info.  So lets
make that be the case.  This avoids a direct reference to the DMA
masks by this driver.

While here, add the dma_set_mask_and_coherent() call which the DMA API
requires DMA-using drivers to call.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 drivers/dma/edma.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index ff50ff4..fd5e48c 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -631,6 +631,10 @@ static int edma_probe(struct platform_device *pdev)
 	struct edma_cc *ecc;
 	int ret;
 
+	ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+	if (ret)
+		return ret;
+
 	ecc = devm_kzalloc(&pdev->dev, sizeof(*ecc), GFP_KERNEL);
 	if (!ecc) {
 		dev_err(&pdev->dev, "Can't allocate controller\n");
@@ -702,11 +706,13 @@ static struct platform_device *pdev0, *pdev1;
 static const struct platform_device_info edma_dev_info0 = {
 	.name = "edma-dma-engine",
 	.id = 0,
+	.dma_mask = DMA_BIT_MASK(32),
 };
 
 static const struct platform_device_info edma_dev_info1 = {
 	.name = "edma-dma-engine",
 	.id = 1,
+	.dma_mask = DMA_BIT_MASK(32),
 };
 
 static int edma_init(void)
@@ -720,8 +726,6 @@ static int edma_init(void)
 			ret = PTR_ERR(pdev0);
 			goto out;
 		}
-		pdev0->dev.dma_mask = &pdev0->dev.coherent_dma_mask;
-		pdev0->dev.coherent_dma_mask = DMA_BIT_MASK(32);
 	}
 
 	if (EDMA_CTLRS == 2) {
@@ -731,8 +735,6 @@ static int edma_init(void)
 			platform_device_unregister(pdev0);
 			ret = PTR_ERR(pdev1);
 		}
-		pdev1->dev.dma_mask = &pdev1->dev.coherent_dma_mask;
-		pdev1->dev.coherent_dma_mask = DMA_BIT_MASK(32);
 	}
 
 out:
-- 
1.7.4.4

^ permalink raw reply related

* iSCSI support in Linux
From: Rayagond K @ 2013-09-23 10:54 UTC (permalink / raw)
  To: netdev

Hi All,

I am checking iSCSI support in Linux, during the search over internet
I got to know that iSCSI standard is implemented in Linux with kernel
version 2.6.20 and later. But I didn't understand one thing clearly
that is there any NIC offloading features related iSCSI ? if so, is
there any support in Linux  for such offloading features ? Any example
NIC driver in LXR with iSCSI implementation ?


Thanks
Rayagond.

^ permalink raw reply

* Re: [alsa-devel] [PATCH 24/51] DMA-API: dma: pl330: add dma_set_mask_and_coherent() call
From: Vinod Koul @ 2013-09-23 10:43 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: alsa-devel, linux-doc, linux-wireless, linux-fbdev, dri-devel,
	linux-ide, devel, linux-samsung-soc, linux-scsi, e1000-devel,
	b43-dev, linux-media, devicetree, linux-nvme, linux-tegra,
	Dan Williams, linux-omap, linux-arm-kernel,
	Solarflare linux maintainers, netdev, linux-usb, linux-mmc,
	linux-crypto, uclinux-dist-devel, linuxppc-dev
In-Reply-To: <20130921200000.GS25647@n2100.arm.linux.org.uk>

On Sat, Sep 21, 2013 at 09:00:00PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 20, 2013 at 07:26:27PM +0200, Heiko Stübner wrote:
> > Am Donnerstag, 19. September 2013, 23:49:01 schrieb Russell King:
> > > The DMA API requires drivers to call the appropriate dma_set_mask()
> > > functions before doing any DMA mapping.  Add this required call to
> > > the AMBA PL08x driver.
> > 			^--- copy and paste error - should of course be PL330
> 
> Fixed, thanks.
with fixed changelog...

Acked-by: Vinod Koul <vinod.koul@intel.com>

~Vinod

-- 

^ permalink raw reply

* Re: [alsa-devel] [PATCH 43/51] DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
From: Vinod Koul @ 2013-09-23 10:25 UTC (permalink / raw)
  To: Russell King
  Cc: alsa-devel, linux-doc, linux-mmc, linux-fbdev, linux-nvme,
	linux-ide, devel, linux-samsung-soc, linux-scsi, e1000-devel,
	b43-dev, linux-media, devicetree, dri-devel, linux-tegra,
	Dan Williams, linux-omap, linux-arm-kernel,
	Solarflare linux maintainers, netdev, linux-usb, linux-wireless,
	linux-crypto, uclinux-dist-devel, linuxppc-dev
In-Reply-To: <E1VMnRj-0007sg-1Z@rmk-PC.arm.linux.org.uk>

On Fri, Sep 20, 2013 at 12:15:39AM +0100, Russell King wrote:
> register_platform_device_full() can setup the DMA mask provided the
> appropriate member is set in struct platform_device_info.  So lets
> make that be the case.  This avoids a direct reference to the DMA
> masks by this driver.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Vinod Koul <vinod.koul@intel.com>

This also brings me question that should we force the driver to use the
dma_set_mask_and_coherent() API or they have below flexiblity too?

~Vinod

> ---
>  drivers/dma/edma.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
> index ff50ff4..7f9fe30 100644
> --- a/drivers/dma/edma.c
> +++ b/drivers/dma/edma.c
> @@ -702,11 +702,13 @@ static struct platform_device *pdev0, *pdev1;
>  static const struct platform_device_info edma_dev_info0 = {
>  	.name = "edma-dma-engine",
>  	.id = 0,
> +	.dma_mask = DMA_BIT_MASK(32),
>  };
>  
>  static const struct platform_device_info edma_dev_info1 = {
>  	.name = "edma-dma-engine",
>  	.id = 1,
> +	.dma_mask = DMA_BIT_MASK(32),
>  };


>  
>  static int edma_init(void)
> @@ -720,8 +722,6 @@ static int edma_init(void)
>  			ret = PTR_ERR(pdev0);
>  			goto out;
>  		}
> -		pdev0->dev.dma_mask = &pdev0->dev.coherent_dma_mask;
> -		pdev0->dev.coherent_dma_mask = DMA_BIT_MASK(32);
>  	}
>  
>  	if (EDMA_CTLRS == 2) {
> @@ -731,8 +731,6 @@ static int edma_init(void)
>  			platform_device_unregister(pdev0);
>  			ret = PTR_ERR(pdev1);
>  		}
> -		pdev1->dev.dma_mask = &pdev1->dev.coherent_dma_mask;
> -		pdev1->dev.coherent_dma_mask = DMA_BIT_MASK(32);
>  	}
>  
>  out:
> -- 
> 1.7.4.4
> 
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

-- 

^ permalink raw reply

* Re: [alsa-devel] [PATCH 23/51] DMA-API: dma: pl08x: add dma_set_mask_and_coherent() call
From: Vinod Koul @ 2013-09-23 10:12 UTC (permalink / raw)
  To: Russell King
  Cc: alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw,
	b43-dev-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-samsung-soc-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Solarflare linux maintainers,
	uclinux-dist-devel-ZG0+EudsQA8dtHy/vicBwGD2FQJk+8+b, Dan Williams
In-Reply-To: <E1VMm4v-0007hz-RC-eh5Bv4kxaXIANfyc6IWni62ZND6+EDdj@public.gmane.org>

On Thu, Sep 19, 2013 at 10:48:01PM +0100, Russell King wrote:
> The DMA API requires drivers to call the appropriate dma_set_mask()
> functions before doing any DMA mapping.  Add this required call to
> the AMBA PL08x driver.
> 
> Signed-off-by: Russell King <rmk+kernel-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
Acked-by: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

~Vinod
> ---
>  drivers/dma/amba-pl08x.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/dma/amba-pl08x.c b/drivers/dma/amba-pl08x.c
> index fce46c5..e51a983 100644
> --- a/drivers/dma/amba-pl08x.c
> +++ b/drivers/dma/amba-pl08x.c
> @@ -2055,6 +2055,11 @@ static int pl08x_probe(struct amba_device *adev, const struct amba_id *id)
>  	if (ret)
>  		return ret;
>  
> +	/* Ensure that we can do DMA */
> +	ret = dma_set_mask_and_coherent(&adev->dev, DMA_BIT_MASK(32));
> +	if (ret)
> +		goto out_no_pl08x;
> +
>  	/* Create the driver state holder */
>  	pl08x = kzalloc(sizeof(*pl08x), GFP_KERNEL);
>  	if (!pl08x) {
> -- 
> 1.7.4.4
> 
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw@public.gmane.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

-- 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] xfrm: Simplify SA looking up when using wildcard source address
From: Fan Du @ 2013-09-23  9:18 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev

I'm not quite sure I get this "wildcard source address" right,
IMHO if a host needs to protect every traffic for a given remote host,
then the source address is wildcard address, i.e. all ZEROs.
(Please correct me if I'm bloodly wrong。。。)

Here is the argument if above statement stands true:
__xfrm4/6_state_addr_check is a four steps check, all we need to do
is checking whether the destination address match. Passing saddr from
flow is worst option, as the checking needs to reach the fourth step.

So, simply this process by only checking destination address only when
using wildcard source address for looking up SAs.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 include/net/xfrm.h    |   31 +++++++++++++++++++++++++++++++
 net/xfrm/xfrm_state.c |    2 +-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e253bf0..fdb9343 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1282,6 +1282,37 @@ xfrm_state_addr_check(const struct xfrm_state *x,
 }
 
 static __inline__ int
+__xfrm4_state_daddr_check(const struct xfrm_state *x,
+                                const xfrm_address_t *daddr)
+{
+        return ((daddr->a4 == x->id.daddr.a4) ? 1 : 0);
+}
+
+static __inline__ int
+__xfrm6_state_daddr_check(const struct xfrm_state *x,
+                         const xfrm_address_t *daddr)
+{
+        if (ipv6_addr_equal((struct in6_addr *)daddr, (struct in6_addr *)&x->id.daddr))
+                return 1;
+        else 
+                return 0;
+}
+
+static __inline__ int
+xfrm_state_daddr_check(const struct xfrm_state *x,
+                      const xfrm_address_t *daddr,
+                      unsigned short family)
+{
+        switch (family) {
+        case AF_INET:
+                return __xfrm4_state_daddr_check(x, daddr);
+        case AF_INET6:
+                return __xfrm6_state_daddr_check(x, daddr);
+        }    
+        return 0;
+}
+
+static __inline__ int
 xfrm_state_addr_flow_check(const struct xfrm_state *x, const struct flowi *fl,
 			   unsigned short family)
 {
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index e1373d5..87c99da 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -824,7 +824,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		    x->props.reqid == tmpl->reqid &&
 		    (mark & x->mark.m) == x->mark.v &&
 		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
-		    xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
+		    xfrm_state_daddr_check(x, daddr, encap_family) &&
 		    tmpl->mode == x->props.mode &&
 		    tmpl->id.proto == x->id.proto &&
 		    (tmpl->id.spi == x->id.spi || !tmpl->id.spi))
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next] xfrm: Force SA to be lookup again if SA in acquire state
From: Fan Du @ 2013-09-23  9:18 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev

If SA is in the process of acquiring, which indicates this SA is more
promising and precise than the fall back option, i.e. using wild card
source address for searching less suitable SA.

So, here bail out, and try again.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/xfrm/xfrm_state.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b9c3f9e..e1373d5 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -815,7 +815,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			xfrm_state_look_at(pol, x, fl, encap_family,
 					   &best, &acquire_in_progress, &error);
 	}
-	if (best)
+	if (best || acquire_in_progress)
 		goto found;
 
 	h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family);
-- 
1.7.9.5

^ permalink raw reply related

* RE: [Xen-devel] [PATCH net-next v2 2/2] xen-netback: handle frontends that fail to transition through Closing
From: Paul Durrant @ 2013-09-23  8:59 UTC (permalink / raw)
  To: Bastian Blank
  Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20130920161431.GA19095@mail.waldi.eu.org>

> -----Original Message-----
> From: Bastian Blank [mailto:bastian@waldi.eu.org]
> Sent: 20 September 2013 17:15
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; xen-devel@lists.xen.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [Xen-devel] [PATCH net-next v2 2/2] xen-netback: handle
> frontends that fail to transition through Closing
> 
> On Fri, Sep 20, 2013 at 02:57:40PM +0100, Paul Durrant wrote:
> > Some old Windows frontends fail to transition through the xenbus Closing
> > state and move directly from Connected to Closed. Handle this case
> properly.
> 
> What happens in this case? Are there other state changes that will do
> unwanted things?
> 

Hmm, now you mention it I suspect a transition directly to an unknown state may also not do the right thing. Perhaps it would be better to go straight to a more robust state model as suggested by David Vrabel.

  Paul

^ permalink raw reply

* RE: [Xen-devel] [PATCH net-next v2 1/2] xen-netback: add a vif-is-connected flag
From: Paul Durrant @ 2013-09-23  8:51 UTC (permalink / raw)
  To: annie li
  Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <523E5C63.9080804@oracle.com>

> -----Original Message-----
> From: annie li [mailto:annie.li@oracle.com]
> Sent: 22 September 2013 03:57
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; xen-devel@lists.xen.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [Xen-devel] [PATCH net-next v2 1/2] xen-netback: add a vif-is-
> connected flag
> 
> 
> On 2013-9-20 21:57, Paul Durrant wrote:
> > Having applied my patch to separate vif disconnect and free, I ran into a
> > BUG when testing resume from S3 with a Windows frontend because the
> vif task
> > pointer was not cleared by xenvif_disconnect() and so a double call to this
> > function tries to stop the thread twice.
> Or it is better to do more implements in windows netfront? For example,
> when the windows vm hibernates, disconnect the vif as required by
> netback: connect-> closing-> closed.
> 

S3 != hibernation; that is S4. The backend does not go away when the VM goes into S3 as the domain remains intact. We do go through the correct closing->closed transition on the way down but, because of the way the D3->D0 code in the frontend needs to be generalized, we attempt a second closing->closed transition on the way back up. In the S4 case this ok because we have a fresh backend, but in the S3 case we don't and therefore hit the double-disconnect issue. The fact the backend BUGs in this case clearly shows a vulnerability in the backend and thus that is where the fix needs to be made; the frontend is doing nothing wrong.

  Paul

^ permalink raw reply

* Device tree node for Freescale Gianfar PTP reference clock source selection
From: Aida Mynzhasova @ 2013-09-23  7:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: netdev, devicetree, Richard Cochran, Claudiu Manoil

Hi,

Currently, Freescale Gianfar PTP reference clock source is determined 
through hard-coded value in gianfar_ptp driver. I don't think that 
recompilation of the entire module (or even worse - the kernel) is a god 
idea when we want to change one clock source to another. So, I want to 
add new device tree binding, which can be used as:

	ptp_clock@24E00 {
		compatible = "fsl,etsec-ptp";
		reg = <0x24E00 0xB0>;
		interrupts = <12 0x8 13 0x8>;
		interrupt-parent = < &ipic >;
		fsl,cksel = <0>; /* <-- New entry */
		fsl,tclk-period = <10>;
		fsl,tmr-prsc    = <100>;
		fsl,tmr-add     = <0x999999A4>;
		fsl,tmr-fiper1  = <0x3B9AC9F6>;
		fsl,tmr-fiper2  = <0x00018696>;
		fsl,max-adj     = <659999998>;
	};

fsl,cksel acceptable values:

<0> for external clock;
<1> for eTSEC system clock;
<2> for eTSEC1 transmit clock;
<3> for RTC clock input.

I am new in this mailing list, and as far as I know, I have to discuss 
all updates for device tree files here before sending patch, which uses 
new attributes.

Also, should I define new bindings in some special way? I want to add 
description of cksel attribute in 
/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt. Is it enough or 
not?

Thanks!

-- 
Regards,
Aida

^ permalink raw reply

* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
From: Michael S. Tsirkin @ 2013-09-23  7:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <5227F274.9040506@redhat.com>

On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
> On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
> > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
> >> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
> >> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
> >> later. This could be avoided by determining zerocopy once by checking all
> >> conditions at one time before.
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  drivers/vhost/net.c |   47 ++++++++++++++++++++---------------------------
> >>  1 files changed, 20 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 8a6dd0d..3f89dea 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
> >>  			       iov_length(nvq->hdr, s), hdr_size);
> >>  			break;
> >>  		}
> >> -		zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
> >> -				       nvq->upend_idx != nvq->done_idx);
> >> +
> >> +		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> >> +				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
> >> +				      nvq->done_idx
> > Thinking about this, this looks strange.
> > The original idea was that once we start doing zcopy, we keep
> > using the heads ring even for short packets until no zcopy is outstanding.
> 
> What's the reason for keep using the heads ring?

To keep completions in order.

> >
> > What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
> > here?
> 
> Because we initialize both upend_idx and done_idx to zero, so upend_idx
> != done_idx could not be used to check whether or not the heads ring
> were full.

But what does ring full have to do with zerocopy use?

> >> +				   && vhost_net_tx_select_zcopy(net);
> >>  
> >>  		/* use msg_control to pass vhost zerocopy ubuf info to skb */
> >>  		if (zcopy_used) {
> >> +			struct ubuf_info *ubuf;
> >> +			ubuf = nvq->ubuf_info + nvq->upend_idx;
> >> +
> >>  			vq->heads[nvq->upend_idx].id = head;
> >> -			if (!vhost_net_tx_select_zcopy(net) ||
> >> -			    len < VHOST_GOODCOPY_LEN) {
> >> -				/* copy don't need to wait for DMA done */
> >> -				vq->heads[nvq->upend_idx].len =
> >> -							VHOST_DMA_DONE_LEN;
> >> -				msg.msg_control = NULL;
> >> -				msg.msg_controllen = 0;
> >> -				ubufs = NULL;
> >> -			} else {
> >> -				struct ubuf_info *ubuf;
> >> -				ubuf = nvq->ubuf_info + nvq->upend_idx;
> >> -
> >> -				vq->heads[nvq->upend_idx].len =
> >> -					VHOST_DMA_IN_PROGRESS;
> >> -				ubuf->callback = vhost_zerocopy_callback;
> >> -				ubuf->ctx = nvq->ubufs;
> >> -				ubuf->desc = nvq->upend_idx;
> >> -				msg.msg_control = ubuf;
> >> -				msg.msg_controllen = sizeof(ubuf);
> >> -				ubufs = nvq->ubufs;
> >> -				kref_get(&ubufs->kref);
> >> -			}
> >> +			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
> >> +			ubuf->callback = vhost_zerocopy_callback;
> >> +			ubuf->ctx = nvq->ubufs;
> >> +			ubuf->desc = nvq->upend_idx;
> >> +			msg.msg_control = ubuf;
> >> +			msg.msg_controllen = sizeof(ubuf);
> >> +			ubufs = nvq->ubufs;
> >> +			kref_get(&ubufs->kref);
> >>  			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
> >> -		} else
> >> +		} else {
> >>  			msg.msg_control = NULL;
> >> +			ubufs = NULL;
> >> +		}
> >>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
> >>  		err = sock->ops->sendmsg(NULL, sock, &msg, len);
> >>  		if (unlikely(err < 0)) {
> >>  			if (zcopy_used) {
> >> -				if (ubufs)
> >> -					vhost_net_ubuf_put(ubufs);
> >> +				vhost_net_ubuf_put(ubufs);
> >>  				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> >>  					% UIO_MAXIOV;
> >>  			}
> >> -- 
> >> 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net 0/6] bnx2x: Bug fixes patch series
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong

Hi Dave,

This patch contains various bug fixes, half of which are SR-IOV related
(some fixing issues in the recently added VF RSS support), while the other fix
a wide assortments of issues in the driver.

Please consider applying these patches to `net'.

Thanks,
Yuval Mintz

^ permalink raw reply

* [PATCH net 1/6] bnx2x: Prevent mistaken hangup between driver & FW
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Eilon Greenstein <eilong@broadcom.com>

When system CPU is stressed it's possible that the driver will not be able
to pulse the FW every second, which will cause the log to be filled with
error messages.

Increasing the threshold to 5 seconds seems to be enough to eliminate the
issue.

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index a6704b5..f403c6b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -5447,26 +5447,24 @@ static void bnx2x_timer(unsigned long data)
 	if (IS_PF(bp) &&
 	    !BP_NOMCP(bp)) {
 		int mb_idx = BP_FW_MB_IDX(bp);
-		u32 drv_pulse;
-		u32 mcp_pulse;
+		u16 drv_pulse;
+		u16 mcp_pulse;
 
 		++bp->fw_drv_pulse_wr_seq;
 		bp->fw_drv_pulse_wr_seq &= DRV_PULSE_SEQ_MASK;
-		/* TBD - add SYSTEM_TIME */
 		drv_pulse = bp->fw_drv_pulse_wr_seq;
 		bnx2x_drv_pulse(bp);
 
 		mcp_pulse = (SHMEM_RD(bp, func_mb[mb_idx].mcp_pulse_mb) &
 			     MCP_PULSE_SEQ_MASK);
 		/* The delta between driver pulse and mcp response
-		 * should be 1 (before mcp response) or 0 (after mcp response)
+		 * should not get too big. If the MFW is more than 5 pulses
+		 * behind, we should worry about it enough to generate an error
+		 * log.
 		 */
-		if ((drv_pulse != mcp_pulse) &&
-		    (drv_pulse != ((mcp_pulse + 1) & MCP_PULSE_SEQ_MASK))) {
-			/* someone lost a heartbeat... */
-			BNX2X_ERR("drv_pulse (0x%x) != mcp_pulse (0x%x)\n",
+		if (((drv_pulse - mcp_pulse) & MCP_PULSE_SEQ_MASK) > 5)
+			BNX2X_ERR("MFW seems hanged: drv_pulse (0x%x) != mcp_pulse (0x%x)\n",
 				  drv_pulse, mcp_pulse);
-		}
 	}
 
 	if (bp->state == BNX2X_STATE_OPEN)
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 4/6] bnx2x: prevent masking error from cnic
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

During error flows while loading cnic the return value was incorrectly replaced
by that of bnx2x_set_real_num_queues(); If that function was to finish
successfully then the cnic would have mistakenly thought the load ended
successfully, causing issues (& panics) later on.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 61726af..e66beff 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2481,8 +2481,7 @@ load_error_cnic2:
 load_error_cnic1:
 	bnx2x_napi_disable_cnic(bp);
 	/* Update the number of queues without the cnic queues */
-	rc = bnx2x_set_real_num_queues(bp, 0);
-	if (rc)
+	if (bnx2x_set_real_num_queues(bp, 0))
 		BNX2X_ERR("Unable to set real_num_queues not including cnic\n");
 load_error_cnic0:
 	BNX2X_ERR("CNIC-related load failed\n");
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 6/6] bnx2x: handle known but unsupported VF messages
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Ariel Elior <ariele@broadcom.com>

Commit b9871bcf "bnx2x: VF RSS support - PF side" has deprecated one of
the previous existing messages. If an old VF driver were to send this message
to the PF then the PF will not reply and leave the mailbox in an unsteady
state (and cause a timeout on the VF side).

Wait until firmware ack is written before unlocking channel

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 50 ++++++++++++------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 6cfb887..da16953 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -1765,28 +1765,28 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		switch (mbx->first_tlv.tl.type) {
 		case CHANNEL_TLV_ACQUIRE:
 			bnx2x_vf_mbx_acquire(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_INIT:
 			bnx2x_vf_mbx_init_vf(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_SETUP_Q:
 			bnx2x_vf_mbx_setup_q(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_SET_Q_FILTERS:
 			bnx2x_vf_mbx_set_q_filters(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_TEARDOWN_Q:
 			bnx2x_vf_mbx_teardown_q(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_CLOSE:
 			bnx2x_vf_mbx_close_vf(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_RELEASE:
 			bnx2x_vf_mbx_release_vf(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_UPDATE_RSS:
 			bnx2x_vf_mbx_update_rss(bp, vf, mbx);
-			break;
+			return;
 		}
 
 	} else {
@@ -1802,26 +1802,24 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		for (i = 0; i < 20; i++)
 			DP_CONT(BNX2X_MSG_IOV, "%x ",
 				mbx->msg->req.tlv_buf_size.tlv_buffer[i]);
+	}
 
-		/* test whether we can respond to the VF (do we have an address
-		 * for it?)
-		 */
-		if (vf->state == VF_ACQUIRED || vf->state == VF_ENABLED) {
-			/* mbx_resp uses the op_rc of the VF */
-			vf->op_rc = PFVF_STATUS_NOT_SUPPORTED;
+	/* can we respond to VF (do we have an address for it?) */
+	if (vf->state == VF_ACQUIRED || vf->state == VF_ENABLED) {
+		/* mbx_resp uses the op_rc of the VF */
+		vf->op_rc = PFVF_STATUS_NOT_SUPPORTED;
 
-			/* notify the VF that we do not support this request */
-			bnx2x_vf_mbx_resp(bp, vf);
-		} else {
-			/* can't send a response since this VF is unknown to us
-			 * just ack the FW to release the mailbox and unlock
-			 * the channel.
-			 */
-			storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
-			mmiowb();
-			bnx2x_unlock_vf_pf_channel(bp, vf,
-						   mbx->first_tlv.tl.type);
-		}
+		/* notify the VF that we do not support this request */
+		bnx2x_vf_mbx_resp(bp, vf);
+	} else {
+		/* can't send a response since this VF is unknown to us
+		 * just ack the FW to release the mailbox and unlock
+		 * the channel.
+		 */
+		storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
+		/* Firmware ack should be written before unlocking channel */
+		mmiowb();
+		bnx2x_unlock_vf_pf_channel(bp, vf, mbx->first_tlv.tl.type);
 	}
 }
 
-- 
1.8.1.227.g44fe835

^ permalink raw reply related

* [PATCH net 5/6] bnx2x: prevent masked MCP parities from appearing
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

During flows which mask block attentions (e.g., register dump) all parities
are masked. However, unlike other blocks the MCP's attention is not masked
inside the block but rather the indication to the driver. If another attention
(e.g., link change) will occour while there's an MCP parity, the driver will
ignore the fact that the parity is masked and erroneously report a parity.

This patch forces the driver to read the MCP masking while checking for
parities.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index f403c6b..82b658d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -4703,6 +4703,14 @@ bool bnx2x_chk_parity_attn(struct bnx2x *bp, bool *global, bool print)
 	attn.sig[3] = REG_RD(bp,
 		MISC_REG_AEU_AFTER_INVERT_4_FUNC_0 +
 			     port*4);
+	/* Since MCP attentions can't be disabled inside the block, we need to
+	 * read AEU registers to see whether they're currently disabled
+	 */
+	attn.sig[3] &= ((REG_RD(bp,
+				!port ? MISC_REG_AEU_ENABLE4_FUNC_0_OUT_0
+				      : MISC_REG_AEU_ENABLE4_FUNC_1_OUT_0) &
+			 MISC_AEU_ENABLE_MCP_PRTY_BITS) |
+			~MISC_AEU_ENABLE_MCP_PRTY_BITS);
 
 	if (!CHIP_IS_E1x(bp))
 		attn.sig[4] = REG_RD(bp,
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 2/6] bnx2x: Fix support for VFs on some PFs
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Ariel Elior <ariele@broadcom.com>

Due to incorrect usage of PF macros when reading information relating to
interrupts, some PFs were erroneously unable to support VFs.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index 2604b62..d9370d4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -1819,7 +1819,7 @@ bnx2x_get_vf_igu_cam_info(struct bnx2x *bp)
 		fid = GET_FIELD((val), IGU_REG_MAPPING_MEMORY_FID);
 		if (fid & IGU_FID_ENCODE_IS_PF)
 			current_pf = fid & IGU_FID_PF_NUM_MASK;
-		else if (current_pf == BP_ABS_FUNC(bp))
+		else if (current_pf == BP_FUNC(bp))
 			bnx2x_vf_set_igu_info(bp, sb_id,
 					      (fid & IGU_FID_VF_NUM_MASK));
 		DP(BNX2X_MSG_IOV, "%s[%d], igu_sb_id=%d, msix=%d\n",
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 3/6] bnx2x: add missing VF resource allocation during init
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Ariel Elior <ariele@broadcom.com>

bnx2x_iov_static_resc() should be called after IGU was read for information on
the number of available VFs, so that resources will be correctly set.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index d9370d4..9ad012b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -3180,6 +3180,7 @@ int bnx2x_enable_sriov(struct bnx2x *bp)
 		/* set local queue arrays */
 		vf->vfqs = &bp->vfdb->vfqs[qcount];
 		qcount += vf_sb_count(vf);
+		bnx2x_iov_static_resc(bp, vf);
 	}
 
 	/* prepare msix vectors in VF configuration space */
@@ -3187,6 +3188,8 @@ int bnx2x_enable_sriov(struct bnx2x *bp)
 		bnx2x_pretend_func(bp, HW_VF_HANDLE(bp, vf_idx));
 		REG_WR(bp, PCICFG_OFFSET + GRC_CONFIG_REG_VF_MSIX_CONTROL,
 		       num_vf_queues);
+		DP(BNX2X_MSG_IOV, "set msix vec num in VF %d cfg space to %d\n",
+		   vf_idx, num_vf_queues);
 	}
 	bnx2x_pretend_func(bp, BP_ABS_FUNC(bp));
 
-- 
1.8.1.4

^ permalink raw reply related

* Re: Bug - regression - Via velocity interface coming up freezes kernel
From: Dirk Kraft @ 2013-09-23  7:05 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Julia Lawall
In-Reply-To: <CAFES+iKYWTn-DZRCnWer56rWeGh0t86GAx66OgZ7Jnvm-9fo9w@mail.gmail.com>

On Mon, Sep 23, 2013 at 8:29 AM, Dirk Kraft <dirk.kraft@gmail.com> wrote:
[...]
> By applying the below patch to 3.11-rc1 the problem is gone.

Uups, I meant 3.12-rc1. Sorry.

^ permalink raw reply

* [PATCH net-next] ipv6: Not need to set fl6.flowi6_flags as zero
From: roy.qing.li @ 2013-09-23  6:55 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

setting fl6.flowi6_flags as zero after memset is redundant, Remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 net/ipv6/route.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c979dd9..c6b2e1c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1137,7 +1137,6 @@ void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
 	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
 	fl6.daddr = iph->daddr;
 	fl6.saddr = iph->saddr;
 	fl6.flowlabel = ip6_flowinfo(iph);
@@ -1236,7 +1235,6 @@ void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark)
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
 	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
 	fl6.daddr = iph->daddr;
 	fl6.saddr = iph->saddr;
 	fl6.flowlabel = ip6_flowinfo(iph);
@@ -1258,7 +1256,6 @@ void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
 	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
 	fl6.daddr = msg->dest;
 	fl6.saddr = iph->daddr;
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next] net ipv4: Convert ipv4.ip_local_port_range to be per netns
From: Eric W. Biederman @ 2013-09-23  6:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net.
- Manually expand inet_get_local_range into ipv4_local_port_range
  because I do not know the struct net.
- Move the initialization of sysctl_local_ports into
  sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

Originally-by: Samya <samya@twitter.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/infiniband/core/cma.c   |    2 +-
 drivers/net/vxlan.c             |    2 +-
 include/net/ip.h                |    7 +----
 include/net/netns/ipv4.h        |    6 +++++
 net/ipv4/inet_connection_sock.c |   20 +++++---------
 net/ipv4/inet_hashtables.c      |    2 +-
 net/ipv4/ping.c                 |    4 +--
 net/ipv4/sysctl_net_ipv4.c      |   57 ++++++++++++++++++++++++++-------------
 net/ipv4/udp.c                  |    2 +-
 net/sctp/socket.c               |    2 +-
 security/selinux/hooks.c        |    3 ++-
 11 files changed, 61 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7c0f953..9627545 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2302,7 +2302,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	int low, high, remaining;
 	unsigned int rover;
 
-	inet_get_local_port_range(&low, &high);
+	inet_get_local_port_range(&init_net, &low, &high);
 	remaining = (high - low) + 1;
 	rover = net_random() % remaining + low;
 retry:
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 767f7af..a105376 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1501,7 +1501,7 @@ static void vxlan_setup(struct net_device *dev)
 	vxlan->age_timer.function = vxlan_cleanup;
 	vxlan->age_timer.data = (unsigned long) vxlan;
 
-	inet_get_local_port_range(&low, &high);
+	inet_get_local_port_range(dev_net(net), &low, &high);
 	vxlan->port_min = low;
 	vxlan->port_max = high;
 	vxlan->dst_port = htons(vxlan_port);
diff --git a/include/net/ip.h b/include/net/ip.h
index a68f838..5e46435 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -195,12 +195,7 @@ static inline u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp
 #endif
 extern int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align);
 extern void snmp_mib_free(void __percpu *ptr[2]);
-
-extern struct local_ports {
-	seqlock_t	lock;
-	int		range[2];
-} sysctl_local_ports;
-extern void inet_get_local_port_range(int *low, int *high);
+extern void inet_get_local_port_range(struct net *net, int *low, int *high);
 
 extern unsigned long *sysctl_local_reserved_ports;
 static inline int inet_is_reserved_local_port(int port)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 2ba9de8..d685e50 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -15,6 +15,10 @@ struct fib_rules_ops;
 struct hlist_head;
 struct fib_table;
 struct sock;
+struct local_ports {
+	seqlock_t	lock;
+	int		range[2];
+};
 
 struct netns_ipv4 {
 #ifdef CONFIG_SYSCTL
@@ -62,6 +66,8 @@ struct netns_ipv4 {
 	int sysctl_icmp_ratemask;
 	int sysctl_icmp_errors_use_inbound_ifaddr;
 
+	struct local_ports sysctl_local_ports;
+
 	int sysctl_tcp_ecn;
 
 	kgid_t sysctl_ping_group_range[2];
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 6acb541..7ac7aa1 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -29,27 +29,19 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
 EXPORT_SYMBOL(inet_csk_timer_bug_msg);
 #endif
 
-/*
- * This struct holds the first and last local port number.
- */
-struct local_ports sysctl_local_ports __read_mostly = {
-	.lock = __SEQLOCK_UNLOCKED(sysctl_local_ports.lock),
-	.range = { 32768, 61000 },
-};
-
 unsigned long *sysctl_local_reserved_ports;
 EXPORT_SYMBOL(sysctl_local_reserved_ports);
 
-void inet_get_local_port_range(int *low, int *high)
+void inet_get_local_port_range(struct net *net, int *low, int *high)
 {
 	unsigned int seq;
 
 	do {
-		seq = read_seqbegin(&sysctl_local_ports.lock);
+		seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock);
 
-		*low = sysctl_local_ports.range[0];
-		*high = sysctl_local_ports.range[1];
-	} while (read_seqretry(&sysctl_local_ports.lock, seq));
+		*low = net->ipv4.sysctl_local_ports.range[0];
+		*high = net->ipv4.sysctl_local_ports.range[1];
+	} while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq));
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
@@ -116,7 +108,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		int remaining, rover, low, high;
 
 again:
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(net, &low, &high);
 		remaining = (high - low) + 1;
 		smallest_rover = rover = net_random() % remaining + low;
 
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 7bd8983..2779037 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -494,7 +494,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		u32 offset = hint + port_offset;
 		struct inet_timewait_sock *tw = NULL;
 
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(net, &low, &high);
 		remaining = (high - low) + 1;
 
 		local_bh_disable();
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 746427c..d71ecc4 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -237,11 +237,11 @@ static void inet_get_ping_group_range_net(struct net *net, kgid_t *low,
 	unsigned int seq;
 
 	do {
-		seq = read_seqbegin(&sysctl_local_ports.lock);
+		seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock);
 
 		*low = data[0];
 		*high = data[1];
-	} while (read_seqretry(&sysctl_local_ports.lock, seq));
+	} while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq));
 }
 
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 610e324..b91f963 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -42,12 +42,12 @@ static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 
 /* Update system visible IP port range */
-static void set_local_port_range(int range[2])
+static void set_local_port_range(struct local_ports *ports, int range[2])
 {
-	write_seqlock(&sysctl_local_ports.lock);
-	sysctl_local_ports.range[0] = range[0];
-	sysctl_local_ports.range[1] = range[1];
-	write_sequnlock(&sysctl_local_ports.lock);
+	write_seqlock(&ports->lock);
+	ports->range[0] = range[0];
+	ports->range[1] = range[1];
+	write_sequnlock(&ports->lock);
 }
 
 /* Validate changes from /proc interface. */
@@ -55,6 +55,9 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
 				 void __user *buffer,
 				 size_t *lenp, loff_t *ppos)
 {
+	struct local_ports *ports =
+		container_of(table->data, struct local_ports, range);
+	unsigned int seq;
 	int ret;
 	int range[2];
 	struct ctl_table tmp = {
@@ -65,14 +68,19 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
 		.extra2 = &ip_local_port_range_max,
 	};
 
-	inet_get_local_port_range(range, range + 1);
+	do {
+		seq = read_seqbegin(&ports->lock);
+		range[0] = ports->range[0];
+		range[1] = ports->range[1];
+	} while (read_seqretry(&ports->lock, seq));
+
 	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
 
 	if (write && ret == 0) {
 		if (range[1] < range[0])
 			ret = -EINVAL;
 		else
-			set_local_port_range(range);
+			set_local_port_range(ports, range);
 	}
 
 	return ret;
@@ -82,23 +90,27 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
 static void inet_get_ping_group_range_table(struct ctl_table *table, kgid_t *low, kgid_t *high)
 {
 	kgid_t *data = table->data;
+        struct netns_ipv4 *ipv4 =
+		container_of(table->data, struct netns_ipv4, sysctl_ping_group_range);
 	unsigned int seq;
 	do {
-		seq = read_seqbegin(&sysctl_local_ports.lock);
+		seq = read_seqbegin(&ipv4->sysctl_local_ports.lock);
 
 		*low = data[0];
 		*high = data[1];
-	} while (read_seqretry(&sysctl_local_ports.lock, seq));
+	} while (read_seqretry(&ipv4->sysctl_local_ports.lock, seq));
 }
 
 /* Update system visible IP port range */
 static void set_ping_group_range(struct ctl_table *table, kgid_t low, kgid_t high)
 {
 	kgid_t *data = table->data;
-	write_seqlock(&sysctl_local_ports.lock);
+        struct netns_ipv4 *ipv4 =
+		container_of(table->data, struct netns_ipv4, sysctl_ping_group_range);
+	write_seqlock(&ipv4->sysctl_local_ports.lock);
 	data[0] = low;
 	data[1] = high;
-	write_sequnlock(&sysctl_local_ports.lock);
+	write_sequnlock(&ipv4->sysctl_local_ports.lock);
 }
 
 /* Validate changes from /proc interface. */
@@ -474,13 +486,6 @@ static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "ip_local_port_range",
-		.data		= &sysctl_local_ports.range,
-		.maxlen		= sizeof(sysctl_local_ports.range),
-		.mode		= 0644,
-		.proc_handler	= ipv4_local_port_range,
-	},
-	{
 		.procname	= "ip_local_reserved_ports",
 		.data		= NULL, /* initialized in sysctl_ipv4_init */
 		.maxlen		= 65536,
@@ -837,6 +842,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
+		.procname	= "ip_local_port_range",
+		.maxlen		= sizeof(init_net.ipv4.sysctl_local_ports.range),
+		.data		= &init_net.ipv4.sysctl_local_ports.range,
+		.mode		= 0644,
+		.proc_handler	= ipv4_local_port_range,
+	},
+	{
 		.procname	= "tcp_mem",
 		.maxlen		= sizeof(init_net.ipv4.sysctl_tcp_mem),
 		.mode		= 0644,
@@ -871,6 +883,8 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 			&net->ipv4.sysctl_ping_group_range;
 		table[7].data =
 			&net->ipv4.sysctl_tcp_ecn;
+		table[8].data =
+			&net->ipv4.sysctl_local_ports.range;
 
 		/* Don't export sysctls to unprivileged users */
 		if (net->user_ns != &init_user_ns)
@@ -884,6 +898,13 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 	net->ipv4.sysctl_ping_group_range[0] = make_kgid(&init_user_ns, 1);
 	net->ipv4.sysctl_ping_group_range[1] = make_kgid(&init_user_ns, 0);
 
+	/*
+	 * Set defaults for local port range
+	 */
+	seqlock_init(&net->ipv4.sysctl_local_ports.lock);
+	net->ipv4.sysctl_local_ports.range[0] =  32768;
+	net->ipv4.sysctl_local_ports.range[1] =  61000;
+
 	tcp_init_mem(net);
 
 	net->ipv4.ipv4_hdr = register_net_sysctl(net, "net/ipv4", table);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 766e6ba..d0c3529 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -219,7 +219,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		unsigned short first, last;
 		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
 
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(net, &low, &high);
 		remaining = (high - low) + 1;
 
 		rand = net_random();
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c6670d2..09f46fb 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5893,7 +5893,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
 		int low, high, remaining, index;
 		unsigned int rover;
 
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(sock_net(sk), &low, &high);
 		remaining = (high - low) + 1;
 		rover = net_random() % remaining + low;
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c956390..558d0d9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3898,6 +3898,7 @@ static int selinux_socket_post_create(struct socket *sock, int family,
 static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, int addrlen)
 {
 	struct sock *sk = sock->sk;
+	struct net *net = sock_net(sk);
 	u16 family;
 	int err;
 
@@ -3934,7 +3935,7 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 		if (snum) {
 			int low, high;
 
-			inet_get_local_port_range(&low, &high);
+			inet_get_local_port_range(net, &low, &high);
 
 			if (snum < max(PROT_SOCK, low) || snum > high) {
 				err = sel_netport_sid(sk->sk_protocol,
-- 
1.7.10.4

^ permalink raw reply related

* Re: Bug - regression - Via velocity interface coming up freezes kernel
From: Dirk Kraft @ 2013-09-23  6:29 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Julia Lawall
In-Reply-To: <20130922221109.GA14246@electric-eye.fr.zoreil.com>

Hi

On Mon, Sep 23, 2013 at 12:11 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
[...]
> You can try this one as a wild guess before I have more time to analyze.

By applying the below patch to 3.11-rc1 the problem is gone.

I was only able to do a short test. Could not say if this has any side effects.

Thanks,
Dirk

>
> diff --git a/drivers/net/ethernet/via/via-velocity.c b/drivers/net/ethernet/via/via-velocity.c
> index d022bf9..64c42be 100644
> --- a/drivers/net/ethernet/via/via-velocity.c
> +++ b/drivers/net/ethernet/via/via-velocity.c
> @@ -2172,16 +2172,13 @@ static int velocity_poll(struct napi_struct *napi, int budget)
>         unsigned int rx_done;
>         unsigned long flags;
>
> -       spin_lock_irqsave(&vptr->lock, flags);
>         /*
>          * Do rx and tx twice for performance (taken from the VIA
>          * out-of-tree driver).
>          */
> -       rx_done = velocity_rx_srv(vptr, budget / 2);
> -       velocity_tx_srv(vptr);
> -       rx_done += velocity_rx_srv(vptr, budget - rx_done);
> +       rx_done = velocity_rx_srv(vptr, budget);
> +       spin_lock_irqsave(&vptr->lock, flags);
>         velocity_tx_srv(vptr);
> -
>         /* If budget not fully consumed, exit the polling mode */
>         if (rx_done < budget) {
>                 napi_complete(napi);

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: convert to GRO API and advertise this feature
From: annie li @ 2013-09-23  6:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Anirban Chakraborty, Wei Liu, <netdev@vger.kernel.org>,
	Ian Campbell, <xen-devel@lists.xen.org>
In-Reply-To: <523FCB4D.30801@redhat.com>


On 2013-9-23 13:02, Jason Wang wrote:
> On 09/23/2013 07:04 AM, Anirban Chakraborty wrote:
>> On Sep 22, 2013, at 5:09 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>>
>>> On Sun, Sep 22, 2013 at 02:29:15PM +0800, Jason Wang wrote:
>>>> On 09/22/2013 12:05 AM, Wei Liu wrote:
>>>>> Anirban was seeing netfront received MTU size packets, which downgraded
>>>>> throughput. The following patch makes netfront use GRO API which
>>>>> improves throughput for that case.
>>>>>
>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>> Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
>>>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>>> Maybe a dumb question: doesn't Xen depends on the driver of host card to
>>>> do GRO and pass it to netfront? What the case that netfront can receive
>>> The would be the ideal situation. Netback pushes large packets to
>>> netfront and netfront sees large packets.
>>>
>>>> a MTU size packet, for a card that does not support GRO in host? Doing
>>> However Anirban saw the case when backend interface receives large
>>> packets but netfront sees MTU size packets, so my thought is there is
>>> certain configuration that leads to this issue. As we cannot tell
>>> users what to enable and what not to enable so I would like to solve
>>> this within our driver.
>>>
>>>> GRO twice may introduce extra overheads.
>>>>
>>> AIUI if the packet that frontend sees is large already then the GRO path
>>> is quite short which will not introduce heavy penalty, while on the
>>> other hand if packet is segmented doing GRO improves throughput.
>>>
>> Thanks Wei, for explaining and submitting the patch. I would like add following to what you have already mentioned.
>> In my configuration, I was seeing netback was pushing large packets to the guest (Centos 6.4) but the netfront was receiving MTU sized packets. With this patch on, I do see large packets received on the guest interface. As a result there was substantial throughput improvement in the guest side (2.8 Gbps to 3.8 Gbps). Also, note that the host NIC driver was enabled for GRO already.
>>
>> -Anirban
> In this case, even if you still want to do GRO. It's better to find the
> root cause of why the GSO packet were segmented

Totally agree, we need to find the cause why large packets is segmented 
only in different host case.

> (maybe GSO were not
> enabled for netback?), since it introduces extra overheads.

 From Anirban's feedback, large packets can be seen on vif interface, 
and even on guests running on the same host.

Thanks
Annie

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox