Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 3/3] vhost: fix the memory leak which will happen when memory_access_ok fails
From: Michael S. Tsirkin @ 2010-05-27 10:49 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527190356.cbf2aac7.yoshikawa.takuya@oss.ntt.co.jp>

On Thu, May 27, 2010 at 07:03:56PM +0900, Takuya Yoshikawa wrote:
> We need to free newmem when vhost_set_memory() fails to complete.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---

Thanks, applied.

>  drivers/vhost/vhost.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9633a3c..1241a22 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -337,8 +337,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -EFAULT;
>  	}
>  
> -	if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
> +	if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL))) {
> +		kfree(newmem);
>  		return -EFAULT;
> +	}
>  	oldmem = d->memory;
>  	rcu_assign_pointer(d->memory, newmem);
>  	synchronize_rcu();
> -- 
> 1.7.0.4

^ permalink raw reply

* Re: [PATCH 2/3] vhost-net: fix to check the return value of copy_to/from_user() correctly
From: Michael S. Tsirkin @ 2010-05-27 10:48 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527190158.4587a2db.yoshikawa.takuya@oss.ntt.co.jp>

On Thu, May 27, 2010 at 07:01:58PM +0900, Takuya Yoshikawa wrote:
> copy_to/from_user() returns the number of bytes that could not be copied.
> 
> So we need to check if it is not zero, and in that case, we should return
> the error number -EFAULT rather than directly return the return value from
> copy_to/from_user().
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

Thanks, applied.

> ---
>  drivers/vhost/net.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index aa88911..0f41c91 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -593,17 +593,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>  	int r;
>  	switch (ioctl) {
>  	case VHOST_NET_SET_BACKEND:
> -		r = copy_from_user(&backend, argp, sizeof backend);
> -		if (r < 0)
> -			return r;
> +		if (copy_from_user(&backend, argp, sizeof backend))
> +			return -EFAULT;
>  		return vhost_net_set_backend(n, backend.index, backend.fd);
>  	case VHOST_GET_FEATURES:
>  		features = VHOST_FEATURES;
> -		return copy_to_user(featurep, &features, sizeof features);
> +		if (copy_to_user(featurep, &features, sizeof features))
> +			return -EFAULT;
> +		return 0;
>  	case VHOST_SET_FEATURES:
> -		r = copy_from_user(&features, featurep, sizeof features);
> -		if (r < 0)
> -			return r;
> +		if (copy_from_user(&features, featurep, sizeof features))
> +			return -EFAULT;
>  		if (features & ~VHOST_FEATURES)
>  			return -EOPNOTSUPP;
>  		return vhost_net_set_features(n, features);
> -- 
> 1.7.0.4

^ permalink raw reply

* Re: [PATCH 1/3] vhost: fix to check the return value of copy_to/from_user() correctly
From: Michael S. Tsirkin @ 2010-05-27 10:48 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527185803.0c0213a1.yoshikawa.takuya@oss.ntt.co.jp>

On Thu, May 27, 2010 at 06:58:03PM +0900, Takuya Yoshikawa wrote:
> copy_to/from_user() returns the number of bytes that could not be copied.
> 
> So we need to check if it is not zero, and in that case, we should return
> the error number -EFAULT rather than directly return the return value from
> copy_to/from_user().
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

Thanks, applied.

> ---
>  drivers/vhost/vhost.c |   51 ++++++++++++++++++++++++++----------------------
>  1 files changed, 28 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 5c9c657..9633a3c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -320,10 +320,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  {
>  	struct vhost_memory mem, *newmem, *oldmem;
>  	unsigned long size = offsetof(struct vhost_memory, regions);
> -	long r;
> -	r = copy_from_user(&mem, m, size);
> -	if (r)
> -		return r;
> +	if (copy_from_user(&mem, m, size))
> +		return -EFAULT;
>  	if (mem.padding)
>  		return -EOPNOTSUPP;
>  	if (mem.nregions > VHOST_MEMORY_MAX_NREGIONS)
> @@ -333,11 +331,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -ENOMEM;
>  
>  	memcpy(newmem, &mem, size);
> -	r = copy_from_user(newmem->regions, m->regions,
> -			   mem.nregions * sizeof *m->regions);
> -	if (r) {
> +	if (copy_from_user(newmem->regions, m->regions,
> +			   mem.nregions * sizeof *m->regions)) {
>  		kfree(newmem);
> -		return r;
> +		return -EFAULT;
>  	}
>  
>  	if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
> @@ -389,9 +386,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  			r = -EBUSY;
>  			break;
>  		}
> -		r = copy_from_user(&s, argp, sizeof s);
> -		if (r < 0)
> +		if (copy_from_user(&s, argp, sizeof s)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) {
>  			r = -EINVAL;
>  			break;
> @@ -405,9 +403,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  			r = -EBUSY;
>  			break;
>  		}
> -		r = copy_from_user(&s, argp, sizeof s);
> -		if (r < 0)
> +		if (copy_from_user(&s, argp, sizeof s)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		if (s.num > 0xffff) {
>  			r = -EINVAL;
>  			break;
> @@ -419,12 +418,14 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  	case VHOST_GET_VRING_BASE:
>  		s.index = idx;
>  		s.num = vq->last_avail_idx;
> -		r = copy_to_user(argp, &s, sizeof s);
> +		if (copy_to_user(argp, &s, sizeof s))
> +			r = -EFAULT;
>  		break;
>  	case VHOST_SET_VRING_ADDR:
> -		r = copy_from_user(&a, argp, sizeof a);
> -		if (r < 0)
> +		if (copy_from_user(&a, argp, sizeof a)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		if (a.flags & ~(0x1 << VHOST_VRING_F_LOG)) {
>  			r = -EOPNOTSUPP;
>  			break;
> @@ -477,9 +478,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  		vq->used = (void __user *)(unsigned long)a.used_user_addr;
>  		break;
>  	case VHOST_SET_VRING_KICK:
> -		r = copy_from_user(&f, argp, sizeof f);
> -		if (r < 0)
> +		if (copy_from_user(&f, argp, sizeof f)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
>  		if (IS_ERR(eventfp)) {
>  			r = PTR_ERR(eventfp);
> @@ -492,9 +494,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  			filep = eventfp;
>  		break;
>  	case VHOST_SET_VRING_CALL:
> -		r = copy_from_user(&f, argp, sizeof f);
> -		if (r < 0)
> +		if (copy_from_user(&f, argp, sizeof f)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
>  		if (IS_ERR(eventfp)) {
>  			r = PTR_ERR(eventfp);
> @@ -510,9 +513,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
>  			filep = eventfp;
>  		break;
>  	case VHOST_SET_VRING_ERR:
> -		r = copy_from_user(&f, argp, sizeof f);
> -		if (r < 0)
> +		if (copy_from_user(&f, argp, sizeof f)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
>  		if (IS_ERR(eventfp)) {
>  			r = PTR_ERR(eventfp);
> @@ -575,9 +579,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
>  		r = vhost_set_memory(d, argp);
>  		break;
>  	case VHOST_SET_LOG_BASE:
> -		r = copy_from_user(&p, argp, sizeof p);
> -		if (r < 0)
> +		if (copy_from_user(&p, argp, sizeof p)) {
> +			r = -EFAULT;
>  			break;
> +		}
>  		if ((u64)(unsigned long)p != p) {
>  			r = -EFAULT;
>  			break;
> -- 
> 1.7.0.4

^ permalink raw reply

* Re: ipv6: Add GSO support on forwarding path
From: Herbert Xu @ 2010-05-27 10:24 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: David S. Miller, netdev
In-Reply-To: <20100526221601.GA3369@gondor.apana.org.au>

On Thu, May 27, 2010 at 08:16:01AM +1000, Herbert Xu wrote:
> 
> OK, it looks like my patch doesn't work because the transport
> header isn't set on the forwarding path.  Here's an updated patch:
> 
> ipv6: Add GSO support on forwarding path

Actually, this patch is still not quite right for the GSO case.
Right now it's only adding the IP header size, not the full header
size.

I need to think a bit more about this.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: ipv6: Add GSO support on forwarding path
From: Ralf Baechle @ 2010-05-27 10:14 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <20100527092246.GB15298@linux-mips.org>

On Thu, May 27, 2010 at 10:22:46AM +0100, Ralf Baechle wrote:

> > > I tested this on top of a 2.6.34 release kernel and asked the user
> > > experiencing the problem to re-test and the problem still persists.
> > 
> > OK, it looks like my patch doesn't work because the transport
> > header isn't set on the forwarding path.  Here's an updated patch:
> 
> Still no improvment.  I've moved all the collected tcpdumps to
> ftp://www.linux-ax25.org/pub/traces/ including a README file describing
> contents.

Uh..  Of course I should have upgraded the host not the client.  And
after having done that things are working fine now.  Thanks!

  Ralf

^ permalink raw reply

* [PATCH 3/3] vhost: fix the memory leak which will happen when memory_access_ok fails
From: Takuya Yoshikawa @ 2010-05-27 10:03 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527185803.0c0213a1.yoshikawa.takuya@oss.ntt.co.jp>

We need to free newmem when vhost_set_memory() fails to complete.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 drivers/vhost/vhost.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9633a3c..1241a22 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -337,8 +337,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 		return -EFAULT;
 	}
 
-	if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
+	if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL))) {
+		kfree(newmem);
 		return -EFAULT;
+	}
 	oldmem = d->memory;
 	rcu_assign_pointer(d->memory, newmem);
 	synchronize_rcu();
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/3] vhost-net: fix to check the return value of copy_to/from_user() correctly
From: Takuya Yoshikawa @ 2010-05-27 10:01 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527185803.0c0213a1.yoshikawa.takuya@oss.ntt.co.jp>

copy_to/from_user() returns the number of bytes that could not be copied.

So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 drivers/vhost/net.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index aa88911..0f41c91 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -593,17 +593,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
 	int r;
 	switch (ioctl) {
 	case VHOST_NET_SET_BACKEND:
-		r = copy_from_user(&backend, argp, sizeof backend);
-		if (r < 0)
-			return r;
+		if (copy_from_user(&backend, argp, sizeof backend))
+			return -EFAULT;
 		return vhost_net_set_backend(n, backend.index, backend.fd);
 	case VHOST_GET_FEATURES:
 		features = VHOST_FEATURES;
-		return copy_to_user(featurep, &features, sizeof features);
+		if (copy_to_user(featurep, &features, sizeof features))
+			return -EFAULT;
+		return 0;
 	case VHOST_SET_FEATURES:
-		r = copy_from_user(&features, featurep, sizeof features);
-		if (r < 0)
-			return r;
+		if (copy_from_user(&features, featurep, sizeof features))
+			return -EFAULT;
 		if (features & ~VHOST_FEATURES)
 			return -EOPNOTSUPP;
 		return vhost_net_set_features(n, features);
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 1/3] vhost: fix to check the return value of copy_to/from_user() correctly
From: Takuya Yoshikawa @ 2010-05-27  9:58 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev

copy_to/from_user() returns the number of bytes that could not be copied.

So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 drivers/vhost/vhost.c |   51 ++++++++++++++++++++++++++----------------------
 1 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5c9c657..9633a3c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -320,10 +320,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 {
 	struct vhost_memory mem, *newmem, *oldmem;
 	unsigned long size = offsetof(struct vhost_memory, regions);
-	long r;
-	r = copy_from_user(&mem, m, size);
-	if (r)
-		return r;
+	if (copy_from_user(&mem, m, size))
+		return -EFAULT;
 	if (mem.padding)
 		return -EOPNOTSUPP;
 	if (mem.nregions > VHOST_MEMORY_MAX_NREGIONS)
@@ -333,11 +331,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 		return -ENOMEM;
 
 	memcpy(newmem, &mem, size);
-	r = copy_from_user(newmem->regions, m->regions,
-			   mem.nregions * sizeof *m->regions);
-	if (r) {
+	if (copy_from_user(newmem->regions, m->regions,
+			   mem.nregions * sizeof *m->regions)) {
 		kfree(newmem);
-		return r;
+		return -EFAULT;
 	}
 
 	if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
@@ -389,9 +386,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 			r = -EBUSY;
 			break;
 		}
-		r = copy_from_user(&s, argp, sizeof s);
-		if (r < 0)
+		if (copy_from_user(&s, argp, sizeof s)) {
+			r = -EFAULT;
 			break;
+		}
 		if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) {
 			r = -EINVAL;
 			break;
@@ -405,9 +403,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 			r = -EBUSY;
 			break;
 		}
-		r = copy_from_user(&s, argp, sizeof s);
-		if (r < 0)
+		if (copy_from_user(&s, argp, sizeof s)) {
+			r = -EFAULT;
 			break;
+		}
 		if (s.num > 0xffff) {
 			r = -EINVAL;
 			break;
@@ -419,12 +418,14 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 	case VHOST_GET_VRING_BASE:
 		s.index = idx;
 		s.num = vq->last_avail_idx;
-		r = copy_to_user(argp, &s, sizeof s);
+		if (copy_to_user(argp, &s, sizeof s))
+			r = -EFAULT;
 		break;
 	case VHOST_SET_VRING_ADDR:
-		r = copy_from_user(&a, argp, sizeof a);
-		if (r < 0)
+		if (copy_from_user(&a, argp, sizeof a)) {
+			r = -EFAULT;
 			break;
+		}
 		if (a.flags & ~(0x1 << VHOST_VRING_F_LOG)) {
 			r = -EOPNOTSUPP;
 			break;
@@ -477,9 +478,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 		vq->used = (void __user *)(unsigned long)a.used_user_addr;
 		break;
 	case VHOST_SET_VRING_KICK:
-		r = copy_from_user(&f, argp, sizeof f);
-		if (r < 0)
+		if (copy_from_user(&f, argp, sizeof f)) {
+			r = -EFAULT;
 			break;
+		}
 		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 		if (IS_ERR(eventfp)) {
 			r = PTR_ERR(eventfp);
@@ -492,9 +494,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 			filep = eventfp;
 		break;
 	case VHOST_SET_VRING_CALL:
-		r = copy_from_user(&f, argp, sizeof f);
-		if (r < 0)
+		if (copy_from_user(&f, argp, sizeof f)) {
+			r = -EFAULT;
 			break;
+		}
 		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 		if (IS_ERR(eventfp)) {
 			r = PTR_ERR(eventfp);
@@ -510,9 +513,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 			filep = eventfp;
 		break;
 	case VHOST_SET_VRING_ERR:
-		r = copy_from_user(&f, argp, sizeof f);
-		if (r < 0)
+		if (copy_from_user(&f, argp, sizeof f)) {
+			r = -EFAULT;
 			break;
+		}
 		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
 		if (IS_ERR(eventfp)) {
 			r = PTR_ERR(eventfp);
@@ -575,9 +579,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
 		r = vhost_set_memory(d, argp);
 		break;
 	case VHOST_SET_LOG_BASE:
-		r = copy_from_user(&p, argp, sizeof p);
-		if (r < 0)
+		if (copy_from_user(&p, argp, sizeof p)) {
+			r = -EFAULT;
 			break;
+		}
 		if ((u64)(unsigned long)p != p) {
 			r = -EFAULT;
 			break;
-- 
1.7.0.4


^ permalink raw reply related

* Re: UDP Fragmentation and DF bit..
From: Andi Kleen @ 2010-05-27  9:43 UTC (permalink / raw)
  To: Vincent, Pradeep; +Cc: netdev@vger.kernel.org
In-Reply-To: <C8231CDB.16FC5%pradeepv@amazon.com>

"Vincent, Pradeep" <pradeepv@amazon.com> writes:

> OMan 7 ip¹ declares that ³The  system-wide  default  is  controlled  by  the
> ip_no_pmtu_disc  sysctl  for SOCK_STREAM  sockets,  and  disabled  on all
> others.² which led me to think ODF¹ bit will not be set for UDP packets.
> But..

One should add ip.7 is not really a spec, just documentation
how things were quite a few years ago. It unfortunately
does often not get updated when things change.

>
> In a network environment where MTU-big and MTU-small co-exist (and have
> router¹s fragmentation turned off in favor of PMTU discovery), UDP packets
> that are > MTU-small and < MTU-big find the PMTU effectively but UDP
> packets

I don't understand your mtu-small/big concept. PMTU is per IP and per
flow. So there's only always a single PMTU, not small and big.

Or do you refer to a single IP NAT situation where a single IP
shares different MTUs?

> Is there a reason ODF¹ bit cannot be set on fragmented packets on UDP
> transmission ? I couldn¹t find anything in RFC for IP protocol that
> prohibited DF bit on fragmented packets. Did I miss
> something here ?

> Would it be reasonable if PMTU discovery is performed (DF bit set +
> appropriate icmp logic) even for locally fragmented packets ? I think
> this

DF=1 on fragments would mean the application has to do pmtu discovery
even with fragments for the case when the kernel does not know 
the path mtu yet. But if the app supports pmtu discovery it's better
to not use fragments in the first place.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* ipsec and gre on the same interface
From: bengan @ 2010-05-27  9:28 UTC (permalink / raw)
  To: netdev

Hi,

I've got a set up with two boxes (routers) with ipsec and gre (quagga for the 
routing) that works but that includes 2 different interfaces in each box. But 
I would like to do this in a box with only one interface for ipsec and gre. Is 
that possible? In a vmware setup I have eth0 and eth1 as my two interfaces and 
established an endpoint for the gre tunnel in eth1 I can get ipsec and gre 
running and forwarding packets. But when I want to do the same thing with only 
one interface it doesn't work.

Am I doing something really stupid or should this work? If I'm doing something 
really stupid could you explain what?

The setup

Working setup

       ****          ****                                                 
       *  *----------*  *                                                 
       ****          ****                                                 
              ipsec                                                            
           <-------->                                                     

              gre 
       <----------------> 

The setup I would like to get working

       ****          ****                                                 
       *  *----------*  *                                                 
       ****          ****                                                 
              ipsec                                                            
           <-------->                                                     

              gre 
           <--------> 

regards,

/bengan

^ permalink raw reply

* [PATCH net-2.6] be2net: Patch removes redundant while statement in loop.
From: Sarveshwar Bandi @ 2010-05-27  9:32 UTC (permalink / raw)
  To: netdev; +Cc: davem

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
 drivers/net/benet/be_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index aa065c7..54b1427 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1861,7 +1861,7 @@ static int be_setup(struct be_adapter *a
 				goto if_destroy;
 			}
 			vf++;
-		} while (vf < num_vfs);
+		}
 	} else if (!be_physfn(adapter)) {
 		status = be_cmd_mac_addr_query(adapter, mac,
 			MAC_ADDRESS_TYPE_NETWORK, false, adapter->if_handle);
-- 
1.4.0


^ permalink raw reply related

* Re: ipv6: Add GSO support on forwarding path
From: Ralf Baechle @ 2010-05-27  9:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <20100526221601.GA3369@gondor.apana.org.au>

On Thu, May 27, 2010 at 08:16:01AM +1000, Herbert Xu wrote:

> > I tested this on top of a 2.6.34 release kernel and asked the user
> > experiencing the problem to re-test and the problem still persists.
> 
> OK, it looks like my patch doesn't work because the transport
> header isn't set on the forwarding path.  Here's an updated patch:

Still no improvment.  I've moved all the collected tcpdumps to
ftp://www.linux-ax25.org/pub/traces/ including a README file describing
contents.

  Ralf

^ permalink raw reply

* Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
From: Michael S. Tsirkin @ 2010-05-27  9:16 UTC (permalink / raw)
  To: Paul Menage
  Cc: Sridhar Samudrala, netdev, kvm@vger.kernel.org, lkml, containers,
	lizf, Andrew Morton, Ben Blum, KAMEZAWA Hiroyuki
In-Reply-To: <AANLkTilawCLv5_XlStsFdOdLh-dmlIR3aUTyrVS-JF_s@mail.gmail.com>

On Tue, May 25, 2010 at 11:34:12AM -0700, Paul Menage wrote:
> On Tue, May 25, 2010 at 9:53 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, May 20, 2010 at 03:22:15PM -0700, Paul Menage wrote:
> >> On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala
> >> <samudrala.sridhar@gmail.com> wrote:
> >> > Add a new kernel API to attach a task to current task's cgroup
> >> > in all the active hierarchies.
> >> >
> >> > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> >>
> >> Reviewed-by: Paul Menage <menage@google.com>
> >>
> >> It would be more efficient to just attach directly to current->cgroups
> >> rather than potentially creating/destroying one css_set for each
> >> hierarchy until we've completely converged on current->cgroups - but
> >> that would require a bunch of refactoring of the guts of
> >> cgroup_attach_task() to ensure that the right can_attach()/attach()
> >> callbacks are made. That doesn't really seem worthwhile right now for
> >> the initial use, that I imagine isn't going to be
> >> performance-sensitive.
> >>
> >> Paul
> >
> > Is this patch suitable for 2.6.35?
> 
> Should be OK, yes.
> 
> Paul

So I'll add your Acked-by tag then, and merge through the vhost tree
together with the patch that uses this.

-- 
MST

^ permalink raw reply

* Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup
From: Michael S. Tsirkin @ 2010-05-27  9:14 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Tejun Heo, Jiri Kosina, Thomas Gleixner, Oleg Nesterov,
	Ingo Molnar, Andi Kleen
In-Reply-To: <1274227491.2370.110.camel@w-sridhar.beaverton.ibm.com>

On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
> Add a new kernel API to create a singlethread workqueue and attach it's
> task to current task's cgroup and cpumask.
> 
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>

Could someone familiar with workqueue code please comment on whether
this patch is suitable for 2.6.35?

It is needed to fix the case where vhost user might cause a kernel
thread to consume more CPU than allowed by the cgroup.
Should I merge it through the vhost tree?
Ack for this?

Thanks!

> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 9466e86..6d6f301 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -211,6 +211,7 @@ __create_workqueue_key(const char *name, int singlethread,
>  #define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0)
>  #define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0, 0)
>  
> +extern struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char *name); 
>  extern void destroy_workqueue(struct workqueue_struct *wq);
>  
>  extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 5bfb213..6ba226e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -35,6 +35,7 @@
>  #include <linux/lockdep.h>
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/workqueue.h>
> +#include <linux/cgroup.h>
>  
>  /*
>   * The per-CPU workqueue (if single thread, we always use the first
> @@ -193,6 +194,45 @@ static const struct cpumask *cpu_singlethread_map __read_mostly;
>   */
>  static cpumask_var_t cpu_populated_map __read_mostly;
>  
> +static struct task_struct *get_singlethread_wq_task(struct workqueue_struct *wq)
> +{
> +	return (per_cpu_ptr(wq->cpu_wq, singlethread_cpu))->thread;
> +}
> +
> +/* Create a singlethread workqueue and attach it's task to the current task's
> + * cgroup and set it's cpumask to the current task's cpumask.
> + */
> +struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char *name)
> +{
> +	struct workqueue_struct *wq;
> +	struct task_struct *task;
> +	cpumask_var_t mask;
> +
> +	wq = create_singlethread_workqueue(name);
> +	if (!wq)
> +		goto out;
> +
> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> +		goto err;
> +			
> +	if (sched_getaffinity(current->pid, mask))
> +		goto err;
> +
> +	task = get_singlethread_wq_task(wq);
> +	if (sched_setaffinity(task->pid, mask))
> +		goto err;
> +
> +	if (cgroup_attach_task_current_cg(task))
> +		goto err;
> +out:	
> +	return wq;
> +err:
> +	destroy_workqueue(wq);
> +	wq = NULL;
> +	goto out;
> +}
> +EXPORT_SYMBOL_GPL(create_singlethread_workqueue_in_current_cg);
> +
>  /* If it's single threaded, it isn't in the list of workqueues. */
>  static inline int is_wq_single_threaded(struct workqueue_struct *wq)
>  {
> 	

^ permalink raw reply

* [patch 3/3] rds/iw_rdma: remove unneeded kfree()
From: Dan Carpenter @ 2010-05-27  9:11 UTC (permalink / raw)
  To: Andy Grover; +Cc: David S. Miller, rds-devel, netdev, kernel-janitors

The "dma_pages" variable is always NULL at this point so the kfree()
isn't needed.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index 48d5073..742ab6a 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -334,7 +334,6 @@ static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
 out_unmap:
 	ib_dma_unmap_sg(rds_iwdev->dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
 	sg->dma_len = 0;
-	kfree(dma_pages);
 	return ERR_PTR(ret);
 }

^ permalink raw reply related

* [patch 2/3] rds/iw_rdma: spin_lock_irq() is not nestable
From: Dan Carpenter @ 2010-05-27  9:10 UTC (permalink / raw)
  To: Andy Grover; +Cc: David S. Miller, rds-devel, netdev, kernel-janitors

The IRQs are already disabled here so we don't need to disable them
again.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index 7365fa2..48d5073 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -213,9 +213,9 @@ void rds_iw_add_conn(struct rds_iw_device *rds_iwdev, struct rds_connection *con
 	BUG_ON(list_empty(&ic->iw_node));
 	list_del(&ic->iw_node);
 
-	spin_lock_irq(&rds_iwdev->spinlock);
+	spin_lock(&rds_iwdev->spinlock);
 	list_add_tail(&ic->iw_node, &rds_iwdev->conn_list);
-	spin_unlock_irq(&rds_iwdev->spinlock);
+	spin_unlock(&rds_iwdev->spinlock);
 	spin_unlock_irq(&iw_nodev_conns_lock);
 
 	ic->rds_iwdev = rds_iwdev;

^ permalink raw reply related

* [patch 1/3] rds/iw_rdma: using too much stack space
From: Dan Carpenter @ 2010-05-27  9:08 UTC (permalink / raw)
  To: Andy Grover; +Cc: David S. Miller, rds-devel, netdev, kernel-janitors

struct rds_sock is pretty big, it uses 944 bytes.  The inclusion of 
struct sock is what does it, struct sock uses 700 bytes.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index 13dc186..742ab6a 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -178,23 +178,29 @@ int rds_iw_update_cm_id(struct rds_iw_device *rds_iwdev, struct rdma_cm_id *cm_i
 {
 	struct sockaddr_in *src_addr, *dst_addr;
 	struct rds_iw_device *rds_iwdev_old;
-	struct rds_sock rs;
+	struct rds_sock *rs;
 	struct rdma_cm_id *pcm_id;
-	int rc;
+	int ret;
+
+	rs = kmalloc(sizeof(*rs), GFP_KERNEL);
+	if (!rs)
+		return -ENOMEM;
 
 	src_addr = (struct sockaddr_in *)&cm_id->route.addr.src_addr;
 	dst_addr = (struct sockaddr_in *)&cm_id->route.addr.dst_addr;
 
-	rs.rs_bound_addr = src_addr->sin_addr.s_addr;
-	rs.rs_bound_port = src_addr->sin_port;
-	rs.rs_conn_addr = dst_addr->sin_addr.s_addr;
-	rs.rs_conn_port = dst_addr->sin_port;
+	rs->rs_bound_addr = src_addr->sin_addr.s_addr;
+	rs->rs_bound_port = src_addr->sin_port;
+	rs->rs_conn_addr = dst_addr->sin_addr.s_addr;
+	rs->rs_conn_port = dst_addr->sin_port;
 
-	rc = rds_iw_get_device(&rs, &rds_iwdev_old, &pcm_id);
-	if (rc)
+	ret = rds_iw_get_device(rs, &rds_iwdev_old, &pcm_id);
+	if (ret)
 		rds_iw_remove_cm_id(rds_iwdev, cm_id);
 
-	return rds_iw_add_cm_id(rds_iwdev, cm_id);
+	ret = rds_iw_add_cm_id(rds_iwdev, cm_id);
+	kfree(rs);
+	return ret;
 }
 
 void rds_iw_add_conn(struct rds_iw_device *rds_iwdev, struct rds_connection *conn)


^ permalink raw reply related

* Re: issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode
From: Michael S. Tsirkin @ 2010-05-27  8:20 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Herbert Xu
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D790B7958E740@shsmsx502.ccr.corp.intel.com>

On Thu, May 27, 2010 at 09:21:02AM +0800, Xin, Xiaohui wrote:
> Michael,
> I'm now looking into the vhost mergeable buffer, and I tried to use it to support PS mode with zero-copy. And I found an issue there that I have to modify the guest virito-net driver.
> 
> When guest virtio-net driver submits mergeable buffers, it submits multiple pages outside. In zero-copy case, vhost cannot know which page is used to put header, and which page is used to put payload. Then vhost can only reserves 12 bytes for each page. That means, the page_offset of the payload DMAed into the guest buffer is always 12 bytes. But guest virtio-net driver always use offset 0 to put the data (See receive_mergeable()). That's where the zero-copy use mergeable buffer must modify.
> 
> Have I missed something here? And how do you think about it?
> 
> Thanks 
> Xiaohui 

Maybe you can teach the hardware skip the first 12 bytes: qemu will
call an ioctl telling hardware what the virtio header size is.
This is how we plan to do it for tap.

Alternatively, buffers can be used in any order.
So we can have hardware use N buffers for the packet, and then
have vhost put the header in buffer N+1.

-- 
MST

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-27  8:00 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.151014.70204145.davem@davemloft.net>

On Wed, May 26, 2010 at 03:10:14PM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 26 May 2010 23:27:45 +0200
> 
> > As I understand the idea was that the application knows
> > what flows belong to a single peer and wants to have
> > a single cwnd for all of those. Perhaps there would
> > be a way to generalize that to tell it to the kernel.
> > 
> > e.g. have a "peer id"  that is known by applications
> > and the kernel could manage cwnds shared between connections
> > associated with the same peer id?
> > 
> > Just an idea, I admit I haven't thought very deeply
> > about this. Feel free to poke holes into it.
> 
> Yes, a CWND "domain" that can include multiple sockets is
> something that might gain some traction.
> 
> The "domain" could just simply be the tuple {process,peer-IP}

If process is in there this wouldn't work for a multi process
server?

Perhaps having it associated with a FD so that it could
be passed around with unix sockets if needed (just would
need to make sure the AF_UNIX gc can handle such cycles)

peer_id = open_peer_id();   
/* peer id is like a fd */

socket = socket( ... ); 
set_peer_id(socket, peer_id); 


...

close(peer_id);

-andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-27  7:57 UTC (permalink / raw)
  To: Rick Jones; +Cc: andi, David Miller, therbert, shemminger, netdev, ycheng
In-Reply-To: <4BFDA0B6.8030701@hp.com>

> Then all the app does is say "I'am in peer id foo" right?  Is that really 
> that much different from making the setsockopt() call for a different cwnd 
> value? Particularly if say the limit were not a global sysctl, but based on 
> the existing per-route value (perhaps expanded to have a min, max and 
> default?)

The worst case with peer id would be app using an own peer id
for each connection. So each connection would have an own cwnd,
just like today. So the worst case is the same as today.

If it shares connections between peer ids the real effective cwnd
of all those connections would be also never be "worse" (that is
larger) than it could be on single connection. 

So this limits the cwnds effectively with peer ids, although it also 
gives a nice way to reuse an already existing cwnd for a new
connection (this does not make things worse because in theory
the app could have reused the same connection too) 

So overall peer ids don't allow to enlarge cwnds over today.

If the cwnd is fully application controlled all these limits
are not there and a bittorrent client could just always set 
it to 1 million.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Hagen Paul Pfeifer @ 2010-05-27  7:46 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100527.002851.02281005.davem@davemloft.net>

* David Miller | 2010-05-27 00:28:51 [-0700]:

>Superior or not, it's simply never going to happen.  We are far beyond
>being able to get to where we were before NAT'ing and shaping devices
>started to get inserted everywhere on the network.
>
>And I also don't see any of this stuff as fundamentally proprietary.

We will see! If no real interaction between peers is required
ISP/Carrier/InternetExchanges will start to put their proprietary components
into the network. Because they have niffty features, the product developed
phase is shorten (no borring standardizations necessary) and so on. This is no
new insight.

>People want deep packet inspection, people want to control their user's
>traffic.  And people, most importantly, are willing to pay for this.
>
>Therefore, these elements will always be in the network.
>
>Better to co-exist with them and use them to our advantage instead of
>fantasizing about a utopia where they don't exist.

Sure, we have no alternative.

HGN


-- 
Hagen Paul Pfeifer <hagen@jauu.net>  ||  http://jauu.net/
Telephone: +49 174 5455209           ||  Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22

^ permalink raw reply

* Re: [PATCH 2/3] cxgb4i_v3: main driver files
From: Mike Christie @ 2010-05-27  7:40 UTC (permalink / raw)
  To: open-iscsi
  Cc: Rakesh Ranjan, NETDEVML, SCSIDEVML, LKML, Karen Xie, David Miller,
	James Bottomley, Anish Bhatt, Rakesh Ranjan
In-Reply-To: <4BFE2173.4050306@cs.wisc.edu>

On 05/27/2010 02:38 AM, Mike Christie wrote:
> On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
>> From: Rakesh Ranjan<rranjan@chelsio.com>
>>
>>
>> Signed-off-by: Rakesh Ranjan<rakesh@chelsio.com>
>> ---
>> drivers/scsi/cxgb4i/cxgb4i.h | 101 ++
>> drivers/scsi/cxgb4i/cxgb4i_ddp.c | 678 +++++++++++++
>> drivers/scsi/cxgb4i/cxgb4i_ddp.h | 118 +++
>> drivers/scsi/cxgb4i/cxgb4i_offload.c | 1846
>> ++++++++++++++++++++++++++++++++++
>> drivers/scsi/cxgb4i/cxgb4i_offload.h | 91 ++
>> drivers/scsi/cxgb4i/cxgb4i_snic.c | 260 +++++
>> 6 files changed, 3094 insertions(+), 0 deletions(-)
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i.h
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.c
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.h
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.c
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.h
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_snic.c
>
>
>
> Got some whitespace errors when applying.
>
> warning: squelched 1 whitespace error
> warning: 6 lines add whitespace errors.
>
>

Oh yeah, run sparse on the driver too. There are some warnings about 
some functions should be static.

^ permalink raw reply

* Re: [PATCH 2/3] cxgb4i_v3: main driver files
From: Mike Christie @ 2010-05-27  7:38 UTC (permalink / raw)
  To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: Rakesh Ranjan, NETDEVML, SCSIDEVML, LKML, Karen Xie, David Miller,
	James Bottomley, Anish Bhatt, Rakesh Ranjan
In-Reply-To: <1273944249-311-3-git-send-email-rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
> From: Rakesh Ranjan<rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
>
>
> Signed-off-by: Rakesh Ranjan<rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> ---
>   drivers/scsi/cxgb4i/cxgb4i.h         |  101 ++
>   drivers/scsi/cxgb4i/cxgb4i_ddp.c     |  678 +++++++++++++
>   drivers/scsi/cxgb4i/cxgb4i_ddp.h     |  118 +++
>   drivers/scsi/cxgb4i/cxgb4i_offload.c | 1846 ++++++++++++++++++++++++++++++++++
>   drivers/scsi/cxgb4i/cxgb4i_offload.h |   91 ++
>   drivers/scsi/cxgb4i/cxgb4i_snic.c    |  260 +++++
>   6 files changed, 3094 insertions(+), 0 deletions(-)
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i.h
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.c
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.h
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.c
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.h
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_snic.c



Got some whitespace errors when applying.

warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.



> +#define	CXGB4I_SCSI_HOST_QDEPTH	1024
> +#define	CXGB4I_MAX_TARGET	CXGB4I_MAX_CONN
> +#define	CXGB4I_MAX_LUN		512

Is the max lun right? It seems kinda small for modern drivers.



> +
> +static inline void cxgb4i_ddp_ulp_mem_io_set_hdr(struct ulp_mem_io *req,


If you build cxgb3i and cxgb4i in the kernel at the same time, will you 
get problems if each driver has structs that are named the same?


> +
> +static inline int cxgb4i_ddp_find_unused_entries(struct cxgb4i_ddp_info *ddp,
> +					unsigned int start, unsigned int max,
> +					unsigned int count,
> +					struct cxgbi_gather_list *gl)
> +{
> +	unsigned int i, j, k;
> +
> +	/*  not enough entries */
> +	if ((max - start)<  count)
> +		return -EBUSY;
> +
> +	max -= count;
> +	spin_lock(&ddp->map_lock);
> +	for (i = start; i<  max;) {
> +		for (j = 0, k = i; j<  count; j++, k++) {
> +			if (ddp->gl_map[k])
> +				break;
> +		}
> +		if (j == count) {
> +			for (j = 0, k = i; j<  count; j++, k++)
> +				ddp->gl_map[k] = gl;
> +			spin_unlock(&ddp->map_lock);
> +			return i;
> +		}


Is there a more efficient bitmap or some sort of common map operation 
for this (I thought we found something when doing cxgb3i but forgot to 
add it or were testing a patch)?



> +		i += j + 1;
> +	}
> +	spin_unlock(&ddp->map_lock);
> +	return -EBUSY;
> +}
> +
> +static inline void cxgb4i_ddp_unmark_entries(struct cxgb4i_ddp_info *ddp,
> +							int start, int count)
> +{
> +	spin_lock(&ddp->map_lock);
> +	memset(&ddp->gl_map[start], 0,
> +			count * sizeof(struct cxgbi_gather_list *));

extra tab.



> +static void __cxgb4i_ddp_init(struct cxgb4i_snic *snic)
> +{
> +	struct cxgb4i_ddp_info *ddp = snic->ddp;
> +	unsigned int ppmax, bits, tagmask, pgsz_factor[4];
> +	int i;
> +
> +	if (ddp) {
> +		kref_get(&ddp->refcnt);
> +		cxgbi_log_warn("snic 0x%p, ddp 0x%p already set up\n",
> +				snic, snic->ddp);
> +		return;
> +	}
> +
> +	sw_tag_idx_bits = (__ilog2_u32(ISCSI_ITT_MASK)) + 1;
> +	sw_tag_age_bits = (__ilog2_u32(ISCSI_AGE_MASK)) + 1;
> +	snic->cdev.tag_format.sw_bits = sw_tag_idx_bits + sw_tag_age_bits;
> +
> +	cxgbi_log_info("tag itt 0x%x, %u bits, age 0x%x, %u bits\n",
> +			ISCSI_ITT_MASK, sw_tag_idx_bits,
> +			ISCSI_AGE_MASK, sw_tag_age_bits);
> +
> +	ppmax = (snic->lldi.vr->iscsi.size>>  PPOD_SIZE_SHIFT);
> +	bits = __ilog2_u32(ppmax) + 1;
> +	if (bits>  PPOD_IDX_MAX_SIZE)
> +		bits = PPOD_IDX_MAX_SIZE;
> +	ppmax = (1<<  (bits - 1)) - 1;
> +
> +	ddp = cxgbi_alloc_big_mem(sizeof(struct cxgb4i_ddp_info) +
> +			ppmax * (sizeof(struct cxgbi_gather_list *) +
> +				sizeof(struct sk_buff *)),
> +				GFP_KERNEL);
> +	if (!ddp) {
> +		cxgbi_log_warn("snic 0x%p unable to alloc ddp 0x%d, "
> +			       "ddp disabled\n", snic, ppmax);
> +		return;
> +	}
> +
> +	ddp->gl_map = (struct cxgbi_gather_list **)(ddp + 1);
> +	spin_lock_init(&ddp->map_lock);
> +	kref_init(&ddp->refcnt);
> +
> +	ddp->snic = snic;
> +	ddp->pdev = snic->lldi.pdev;
> +	ddp->max_txsz = min_t(unsigned int,
> +				snic->lldi.iscsi_iolen,
> +				ULP2_MAX_PKT_SIZE);
> +	ddp->max_rxsz = min_t(unsigned int,
> +				snic->lldi.iscsi_iolen,
> +				ULP2_MAX_PKT_SIZE);
> +	ddp->llimit = snic->lldi.vr->iscsi.start;
> +	ddp->ulimit = ddp->llimit + snic->lldi.vr->iscsi.size;
> +	ddp->nppods = ppmax;
> +	ddp->idx_last = ppmax;
> +	ddp->idx_bits = bits;
> +	ddp->idx_mask = (1<<  bits) - 1;
> +	ddp->rsvd_tag_mask = (1<<  (bits + PPOD_IDX_SHIFT)) - 1;
> +
> +	tagmask = ddp->idx_mask<<  PPOD_IDX_SHIFT;
> +	for (i = 0; i<  DDP_PGIDX_MAX; i++)
> +		pgsz_factor[i] = ddp_page_order[i];
> +
> +	cxgb4_iscsi_init(snic->lldi.ports[0], tagmask, pgsz_factor);
> +	snic->ddp = ddp;
> +
> +	snic->cdev.tag_format.rsvd_bits = ddp->idx_bits;
> +	snic->cdev.tag_format.rsvd_shift = PPOD_IDX_SHIFT;
> +	snic->cdev.tag_format.rsvd_mask =
> +		((1<<  snic->cdev.tag_format.rsvd_bits) - 1);
> +
> +	cxgbi_log_info("tag format: sw %u, rsvd %u,%u, mask 0x%x.\n",
> +			snic->cdev.tag_format.sw_bits,
> +			snic->cdev.tag_format.rsvd_bits,
> +			snic->cdev.tag_format.rsvd_shift,
> +			snic->cdev.tag_format.rsvd_mask);
> +
> +	snic->tx_max_size = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
> +				ddp->max_txsz - ISCSI_PDU_NONPAYLOAD_LEN);
> +	snic->rx_max_size = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
> +				ddp->max_rxsz - ISCSI_PDU_NONPAYLOAD_LEN);
> +
> +	cxgbi_log_info("max payload size: %u/%u, %u/%u.\n",
> +			snic->tx_max_size, ddp->max_txsz,
> +			snic->rx_max_size, ddp->max_rxsz);
> +
> +	cxgbi_log_info("snic 0x%p, nppods %u, bits %u, mask 0x%x,0x%x "
> +			"pkt %u/%u, %u/%u\n",
> +			snic, ppmax, ddp->idx_bits, ddp->idx_mask,
> +			ddp->rsvd_tag_mask, ddp->max_txsz,
> +			snic->lldi.iscsi_iolen,
> +			ddp->max_rxsz, snic->lldi.iscsi_iolen);
> +
> +	return;


Don't need "return".





> +static void cxgbi_conn_pdu_ready(struct cxgbi_sock *csk)
> +{


Were you going to put this in the libcxgbi but later decide it was a 
little too different? If you are leaving it here could you add a cxgb4i 
prefix so the naming is consistent and avoid confusion with your lib 
functions?



cxgb4i_find_best_mtu looks like it could go in your lib. Looks like 
find_best_mtu from cxgb3i_offload.c. Same with select_mss and compute_wscale


> +	struct sk_buff *skb;
> +	unsigned int read = 0;
> +	struct iscsi_conn *conn = csk->user_data;
> +	int err = 0;
> +
> +	cxgbi_rx_debug("csk 0x%p.\n", csk);
> +
> +	read_lock(&csk->callback_lock);
> +	if (unlikely(!conn || conn->suspend_rx)) {
> +		cxgbi_rx_debug("conn 0x%p, id %d, suspend_rx %lu!\n",
> +				conn, conn ? conn->id : 0xFF,
> +				conn ? conn->suspend_rx : 0xFF);
> +		read_unlock(&csk->callback_lock);
> +		return;
> +	}
> +	skb = skb_peek(&csk->receive_queue);
> +	while (!err&&  skb) {
> +		__skb_unlink(skb,&csk->receive_queue);
> +		read += cxgb4i_skb_rx_pdulen(skb);
> +		cxgbi_rx_debug("conn 0x%p, csk 0x%p, rx skb 0x%p, pdulen %u\n",
> +				conn, csk, skb, cxgb4i_skb_rx_pdulen(skb));
> +		if (cxgb4i_skb_flags(skb)&  CXGB4I_SKCB_FLAG_HDR_RCVD)
> +			err = cxgbi_conn_read_bhs_pdu_skb(conn, skb);
> +		else if (cxgb4i_skb_flags(skb) == CXGB4I_SKCB_FLAG_DATA_RCVD)
> +			err = cxgbi_conn_read_data_pdu_skb(conn, skb);
> +		__kfree_skb(skb);
> +		skb = skb_peek(&csk->receive_queue);
> +	}
> +	read_unlock(&csk->callback_lock);
> +	csk->copied_seq += read;
> +	cxgb4i_sock_rx_credits(csk, read);
> +	conn->rxdata_octets += read;
> +
> +	if (err) {
> +		cxgbi_log_info("conn 0x%p rx failed err %d.\n", conn, err);
> +		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
> +	}
> +}





> +
> +static inline void cxgb4i_sock_free_wr_skb(struct sk_buff *skb)
> +{
> +	kfree_skb(skb);
> +}


I think adding wrappers around skb functions in a net driver is not so 
useful.


> +
> +static inline struct sk_buff *cxgb4i_sock_dequeue_wr(struct cxgbi_sock *csk)
> +{
> +	struct sk_buff *skb = csk->wr_pending_head;
> +
> +	if (likely(skb)) {
> +		csk->wr_pending_head = cxgb4i_skb_tx_wr_next(skb);
> +		cxgb4i_skb_tx_wr_next(skb) = NULL;
> +	}
> +	return skb;
> +}
> +
> +static void cxgb4i_sock_purge_wr_queue(struct cxgbi_sock *csk)
> +{
> +	struct sk_buff *skb;
> +
> +	while ((skb = cxgb4i_sock_dequeue_wr(csk)) != NULL)
> +		cxgb4i_sock_free_wr_skb(skb);
> +}
> +
> +/*

I think this is supposed to be

/**


> +static int cxgb4i_sock_push_tx_frames(struct cxgbi_sock *csk,
> +						int req_completion)
> +{
> +	int total_size = 0;
> +	struct sk_buff *skb;
> +	struct cxgb4i_snic *snic;
> +
> +	if (unlikely(csk->state == CXGBI_CSK_ST_CONNECTING ||
> +				csk->state == CXGBI_CSK_ST_CLOSE_WAIT_1 ||
> +				csk->state>= CXGBI_CSK_ST_ABORTING)) {
> +		cxgbi_tx_debug("csk 0x%p, in closing state %u.\n",
> +				csk, csk->state);
> +		return 0;
> +	}
> +
> +	snic = cxgb4i_get_snic(csk->cdev);
> +
> +	while (csk->wr_cred
> +			&&  (skb = skb_peek(&csk->write_queue)) != NULL) {


The && should be on the right

while (csk->wr_cred &&

> +
> +static int cxgb4i_cpl_act_open_rpl(struct cxgb4i_snic *snic,
> +						struct sk_buff *skb)
> +{
> +	struct cxgbi_sock *csk;
> +	struct cpl_act_open_rpl *rpl = cplhdr(skb);
> +	unsigned int atid =
> +		GET_TID_TID(GET_AOPEN_ATID(be32_to_cpu(rpl->atid_status)));
> +	struct tid_info *t = snic->lldi.tids;
> +	unsigned int status = GET_AOPEN_STATUS(be32_to_cpu(rpl->atid_status));
> +
> +	csk = lookup_atid(t, atid);
> +
> +	if (unlikely(!csk)) {
> +		cxgbi_log_error("can't find connection for tid %u\n", atid);
> +		return CPL_RET_UNKNOWN_TID;
> +	}
> +
> +	cxgbi_sock_hold(csk);
> +	spin_lock_bh(&csk->lock);
> +
> +	cxgbi_conn_debug("rcv, status 0x%x, csk 0x%p, csk->state %u, "
> +			"csk->flag 0x%lx, csk->atid %u.\n",
> +			status, csk, csk->state, csk->flags, csk->hwtid);
> +
> +	if (status&  act_open_has_tid(status))
> +		cxgb4_remove_tid(snic->lldi.tids, csk->port_id, GET_TID(rpl));
> +
> +	if (status == CPL_ERR_CONN_EXIST&&
> +			csk->retry_timer.function !=
> +			cxgb4i_sock_act_open_retry_timer) {


I do not mean to nit pick on silly coding style stuff. I think it is 
easier to read lines like this:

if (status == CPL_ERR_CONN_EXIST&&
     csk->retry_timer.function != cxgb4i_sock_act_open_retry_timer) {


> +		csk->retry_timer.function = cxgb4i_sock_act_open_retry_timer;
> +		if (!mod_timer(&csk->retry_timer, jiffies + HZ / 2))
> +			cxgbi_sock_hold(csk);


There is no del_timer/del_timer_sync + cxgbi_sock_put for this timer. If 
something cleans this csk up, what makes sure the timer gets stopped ok.

> +
> +static int cxgb4i_alloc_cpl_skbs(struct cxgbi_sock *csk)
> +{
> +	csk->cpl_close = alloc_skb(sizeof(struct cpl_close_con_req),
> +					GFP_KERNEL);
> +	if (!csk->cpl_close)
> +		return -ENOMEM;
> +	skb_put(csk->cpl_close, sizeof(struct cpl_close_con_req));
> +
> +	csk->cpl_abort_req = alloc_skb(sizeof(struct cpl_abort_req),
> +					GFP_KERNEL);
> +	if (!csk->cpl_abort_req)
> +		goto free_cpl_skbs;
> +	skb_put(csk->cpl_abort_req, sizeof(struct cpl_abort_req));
> +
> +	csk->cpl_abort_rpl = alloc_skb(sizeof(struct cpl_abort_rpl),
> +					GFP_KERNEL);


These should be GFP_NOIO in case we call them to relogin on a disk that 
has data that would have been needed to be written out to free up mem.

> +
> +struct cxgbi_sock *cxgb4i_sock_create(struct cxgb4i_snic *snic)
> +{
> +	struct cxgbi_sock *csk = NULL;
> +
> +	csk = kzalloc(sizeof(*csk), GFP_KERNEL);

Same as above.

> +
> +static int is_cxgb4_dev(struct net_device *dev, struct cxgb4i_snic *snic)
> +{
> +	struct net_device *ndev = dev;
> +	int i;
> +
> +	if (dev->priv_flags&  IFF_802_1Q_VLAN)
> +		ndev = vlan_dev_real_dev(dev);
> +
> +	for (i = 0; i<  snic->lldi.nports; i++) {
> +		if (ndev == snic->lldi.ports[i])
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static struct net_device *cxgb4i_find_egress_dev(struct net_device *root_dev,
> +						struct cxgb4i_snic *snic)
> +{
> +	while (root_dev) {
> +		if (root_dev->priv_flags&  IFF_802_1Q_VLAN)
> +			root_dev = vlan_dev_real_dev(root_dev);
> +		else if (is_cxgb4_dev(root_dev, snic))
> +			return root_dev;
> +		else
> +			return NULL;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct rtable *find_route(struct net_device *dev,
> +				__be32 saddr, __be32 daddr,
> +				__be16 sport, __be16 dport,
> +				u8 tos)
> +{
> +	struct rtable *rt;
> +	struct flowi fl = {
> +		.oif = dev ? dev->ifindex : 0,
> +		.nl_u = {
> +			.ip4_u = {
> +				.daddr = daddr,
> +				.saddr = saddr,
> +				.tos = tos }
> +			},
> +		.proto = IPPROTO_TCP,
> +		.uli_u = {
> +			.ports = {
> +				.sport = sport,
> +				.dport = dport }
> +			}
> +	};
> +
> +	if (ip_route_output_flow(dev ? dev_net(dev) :&init_net,
> +					&rt,&fl, NULL, 0))
> +		return NULL;
> +
> +	return rt;
> +}
> +


Those functions above look like the cxgb3i ones. Could they be in your lib?



> +static int cxgb4i_init_act_open(struct cxgbi_sock *csk,
> +					struct net_device *dev)
> +{
> +	struct dst_entry *dst = csk->dst;
> +	struct sk_buff *skb;
> +	struct port_info *pi = netdev_priv(dev);
> +
> +	cxgbi_conn_debug("csk 0x%p, state %u, flags 0x%lx\n",
> +			csk, csk->state, csk->flags);
> +
> +	csk->atid = cxgb4_alloc_atid(cxgb4i_get_snic(csk->cdev)->lldi.tids,
> +					csk);
> +	if (csk->atid == -1) {
> +		cxgbi_log_error("cannot alloc atid\n");
> +		goto out_err;
> +	}
> +
> +	csk->l2t = cxgb4_l2t_get(cxgb4i_get_snic(csk->cdev)->lldi.l2t,
> +				csk->dst->neighbour, dev, 0);
> +	if (!csk->l2t) {
> +		cxgbi_log_error("cannot alloc l2t\n");
> +		goto free_atid;
> +	}
> +
> +	skb = alloc_skb(sizeof(struct cpl_act_open_req), GFP_KERNEL);
> +	if (!skb)


Should be NOIO too.

> +		goto free_l2t;
> +
> +	skb->sk = (struct sock *)csk;
> +	t4_set_arp_err_handler(skb, csk, cxgb4i_act_open_req_arp_failure);
> +
> +	cxgbi_sock_hold(csk);
> +
> +	csk->wr_max_cred = csk->wr_cred =
> +		cxgb4i_get_snic(csk->cdev)->lldi.wr_cred;
> +	csk->port_id = pi->port_id;
> +	csk->rss_qid = cxgb4i_get_snic(csk->cdev)->lldi.rxq_ids[csk->port_id];
> +	csk->tx_chan = pi->tx_chan;
> +	csk->smac_idx = csk->tx_chan<<  1;
> +	csk->wr_una_cred = 0;
> +	csk->mss_idx = cxgb4i_select_mss(csk, dst_mtu(dst));
> +	csk->err = 0;
> +
> +	cxgb4i_sock_reset_wr_list(csk);
> +
> +	cxgb4i_sock_make_act_open_req(csk, skb,
> +					((csk->rss_qid<<  14) |
> +					 (csk->atid)), csk->l2t);
> +	cxgb4_l2t_send(cxgb4i_get_snic(csk->cdev)->lldi.ports[csk->port_id],
> +					skb, csk->l2t);
> +	return 0;
> +
> +free_l2t:
> +	cxgb4_l2t_release(csk->l2t);
> +
> +free_atid:
> +	cxgb4_free_atid(cxgb4i_get_snic(csk->cdev)->lldi.tids, csk->atid);
> +
> +out_err:
> +
> +	return -EINVAL;;
> +}
> +
> +static struct net_device *cxgb4i_find_dev(struct net_device *dev,
> +							__be32 ipaddr)
> +{
> +	struct flowi fl;
> +	struct rtable *rt;
> +	int err;
> +
> +	memset(&fl, 0, sizeof(fl));
> +	fl.nl_u.ip4_u.daddr = ipaddr;
> +
> +	err = ip_route_output_key(dev ? dev_net(dev) :&init_net,&rt,&fl);
> +	if (!err)
> +		return (&rt->u.dst)->dev;
> +
> +	return NULL;
> +}
> +


Looks like cxgb3i one.


> +int cxgb4i_sock_connect(struct net_device *dev, struct cxgbi_sock *csk,
> +						struct sockaddr_in *sin)
> +{
> +	struct rtable *rt;
> +	__be32 sipv4 = 0;
> +	struct net_device *dstdev;
> +	struct cxgbi_hba *chba = NULL;
> +	int err;
> +
> +	cxgbi_conn_debug("csk 0x%p, dev 0x%p\n", csk, dev);
> +
> +	if (sin->sin_family != AF_INET)
> +		return -EAFNOSUPPORT;
> +
> +	csk->daddr.sin_port = sin->sin_port;
> +	csk->daddr.sin_addr.s_addr = sin->sin_addr.s_addr;
> +
> +	dstdev = cxgb4i_find_dev(dev, sin->sin_addr.s_addr);
> +	if (!dstdev || !is_cxgb4_dev(dstdev, cxgb4i_get_snic(csk->cdev)))
> +		return -ENETUNREACH;
> +
> +	if (dstdev->priv_flags&  IFF_802_1Q_VLAN)
> +		dev = dstdev;
> +
> +	rt = find_route(dev, csk->saddr.sin_addr.s_addr,
> +			csk->daddr.sin_addr.s_addr,
> +			csk->saddr.sin_port,
> +			csk->daddr.sin_port,
> +			0);
> +	if (rt == NULL) {
> +		cxgbi_conn_debug("no route to %pI4, port %u, dev %s, "
> +					"snic 0x%p\n",
> +					&csk->daddr.sin_addr.s_addr,
> +					ntohs(csk->daddr.sin_port),
> +					dev ? dev->name : "any",
> +					csk->snic);
> +		return -ENETUNREACH;
> +	}
> +
> +	if (rt->rt_flags&  (RTCF_MULTICAST | RTCF_BROADCAST)) {
> +		cxgbi_conn_debug("multi-cast route to %pI4, port %u, "
> +					"dev %s, snic 0x%p\n",
> +					&csk->daddr.sin_addr.s_addr,
> +					ntohs(csk->daddr.sin_port),
> +					dev ? dev->name : "any",
> +					csk->snic);
> +		ip_rt_put(rt);
> +		return -ENETUNREACH;
> +	}
> +
> +	if (!csk->saddr.sin_addr.s_addr)
> +		csk->saddr.sin_addr.s_addr = rt->rt_src;
> +
> +	csk->dst =&rt->u.dst;
> +
> +	dev = cxgb4i_find_egress_dev(csk->dst->dev,
> +					cxgb4i_get_snic(csk->cdev));
> +	if (dev == NULL) {
> +		cxgbi_conn_debug("csk: 0x%p, egress dev NULL\n", csk);
> +		return -ENETUNREACH;
> +	}
> +
> +	err = cxgbi_sock_get_port(csk);
> +	if (err)
> +		return err;
> +
> +	cxgbi_conn_debug("csk: 0x%p get port: %u\n",
> +			csk, ntohs(csk->saddr.sin_port));
> +
> +	chba = cxgb4i_hba_find_by_netdev(csk->dst->dev);
> +
> +	sipv4 = cxgb4i_get_iscsi_ipv4(chba);
> +	if (!sipv4) {
> +		cxgbi_conn_debug("csk: 0x%p, iscsi is not configured\n", csk);
> +		sipv4 = csk->saddr.sin_addr.s_addr;
> +		cxgb4i_set_iscsi_ipv4(chba, sipv4);
> +	} else
> +		csk->saddr.sin_addr.s_addr = sipv4;
> +
> +	cxgbi_conn_debug("csk: 0x%p, %pI4:[%u], %pI4:[%u] SYN_SENT\n",
> +				csk,
> +				&csk->saddr.sin_addr.s_addr,
> +				ntohs(csk->saddr.sin_port),
> +				&csk->daddr.sin_addr.s_addr,
> +				ntohs(csk->daddr.sin_port));
> +
> +	cxgbi_sock_set_state(csk, CXGBI_CSK_ST_CONNECTING);
> +
> +	if (!cxgb4i_init_act_open(csk, dev))
> +		return 0;
> +
> +	err = -ENOTSUPP;
> +
> +	cxgbi_conn_debug("csk 0x%p ->  closed\n", csk);
> +	cxgbi_sock_set_state(csk, CXGBI_CSK_ST_CLOSED);
> +	ip_rt_put(rt);
> +	cxgbi_sock_put_port(csk);
> +
> +	return err;
> +}
> +
> +void cxgb4i_sock_rx_credits(struct cxgbi_sock *csk, int copied)
> +{
> +	int must_send;
> +	u32 credits;
> +
> +	if (csk->state != CXGBI_CSK_ST_ESTABLISHED)
> +		return;
> +
> +	credits = csk->copied_seq - csk->rcv_wup;
> +	if (unlikely(!credits))
> +		return;
> +
> +	if (unlikely(cxgb4i_rx_credit_thres == 0))
> +		return;
> +
> +	must_send = credits + 16384>= cxgb4i_rcv_win;
> +
> +	if (must_send || credits>= cxgb4i_rx_credit_thres)
> +		csk->rcv_wup += cxgb4i_csk_send_rx_credits(csk, credits);
> +}
> +
> +int cxgb4i_sock_send_pdus(struct cxgbi_sock *csk, struct sk_buff *skb)
> +{
> +	struct sk_buff *next;
> +	int err, copied = 0;
> +
> +	spin_lock_bh(&csk->lock);
> +
> +	if (csk->state != CXGBI_CSK_ST_ESTABLISHED) {
> +		cxgbi_tx_debug("csk 0x%p, not in est. state %u.\n",
> +			      csk, csk->state);
> +		err = -EAGAIN;
> +		goto out_err;
> +	}
> +
> +	if (csk->err) {
> +		cxgbi_tx_debug("csk 0x%p, err %d.\n", csk, csk->err);
> +		err = -EPIPE;
> +		goto out_err;
> +	}
> +
> +	if (csk->write_seq - csk->snd_una>= cxgb4i_snd_win) {
> +		cxgbi_tx_debug("csk 0x%p, snd %u - %u>  %u.\n",
> +				csk, csk->write_seq, csk->snd_una,
> +				cxgb4i_snd_win);
> +		err = -ENOBUFS;
> +		goto out_err;
> +	}
> +
> +	while (skb) {
> +		int frags = skb_shinfo(skb)->nr_frags +
> +				(skb->len != skb->data_len);
> +
> +		if (unlikely(skb_headroom(skb)<  CXGB4I_TX_HEADER_LEN)) {
> +			cxgbi_tx_debug("csk 0x%p, skb head.\n", csk);
> +			err = -EINVAL;
> +			goto out_err;
> +		}
> +
> +		if (frags>= SKB_WR_LIST_SIZE) {
> +			cxgbi_log_error("csk 0x%p, tx frags %d, len %u,%u.\n",
> +					 csk, skb_shinfo(skb)->nr_frags,
> +					 skb->len, skb->data_len);
> +			err = -EINVAL;
> +			goto out_err;
> +		}
> +
> +		next = skb->next;
> +		skb->next = NULL;
> +		cxgb4i_sock_skb_entail(csk, skb,
> +				CXGB4I_SKCB_FLAG_NO_APPEND |
> +				CXGB4I_SKCB_FLAG_NEED_HDR);
> +		copied += skb->len;
> +		csk->write_seq += skb->len + ulp_extra_len(skb);
> +		skb = next;
> +	}
> +done:
> +	if (likely(skb_queue_len(&csk->write_queue)))
> +		cxgb4i_sock_push_tx_frames(csk, 1);
> +	spin_unlock_bh(&csk->lock);
> +	return copied;
> +
> +out_err:
> +	if (copied == 0&&  err == -EPIPE)
> +		copied = csk->err ? csk->err : -EPIPE;
> +	else
> +		copied = err;
> +	goto done;
> +}

Looks similar to cxgb3i one.


> +
> +static void cxgbi_sock_conn_closing(struct cxgbi_sock *csk)
> +{

Was this going to the lib? Looks like the cxgb3i one. If not then rename 
it to avoid confusion.


> +
> +struct cxgbi_hba *cxgb4i_hba_find_by_netdev(struct net_device *dev)
> +{
> +	int i;
> +	struct cxgb4i_snic *snic = NULL;;
> +
> +	if (dev->priv_flags&  IFF_802_1Q_VLAN)
> +		dev = vlan_dev_real_dev(dev);
> +
> +	mutex_lock(&snic_rwlock);
> +	list_for_each_entry(snic,&snic_list, list_head) {
> +		for (i = 0; i<  snic->hba_cnt; i++) {
> +			if (snic->hba[i]->ndev == dev) {
> +				mutex_unlock(&snic_rwlock);
> +				return snic->hba[i];
> +			}
> +		}
> +	}
> +	mutex_unlock(&snic_rwlock);
> +	return NULL;


Looks like cxgb3i_hba_find_by_netdev.

> +}
> +
> +struct cxgb4i_snic *cxgb4i_find_snic(struct net_device *dev, __be32 ipaddr)
> +{
> +	struct flowi fl;
> +	struct rtable *rt;
> +	struct net_device *sdev = NULL;
> +	struct cxgb4i_snic *snic = NULL, *tmp;
> +	int err, i;
> +
> +	memset(&fl, 0, sizeof(fl));
> +	fl.nl_u.ip4_u.daddr = ipaddr;
> +
> +	err = ip_route_output_key(dev ? dev_net(dev) :&init_net,&rt,&fl);
> +	if (err)
> +		goto out;
> +
> +	sdev = (&rt->u.dst)->dev;
> +	mutex_lock(&snic_rwlock);
> +	list_for_each_entry_safe(snic, tmp,&snic_list, list_head) {
> +		if (snic) {
> +			for (i = 0; i<  snic->lldi.nports; i++) {
> +				if (sdev == snic->lldi.ports[i]) {
> +					mutex_unlock(&snic_rwlock);
> +					return snic;
> +				}
> +			}
> +		}
> +	}
> +	mutex_unlock(&snic_rwlock);
> +
> +out:
> +	snic = NULL;
> +	return snic;


you can just do return NULL


> +}
> +
> +void cxgb4i_snic_add(struct list_head *list_head)
> +{
> +	mutex_lock(&snic_rwlock);
> +	list_add_tail(list_head,&snic_list);
> +	mutex_unlock(&snic_rwlock);
> +}
> +
> +struct cxgb4i_snic *cxgb4i_snic_init(const struct cxgb4_lld_info *linfo)
> +{
> +	struct cxgb4i_snic *snic;
> +	int i;
> +
> +	snic = kzalloc(sizeof(*snic), GFP_KERNEL);
> +	if (snic) {
> +

extra newline

> +		spin_lock_init(&snic->lock);
> +		snic->lldi = *linfo;
> +		snic->hba_cnt = snic->lldi.nports;
> +		snic->cdev.dd_data = snic;
> +		snic->cdev.pdev = snic->lldi.pdev;
> +		snic->cdev.skb_tx_headroom = SKB_MAX_HEAD(CXGB4I_TX_HEADER_LEN);
> +
> +		cxgb4i_iscsi_init();
> +		cxgbi_pdu_init(&snic->cdev);
> +		cxgb4i_ddp_init(snic);
> +		cxgb4i_ofld_init(snic);
> +
> +		for (i = 0; i<  snic->hba_cnt; i++) {
> +			snic->hba[i] = cxgb4i_hba_add(snic,
> +						snic->lldi.ports[i]);
> +			if (!snic->hba[i]) {
> +				kfree(snic);
> +				snic = ERR_PTR(-ENOMEM);
> +				goto out;
> +			}
> +		}
> +		cxgb4i_snic_add(&snic->list_head);
> +	} else
> +out :
> +	snic = ERR_PTR(-ENOMEM);
> +
> +	return snic;


I think xgb4i_uld_add is not checking for PTR_ERR/IS_ERR.


> +}
> +
> +void cxgb4i_snic_cleanup(void)
> +{
> +	struct cxgb4i_snic *snic, *tmp;
> +	int i;
> +
> +	mutex_lock(&snic_rwlock);
> +	list_for_each_entry_safe(snic, tmp,&snic_list, list_head) {
> +		list_del(&snic->list_head);
> +
> +		for (i = 0; i<  snic->hba_cnt; i++) {
> +			if (snic->hba[i]) {
> +				cxgb4i_hba_remove(snic->hba[i]);
> +				snic->hba[i] = NULL;
> +			}
> +		}
> +		cxgb4i_ofld_cleanup(snic);
> +		cxgb4i_ddp_cleanup(snic);
> +		cxgbi_pdu_cleanup(&snic->cdev);
> +		cxgbi_log_info("snic 0x%p, %u scsi hosts removed.\n",
> +				snic, snic->hba_cnt);
> +
> +		kfree(snic);
> +	}
> +	mutex_unlock(&snic_rwlock);
> +	cxgb4i_iscsi_cleanup();
> +}
> +
> +static void *cxgb4i_uld_add(const struct cxgb4_lld_info *linfo)
> +{
> +	struct cxgb4i_snic *snic;
> +
> +	cxgbi_log_info("%s", version);
> +
> +	snic = cxgb4i_snic_init(linfo);

you can just do

return cxgb4i_snic_init(linfo);

and then delete everything below.



> +	if (!snic)
> +		goto out;
> +out:
> +	return snic;
> +}
> +
> +static int cxgb4i_uld_rx_handler(void *handle, const __be64 *rsp,
> +				const struct pkt_gl *pgl)
> +{
> +	struct cxgb4i_snic *snic = handle;
> +	struct sk_buff *skb;
> +	const struct cpl_act_establish *rpl;
> +	unsigned int opcode;
> +
> +	if (pgl == NULL) {
> +		unsigned int len = 64 - sizeof(struct rsp_ctrl) - 8;
> +
> +		skb = alloc_skb(256, GFP_ATOMIC);
> +		if (!skb)
> +			goto nomem;
> +		__skb_put(skb, len);
> +		skb_copy_to_linear_data(skb,&rsp[1], len);
> +
> +	} else if (pgl == CXGB4_MSG_AN) {
> +

don't need extra {} and newlines.

> +		return 0;
> +
> +	} else {
> +

extra newline

> +		skb = cxgb4_pktgl_to_skb(pgl, 256, 256);
> +		if (unlikely(!skb))
> +			goto nomem;
> +	}
> +
> +	rpl = cplhdr(skb);
> +	opcode = rpl->ot.opcode;
> +
> +	cxgbi_api_debug("snic %p, opcode 0x%x, skb %p\n",
> +			 snic, opcode, skb);
> +
> +	BUG_ON(!snic->handlers[opcode]);
> +
> +	if (snic->handlers[opcode]) {

extra brackets

> +		snic->handlers[opcode](snic, skb);
> +	} else
> +		cxgbi_log_error("No handler for opcode 0x%x\n",
> +				opcode);
> +
> +	return 0;
> +
> +nomem:
> +	cxgbi_api_debug("OOM bailing out\n");
> +	return 1;
> +}
> +
> +static int cxgb4i_uld_state_change(void *handle, enum cxgb4_state state)
> +{
> +	return 0;
> +}
> +
> +static int __init cxgb4i_init_module(void)
> +{
> +	cxgb4_register_uld(CXGB4_ULD_ISCSI,&cxgb4i_uld_info);
> +

extra newline

> +	return 0;
> +}
> +
> +static void __exit cxgb4i_exit_module(void)
> +{
> +

extra newline


> +	cxgb4_unregister_uld(CXGB4_ULD_ISCSI);
> +	cxgb4i_snic_cleanup();
> +}
> +
> +module_init(cxgb4i_init_module);
> +module_exit(cxgb4i_exit_module);
> +

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: David Miller @ 2010-05-27  7:29 UTC (permalink / raw)
  To: anton; +Cc: eric.dumazet, netdev
In-Reply-To: <20100527060923.GD28295@kryten>

From: Anton Blanchard <anton@samba.org>
Date: Thu, 27 May 2010 16:09:23 +1000

> 
>> > [PATCH v2] net: fix lock_sock_bh/unlock_sock_bh
>> > 
>> > This new sock lock primitive was introduced to speedup some user context
>> > socket manipulation. But it is unsafe to protect two threads, one using
>> > regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
>> > 
>> > This patch changes lock_sock_bh to be careful against 'owned' state.
>> > If owned is found to be set, we must take the slow path.
>> > lock_sock_bh() now returns a boolean to say if the slow path was taken,
>> > and this boolean is used at unlock_sock_bh time to call the appropriate
>> > unlock function.
>> > 
>> > After this change, BH are either disabled or enabled during the
>> > lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
>> > so we rename these functions to lock_sock_fast()/unlock_sock_fast().
>> > 
>> > Reported-by: Anton Blanchard <anton@samba.org>
>> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> 
>> Looks good, I'll wait for positive testing from Anton before applying
>> this.
> 
> Thanks guys, this fixed it.
> 
> Tested-by: Anton Blanchard <anton@samba.org>

Thanks for testing Anton, I'll apply this now.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-27  7:28 UTC (permalink / raw)
  To: hagen; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100527070827.GB2728@nuttenaction>

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Thu, 27 May 2010 09:08:27 +0200

> And by the way, the IETF (and our) paradigm is still to shift functionality to
> end hosts - not into network core. "The Rise of the stupid network" [1] is
> still a paradigm that is superior to the alternative where vendors put their
> proprietary algorithms into the network and change the behavior in a
> uncontrollable fashion.

Superior or not, it's simply never going to happen.  We are far beyond
being able to get to where we were before NAT'ing and shaping devices
started to get inserted everywhere on the network.

And I also don't see any of this stuff as fundamentally proprietary.

People want deep packet inspection, people want to control their user's
traffic.  And people, most importantly, are willing to pay for this.

Therefore, these elements will always be in the network.

Better to co-exist with them and use them to our advantage instead of
fantasizing about a utopia where they don't exist.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox