Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: John Williams @ 2010-05-27  4:12 UTC (permalink / raw)
  To: John Linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, michal.simek,
	Brian Hill
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>

On Thu, May 27, 2010 at 3:29 AM, John Linn <john.linn@xilinx.com> wrote:
> The code is not checking the interrupt for DMA correctly so that an
> interrupt number of 0 will cause a false error.
>
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>
> ---
>  drivers/net/ll_temac_main.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
> index fa7620e..0615737 100644
> --- a/drivers/net/ll_temac_main.c
> +++ b/drivers/net/ll_temac_main.c
> @@ -950,7 +950,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
>
>        lp->rx_irq = irq_of_parse_and_map(np, 0);
>        lp->tx_irq = irq_of_parse_and_map(np, 1);
> -       if (!lp->rx_irq || !lp->tx_irq) {
> +       if ((lp->rx_irq == NO_IRQ) || (lp->tx_irq == NO_IRQ)) {

Personally I think this is the right thing to do.  But, I thought the
IRQ 0 == NO_IRQ (AKA "all-the-world's-an-x86-and-if-not-it-should-be")
holy war was already fought and won (or lost, depending on your
perspective)?

I seem to recall giving reluctant assent to a patch from Grant a few
months ago that touched MicroBlaze thus?

John

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Eric Dumazet @ 2010-05-27  4:18 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: David Miller, netdev
In-Reply-To: <20100527035617.GB28295@kryten>

Le jeudi 27 mai 2010 à 13:56 +1000, Anton Blanchard a écrit :
> Hi Eric,
> 
> > You are 100% right David, maybe we should add a test when changing
> > sk_forward_alloc to test if socket is locked (lockdep only test), but
> > that's for 2.6.36 :)
> 
> Thanks for the patch, unfortunately I can still hit the WARN_ON. I'm somewhat
> confused by the two stage locking in the socket lock (ie sk_lock.slock and
> sk_lock.owned).
> 
> What state should the socket lock be in for serialising updates of
> sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
> unlocked, sk_lock.owned = 1:
> 
> NIP [c0000000005b4ad0] .sock_queue_rcv_skb
> LR [c0000000005b4acc] .sock_queue_rcv_skb
> Call Trace:
> [c0000000005f9fcc] .ip_queue_rcv_skb
> [c00000000061d604] .__udp_queue_rcv_skb
> [c0000000005b1a38] .release_sock
> [c0000000006205f0] .udp_sendmsg
> [c0000000006290d4] .inet_sendmsg
> [c0000000005abfb4] .sock_sendmsg
> [c0000000005ae9dc] .SyS_sendto
> [c0000000005ab6c0] .SyS_send
> 
> And other times we update it with sk_lock.slock = locked, sk_lock.owned = 0:
> 
> NIP [c0000000005b2b6c] .sock_rfree
> LR [c0000000005b2b68] .sock_rfree
> Call Trace:
> [c0000000005bca10] .skb_free_datagram_locked
> [c00000000061fe88] .udp_recvmsg
> [c0000000006285e8] .inet_recvmsg
> [c0000000005abe0c] .sock_recvmsg
> [c0000000005ae358] .SyS_recvfrom
> 
> I see we sometimes take sk_lock.slock then check the owned field, but we
> aren't doing that all the time.
> 

Old rule was :

A Process context was using 
lock + test_and_set_or_sleep + unlock (sk_lock.slock = unlocked,
sk_lock.owned = 1)

softirq handler was using :
(sk_lock.slock = locked, sk_lock.owned =0)

I added a shortcut, but it seems wrong as is

Process context :

lock only (sk_lock.slock = locked, sk_lock.owned = ???)

So I should add a test on owned. If set (by another thread), we should take the slow path.

Thanks !



^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: David Miller @ 2010-05-27  4:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev
In-Reply-To: <1274933926.2542.26.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 May 2010 06:18:46 +0200

> Process context :
> 
> lock only (sk_lock.slock = locked, sk_lock.owned = ???)
> 
> So I should add a test on owned. If set (by another thread), we should take the slow path.

Indeed, what you're doing now is broken.

If owned is non-zero when you take the spinlock you have to queue your
work, just like how we toss packets into the socket backlog, which is
processed by release_sock(), when this happens.

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Eric Dumazet @ 2010-05-27  4:21 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev
In-Reply-To: <20100526.210600.242135655.davem@davemloft.net>

Le mercredi 26 mai 2010 à 21:06 -0700, David Miller a écrit :
> From: Anton Blanchard <anton@samba.org>
> Date: Thu, 27 May 2010 13:56:17 +1000
> 
> > I'm somewhat confused by the two stage locking in the socket lock
> > (ie sk_lock.slock and sk_lock.owned).
> > 
> > What state should the socket lock be in for serialising updates of
> > sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
> > unlocked, sk_lock.owned = 1:
> 
> If sh_lock.owned=1 the user has grabbed exclusive sleepable lock on the
> socket, it does this via something approximating:
> 
> retry:
> 	spin_lock(&sk_lock.slock);
> 	was_locked = sk_lock.owned;
> 	if (!was_locked)
> 		sk_lock.owned = 1;
> 	spin_unlock(&sk_lock.slock);
> 	if (was_locked) {
> 		sleep_on(condition(sk_lock.owned));
> 		goto retry;
> 	}
> 
> This allows user context code to sleep with exclusive access to the
> socket.
> 
> Code that cannot sleep takes the spinlock, and queues the work if the
> owner field if non-zero.  Else, it keeps the spinlock held and does
> the work.
> 
> In either case, socket modifications are thus done completely protected
> from other contexts.
> 
> 

Yes, but not on the case one user context uses the 'slow locking', and
another uses the 'fast locking'.

I am working on a patch now.





^ permalink raw reply

* Re: [PATCH 3/3] net: deliver skbs on inactive slaves to exact matches
From: John Fastabend @ 2010-05-27  4:52 UTC (permalink / raw)
  To: andy@greyhouse.net, fubar@us.ibm.com, nhorman@tuxdriver.com,
	"bonding-devel@lists.sourceforge.net" <bonding-dev
In-Reply-To: <20100513073117.3528.33132.stgit@jf-dev2-dcblab>

Fastabend, John R wrote:
> Currently, the accelerated receive path for VLAN's will
> drop packets if the real device is an inactive slave and
> is not one of the special pkts tested for in
> skb_bond_should_drop().  This behavior is different then
> the non-accelerated path and for pkts over a bonded vlan.
> 
> For example,
> 
> vlanx -> bond0 -> ethx
> 
> will be dropped in the vlan path and not delivered to any
> packet handlers at all.  However,
> 
> bond0 -> vlanx -> ethx
> 
> and
> 
> bond0 -> ethx
> 
> will be delivered to handlers that match the exact dev,
> because the VLAN path checks the real_dev which is not a
> slave and netif_recv_skb() doesn't drop frames but only
> delivers them to exact matches.
> 
> This patch adds a sk_buff flag which is used for tagging
> skbs that would previously been dropped and allows the
> skb to continue to skb_netif_recv().  Here we add
> logic to check for the bond_should_drop flag and if it
> is set only deliver to handlers that match exactly.  This
> makes both paths above consistent and gives pkt handlers
> a way to identify skbs that come from inactive slaves.
> Without this patch in some configurations skbs will be
> delivered to handlers with exact matches and in others
> be dropped out right in the vlan path.
> 
> I have tested the following 4 configurations in failover modes
> and load balancing modes.
> 
> # bond0 -> ethx
> 
> # vlanx -> bond0 -> ethx
> 
> # bond0 -> vlanx -> ethx
> 
> # bond0 -> ethx
>             |
>   vlanx -> --
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 

Jay,

I know you were looking over this series at one point.  Did you have any 
comments?  Sorry for the ping just wanted to keep this on my radar.

Thanks,
John

^ permalink raw reply

* [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: Eric Dumazet @ 2010-05-27  5:06 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev
In-Reply-To: <20100526.212105.212688828.davem@davemloft.net>

Le mercredi 26 mai 2010 à 21:21 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 27 May 2010 06:18:46 +0200
> 
> > Process context :
> > 
> > lock only (sk_lock.slock = locked, sk_lock.owned = ???)
> > 
> > So I should add a test on owned. If set (by another thread), we should take the slow path.
> 
> Indeed, what you're doing now is broken.
> 
> If owned is non-zero when you take the spinlock you have to queue your
> work, just like how we toss packets into the socket backlog, which is
> processed by release_sock(), when this happens.

Here is the patch I am going to test today.

Anton, could you please test it too, just for sure ?

Thanks !


[PATCH] net: fix lock_sock_bh/unlock_sock_bh

This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h  |   20 ++++++++++++++------
 net/core/datagram.c |    6 ++++--
 net/core/sock.c     |   32 ++++++++++++++++++++++++++++++++
 net/ipv4/udp.c      |   14 ++++++++------
 net/ipv6/udp.c      |    5 +++--
 5 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index d2a71b0..ca241ea 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1026,15 +1026,23 @@ extern void release_sock(struct sock *sk);
 				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
-static inline void lock_sock_bh(struct sock *sk)
+extern bool lock_sock_fast(struct sock *sk);
+/**
+ * unlock_sock_fast - complement of lock_sock_fast
+ * @sk: socket
+ * @slow: slow mode
+ *
+ * fast unlock socket for user context.
+ * If slow mode is on, we call regular release_sock()
+ */
+static inline void unlock_sock_fast(struct sock *sk, bool slow)
 {
-	spin_lock_bh(&sk->sk_lock.slock);
+	if (slow)
+		release_sock(sk);
+	else
+		spin_unlock_bh(&sk->sk_lock.slock);
 }
 
-static inline void unlock_sock_bh(struct sock *sk)
-{
-	spin_unlock_bh(&sk->sk_lock.slock);
-}
 
 extern struct sock		*sk_alloc(struct net *net, int family,
 					  gfp_t priority,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index e009753..f5b6f43 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -229,15 +229,17 @@ EXPORT_SYMBOL(skb_free_datagram);
 
 void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
+	bool slow;
+
 	if (likely(atomic_read(&skb->users) == 1))
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
 
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	skb_orphan(skb);
 	sk_mem_reclaim_partial(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
 	__kfree_skb(skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 37fe9b6..7ab6398 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2007,6 +2007,38 @@ void release_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(release_sock);
 
+/**
+ * lock_sock_fast - fast version of lock_sock
+ * @sk: socket
+ *
+ * This version should be used for very small section, where process wont block
+ * return false if fast path is taken
+ *   sk_lock.slock locked, owned = 0, BH disabled
+ * return true if slow path is taken
+ *   sk_lock.slock unlocked, owned = 1, BH enabled
+ */
+bool lock_sock_fast(struct sock *sk)
+{
+	might_sleep();
+	spin_lock_bh(&sk->sk_lock.slock);
+
+	if (!sk->sk_lock.owned)
+		/*
+		 * Note : We must disable BH or risk a deadlock
+		 */
+		return false;
+
+	__lock_sock(sk);
+	sk->sk_lock.owned = 1;
+	spin_unlock(&sk->sk_lock.slock);
+	/*
+	 * The sk_lock has mutex_lock() semantics here:
+	 */
+	mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
+	return true;
+}
+EXPORT_SYMBOL(lock_sock_fast);
+
 int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
 	struct timeval tv;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9de6a69..b9d0d40 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1063,10 +1063,11 @@ static unsigned int first_packet_length(struct sock *sk)
 	spin_unlock_bh(&rcvq->lock);
 
 	if (!skb_queue_empty(&list_kill)) {
-		lock_sock_bh(sk);
+		bool slow = lock_sock_fast(sk);
+
 		__skb_queue_purge(&list_kill);
 		sk_mem_reclaim_partial(sk);
-		unlock_sock_bh(sk);
+		unlock_sock_fast(sk, slow);
 	}
 	return res;
 }
@@ -1123,6 +1124,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int peeked;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
+	bool slow;
 
 	/*
 	 *	Check any passed addresses
@@ -1197,10 +1199,10 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags))
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1625,9 +1627,9 @@ int udp_rcv(struct sk_buff *skb)
 
 void udp_destroy_sock(struct sock *sk)
 {
-	lock_sock_bh(sk);
+	bool slow = lock_sock_fast(sk);
 	udp_flush_pending_frames(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 }
 
 /*
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3d7a2c0..87be586 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -328,6 +328,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	int is_udp4;
+	bool slow;
 
 	if (addr_len)
 		*addr_len=sizeof(struct sockaddr_in6);
@@ -424,7 +425,7 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags)) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
@@ -433,7 +434,7 @@ csum_copy_err:
 			UDP6_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 	}
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (flags & MSG_DONTWAIT)
 		return -EAGAIN;



^ permalink raw reply related

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: Eric Dumazet @ 2010-05-27  5:20 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev
In-Reply-To: <1274936763.2542.42.camel@edumazet-laptop>

Le jeudi 27 mai 2010 à 07:06 +0200, Eric Dumazet a écrit :
> Le mercredi 26 mai 2010 à 21:21 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Thu, 27 May 2010 06:18:46 +0200
> > 
> > > Process context :
> > > 
> > > lock only (sk_lock.slock = locked, sk_lock.owned = ???)
> > > 
> > > So I should add a test on owned. If set (by another thread), we should take the slow path.
> > 
> > Indeed, what you're doing now is broken.
> > 
> > If owned is non-zero when you take the spinlock you have to queue your
> > work, just like how we toss packets into the socket backlog, which is
> > processed by release_sock(), when this happens.
> 
> Here is the patch I am going to test today.
> 
> Anton, could you please test it too, just for sure ?
> 
> Thanks !
> 
> 
> [PATCH] net: fix lock_sock_bh/unlock_sock_bh
> 
> This new sock lock primitive was introduced to speedup some user context
> socket manipulation. But it is unsafe to protect two threads, one using
> regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
> 
> This patch changes lock_sock_bh to be careful against 'owned' state.
> If owned is found to be set, we must take the slow path.
> lock_sock_bh() now returns a boolean to say if the slow path was taken,
> and this boolean is used at unlock_sock_bh time to call the appropriate
> unlock function.
> 
> After this change, BH are either disabled or enabled during the
> lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
> so we rename these functions to lock_sock_fast()/unlock_sock_fast().
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/net/sock.h  |   20 ++++++++++++++------
>  net/core/datagram.c |    6 ++++--
>  net/core/sock.c     |   32 ++++++++++++++++++++++++++++++++
>  net/ipv4/udp.c      |   14 ++++++++------
>  net/ipv6/udp.c      |    5 +++--
>  5 files changed, 61 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d2a71b0..ca241ea 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1026,15 +1026,23 @@ extern void release_sock(struct sock *sk);
>  				SINGLE_DEPTH_NESTING)
>  #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
>  
> -static inline void lock_sock_bh(struct sock *sk)
> +extern bool lock_sock_fast(struct sock *sk);
> +/**
> + * unlock_sock_fast - complement of lock_sock_fast
> + * @sk: socket
> + * @slow: slow mode
> + *
> + * fast unlock socket for user context.
> + * If slow mode is on, we call regular release_sock()
> + */
> +static inline void unlock_sock_fast(struct sock *sk, bool slow)
>  {
> -	spin_lock_bh(&sk->sk_lock.slock);
> +	if (slow)
> +		release_sock(sk);
> +	else
> +		spin_unlock_bh(&sk->sk_lock.slock);
>  }
>  
> -static inline void unlock_sock_bh(struct sock *sk)
> -{
> -	spin_unlock_bh(&sk->sk_lock.slock);
> -}
>  
>  extern struct sock		*sk_alloc(struct net *net, int family,
>  					  gfp_t priority,
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index e009753..f5b6f43 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -229,15 +229,17 @@ EXPORT_SYMBOL(skb_free_datagram);
>  
>  void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
>  {
> +	bool slow;
> +
>  	if (likely(atomic_read(&skb->users) == 1))
>  		smp_rmb();
>  	else if (likely(!atomic_dec_and_test(&skb->users)))
>  		return;
>  
> -	lock_sock_bh(sk);
> +	slow = lock_sock_fast(sk);
>  	skb_orphan(skb);
>  	sk_mem_reclaim_partial(sk);
> -	unlock_sock_bh(sk);
> +	unlock_sock_fast(sk, slow);
>  
>  	/* skb is now orphaned, can be freed outside of locked section */
>  	__kfree_skb(skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 37fe9b6..7ab6398 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2007,6 +2007,38 @@ void release_sock(struct sock *sk)
>  }
>  EXPORT_SYMBOL(release_sock);
>  
> +/**
> + * lock_sock_fast - fast version of lock_sock
> + * @sk: socket
> + *
> + * This version should be used for very small section, where process wont block
> + * return false if fast path is taken
> + *   sk_lock.slock locked, owned = 0, BH disabled
> + * return true if slow path is taken
> + *   sk_lock.slock unlocked, owned = 1, BH enabled
> + */
> +bool lock_sock_fast(struct sock *sk)
> +{
> +	might_sleep();
> +	spin_lock_bh(&sk->sk_lock.slock);
> +
> +	if (!sk->sk_lock.owned)
> +		/*
> +		 * Note : We must disable BH or risk a deadlock
> +		 */
> +		return false;
> +
> +	__lock_sock(sk);
> +	sk->sk_lock.owned = 1;
> +	spin_unlock(&sk->sk_lock.slock);
> +	/*
> +	 * The sk_lock has mutex_lock() semantics here:
> +	 */
> +	mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);

Oops, I missed a local_bh_enable(); here

> +	return true;
> +}
> +EXPORT_SYMBOL(lock_sock_fast);
> +
>  int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
>  {
>  	struct timeval tv;

Here is v2 of patch with local_bh_enable(); added in lock_sock_fast()

[PATCH v2] net: fix lock_sock_bh/unlock_sock_bh

This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h  |   20 ++++++++++++++------
 net/core/datagram.c |    6 ++++--
 net/core/sock.c     |   33 +++++++++++++++++++++++++++++++++
 net/ipv4/udp.c      |   14 ++++++++------
 net/ipv6/udp.c      |    5 +++--
 5 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index d2a71b0..ca241ea 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1026,15 +1026,23 @@ extern void release_sock(struct sock *sk);
 				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
-static inline void lock_sock_bh(struct sock *sk)
+extern bool lock_sock_fast(struct sock *sk);
+/**
+ * unlock_sock_fast - complement of lock_sock_fast
+ * @sk: socket
+ * @slow: slow mode
+ *
+ * fast unlock socket for user context.
+ * If slow mode is on, we call regular release_sock()
+ */
+static inline void unlock_sock_fast(struct sock *sk, bool slow)
 {
-	spin_lock_bh(&sk->sk_lock.slock);
+	if (slow)
+		release_sock(sk);
+	else
+		spin_unlock_bh(&sk->sk_lock.slock);
 }
 
-static inline void unlock_sock_bh(struct sock *sk)
-{
-	spin_unlock_bh(&sk->sk_lock.slock);
-}
 
 extern struct sock		*sk_alloc(struct net *net, int family,
 					  gfp_t priority,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index e009753..f5b6f43 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -229,15 +229,17 @@ EXPORT_SYMBOL(skb_free_datagram);
 
 void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
+	bool slow;
+
 	if (likely(atomic_read(&skb->users) == 1))
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
 
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	skb_orphan(skb);
 	sk_mem_reclaim_partial(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
 	__kfree_skb(skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 37fe9b6..2cf7f9f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2007,6 +2007,39 @@ void release_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(release_sock);
 
+/**
+ * lock_sock_fast - fast version of lock_sock
+ * @sk: socket
+ *
+ * This version should be used for very small section, where process wont block
+ * return false if fast path is taken
+ *   sk_lock.slock locked, owned = 0, BH disabled
+ * return true if slow path is taken
+ *   sk_lock.slock unlocked, owned = 1, BH enabled
+ */
+bool lock_sock_fast(struct sock *sk)
+{
+	might_sleep();
+	spin_lock_bh(&sk->sk_lock.slock);
+
+	if (!sk->sk_lock.owned)
+		/*
+		 * Note : We must disable BH
+		 */
+		return false;
+
+	__lock_sock(sk);
+	sk->sk_lock.owned = 1;
+	spin_unlock(&sk->sk_lock.slock);
+	/*
+	 * The sk_lock has mutex_lock() semantics here:
+	 */
+	mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
+	local_bh_enable();
+	return true;
+}
+EXPORT_SYMBOL(lock_sock_fast);
+
 int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
 	struct timeval tv;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9de6a69..b9d0d40 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1063,10 +1063,11 @@ static unsigned int first_packet_length(struct sock *sk)
 	spin_unlock_bh(&rcvq->lock);
 
 	if (!skb_queue_empty(&list_kill)) {
-		lock_sock_bh(sk);
+		bool slow = lock_sock_fast(sk);
+
 		__skb_queue_purge(&list_kill);
 		sk_mem_reclaim_partial(sk);
-		unlock_sock_bh(sk);
+		unlock_sock_fast(sk, slow);
 	}
 	return res;
 }
@@ -1123,6 +1124,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int peeked;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
+	bool slow;
 
 	/*
 	 *	Check any passed addresses
@@ -1197,10 +1199,10 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags))
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1625,9 +1627,9 @@ int udp_rcv(struct sk_buff *skb)
 
 void udp_destroy_sock(struct sock *sk)
 {
-	lock_sock_bh(sk);
+	bool slow = lock_sock_fast(sk);
 	udp_flush_pending_frames(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 }
 
 /*
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3d7a2c0..87be586 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -328,6 +328,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	int is_udp4;
+	bool slow;
 
 	if (addr_len)
 		*addr_len=sizeof(struct sockaddr_in6);
@@ -424,7 +425,7 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags)) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
@@ -433,7 +434,7 @@ csum_copy_err:
 			UDP6_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 	}
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (flags & MSG_DONTWAIT)
 		return -EAGAIN;



^ permalink raw reply related

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: David Miller @ 2010-05-27  5:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev
In-Reply-To: <1274937618.2542.46.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 May 2010 07:20:18 +0200

> [PATCH v2] net: fix lock_sock_bh/unlock_sock_bh
> 
> This new sock lock primitive was introduced to speedup some user context
> socket manipulation. But it is unsafe to protect two threads, one using
> regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
> 
> This patch changes lock_sock_bh to be careful against 'owned' state.
> If owned is found to be set, we must take the slow path.
> lock_sock_bh() now returns a boolean to say if the slow path was taken,
> and this boolean is used at unlock_sock_bh time to call the appropriate
> unlock function.
> 
> After this change, BH are either disabled or enabled during the
> lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
> so we rename these functions to lock_sock_fast()/unlock_sock_fast().
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good, I'll wait for positive testing from Anton before applying
this.

^ permalink raw reply

* [PATCH] net: add rtnl assertion to linkwatch_run_queue
From: Denis Kirjanov @ 2010-05-27  5:51 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, shemminger, herbert, kaber, netdev

Add RTNL assertion to linkwatch_run_queue since function 
must be called with semaphore held

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---
net/core/link_watch.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index bdbce2f..e6d06c0 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -220,6 +220,7 @@ void linkwatch_forget_dev(struct net_device *dev)
 /* Must be called with the rtnl semaphore held */
 void linkwatch_run_queue(void)
 {
+	ASSERT_RTNL();
 	__linkwatch_run_queue(0);
 }
 

^ permalink raw reply related

* [PATCH] mlx4_en: use net_device dev_id to indicate port number
From: Eli Cohen @ 2010-05-27  5:56 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	rdreier-FYB4Gu1CFyUAvxtiuMwx3w, yevgenyp-VPRAkNaXOzVS1MOuV/RT9w,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

Today, there are no means to know which port of a hardware device a netdev
interface uses. struct net_device conatins a field, dev_id, that can be used
for that. Use this field to save the port number in ConnectX that is being used
by the net device; port numbers are zero based.

Signed-off-by: Eli Cohen <eli-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
---
 drivers/net/mlx4/en_netdev.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index 6c2b15b..ab1d480 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -978,6 +978,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	}
 
 	SET_NETDEV_DEV(dev, &mdev->dev->pdev->dev);
+	dev->dev_id =  port - 1;
 
 	/*
 	 * Initialize driver private data
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 3/3] cxgb4i_v3: iscsi and libcxgbi library for handling common part
From: Mike Christie @ 2010-05-27  5:59 UTC (permalink / raw)
  To: open-iscsi
  Cc: Rakesh Ranjan, NETDEVML, SCSIDEVML, LKML, Karen Xie, David Miller,
	James Bottomley, Anish Bhatt, Rakesh Ranjan
In-Reply-To: <1273944249-311-4-git-send-email-rakesh@chelsio.com>

On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
> From: Rakesh Ranjan<rranjan@chelsio.com>
>
>
> Signed-off-by: Rakesh Ranjan<rakesh@chelsio.com>
> ---
>   drivers/scsi/cxgb4i/cxgb4i_iscsi.c |  617 ++++++++++++++++++++++++++++++++++++
>   drivers/scsi/cxgb4i/libcxgbi.c     |  589 ++++++++++++++++++++++++++++++++++
>   drivers/scsi/cxgb4i/libcxgbi.h     |  430 +++++++++++++++++++++++++

I think the patch had some whitespace/newline issues. When I did git-am 
I got:

warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.

I think James can just fix up when he merges with git, but in the future 
you might want to try a git-am on your patch before you send (git-am 
--whitespace=fix will fix it up for you).



> +
> +int cxgbi_pdu_init(struct cxgbi_device *cdev)
> +{
> +	if (cdev->skb_tx_headroom>  (512 * MAX_SKB_FRAGS))
> +		cdev->skb_extra_headroom = cdev->skb_tx_headroom;
> +	cdev->pad_page = alloc_page(GFP_KERNEL);
> +	if (cdev->pad_page)
> +		return -ENOMEM;
> +	memset(page_address(cdev->pad_page), 0, PAGE_SIZE);
> +	return 0;
> +
> +	/*
> +	if (SKB_TX_HEADROOM>  (512 * MAX_SKB_FRAGS))
> +		skb_extra_headroom = SKB_TX_HEADROOM;
> +	pad_page = alloc_page(GFP_KERNEL);
> +	if (!pad_page)
> +		return -ENOMEM;
> +	memset(page_address(pad_page), 0, PAGE_SIZE);
> +	return 0;*/

Clean this up.

> +
> +void cxgbi_release_itt(struct iscsi_task *task, itt_t hdr_itt)
> +{
> +	struct scsi_cmnd *sc = task->sc;
> +	struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
> +	struct cxgbi_conn *cconn = tcp_conn->dd_data;
> +	struct cxgbi_device *cdev = cconn->chba->cdev;
> +	struct cxgbi_tag_format *tformat =&cdev->tag_format;
> +	u32 tag = ntohl((__force u32)hdr_itt);
> +
> +	cxgbi_tag_debug("release tag 0x%x.\n", tag);
> +
> +	if (sc&&
> +		(scsi_bidi_cmnd(sc) ||
> +		 sc->sc_data_direction == DMA_FROM_DEVICE)&&
> +			cxgbi_is_ddp_tag(tformat, tag))

The formatting is a little weird. I think you want each line to start in 
the same column instead of each getting tabbed over.


> +
> +
> +int cxgbi_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
> +{
> +	struct scsi_cmnd *sc = task->sc;
> +	struct iscsi_conn *conn = task->conn;
> +	struct iscsi_session *sess = conn->session;
> +	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> +	struct cxgbi_conn *cconn = tcp_conn->dd_data;
> +	struct cxgbi_device *cdev = cconn->chba->cdev;
> +	struct cxgbi_tag_format *tformat =&cdev->tag_format;
> +	u32 sw_tag = (sess->age<<  cconn->task_idx_bits) | task->itt;
> +	u32 tag;
> +	int err = -EINVAL;
> +
> +	if (sc&&
> +		(scsi_bidi_cmnd(sc) ||
> +		 sc->sc_data_direction == DMA_FROM_DEVICE)&&
> +			cxgbi_sw_tag_usable(tformat, sw_tag)) {


Same tabbing.


> +	volatile unsigned int state;

I did not get why this needed to be volatile (I see it is like this in 
cxgb3i and I did not see why in there too).



> +
> +static inline void cxgbi_sock_hold(struct cxgbi_sock *csk)
> +{
> +	atomic_inc(&csk->refcnt);


We want people to use krefs instead of their own refcounting now.

> +
> +static inline void *cplhdr(struct sk_buff *skb)
> +{
> +	return skb->data;
> +}


Seems kinda useless.

^ permalink raw reply

* Re: [PATCH 3/3] cxgb4i_v3: iscsi and libcxgbi library for handling common part
From: Rakesh Ranjan @ 2010-05-27  6:05 UTC (permalink / raw)
  To: Mike Christie
  Cc: open-iscsi-/JYPxA39Uh5TLH3MbocFFw, Rakesh Ranjan, NETDEVML,
	SCSIDEVML, LKML, Karen Xie, David Miller, James Bottomley,
	Anish Bhatt
In-Reply-To: <4BFE0A3E.9020804-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 05/27/2010 11:29 AM, Mike Christie wrote:
> On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
>> From: Rakesh Ranjan<rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
>>
>>
>> Signed-off-by: Rakesh Ranjan<rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
>> ---
>>   drivers/scsi/cxgb4i/cxgb4i_iscsi.c |  617
>> ++++++++++++++++++++++++++++++++++++
>>   drivers/scsi/cxgb4i/libcxgbi.c     |  589
>> ++++++++++++++++++++++++++++++++++
>>   drivers/scsi/cxgb4i/libcxgbi.h     |  430 +++++++++++++++++++++++++
> 
> I think the patch had some whitespace/newline issues. When I did git-am
> I got:
> 
> warning: squelched 1 whitespace error
> warning: 6 lines add whitespace errors.
> 
> I think James can just fix up when he merges with git, but in the future
> you might want to try a git-am on your patch before you send (git-am
> --whitespace=fix will fix it up for you).
> 
> 
> 
>> +
>> +int cxgbi_pdu_init(struct cxgbi_device *cdev)
>> +{
>> +    if (cdev->skb_tx_headroom>  (512 * MAX_SKB_FRAGS))
>> +        cdev->skb_extra_headroom = cdev->skb_tx_headroom;
>> +    cdev->pad_page = alloc_page(GFP_KERNEL);
>> +    if (cdev->pad_page)
>> +        return -ENOMEM;
>> +    memset(page_address(cdev->pad_page), 0, PAGE_SIZE);
>> +    return 0;
>> +
>> +    /*
>> +    if (SKB_TX_HEADROOM>  (512 * MAX_SKB_FRAGS))
>> +        skb_extra_headroom = SKB_TX_HEADROOM;
>> +    pad_page = alloc_page(GFP_KERNEL);
>> +    if (!pad_page)
>> +        return -ENOMEM;
>> +    memset(page_address(pad_page), 0, PAGE_SIZE);
>> +    return 0;*/
> 
> Clean this up.
> 
>> +
>> +void cxgbi_release_itt(struct iscsi_task *task, itt_t hdr_itt)
>> +{
>> +    struct scsi_cmnd *sc = task->sc;
>> +    struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
>> +    struct cxgbi_conn *cconn = tcp_conn->dd_data;
>> +    struct cxgbi_device *cdev = cconn->chba->cdev;
>> +    struct cxgbi_tag_format *tformat =&cdev->tag_format;
>> +    u32 tag = ntohl((__force u32)hdr_itt);
>> +
>> +    cxgbi_tag_debug("release tag 0x%x.\n", tag);
>> +
>> +    if (sc&&
>> +        (scsi_bidi_cmnd(sc) ||
>> +         sc->sc_data_direction == DMA_FROM_DEVICE)&&
>> +            cxgbi_is_ddp_tag(tformat, tag))
> 
> The formatting is a little weird. I think you want each line to start in
> the same column instead of each getting tabbed over.
> 
> 
>> +
>> +
>> +int cxgbi_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
>> +{
>> +    struct scsi_cmnd *sc = task->sc;
>> +    struct iscsi_conn *conn = task->conn;
>> +    struct iscsi_session *sess = conn->session;
>> +    struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>> +    struct cxgbi_conn *cconn = tcp_conn->dd_data;
>> +    struct cxgbi_device *cdev = cconn->chba->cdev;
>> +    struct cxgbi_tag_format *tformat =&cdev->tag_format;
>> +    u32 sw_tag = (sess->age<<  cconn->task_idx_bits) | task->itt;
>> +    u32 tag;
>> +    int err = -EINVAL;
>> +
>> +    if (sc&&
>> +        (scsi_bidi_cmnd(sc) ||
>> +         sc->sc_data_direction == DMA_FROM_DEVICE)&&
>> +            cxgbi_sw_tag_usable(tformat, sw_tag)) {
> 
> 
> Same tabbing.
> 
> 
>> +    volatile unsigned int state;
> 
> I did not get why this needed to be volatile (I see it is like this in
> cxgb3i and I did not see why in there too).
> 
> 
> 
>> +
>> +static inline void cxgbi_sock_hold(struct cxgbi_sock *csk)
>> +{
>> +    atomic_inc(&csk->refcnt);
> 
> 
> We want people to use krefs instead of their own refcounting now.
> 
>> +
>> +static inline void *cplhdr(struct sk_buff *skb)
>> +{
>> +    return skb->data;
>> +}
> 
> 
> Seems kinda useless.

Hi Mike,

Thanks for the review, will send you the updated patch.

Regards
Rakesh Ranjan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQIcBAEBCgAGBQJL/guKAAoJEBqoHbxtDU4JFvQP/2uIWFmZo4nqcJ7xmEiBjnaq
BeUfjbRGaiAImEM/6OuMIB/Fm9p9NGtW0GsZdqKYSdZBmgO1kWNT1onCZgVF4eW1
Muikjo+I2wXGsWIt4ZqbZs722COC1oQHXpyulfh/6ONzeWJ3R8vos/TnCw256Wxj
HL42jIO882JCImzmP/wmdPmDlTUI/Z0UEYbu0djy20yzESo2NDHq1PYYopyXIboH
joJ/sZvK0jRYrRsDdq84XJ8CUv1yDE8m3NiMV8cV9JN1EwP/gzBrq0DXfFgpdEDU
+4/pfzzxJ5y/ekdZaSZdx9ke/n/vmamJCaQONaL2WfQaznogxqZypaoM/NRzfMlH
40KY5lqajGRnm0aEBffn8uqBzEbopYpb1m+3mzt/vDtvdBRuSJqmcPbUandTNHLe
cZQnn689GtWqJfoyVF/dfyubtAZFLu8VVZkFxx1e3UDEqVDUuwVEHXoXSt6K/SkB
UnZI7hMUYSof+WI+UEV8DR4KMsRbE1cnvXGnsXTZOMhMsU3KoMielaGTgepAbxT7
bLfaARiMwvU+vle5546uK8IuxjGH81nbUEy6R86OeS6Osv6PZFiGYP9yxjzkp4K7
NWOTnVV9qTaPPTtN9SORINPXUXuI90JgIDbh9qnfDDKl2oj5HmwBpa7S3KmMxxlI
eoAqa/9LLjrMVi5Wicv3
=TOBq
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: Anton Blanchard @ 2010-05-27  6:09 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20100526.222332.233688655.davem@davemloft.net>


> > [PATCH v2] net: fix lock_sock_bh/unlock_sock_bh
> > 
> > This new sock lock primitive was introduced to speedup some user context
> > socket manipulation. But it is unsafe to protect two threads, one using
> > regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
> > 
> > This patch changes lock_sock_bh to be careful against 'owned' state.
> > If owned is found to be set, we must take the slow path.
> > lock_sock_bh() now returns a boolean to say if the slow path was taken,
> > and this boolean is used at unlock_sock_bh time to call the appropriate
> > unlock function.
> > 
> > After this change, BH are either disabled or enabled during the
> > lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
> > so we rename these functions to lock_sock_fast()/unlock_sock_fast().
> > 
> > Reported-by: Anton Blanchard <anton@samba.org>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Looks good, I'll wait for positive testing from Anton before applying
> this.

Thanks guys, this fixed it.

Tested-by: Anton Blanchard <anton@samba.org>

Anton

^ permalink raw reply

* Re: [patch 1/2] be2net: add unlock on error path
From: Sarveshwar Bandi @ 2010-05-27  6:31 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sathya Perla, Subbu Seetharaman, Ajit Khaparde, David S. Miller,
	netdev, kernel-janitors
In-Reply-To: <20100526144634.GL22515@bicker>

Thanks

Acked-by: Sarveshwar Bandi <sarveshwarb@serverengines.com> 
On 26/05/10 16:46 +0200, Dan Carpenter wrote:
> The unlock accidentally got removed from the error path in dd131e76e5:
> "be2net: Bug fix to avoid disabling bottom half during firmware upgrade."
>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
> index c911bfb..18d5789 100644
> --- a/drivers/net/benet/be_cmds.c
> +++ b/drivers/net/benet/be_cmds.c
> @@ -1429,7 +1429,7 @@ int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
>  	wrb = wrb_from_mccq(adapter);
>  	if (!wrb) {
>  		status = -EBUSY;
> -		goto err;
> +		goto err_unlock;
>  	}
>  	req = cmd->va;
>  	sge = nonembedded_sgl(wrb);
> @@ -1457,7 +1457,10 @@ int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
>  	else
>  		status = adapter->flash_status;
>  
> -err:
> +	return status;
> +
> +err_unlock:
> +	spin_unlock_bh(&adapter->mcc_lock);
>  	return status;
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch 2/2] be2net: remove superfluous externs
From: Sarveshwar Bandi @ 2010-05-27  6:32 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sathya Perla, Subbu Seetharaman, Ajit Khaparde, David S. Miller,
	netdev, kernel-janitors
In-Reply-To: <20100526144739.GM22515@bicker>

Thanks.

Acked-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
On 26/05/10 16:47 +0200, Dan Carpenter wrote:
> This fixes some sparse warnings:
> drivers/net/benet/be_cmds.c:1503:12: warning: function
> 	'be_cmd_enable_magic_wol' with external linkage has definition
> drivers/net/benet/be_cmds.c:1668:12: warning: function
> 	'be_cmd_get_seeprom_data' with external linkage has definition
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
> index c911bfb..f9c0d4d 100644
> --- a/drivers/net/benet/be_cmds.c
> +++ b/drivers/net/benet/be_cmds.c
> @@ -1497,7 +1500,7 @@ err:
>  	return status;
>  }
>  
> -extern int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
> +int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
>  				struct be_dma_mem *nonemb_cmd)
>  {
>  	struct be_mcc_wrb *wrb;
> @@ -1662,7 +1665,7 @@ err:
>  	return status;
>  }
>  
> -extern int be_cmd_get_seeprom_data(struct be_adapter *adapter,
> +int be_cmd_get_seeprom_data(struct be_adapter *adapter,
>  				struct be_dma_mem *nonemb_cmd)
>  {
>  	struct be_mcc_wrb *wrb;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC] IFLA_PORT_* iproute2 cmd line
From: Arnd Bergmann @ 2010-05-27  7:00 UTC (permalink / raw)
  To: Scott Feldman
  Cc: netdev, Chris Wright, Stephen Hemminger, Stefan Berger,
	Dirk Herrendoerfer, Vivek Kashyap
In-Reply-To: <C8228316.35508%scofeldm@cisco.com>

On Wednesday 26 May 2010, Scott Feldman wrote:
> On 5/26/10 5:38 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> How does this strike you?
> 
>   Usage: ip link add link DEV [ name ] NAME
>                      [ txqueuelen PACKETS ]
>                      [ address LLADDR ]
>                      [ broadcast LLADDR ]
>                      [ mtu MTU ]
>                      type TYPE [ ARGS ]
>          ip link delete DEV type TYPE [ ARGS ]
> 
>          ip link set DEVICE [ { up | down } ]
>                             [ arp { on | off } ]
>                             [ dynamic { on | off } ]
>                             [ multicast { on | off } ]
>                             [ allmulticast { on | off } ]
>                             [ promisc { on | off } ]
>                             [ trailers { on | off } ]
>                             [ txqueuelen PACKETS ]
>                             [ name NEWNAME ]
>                             [ address LLADDR ]
>                             [ broadcast LLADDR ]
>                             [ mtu MTU ]
>                             [ netns PID ]
>                             [ alias NAME ]
> +                           [ virtualport MODE {PROFILE|VSI} ]
>                             [ vf NUM [ mac LLADDR ]
>                                      [ vlan VLANID [ qos VLAN-QOS ] ]
> -                                    [ rate TXRATE ] ]
> +                                    [ rate TXRATE ]
> +                                    [ virtualport MODE {PROFILE|VSI} ] ]
>          ip link show [ DEVICE ]
> 
>   TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can }
> +
> + MODE := { associate | preassociate | preassociaterr | disassociate }
> +
> + PROFILE := port-profile PORT-PROFILE
> +            [ instance-uuid INSTANCE-UUID ]
> +            [ host-uuid HOST-UUID ]
> + 
> + VSI := vsi managerid MGR typeid VTID typeidversion VER
> +        [ instance-uuid INSTANCE-UUID ]

Right, exactly what I was thinking of. I would probably use
some shorter keywords (virtual-port -> port, port-profile -> profile,
instance-uuid -> instance, host-uuid -> host), but I really
don't have a strong opionion on that, so I'f fine with
whatever you and others come up with in that regard.

> > The more interesting question is how to do this when we
> > talk to lldpad. One idea was to use the same protocol
> > but to direct the message to a specific pid (that of lldpad).
> > That would require adding an option like '-p PID' to ip
> > that lets us change who we talk to.
> 
> Let me get the user-to-kernel part working to establish the cmd line and you
> can follow up with alternative addressing schemes.

ok.

	Arnd

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Hagen Paul Pfeifer @ 2010-05-27  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.200443.232751390.davem@davemloft.net>

* David Miller | 2010-05-26 20:04:43 [-0700]:

>You're asking about a network level issue in terms of what can be done
>on a local end-node.

No, I *write* about network level issues, this is the important item in my
mind.  It is about network stability and network fairness. The lion share of
TCP algorithm are drafted to guarantee _network fairness and network stability_.

And by the way, the IETF (and our) paradigm is still to shift functionality to
end hosts - not into network core. "The Rise of the stupid network" [1] is
still a paradigm that is superior to the alternative where vendors put their
proprietary algorithms into the network and change the behavior in a
uncontrollable fashion.

>All an end-node can do is abide by congestion control rules and respond
>to packet drops, as has been going on for decades.

Right, and this will be reality for the next decades (at least for TCP;
maybe backed by ECN).

>People have basically (especially in Europe) given up on crazy crap
>like RSVP and other forms of bandwidth limiting and reservation.  They
>just oversubscribe their links, and increase their capacity as traffic
>increases dictate.  It just isn't all that manageable to put people's
>traffic into classes and control what they do on a large scale.
>
>I'm also skeptical about those who say the fight belongs squarely at
>the end nodes.  If you want to control the network traffic of the
>meeting point of your dumbbell, you'll need a machine there doing RED
>or traffic limiting.  End-host schemes simply aren't going to work
>because I can just add more end-hosts to reintroduce the problem.

I am not happy with this statement. This differs from the previous paragraph
where you complain about intelligent network components. Davem until these
days the routers do exactly this, they do RED/WRED whatever and signal to the
producer to reduce their bandwidth.

And this is the most important aspect in this email: core network components
rely on end hosts to behave in a fair manner. Disable Slow Start/Congestion
Avoidance and the network will instantly collapse (mmh, net-next? ;-)

The mechanism as proposed in the patch is not fair. There are a lot of
publications available that analyse the impact CWND in great detail as well as
several RFC that talk about the CWND.

>The dumbbell situation is independant of the end-node issues, that's
>all I'm really saying.

Davem, I know that you are a good guy and worries about fairness aspects
really well. I wrote this email to popularize fairness and network stability
aspects to the broad audience.

Hagen


[1] http://isen.com/stupid.html


>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Die Zensur ist das lebendige Gestaendnis der Grossen, dass sie 
nur verdummte Sklaven treten, aber keine freien Voelker regieren koennen.
- Johann Nepomuk Nestroy


^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-27  7:28 UTC (permalink / raw)
  To: hagen; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100527070827.GB2728@nuttenaction>

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Thu, 27 May 2010 09:08:27 +0200

> And by the way, the IETF (and our) paradigm is still to shift functionality to
> end hosts - not into network core. "The Rise of the stupid network" [1] is
> still a paradigm that is superior to the alternative where vendors put their
> proprietary algorithms into the network and change the behavior in a
> uncontrollable fashion.

Superior or not, it's simply never going to happen.  We are far beyond
being able to get to where we were before NAT'ing and shaping devices
started to get inserted everywhere on the network.

And I also don't see any of this stuff as fundamentally proprietary.

People want deep packet inspection, people want to control their user's
traffic.  And people, most importantly, are willing to pay for this.

Therefore, these elements will always be in the network.

Better to co-exist with them and use them to our advantage instead of
fantasizing about a utopia where they don't exist.

^ permalink raw reply

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: David Miller @ 2010-05-27  7:29 UTC (permalink / raw)
  To: anton; +Cc: eric.dumazet, netdev
In-Reply-To: <20100527060923.GD28295@kryten>

From: Anton Blanchard <anton@samba.org>
Date: Thu, 27 May 2010 16:09:23 +1000

> 
>> > [PATCH v2] net: fix lock_sock_bh/unlock_sock_bh
>> > 
>> > This new sock lock primitive was introduced to speedup some user context
>> > socket manipulation. But it is unsafe to protect two threads, one using
>> > regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
>> > 
>> > This patch changes lock_sock_bh to be careful against 'owned' state.
>> > If owned is found to be set, we must take the slow path.
>> > lock_sock_bh() now returns a boolean to say if the slow path was taken,
>> > and this boolean is used at unlock_sock_bh time to call the appropriate
>> > unlock function.
>> > 
>> > After this change, BH are either disabled or enabled during the
>> > lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
>> > so we rename these functions to lock_sock_fast()/unlock_sock_fast().
>> > 
>> > Reported-by: Anton Blanchard <anton@samba.org>
>> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> 
>> Looks good, I'll wait for positive testing from Anton before applying
>> this.
> 
> Thanks guys, this fixed it.
> 
> Tested-by: Anton Blanchard <anton@samba.org>

Thanks for testing Anton, I'll apply this now.

^ permalink raw reply

* Re: [PATCH 2/3] cxgb4i_v3: main driver files
From: Mike Christie @ 2010-05-27  7:38 UTC (permalink / raw)
  To: open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: Rakesh Ranjan, NETDEVML, SCSIDEVML, LKML, Karen Xie, David Miller,
	James Bottomley, Anish Bhatt, Rakesh Ranjan
In-Reply-To: <1273944249-311-3-git-send-email-rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
> From: Rakesh Ranjan<rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
>
>
> Signed-off-by: Rakesh Ranjan<rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> ---
>   drivers/scsi/cxgb4i/cxgb4i.h         |  101 ++
>   drivers/scsi/cxgb4i/cxgb4i_ddp.c     |  678 +++++++++++++
>   drivers/scsi/cxgb4i/cxgb4i_ddp.h     |  118 +++
>   drivers/scsi/cxgb4i/cxgb4i_offload.c | 1846 ++++++++++++++++++++++++++++++++++
>   drivers/scsi/cxgb4i/cxgb4i_offload.h |   91 ++
>   drivers/scsi/cxgb4i/cxgb4i_snic.c    |  260 +++++
>   6 files changed, 3094 insertions(+), 0 deletions(-)
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i.h
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.c
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.h
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.c
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.h
>   create mode 100644 drivers/scsi/cxgb4i/cxgb4i_snic.c



Got some whitespace errors when applying.

warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.



> +#define	CXGB4I_SCSI_HOST_QDEPTH	1024
> +#define	CXGB4I_MAX_TARGET	CXGB4I_MAX_CONN
> +#define	CXGB4I_MAX_LUN		512

Is the max lun right? It seems kinda small for modern drivers.



> +
> +static inline void cxgb4i_ddp_ulp_mem_io_set_hdr(struct ulp_mem_io *req,


If you build cxgb3i and cxgb4i in the kernel at the same time, will you 
get problems if each driver has structs that are named the same?


> +
> +static inline int cxgb4i_ddp_find_unused_entries(struct cxgb4i_ddp_info *ddp,
> +					unsigned int start, unsigned int max,
> +					unsigned int count,
> +					struct cxgbi_gather_list *gl)
> +{
> +	unsigned int i, j, k;
> +
> +	/*  not enough entries */
> +	if ((max - start)<  count)
> +		return -EBUSY;
> +
> +	max -= count;
> +	spin_lock(&ddp->map_lock);
> +	for (i = start; i<  max;) {
> +		for (j = 0, k = i; j<  count; j++, k++) {
> +			if (ddp->gl_map[k])
> +				break;
> +		}
> +		if (j == count) {
> +			for (j = 0, k = i; j<  count; j++, k++)
> +				ddp->gl_map[k] = gl;
> +			spin_unlock(&ddp->map_lock);
> +			return i;
> +		}


Is there a more efficient bitmap or some sort of common map operation 
for this (I thought we found something when doing cxgb3i but forgot to 
add it or were testing a patch)?



> +		i += j + 1;
> +	}
> +	spin_unlock(&ddp->map_lock);
> +	return -EBUSY;
> +}
> +
> +static inline void cxgb4i_ddp_unmark_entries(struct cxgb4i_ddp_info *ddp,
> +							int start, int count)
> +{
> +	spin_lock(&ddp->map_lock);
> +	memset(&ddp->gl_map[start], 0,
> +			count * sizeof(struct cxgbi_gather_list *));

extra tab.



> +static void __cxgb4i_ddp_init(struct cxgb4i_snic *snic)
> +{
> +	struct cxgb4i_ddp_info *ddp = snic->ddp;
> +	unsigned int ppmax, bits, tagmask, pgsz_factor[4];
> +	int i;
> +
> +	if (ddp) {
> +		kref_get(&ddp->refcnt);
> +		cxgbi_log_warn("snic 0x%p, ddp 0x%p already set up\n",
> +				snic, snic->ddp);
> +		return;
> +	}
> +
> +	sw_tag_idx_bits = (__ilog2_u32(ISCSI_ITT_MASK)) + 1;
> +	sw_tag_age_bits = (__ilog2_u32(ISCSI_AGE_MASK)) + 1;
> +	snic->cdev.tag_format.sw_bits = sw_tag_idx_bits + sw_tag_age_bits;
> +
> +	cxgbi_log_info("tag itt 0x%x, %u bits, age 0x%x, %u bits\n",
> +			ISCSI_ITT_MASK, sw_tag_idx_bits,
> +			ISCSI_AGE_MASK, sw_tag_age_bits);
> +
> +	ppmax = (snic->lldi.vr->iscsi.size>>  PPOD_SIZE_SHIFT);
> +	bits = __ilog2_u32(ppmax) + 1;
> +	if (bits>  PPOD_IDX_MAX_SIZE)
> +		bits = PPOD_IDX_MAX_SIZE;
> +	ppmax = (1<<  (bits - 1)) - 1;
> +
> +	ddp = cxgbi_alloc_big_mem(sizeof(struct cxgb4i_ddp_info) +
> +			ppmax * (sizeof(struct cxgbi_gather_list *) +
> +				sizeof(struct sk_buff *)),
> +				GFP_KERNEL);
> +	if (!ddp) {
> +		cxgbi_log_warn("snic 0x%p unable to alloc ddp 0x%d, "
> +			       "ddp disabled\n", snic, ppmax);
> +		return;
> +	}
> +
> +	ddp->gl_map = (struct cxgbi_gather_list **)(ddp + 1);
> +	spin_lock_init(&ddp->map_lock);
> +	kref_init(&ddp->refcnt);
> +
> +	ddp->snic = snic;
> +	ddp->pdev = snic->lldi.pdev;
> +	ddp->max_txsz = min_t(unsigned int,
> +				snic->lldi.iscsi_iolen,
> +				ULP2_MAX_PKT_SIZE);
> +	ddp->max_rxsz = min_t(unsigned int,
> +				snic->lldi.iscsi_iolen,
> +				ULP2_MAX_PKT_SIZE);
> +	ddp->llimit = snic->lldi.vr->iscsi.start;
> +	ddp->ulimit = ddp->llimit + snic->lldi.vr->iscsi.size;
> +	ddp->nppods = ppmax;
> +	ddp->idx_last = ppmax;
> +	ddp->idx_bits = bits;
> +	ddp->idx_mask = (1<<  bits) - 1;
> +	ddp->rsvd_tag_mask = (1<<  (bits + PPOD_IDX_SHIFT)) - 1;
> +
> +	tagmask = ddp->idx_mask<<  PPOD_IDX_SHIFT;
> +	for (i = 0; i<  DDP_PGIDX_MAX; i++)
> +		pgsz_factor[i] = ddp_page_order[i];
> +
> +	cxgb4_iscsi_init(snic->lldi.ports[0], tagmask, pgsz_factor);
> +	snic->ddp = ddp;
> +
> +	snic->cdev.tag_format.rsvd_bits = ddp->idx_bits;
> +	snic->cdev.tag_format.rsvd_shift = PPOD_IDX_SHIFT;
> +	snic->cdev.tag_format.rsvd_mask =
> +		((1<<  snic->cdev.tag_format.rsvd_bits) - 1);
> +
> +	cxgbi_log_info("tag format: sw %u, rsvd %u,%u, mask 0x%x.\n",
> +			snic->cdev.tag_format.sw_bits,
> +			snic->cdev.tag_format.rsvd_bits,
> +			snic->cdev.tag_format.rsvd_shift,
> +			snic->cdev.tag_format.rsvd_mask);
> +
> +	snic->tx_max_size = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
> +				ddp->max_txsz - ISCSI_PDU_NONPAYLOAD_LEN);
> +	snic->rx_max_size = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
> +				ddp->max_rxsz - ISCSI_PDU_NONPAYLOAD_LEN);
> +
> +	cxgbi_log_info("max payload size: %u/%u, %u/%u.\n",
> +			snic->tx_max_size, ddp->max_txsz,
> +			snic->rx_max_size, ddp->max_rxsz);
> +
> +	cxgbi_log_info("snic 0x%p, nppods %u, bits %u, mask 0x%x,0x%x "
> +			"pkt %u/%u, %u/%u\n",
> +			snic, ppmax, ddp->idx_bits, ddp->idx_mask,
> +			ddp->rsvd_tag_mask, ddp->max_txsz,
> +			snic->lldi.iscsi_iolen,
> +			ddp->max_rxsz, snic->lldi.iscsi_iolen);
> +
> +	return;


Don't need "return".





> +static void cxgbi_conn_pdu_ready(struct cxgbi_sock *csk)
> +{


Were you going to put this in the libcxgbi but later decide it was a 
little too different? If you are leaving it here could you add a cxgb4i 
prefix so the naming is consistent and avoid confusion with your lib 
functions?



cxgb4i_find_best_mtu looks like it could go in your lib. Looks like 
find_best_mtu from cxgb3i_offload.c. Same with select_mss and compute_wscale


> +	struct sk_buff *skb;
> +	unsigned int read = 0;
> +	struct iscsi_conn *conn = csk->user_data;
> +	int err = 0;
> +
> +	cxgbi_rx_debug("csk 0x%p.\n", csk);
> +
> +	read_lock(&csk->callback_lock);
> +	if (unlikely(!conn || conn->suspend_rx)) {
> +		cxgbi_rx_debug("conn 0x%p, id %d, suspend_rx %lu!\n",
> +				conn, conn ? conn->id : 0xFF,
> +				conn ? conn->suspend_rx : 0xFF);
> +		read_unlock(&csk->callback_lock);
> +		return;
> +	}
> +	skb = skb_peek(&csk->receive_queue);
> +	while (!err&&  skb) {
> +		__skb_unlink(skb,&csk->receive_queue);
> +		read += cxgb4i_skb_rx_pdulen(skb);
> +		cxgbi_rx_debug("conn 0x%p, csk 0x%p, rx skb 0x%p, pdulen %u\n",
> +				conn, csk, skb, cxgb4i_skb_rx_pdulen(skb));
> +		if (cxgb4i_skb_flags(skb)&  CXGB4I_SKCB_FLAG_HDR_RCVD)
> +			err = cxgbi_conn_read_bhs_pdu_skb(conn, skb);
> +		else if (cxgb4i_skb_flags(skb) == CXGB4I_SKCB_FLAG_DATA_RCVD)
> +			err = cxgbi_conn_read_data_pdu_skb(conn, skb);
> +		__kfree_skb(skb);
> +		skb = skb_peek(&csk->receive_queue);
> +	}
> +	read_unlock(&csk->callback_lock);
> +	csk->copied_seq += read;
> +	cxgb4i_sock_rx_credits(csk, read);
> +	conn->rxdata_octets += read;
> +
> +	if (err) {
> +		cxgbi_log_info("conn 0x%p rx failed err %d.\n", conn, err);
> +		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
> +	}
> +}





> +
> +static inline void cxgb4i_sock_free_wr_skb(struct sk_buff *skb)
> +{
> +	kfree_skb(skb);
> +}


I think adding wrappers around skb functions in a net driver is not so 
useful.


> +
> +static inline struct sk_buff *cxgb4i_sock_dequeue_wr(struct cxgbi_sock *csk)
> +{
> +	struct sk_buff *skb = csk->wr_pending_head;
> +
> +	if (likely(skb)) {
> +		csk->wr_pending_head = cxgb4i_skb_tx_wr_next(skb);
> +		cxgb4i_skb_tx_wr_next(skb) = NULL;
> +	}
> +	return skb;
> +}
> +
> +static void cxgb4i_sock_purge_wr_queue(struct cxgbi_sock *csk)
> +{
> +	struct sk_buff *skb;
> +
> +	while ((skb = cxgb4i_sock_dequeue_wr(csk)) != NULL)
> +		cxgb4i_sock_free_wr_skb(skb);
> +}
> +
> +/*

I think this is supposed to be

/**


> +static int cxgb4i_sock_push_tx_frames(struct cxgbi_sock *csk,
> +						int req_completion)
> +{
> +	int total_size = 0;
> +	struct sk_buff *skb;
> +	struct cxgb4i_snic *snic;
> +
> +	if (unlikely(csk->state == CXGBI_CSK_ST_CONNECTING ||
> +				csk->state == CXGBI_CSK_ST_CLOSE_WAIT_1 ||
> +				csk->state>= CXGBI_CSK_ST_ABORTING)) {
> +		cxgbi_tx_debug("csk 0x%p, in closing state %u.\n",
> +				csk, csk->state);
> +		return 0;
> +	}
> +
> +	snic = cxgb4i_get_snic(csk->cdev);
> +
> +	while (csk->wr_cred
> +			&&  (skb = skb_peek(&csk->write_queue)) != NULL) {


The && should be on the right

while (csk->wr_cred &&

> +
> +static int cxgb4i_cpl_act_open_rpl(struct cxgb4i_snic *snic,
> +						struct sk_buff *skb)
> +{
> +	struct cxgbi_sock *csk;
> +	struct cpl_act_open_rpl *rpl = cplhdr(skb);
> +	unsigned int atid =
> +		GET_TID_TID(GET_AOPEN_ATID(be32_to_cpu(rpl->atid_status)));
> +	struct tid_info *t = snic->lldi.tids;
> +	unsigned int status = GET_AOPEN_STATUS(be32_to_cpu(rpl->atid_status));
> +
> +	csk = lookup_atid(t, atid);
> +
> +	if (unlikely(!csk)) {
> +		cxgbi_log_error("can't find connection for tid %u\n", atid);
> +		return CPL_RET_UNKNOWN_TID;
> +	}
> +
> +	cxgbi_sock_hold(csk);
> +	spin_lock_bh(&csk->lock);
> +
> +	cxgbi_conn_debug("rcv, status 0x%x, csk 0x%p, csk->state %u, "
> +			"csk->flag 0x%lx, csk->atid %u.\n",
> +			status, csk, csk->state, csk->flags, csk->hwtid);
> +
> +	if (status&  act_open_has_tid(status))
> +		cxgb4_remove_tid(snic->lldi.tids, csk->port_id, GET_TID(rpl));
> +
> +	if (status == CPL_ERR_CONN_EXIST&&
> +			csk->retry_timer.function !=
> +			cxgb4i_sock_act_open_retry_timer) {


I do not mean to nit pick on silly coding style stuff. I think it is 
easier to read lines like this:

if (status == CPL_ERR_CONN_EXIST&&
     csk->retry_timer.function != cxgb4i_sock_act_open_retry_timer) {


> +		csk->retry_timer.function = cxgb4i_sock_act_open_retry_timer;
> +		if (!mod_timer(&csk->retry_timer, jiffies + HZ / 2))
> +			cxgbi_sock_hold(csk);


There is no del_timer/del_timer_sync + cxgbi_sock_put for this timer. If 
something cleans this csk up, what makes sure the timer gets stopped ok.

> +
> +static int cxgb4i_alloc_cpl_skbs(struct cxgbi_sock *csk)
> +{
> +	csk->cpl_close = alloc_skb(sizeof(struct cpl_close_con_req),
> +					GFP_KERNEL);
> +	if (!csk->cpl_close)
> +		return -ENOMEM;
> +	skb_put(csk->cpl_close, sizeof(struct cpl_close_con_req));
> +
> +	csk->cpl_abort_req = alloc_skb(sizeof(struct cpl_abort_req),
> +					GFP_KERNEL);
> +	if (!csk->cpl_abort_req)
> +		goto free_cpl_skbs;
> +	skb_put(csk->cpl_abort_req, sizeof(struct cpl_abort_req));
> +
> +	csk->cpl_abort_rpl = alloc_skb(sizeof(struct cpl_abort_rpl),
> +					GFP_KERNEL);


These should be GFP_NOIO in case we call them to relogin on a disk that 
has data that would have been needed to be written out to free up mem.

> +
> +struct cxgbi_sock *cxgb4i_sock_create(struct cxgb4i_snic *snic)
> +{
> +	struct cxgbi_sock *csk = NULL;
> +
> +	csk = kzalloc(sizeof(*csk), GFP_KERNEL);

Same as above.

> +
> +static int is_cxgb4_dev(struct net_device *dev, struct cxgb4i_snic *snic)
> +{
> +	struct net_device *ndev = dev;
> +	int i;
> +
> +	if (dev->priv_flags&  IFF_802_1Q_VLAN)
> +		ndev = vlan_dev_real_dev(dev);
> +
> +	for (i = 0; i<  snic->lldi.nports; i++) {
> +		if (ndev == snic->lldi.ports[i])
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static struct net_device *cxgb4i_find_egress_dev(struct net_device *root_dev,
> +						struct cxgb4i_snic *snic)
> +{
> +	while (root_dev) {
> +		if (root_dev->priv_flags&  IFF_802_1Q_VLAN)
> +			root_dev = vlan_dev_real_dev(root_dev);
> +		else if (is_cxgb4_dev(root_dev, snic))
> +			return root_dev;
> +		else
> +			return NULL;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct rtable *find_route(struct net_device *dev,
> +				__be32 saddr, __be32 daddr,
> +				__be16 sport, __be16 dport,
> +				u8 tos)
> +{
> +	struct rtable *rt;
> +	struct flowi fl = {
> +		.oif = dev ? dev->ifindex : 0,
> +		.nl_u = {
> +			.ip4_u = {
> +				.daddr = daddr,
> +				.saddr = saddr,
> +				.tos = tos }
> +			},
> +		.proto = IPPROTO_TCP,
> +		.uli_u = {
> +			.ports = {
> +				.sport = sport,
> +				.dport = dport }
> +			}
> +	};
> +
> +	if (ip_route_output_flow(dev ? dev_net(dev) :&init_net,
> +					&rt,&fl, NULL, 0))
> +		return NULL;
> +
> +	return rt;
> +}
> +


Those functions above look like the cxgb3i ones. Could they be in your lib?



> +static int cxgb4i_init_act_open(struct cxgbi_sock *csk,
> +					struct net_device *dev)
> +{
> +	struct dst_entry *dst = csk->dst;
> +	struct sk_buff *skb;
> +	struct port_info *pi = netdev_priv(dev);
> +
> +	cxgbi_conn_debug("csk 0x%p, state %u, flags 0x%lx\n",
> +			csk, csk->state, csk->flags);
> +
> +	csk->atid = cxgb4_alloc_atid(cxgb4i_get_snic(csk->cdev)->lldi.tids,
> +					csk);
> +	if (csk->atid == -1) {
> +		cxgbi_log_error("cannot alloc atid\n");
> +		goto out_err;
> +	}
> +
> +	csk->l2t = cxgb4_l2t_get(cxgb4i_get_snic(csk->cdev)->lldi.l2t,
> +				csk->dst->neighbour, dev, 0);
> +	if (!csk->l2t) {
> +		cxgbi_log_error("cannot alloc l2t\n");
> +		goto free_atid;
> +	}
> +
> +	skb = alloc_skb(sizeof(struct cpl_act_open_req), GFP_KERNEL);
> +	if (!skb)


Should be NOIO too.

> +		goto free_l2t;
> +
> +	skb->sk = (struct sock *)csk;
> +	t4_set_arp_err_handler(skb, csk, cxgb4i_act_open_req_arp_failure);
> +
> +	cxgbi_sock_hold(csk);
> +
> +	csk->wr_max_cred = csk->wr_cred =
> +		cxgb4i_get_snic(csk->cdev)->lldi.wr_cred;
> +	csk->port_id = pi->port_id;
> +	csk->rss_qid = cxgb4i_get_snic(csk->cdev)->lldi.rxq_ids[csk->port_id];
> +	csk->tx_chan = pi->tx_chan;
> +	csk->smac_idx = csk->tx_chan<<  1;
> +	csk->wr_una_cred = 0;
> +	csk->mss_idx = cxgb4i_select_mss(csk, dst_mtu(dst));
> +	csk->err = 0;
> +
> +	cxgb4i_sock_reset_wr_list(csk);
> +
> +	cxgb4i_sock_make_act_open_req(csk, skb,
> +					((csk->rss_qid<<  14) |
> +					 (csk->atid)), csk->l2t);
> +	cxgb4_l2t_send(cxgb4i_get_snic(csk->cdev)->lldi.ports[csk->port_id],
> +					skb, csk->l2t);
> +	return 0;
> +
> +free_l2t:
> +	cxgb4_l2t_release(csk->l2t);
> +
> +free_atid:
> +	cxgb4_free_atid(cxgb4i_get_snic(csk->cdev)->lldi.tids, csk->atid);
> +
> +out_err:
> +
> +	return -EINVAL;;
> +}
> +
> +static struct net_device *cxgb4i_find_dev(struct net_device *dev,
> +							__be32 ipaddr)
> +{
> +	struct flowi fl;
> +	struct rtable *rt;
> +	int err;
> +
> +	memset(&fl, 0, sizeof(fl));
> +	fl.nl_u.ip4_u.daddr = ipaddr;
> +
> +	err = ip_route_output_key(dev ? dev_net(dev) :&init_net,&rt,&fl);
> +	if (!err)
> +		return (&rt->u.dst)->dev;
> +
> +	return NULL;
> +}
> +


Looks like cxgb3i one.


> +int cxgb4i_sock_connect(struct net_device *dev, struct cxgbi_sock *csk,
> +						struct sockaddr_in *sin)
> +{
> +	struct rtable *rt;
> +	__be32 sipv4 = 0;
> +	struct net_device *dstdev;
> +	struct cxgbi_hba *chba = NULL;
> +	int err;
> +
> +	cxgbi_conn_debug("csk 0x%p, dev 0x%p\n", csk, dev);
> +
> +	if (sin->sin_family != AF_INET)
> +		return -EAFNOSUPPORT;
> +
> +	csk->daddr.sin_port = sin->sin_port;
> +	csk->daddr.sin_addr.s_addr = sin->sin_addr.s_addr;
> +
> +	dstdev = cxgb4i_find_dev(dev, sin->sin_addr.s_addr);
> +	if (!dstdev || !is_cxgb4_dev(dstdev, cxgb4i_get_snic(csk->cdev)))
> +		return -ENETUNREACH;
> +
> +	if (dstdev->priv_flags&  IFF_802_1Q_VLAN)
> +		dev = dstdev;
> +
> +	rt = find_route(dev, csk->saddr.sin_addr.s_addr,
> +			csk->daddr.sin_addr.s_addr,
> +			csk->saddr.sin_port,
> +			csk->daddr.sin_port,
> +			0);
> +	if (rt == NULL) {
> +		cxgbi_conn_debug("no route to %pI4, port %u, dev %s, "
> +					"snic 0x%p\n",
> +					&csk->daddr.sin_addr.s_addr,
> +					ntohs(csk->daddr.sin_port),
> +					dev ? dev->name : "any",
> +					csk->snic);
> +		return -ENETUNREACH;
> +	}
> +
> +	if (rt->rt_flags&  (RTCF_MULTICAST | RTCF_BROADCAST)) {
> +		cxgbi_conn_debug("multi-cast route to %pI4, port %u, "
> +					"dev %s, snic 0x%p\n",
> +					&csk->daddr.sin_addr.s_addr,
> +					ntohs(csk->daddr.sin_port),
> +					dev ? dev->name : "any",
> +					csk->snic);
> +		ip_rt_put(rt);
> +		return -ENETUNREACH;
> +	}
> +
> +	if (!csk->saddr.sin_addr.s_addr)
> +		csk->saddr.sin_addr.s_addr = rt->rt_src;
> +
> +	csk->dst =&rt->u.dst;
> +
> +	dev = cxgb4i_find_egress_dev(csk->dst->dev,
> +					cxgb4i_get_snic(csk->cdev));
> +	if (dev == NULL) {
> +		cxgbi_conn_debug("csk: 0x%p, egress dev NULL\n", csk);
> +		return -ENETUNREACH;
> +	}
> +
> +	err = cxgbi_sock_get_port(csk);
> +	if (err)
> +		return err;
> +
> +	cxgbi_conn_debug("csk: 0x%p get port: %u\n",
> +			csk, ntohs(csk->saddr.sin_port));
> +
> +	chba = cxgb4i_hba_find_by_netdev(csk->dst->dev);
> +
> +	sipv4 = cxgb4i_get_iscsi_ipv4(chba);
> +	if (!sipv4) {
> +		cxgbi_conn_debug("csk: 0x%p, iscsi is not configured\n", csk);
> +		sipv4 = csk->saddr.sin_addr.s_addr;
> +		cxgb4i_set_iscsi_ipv4(chba, sipv4);
> +	} else
> +		csk->saddr.sin_addr.s_addr = sipv4;
> +
> +	cxgbi_conn_debug("csk: 0x%p, %pI4:[%u], %pI4:[%u] SYN_SENT\n",
> +				csk,
> +				&csk->saddr.sin_addr.s_addr,
> +				ntohs(csk->saddr.sin_port),
> +				&csk->daddr.sin_addr.s_addr,
> +				ntohs(csk->daddr.sin_port));
> +
> +	cxgbi_sock_set_state(csk, CXGBI_CSK_ST_CONNECTING);
> +
> +	if (!cxgb4i_init_act_open(csk, dev))
> +		return 0;
> +
> +	err = -ENOTSUPP;
> +
> +	cxgbi_conn_debug("csk 0x%p ->  closed\n", csk);
> +	cxgbi_sock_set_state(csk, CXGBI_CSK_ST_CLOSED);
> +	ip_rt_put(rt);
> +	cxgbi_sock_put_port(csk);
> +
> +	return err;
> +}
> +
> +void cxgb4i_sock_rx_credits(struct cxgbi_sock *csk, int copied)
> +{
> +	int must_send;
> +	u32 credits;
> +
> +	if (csk->state != CXGBI_CSK_ST_ESTABLISHED)
> +		return;
> +
> +	credits = csk->copied_seq - csk->rcv_wup;
> +	if (unlikely(!credits))
> +		return;
> +
> +	if (unlikely(cxgb4i_rx_credit_thres == 0))
> +		return;
> +
> +	must_send = credits + 16384>= cxgb4i_rcv_win;
> +
> +	if (must_send || credits>= cxgb4i_rx_credit_thres)
> +		csk->rcv_wup += cxgb4i_csk_send_rx_credits(csk, credits);
> +}
> +
> +int cxgb4i_sock_send_pdus(struct cxgbi_sock *csk, struct sk_buff *skb)
> +{
> +	struct sk_buff *next;
> +	int err, copied = 0;
> +
> +	spin_lock_bh(&csk->lock);
> +
> +	if (csk->state != CXGBI_CSK_ST_ESTABLISHED) {
> +		cxgbi_tx_debug("csk 0x%p, not in est. state %u.\n",
> +			      csk, csk->state);
> +		err = -EAGAIN;
> +		goto out_err;
> +	}
> +
> +	if (csk->err) {
> +		cxgbi_tx_debug("csk 0x%p, err %d.\n", csk, csk->err);
> +		err = -EPIPE;
> +		goto out_err;
> +	}
> +
> +	if (csk->write_seq - csk->snd_una>= cxgb4i_snd_win) {
> +		cxgbi_tx_debug("csk 0x%p, snd %u - %u>  %u.\n",
> +				csk, csk->write_seq, csk->snd_una,
> +				cxgb4i_snd_win);
> +		err = -ENOBUFS;
> +		goto out_err;
> +	}
> +
> +	while (skb) {
> +		int frags = skb_shinfo(skb)->nr_frags +
> +				(skb->len != skb->data_len);
> +
> +		if (unlikely(skb_headroom(skb)<  CXGB4I_TX_HEADER_LEN)) {
> +			cxgbi_tx_debug("csk 0x%p, skb head.\n", csk);
> +			err = -EINVAL;
> +			goto out_err;
> +		}
> +
> +		if (frags>= SKB_WR_LIST_SIZE) {
> +			cxgbi_log_error("csk 0x%p, tx frags %d, len %u,%u.\n",
> +					 csk, skb_shinfo(skb)->nr_frags,
> +					 skb->len, skb->data_len);
> +			err = -EINVAL;
> +			goto out_err;
> +		}
> +
> +		next = skb->next;
> +		skb->next = NULL;
> +		cxgb4i_sock_skb_entail(csk, skb,
> +				CXGB4I_SKCB_FLAG_NO_APPEND |
> +				CXGB4I_SKCB_FLAG_NEED_HDR);
> +		copied += skb->len;
> +		csk->write_seq += skb->len + ulp_extra_len(skb);
> +		skb = next;
> +	}
> +done:
> +	if (likely(skb_queue_len(&csk->write_queue)))
> +		cxgb4i_sock_push_tx_frames(csk, 1);
> +	spin_unlock_bh(&csk->lock);
> +	return copied;
> +
> +out_err:
> +	if (copied == 0&&  err == -EPIPE)
> +		copied = csk->err ? csk->err : -EPIPE;
> +	else
> +		copied = err;
> +	goto done;
> +}

Looks similar to cxgb3i one.


> +
> +static void cxgbi_sock_conn_closing(struct cxgbi_sock *csk)
> +{

Was this going to the lib? Looks like the cxgb3i one. If not then rename 
it to avoid confusion.


> +
> +struct cxgbi_hba *cxgb4i_hba_find_by_netdev(struct net_device *dev)
> +{
> +	int i;
> +	struct cxgb4i_snic *snic = NULL;;
> +
> +	if (dev->priv_flags&  IFF_802_1Q_VLAN)
> +		dev = vlan_dev_real_dev(dev);
> +
> +	mutex_lock(&snic_rwlock);
> +	list_for_each_entry(snic,&snic_list, list_head) {
> +		for (i = 0; i<  snic->hba_cnt; i++) {
> +			if (snic->hba[i]->ndev == dev) {
> +				mutex_unlock(&snic_rwlock);
> +				return snic->hba[i];
> +			}
> +		}
> +	}
> +	mutex_unlock(&snic_rwlock);
> +	return NULL;


Looks like cxgb3i_hba_find_by_netdev.

> +}
> +
> +struct cxgb4i_snic *cxgb4i_find_snic(struct net_device *dev, __be32 ipaddr)
> +{
> +	struct flowi fl;
> +	struct rtable *rt;
> +	struct net_device *sdev = NULL;
> +	struct cxgb4i_snic *snic = NULL, *tmp;
> +	int err, i;
> +
> +	memset(&fl, 0, sizeof(fl));
> +	fl.nl_u.ip4_u.daddr = ipaddr;
> +
> +	err = ip_route_output_key(dev ? dev_net(dev) :&init_net,&rt,&fl);
> +	if (err)
> +		goto out;
> +
> +	sdev = (&rt->u.dst)->dev;
> +	mutex_lock(&snic_rwlock);
> +	list_for_each_entry_safe(snic, tmp,&snic_list, list_head) {
> +		if (snic) {
> +			for (i = 0; i<  snic->lldi.nports; i++) {
> +				if (sdev == snic->lldi.ports[i]) {
> +					mutex_unlock(&snic_rwlock);
> +					return snic;
> +				}
> +			}
> +		}
> +	}
> +	mutex_unlock(&snic_rwlock);
> +
> +out:
> +	snic = NULL;
> +	return snic;


you can just do return NULL


> +}
> +
> +void cxgb4i_snic_add(struct list_head *list_head)
> +{
> +	mutex_lock(&snic_rwlock);
> +	list_add_tail(list_head,&snic_list);
> +	mutex_unlock(&snic_rwlock);
> +}
> +
> +struct cxgb4i_snic *cxgb4i_snic_init(const struct cxgb4_lld_info *linfo)
> +{
> +	struct cxgb4i_snic *snic;
> +	int i;
> +
> +	snic = kzalloc(sizeof(*snic), GFP_KERNEL);
> +	if (snic) {
> +

extra newline

> +		spin_lock_init(&snic->lock);
> +		snic->lldi = *linfo;
> +		snic->hba_cnt = snic->lldi.nports;
> +		snic->cdev.dd_data = snic;
> +		snic->cdev.pdev = snic->lldi.pdev;
> +		snic->cdev.skb_tx_headroom = SKB_MAX_HEAD(CXGB4I_TX_HEADER_LEN);
> +
> +		cxgb4i_iscsi_init();
> +		cxgbi_pdu_init(&snic->cdev);
> +		cxgb4i_ddp_init(snic);
> +		cxgb4i_ofld_init(snic);
> +
> +		for (i = 0; i<  snic->hba_cnt; i++) {
> +			snic->hba[i] = cxgb4i_hba_add(snic,
> +						snic->lldi.ports[i]);
> +			if (!snic->hba[i]) {
> +				kfree(snic);
> +				snic = ERR_PTR(-ENOMEM);
> +				goto out;
> +			}
> +		}
> +		cxgb4i_snic_add(&snic->list_head);
> +	} else
> +out :
> +	snic = ERR_PTR(-ENOMEM);
> +
> +	return snic;


I think xgb4i_uld_add is not checking for PTR_ERR/IS_ERR.


> +}
> +
> +void cxgb4i_snic_cleanup(void)
> +{
> +	struct cxgb4i_snic *snic, *tmp;
> +	int i;
> +
> +	mutex_lock(&snic_rwlock);
> +	list_for_each_entry_safe(snic, tmp,&snic_list, list_head) {
> +		list_del(&snic->list_head);
> +
> +		for (i = 0; i<  snic->hba_cnt; i++) {
> +			if (snic->hba[i]) {
> +				cxgb4i_hba_remove(snic->hba[i]);
> +				snic->hba[i] = NULL;
> +			}
> +		}
> +		cxgb4i_ofld_cleanup(snic);
> +		cxgb4i_ddp_cleanup(snic);
> +		cxgbi_pdu_cleanup(&snic->cdev);
> +		cxgbi_log_info("snic 0x%p, %u scsi hosts removed.\n",
> +				snic, snic->hba_cnt);
> +
> +		kfree(snic);
> +	}
> +	mutex_unlock(&snic_rwlock);
> +	cxgb4i_iscsi_cleanup();
> +}
> +
> +static void *cxgb4i_uld_add(const struct cxgb4_lld_info *linfo)
> +{
> +	struct cxgb4i_snic *snic;
> +
> +	cxgbi_log_info("%s", version);
> +
> +	snic = cxgb4i_snic_init(linfo);

you can just do

return cxgb4i_snic_init(linfo);

and then delete everything below.



> +	if (!snic)
> +		goto out;
> +out:
> +	return snic;
> +}
> +
> +static int cxgb4i_uld_rx_handler(void *handle, const __be64 *rsp,
> +				const struct pkt_gl *pgl)
> +{
> +	struct cxgb4i_snic *snic = handle;
> +	struct sk_buff *skb;
> +	const struct cpl_act_establish *rpl;
> +	unsigned int opcode;
> +
> +	if (pgl == NULL) {
> +		unsigned int len = 64 - sizeof(struct rsp_ctrl) - 8;
> +
> +		skb = alloc_skb(256, GFP_ATOMIC);
> +		if (!skb)
> +			goto nomem;
> +		__skb_put(skb, len);
> +		skb_copy_to_linear_data(skb,&rsp[1], len);
> +
> +	} else if (pgl == CXGB4_MSG_AN) {
> +

don't need extra {} and newlines.

> +		return 0;
> +
> +	} else {
> +

extra newline

> +		skb = cxgb4_pktgl_to_skb(pgl, 256, 256);
> +		if (unlikely(!skb))
> +			goto nomem;
> +	}
> +
> +	rpl = cplhdr(skb);
> +	opcode = rpl->ot.opcode;
> +
> +	cxgbi_api_debug("snic %p, opcode 0x%x, skb %p\n",
> +			 snic, opcode, skb);
> +
> +	BUG_ON(!snic->handlers[opcode]);
> +
> +	if (snic->handlers[opcode]) {

extra brackets

> +		snic->handlers[opcode](snic, skb);
> +	} else
> +		cxgbi_log_error("No handler for opcode 0x%x\n",
> +				opcode);
> +
> +	return 0;
> +
> +nomem:
> +	cxgbi_api_debug("OOM bailing out\n");
> +	return 1;
> +}
> +
> +static int cxgb4i_uld_state_change(void *handle, enum cxgb4_state state)
> +{
> +	return 0;
> +}
> +
> +static int __init cxgb4i_init_module(void)
> +{
> +	cxgb4_register_uld(CXGB4_ULD_ISCSI,&cxgb4i_uld_info);
> +

extra newline

> +	return 0;
> +}
> +
> +static void __exit cxgb4i_exit_module(void)
> +{
> +

extra newline


> +	cxgb4_unregister_uld(CXGB4_ULD_ISCSI);
> +	cxgb4i_snic_cleanup();
> +}
> +
> +module_init(cxgb4i_init_module);
> +module_exit(cxgb4i_exit_module);
> +

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply

* Re: [PATCH 2/3] cxgb4i_v3: main driver files
From: Mike Christie @ 2010-05-27  7:40 UTC (permalink / raw)
  To: open-iscsi
  Cc: Rakesh Ranjan, NETDEVML, SCSIDEVML, LKML, Karen Xie, David Miller,
	James Bottomley, Anish Bhatt, Rakesh Ranjan
In-Reply-To: <4BFE2173.4050306@cs.wisc.edu>

On 05/27/2010 02:38 AM, Mike Christie wrote:
> On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
>> From: Rakesh Ranjan<rranjan@chelsio.com>
>>
>>
>> Signed-off-by: Rakesh Ranjan<rakesh@chelsio.com>
>> ---
>> drivers/scsi/cxgb4i/cxgb4i.h | 101 ++
>> drivers/scsi/cxgb4i/cxgb4i_ddp.c | 678 +++++++++++++
>> drivers/scsi/cxgb4i/cxgb4i_ddp.h | 118 +++
>> drivers/scsi/cxgb4i/cxgb4i_offload.c | 1846
>> ++++++++++++++++++++++++++++++++++
>> drivers/scsi/cxgb4i/cxgb4i_offload.h | 91 ++
>> drivers/scsi/cxgb4i/cxgb4i_snic.c | 260 +++++
>> 6 files changed, 3094 insertions(+), 0 deletions(-)
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i.h
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.c
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.h
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.c
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.h
>> create mode 100644 drivers/scsi/cxgb4i/cxgb4i_snic.c
>
>
>
> Got some whitespace errors when applying.
>
> warning: squelched 1 whitespace error
> warning: 6 lines add whitespace errors.
>
>

Oh yeah, run sparse on the driver too. There are some warnings about 
some functions should be static.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Hagen Paul Pfeifer @ 2010-05-27  7:46 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100527.002851.02281005.davem@davemloft.net>

* David Miller | 2010-05-27 00:28:51 [-0700]:

>Superior or not, it's simply never going to happen.  We are far beyond
>being able to get to where we were before NAT'ing and shaping devices
>started to get inserted everywhere on the network.
>
>And I also don't see any of this stuff as fundamentally proprietary.

We will see! If no real interaction between peers is required
ISP/Carrier/InternetExchanges will start to put their proprietary components
into the network. Because they have niffty features, the product developed
phase is shorten (no borring standardizations necessary) and so on. This is no
new insight.

>People want deep packet inspection, people want to control their user's
>traffic.  And people, most importantly, are willing to pay for this.
>
>Therefore, these elements will always be in the network.
>
>Better to co-exist with them and use them to our advantage instead of
>fantasizing about a utopia where they don't exist.

Sure, we have no alternative.

HGN


-- 
Hagen Paul Pfeifer <hagen@jauu.net>  ||  http://jauu.net/
Telephone: +49 174 5455209           ||  Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-27  7:57 UTC (permalink / raw)
  To: Rick Jones; +Cc: andi, David Miller, therbert, shemminger, netdev, ycheng
In-Reply-To: <4BFDA0B6.8030701@hp.com>

> Then all the app does is say "I'am in peer id foo" right?  Is that really 
> that much different from making the setsockopt() call for a different cwnd 
> value? Particularly if say the limit were not a global sysctl, but based on 
> the existing per-route value (perhaps expanded to have a min, max and 
> default?)

The worst case with peer id would be app using an own peer id
for each connection. So each connection would have an own cwnd,
just like today. So the worst case is the same as today.

If it shares connections between peer ids the real effective cwnd
of all those connections would be also never be "worse" (that is
larger) than it could be on single connection. 

So this limits the cwnds effectively with peer ids, although it also 
gives a nice way to reuse an already existing cwnd for a new
connection (this does not make things worse because in theory
the app could have reused the same connection too) 

So overall peer ids don't allow to enlarge cwnds over today.

If the cwnd is fully application controlled all these limits
are not there and a bittorrent client could just always set 
it to 1 million.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Andi Kleen @ 2010-05-27  8:00 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.151014.70204145.davem@davemloft.net>

On Wed, May 26, 2010 at 03:10:14PM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 26 May 2010 23:27:45 +0200
> 
> > As I understand the idea was that the application knows
> > what flows belong to a single peer and wants to have
> > a single cwnd for all of those. Perhaps there would
> > be a way to generalize that to tell it to the kernel.
> > 
> > e.g. have a "peer id"  that is known by applications
> > and the kernel could manage cwnds shared between connections
> > associated with the same peer id?
> > 
> > Just an idea, I admit I haven't thought very deeply
> > about this. Feel free to poke holes into it.
> 
> Yes, a CWND "domain" that can include multiple sockets is
> something that might gain some traction.
> 
> The "domain" could just simply be the tuple {process,peer-IP}

If process is in there this wouldn't work for a multi process
server?

Perhaps having it associated with a FD so that it could
be passed around with unix sockets if needed (just would
need to make sure the AF_UNIX gc can handle such cycles)

peer_id = open_peer_id();   
/* peer id is like a fd */

socket = socket( ... ); 
set_peer_id(socket, peer_id); 


...

close(peer_id);

-andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode
From: Michael S. Tsirkin @ 2010-05-27  8:20 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Herbert Xu
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D790B7958E740@shsmsx502.ccr.corp.intel.com>

On Thu, May 27, 2010 at 09:21:02AM +0800, Xin, Xiaohui wrote:
> Michael,
> I'm now looking into the vhost mergeable buffer, and I tried to use it to support PS mode with zero-copy. And I found an issue there that I have to modify the guest virito-net driver.
> 
> When guest virtio-net driver submits mergeable buffers, it submits multiple pages outside. In zero-copy case, vhost cannot know which page is used to put header, and which page is used to put payload. Then vhost can only reserves 12 bytes for each page. That means, the page_offset of the payload DMAed into the guest buffer is always 12 bytes. But guest virtio-net driver always use offset 0 to put the data (See receive_mergeable()). That's where the zero-copy use mergeable buffer must modify.
> 
> Have I missed something here? And how do you think about it?
> 
> Thanks 
> Xiaohui 

Maybe you can teach the hardware skip the first 12 bytes: qemu will
call an ioctl telling hardware what the virtio header size is.
This is how we plan to do it for tap.

Alternatively, buffers can be used in any order.
So we can have hardware use N buffers for the packet, and then
have vhost put the header in buffer N+1.

-- 
MST

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox