Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/1] myri10ge: Add support for PCI device id 9
From: Brice Goglin @ 2007-09-13 22:40 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev
In-Reply-To: <46E9BC1A.5020105@myri.com>

Add support for new Myri-10G boards with PCI device id 9.

Signed-off-by: Brice Goglin <brice@myri.com>
---
 drivers/net/myri10ge/myri10ge.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===================================================================
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c	2007-09-11 20:27:17.000000000 +0200
+++ linux-rc/drivers/net/myri10ge/myri10ge.c	2007-09-14 00:36:36.000000000 +0200
@@ -3094,9 +3094,12 @@
 }
 
 #define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E 	0x0008
+#define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9	0x0009
 
 static struct pci_device_id myri10ge_pci_tbl[] = {
 	{PCI_DEVICE(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E)},
+	{PCI_DEVICE
+	 (PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9)},
 	{0},
 };
 



^ permalink raw reply

* [PATCH 0/1] myri10ge update for 2.6.23
From: Brice Goglin @ 2007-09-13 22:39 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev

Hi Jeff,

The following patch adds support for a new PCI device id. Please apply
for 2.6.23.

Thanks,
Brice


^ permalink raw reply

* [ofa-general] Re: InfiniBand/RDMA merge plans for 2.6.24
From: Shirley Ma @ 2007-09-13 22:16 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, linux-kernel, general, netdev-owner
In-Reply-To: <aday7fal1qc.fsf@cisco.com>


[-- Attachment #1.1: Type: text/plain, Size: 1019 bytes --]






netdev-owner@vger.kernel.org wrote on 09/13/2007 02:00:43 PM:

>  >         Since ehca can support 4K MTU, we would like to see a patch in

>  > IPoIB to allow link MTU to be up to 4K instead of current 2K for
2.6.24
>  > kernel. The idea is IPoIB link MTU will pick up a return value from
SM's
>  > default broadcast MTU. This patch should be a small patch, I hope you
are
>  > OK with this.
>
> It's actually not small, since it turns the skb allocation into a
> 4100-byte buffer, which ends up being more than 1 page usually, which
> means it fails if memory is fragmented.
>
> Anyway given the backlog anything substantial that hasn't been posted
> already is almost surely going to have to wait until 2.6.25.

The patch is just needed to pick up broadcast MTU size instead of hard
coding 2K right now. SKB allocation shouldn't be different with Ethernet
Jambo Frame and IPoIB-CM which 64K MTU. I don't understand why it's
different. Could you please explain this?

Thanks
Shirley

[-- Attachment #1.2: Type: text/html, Size: 1242 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply

* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Roland Dreier @ 2007-09-13 21:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Steve Wise, general, linux-kernel, netdev
In-Reply-To: <46E987E0.2010605@garzik.org>

 > Well, if it involves /sharing/ port space with the native stack,
 > i.e. where port 1234 is IB but 1235 is Linux, pretty much all the
 > networking devs have NAK'd that approach AFAICS.

Just to be clear, InfiniBand has no problem; the issue is port
collisions involving iWARP connections.

 - R.

^ permalink raw reply

* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Roland Dreier @ 2007-09-13 21:11 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, linux-kernel, general
In-Reply-To: <46E97BB0.9030106@opengridcomputing.com>

 > I was about to post v2 of my patch to avoid port space collisions with
 > the native stack.  Can we get that 2.6.24?  It is high priority
 > IMO. I've tried to solicit review on it, but I think folks are
 > reluctant... ;-)

I would like to get this in, but I'm still at least a little
reluctant, since we would be committing to a user interface that seems
a little awkward at best, so I'd like to try and find something
better.  Just to summarize my understanding:

 - your patch requires the administration to configure an ethX:iwY
   alias address to use iwarp.  (By the way is there anything other
   than "don't do that" that avoids assigning the same address to the
   iwarp alias and a non-iwarp interface?)

 - it would be nicer to create the alias automatically, but an alias
   without an address doesn't make sense.  Creating a whole separate
   net device causes problems because the iwarp stuff still needs to
   use the main net device to do ARP etc.

 - so I'm out of better ideas but I still want to push back a little
   before we commit to something ugly.

I've been meaning to track down the bnx2 iscsi offload patch to look
and see if this issue is addressed, since the same problem seems to
exist: it seems an iscsi connection and a main stack tcp connection
might share the same 4-tuple unless something is done to avoid that
happening.

Also, I think it behooves us to get some agreement on this approach
with NetEffect and Kanoj (NetXen?) at least, since their iwarp drivers
seem to be imminent.

 - R.

^ permalink raw reply

* [PATCH for 2.6.24] SCTP: Move sysctl_sctp_[rw]mem definitions to protocol.c
From: Vlad Yasevich @ 2007-09-13 21:03 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers@lists.sourceforge.net, David Miller

The sctp_[rw]mem definitions should really be in protocol.c
since that is where they are initialized.  This also allows
one to build a kernel without sysctl support.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/sctp/protocol.c |    6 +++---
 net/sctp/sysctl.c   |   11 +++--------
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 193835d..c49eb99 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -84,9 +84,9 @@ static struct sctp_af *sctp_af_v6_specific;
 struct kmem_cache *sctp_chunk_cachep __read_mostly;
 struct kmem_cache *sctp_bucket_cachep __read_mostly;
 
-extern int sysctl_sctp_mem[3];
-extern int sysctl_sctp_rmem[3];
-extern int sysctl_sctp_wmem[3];
+int sysctl_sctp_mem[3];
+int sysctl_sctp_rmem[3];
+int sysctl_sctp_wmem[3];
 
 /* Return the address of the control sock. */
 struct sock *sctp_get_ctl_sock(void)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index ba75ef4..39b10ee 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -52,14 +52,9 @@ static int int_max = INT_MAX;
 static long sack_timer_min = 1;
 static long sack_timer_max = 500;
 
-int sysctl_sctp_mem[3];
-int sysctl_sctp_rmem[3];
-int sysctl_sctp_wmem[3];
-
-/*
- * per assoc memory limitationf for sends
- */
-int sysctl_sctp_wmem[3];
+extern int sysctl_sctp_mem[3];
+extern int sysctl_sctp_rmem[3];
+extern int sysctl_sctp_wmem[3];
 
 static ctl_table sctp_table[] = {
 	{
-- 
1.5.2.4


^ permalink raw reply related

* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Roland Dreier @ 2007-09-13 21:02 UTC (permalink / raw)
  To: Sean Hefty; +Cc: general, linux-kernel, netdev
In-Reply-To: <000401c7f632$c993e8e0$65cc180a@amr.corp.intel.com>

 > > - My user_mad P_Key index support patch.  I'll test the ioctl to
 > >   change to the new mode and merge this I guess, since Hal and Sean
 > >   have tested this out.
 > 
 > I can give this patch a reviewed-by: too, and I will also try to review a couple
 > of the pending ipoib patches.

Thanks!

 > > - Sean's QoS changes.  These look fine at first glance, and I just
 > >   plan to understand the backwards compatibility story (ie how this
 > >   works with an old SM) and merge.  Anyone who objects let me know.
 > 
 > The new QoS fields fall into fields that are currently reserved, which should be
 > ignored by an older SM.  I've only tested this against openSM however.

That seems OK -- I'm OK with breaking things if an SM is clearly buggy
(and not ignoring fields that are defined to be ignored in the spec
would certainly be a clear bug to me).

 > This patch was generated in response to an Intel MPI issue.  We've seen MPI take
 > several minutes to respond to a connection request during the middle of large
 > application runs.  When this happens, the active side times out the connection.
 > In OFED, we added module parameters to adjust the rdma_cm connection timeout on
 > the active side, but I believe that sending an MRA from the passive side is a
 > better solution.

OK -- just to make sure I'm understanding what you're saying: have you
confirmed that your proposed patches actually fix the issue?

 - R.

^ permalink raw reply

* Re: InfiniBand/RDMA merge plans for 2.6.24
From: Roland Dreier @ 2007-09-13 21:00 UTC (permalink / raw)
  To: Shirley Ma; +Cc: general, linux-kernel, netdev, netdev-owner
In-Reply-To: <OF79F6C618.02039854-ON87257355.00646A35-88257355.0064AB34@us.ibm.com>

 >         Since ehca can support 4K MTU, we would like to see a patch in 
 > IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 
 > kernel. The idea is IPoIB link MTU will pick up a return value from SM's 
 > default broadcast MTU. This patch should be a small patch, I hope you are 
 > OK with this.

It's actually not small, since it turns the skb allocation into a
4100-byte buffer, which ends up being more than 1 page usually, which
means it fails if memory is fragmented.

Anyway given the backlog anything substantial that hasn't been posted
already is almost surely going to have to wait until 2.6.25.

^ permalink raw reply

* Re: [Lksctp-developers] [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU
From: Vlad Yasevich @ 2007-09-13 20:14 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: paulmck, netdev, lksctp-developers
In-Reply-To: <1189713386.2748.28.camel@w-sridhar2.beaverton.ibm.com>

Sridhar Samudrala wrote:
> On Thu, 2007-09-13 at 15:33 -0400, Vlad Yasevich wrote:
>> Hi Sridhar
>>
>> Sridhar Samudrala wrote:
>>> On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
>>>> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
>>>>> [... and here is the updated version as promissed ...]
>>>>>
>>>>> Since the sctp_sockaddr_entry is now RCU enabled as part of
>>>>> the patch to synchronize sctp_localaddr_list, it makes sense to
>>>>> change all handling of these entries to RCU.  This includes the
>>>>> sctp_bind_addrs structure and it's list of bound addresses.
>>>>>
>>>>> This list is currently protected by an external rw_lock and that
>>>>> looks like an overkill.  There are only 2 writers to the list:
>>>>> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
>>>>> These are already seriealized via the socket lock, so they will
>>>>> not step on each other.  These are also relatively rare, so we
>>>>> should be good with RCU.
>>>>>
>>>>> The readers are varied and they are easily converted to RCU.
>>>> Looks good from an RCU viewpoint -- I must defer to others on
>>>> the networking aspects.
>>>>
>>>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> looks good to me too. some minor typos and some comments on
>>> RCU usage comments inline.
>>>
>>> Also, I guess we can remove the sctp_[read/write]_[un]lock macros
>>> from sctp.h now that you removed the all the users of rwlocks
>>> in SCTP
>>>
>> Looks like some of the hashing calls still use sctp_write_[un]lock
>> macros, but use normal read_lock() for the read side.
>>
>> I'll clean that up after these patches are accepted.
> 
> OK. You may also consider looking into the generic inet_hashtable
> infrastructure and see if we can use it for SCTP.
> 
> 

I've had a patch set brewing for a while.  I had everything done except the
association hash.  Have been trying to figure out how to plug that one in...

If you want to take a look, I can send you what I have so far. :)

-vlad

^ permalink raw reply

* Re: [v3 PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
From: Sridhar Samudrala @ 2007-09-13 20:00 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp
In-Reply-To: <1189712077487-git-send-email-vladislav.yasevich@hp.com>

On Thu, 2007-09-13 at 15:34 -0400, Vlad Yasevich wrote:
> Since the sctp_sockaddr_entry is now RCU enabled as part of
> the patch to synchronize sctp_localaddr_list, it makes sense to
> change all handling of these entries to RCU.  This includes the
> sctp_bind_addrs structure and it's list of bound addresses.
> 
> This list is currently protected by an external rw_lock and that
> looks like an overkill.  There are only 2 writers to the list:
> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> These are already seriealized via the socket lock, so they will
> not step on each other.  These are also relatively rare, so we
> should be good with RCU.
> 
> The readers are varied and they are easily converted to RCU.
> 
> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Acked-by: Sridhar Samudrala <sri@us.ibm.com>

Thanks
Sridhar
> ---
>  include/net/sctp/structs.h |    7 +--
>  net/sctp/associola.c       |   14 +-----
>  net/sctp/bind_addr.c       |   68 ++++++++++++++++++++----------
>  net/sctp/endpointola.c     |   27 +++---------
>  net/sctp/ipv6.c            |   12 ++---
>  net/sctp/protocol.c        |   25 ++++-------
>  net/sctp/sm_make_chunk.c   |   18 +++-----
>  net/sctp/socket.c          |   98 ++++++++++++-------------------------------
>  8 files changed, 106 insertions(+), 163 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index a89e361..c2fe2dc 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -1155,7 +1155,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest,
>  			int flags);
>  int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
>  		       __u8 use_as_src, gfp_t gfp);
> -int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *);
> +int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
> +			void (*rcu_call)(struct rcu_head *,
> +					  void (*func)(struct rcu_head *)));
>  int sctp_bind_addr_match(struct sctp_bind_addr *, const union sctp_addr *,
>  			 struct sctp_sock *);
>  union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
> @@ -1226,9 +1228,6 @@ struct sctp_ep_common {
>  	 * bind_addr.address_list is our set of local IP addresses.
>  	 */
>  	struct sctp_bind_addr bind_addr;
> -
> -	/* Protection during address list comparisons. */
> -	rwlock_t   addr_lock;
>  };
> 
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 2ad1caf..9bad8ba 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> 
>  	/* Initialize the bind addr area.  */
>  	sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
> -	rwlock_init(&asoc->base.addr_lock);
> 
>  	asoc->state = SCTP_STATE_CLOSED;
> 
> @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
>  {
>  	struct sctp_transport *transport;
> 
> -	sctp_read_lock(&asoc->base.addr_lock);
> -
>  	if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
>  	    (htons(asoc->peer.port) == paddr->v4.sin_port)) {
>  		transport = sctp_assoc_lookup_paddr(asoc, paddr);
> @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
>  	transport = NULL;
> 
>  out:
> -	sctp_read_unlock(&asoc->base.addr_lock);
>  	return transport;
>  }
> 
> @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc,
>  int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
>  			    const union sctp_addr *laddr)
>  {
> -	int found;
> +	int found = 0;
> 
> -	sctp_read_lock(&asoc->base.addr_lock);
>  	if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
>  	    sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
> -				 sctp_sk(asoc->base.sk))) {
> +				 sctp_sk(asoc->base.sk)))
>  		found = 1;
> -		goto out;
> -	}
> 
> -	found = 0;
> -out:
> -	sctp_read_unlock(&asoc->base.addr_lock);
>  	return found;
>  }
> 
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index 7fc369f..d35cbf5 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> 
>  	INIT_LIST_HEAD(&addr->list);
>  	INIT_RCU_HEAD(&addr->rcu);
> -	list_add_tail(&addr->list, &bp->address_list);
> +
> +	/* We always hold a socket lock when calling this function,
> +	 * and that acts as a writer synchronizing lock.
> +	 */
> +	list_add_tail_rcu(&addr->list, &bp->address_list);
>  	SCTP_DBG_OBJCNT_INC(addr);
> 
>  	return 0;
> @@ -176,23 +180,35 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>  /* Delete an address from the bind address list in the SCTP_bind_addr
>   * structure.
>   */
> -int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
> +int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr,
> +			void (*rcu_call)(struct rcu_head *head,
> +					 void (*func)(struct rcu_head *head)))
>  {
> -	struct list_head *pos, *temp;
> -	struct sctp_sockaddr_entry *addr;
> +	struct sctp_sockaddr_entry *addr, *temp;
> 
> -	list_for_each_safe(pos, temp, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	/* We hold the socket lock when calling this function,
> +	 * and that acts as a writer synchronizing lock.
> +	 */
> +	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
>  		if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
>  			/* Found the exact match. */
> -			list_del(pos);
> -			kfree(addr);
> -			SCTP_DBG_OBJCNT_DEC(addr);
> -
> -			return 0;
> +			addr->valid = 0;
> +			list_del_rcu(&addr->list);
> +			break;
>  		}
>  	}
> 
> +	/* Call the rcu callback provided in the args.  This function is
> +	 * called by both BH packet processing and user side socket option
> +	 * processing, but it works on different lists in those 2 contexts.
> +	 * Each context provides it's own callback, whether call_rcu_bh()
> +	 * or call_rcu(), to make sure that we wait for an appropriate time.
> +	 */
> +	if (addr && !addr->valid) {
> +		rcu_call(&addr->rcu, sctp_local_addr_free);
> +		SCTP_DBG_OBJCNT_DEC(addr);
> +	}
> +
>  	return -EINVAL;
>  }
> 
> @@ -302,15 +318,20 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
>  			 struct sctp_sock *opt)
>  {
>  	struct sctp_sockaddr_entry *laddr;
> -	struct list_head *pos;
> -
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -		if (opt->pf->cmp_addr(&laddr->a, addr, opt))
> -			return 1;
> +	int match = 0;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;
> +		if (opt->pf->cmp_addr(&laddr->a, addr, opt)) {
> +			match = 1;
> +			break;
> +		}
>  	}
> +	rcu_read_unlock();
> 
> -	return 0;
> +	return match;
>  }
> 
>  /* Find the first address in the bind address list that is not present in
> @@ -325,18 +346,19 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
>  	union sctp_addr			*addr;
>  	void 				*addr_buf;
>  	struct sctp_af			*af;
> -	struct list_head		*pos;
>  	int				i;
> 
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -
> +	/* This is only called sctp_send_asconf_del_ip() and we hold
> +	 * the socket lock in that code patch, so that address list
> +	 * can't change.
> +	 */
> +	list_for_each_entry(laddr, &bp->address_list, list) {
>  		addr_buf = (union sctp_addr *)addrs;
>  		for (i = 0; i < addrcnt; i++) {
>  			addr = (union sctp_addr *)addr_buf;
>  			af = sctp_get_af_specific(addr->v4.sin_family);
>  			if (!af)
> -				return NULL;
> +				break;
> 
>  			if (opt->pf->cmp_addr(&laddr->a, addr, opt))
>  				break;
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index 1404a9e..8f485a0 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -92,7 +92,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep,
> 
>  	/* Initialize the bind addr area */
>  	sctp_bind_addr_init(&ep->base.bind_addr, 0);
> -	rwlock_init(&ep->base.addr_lock);
> 
>  	/* Remember who we are attached to.  */
>  	ep->base.sk = sk;
> @@ -225,21 +224,14 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
>  struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
>  					       const union sctp_addr *laddr)
>  {
> -	struct sctp_endpoint *retval;
> +	struct sctp_endpoint *retval = NULL;
> 
> -	sctp_read_lock(&ep->base.addr_lock);
>  	if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
>  		if (sctp_bind_addr_match(&ep->base.bind_addr, laddr,
> -					 sctp_sk(ep->base.sk))) {
> +					 sctp_sk(ep->base.sk)))
>  			retval = ep;
> -			goto out;
> -		}
>  	}
> 
> -	retval = NULL;
> -
> -out:
> -	sctp_read_unlock(&ep->base.addr_lock);
>  	return retval;
>  }
> 
> @@ -261,9 +253,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
>  	list_for_each(pos, &ep->asocs) {
>  		asoc = list_entry(pos, struct sctp_association, asocs);
>  		if (rport == asoc->peer.port) {
> -			sctp_read_lock(&asoc->base.addr_lock);
>  			*transport = sctp_assoc_lookup_paddr(asoc, paddr);
> -			sctp_read_unlock(&asoc->base.addr_lock);
> 
>  			if (*transport)
>  				return asoc;
> @@ -295,20 +285,17 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
>  int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
>  				const union sctp_addr *paddr)
>  {
> -	struct list_head *pos;
>  	struct sctp_sockaddr_entry *addr;
>  	struct sctp_bind_addr *bp;
> 
> -	sctp_read_lock(&ep->base.addr_lock);
>  	bp = &ep->base.bind_addr;
> -	list_for_each(pos, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -		if (sctp_has_association(&addr->a, paddr)) {
> -			sctp_read_unlock(&ep->base.addr_lock);
> +	/* This function is called with the socket lock held,
> +	 * so the address_list can not change.
> +	 */
> +	list_for_each_entry(addr, &bp->address_list, list) {
> +		if (sctp_has_association(&addr->a, paddr))
>  			return 1;
> -		}
>  	}
> -	sctp_read_unlock(&ep->base.addr_lock);
> 
>  	return 0;
>  }
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index e12fa0a..670fd27 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -302,9 +302,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
>  			      union sctp_addr *saddr)
>  {
>  	struct sctp_bind_addr *bp;
> -	rwlock_t *addr_lock;
>  	struct sctp_sockaddr_entry *laddr;
> -	struct list_head *pos;
>  	sctp_scope_t scope;
>  	union sctp_addr *baddr = NULL;
>  	__u8 matchlen = 0;
> @@ -324,14 +322,14 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
>  	scope = sctp_scope(daddr);
> 
>  	bp = &asoc->base.bind_addr;
> -	addr_lock = &asoc->base.addr_lock;
> 
>  	/* Go through the bind address list and find the best source address
>  	 * that matches the scope of the destination address.
>  	 */
> -	sctp_read_lock(addr_lock);
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;
>  		if ((laddr->use_as_src) &&
>  		    (laddr->a.sa.sa_family == AF_INET6) &&
>  		    (scope <= sctp_scope(&laddr->a))) {
> @@ -353,7 +351,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
>  		       __FUNCTION__, asoc, NIP6(daddr->v6.sin6_addr));
>  	}
> 
> -	sctp_read_unlock(addr_lock);
> +	rcu_read_unlock();
>  }
> 
>  /* Make a copy of all potential local addresses. */
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index 7ee120e..3d036cd 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -224,7 +224,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>  			      (copy_flags & SCTP_ADDR6_ALLOWED) &&
>  			      (copy_flags & SCTP_ADDR6_PEERSUPP)))) {
>  				error = sctp_add_bind_addr(bp, &addr->a, 1,
> -							   GFP_ATOMIC);
> +						    GFP_ATOMIC);
>  				if (error)
>  					goto end_copy;
>  			}
> @@ -428,9 +428,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  	struct rtable *rt;
>  	struct flowi fl;
>  	struct sctp_bind_addr *bp;
> -	rwlock_t *addr_lock;
>  	struct sctp_sockaddr_entry *laddr;
> -	struct list_head *pos;
>  	struct dst_entry *dst = NULL;
>  	union sctp_addr dst_saddr;
> 
> @@ -459,23 +457,20 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  		goto out;
> 
>  	bp = &asoc->base.bind_addr;
> -	addr_lock = &asoc->base.addr_lock;
> 
>  	if (dst) {
>  		/* Walk through the bind address list and look for a bind
>  		 * address that matches the source address of the returned dst.
>  		 */
> -		sctp_read_lock(addr_lock);
> -		list_for_each(pos, &bp->address_list) {
> -			laddr = list_entry(pos, struct sctp_sockaddr_entry,
> -					   list);
> -			if (!laddr->use_as_src)
> +		rcu_read_lock();
> +		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +			if (!laddr->valid || !laddr->use_as_src)
>  				continue;
>  			sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
>  			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
>  				goto out_unlock;
>  		}
> -		sctp_read_unlock(addr_lock);
> +		rcu_read_unlock();
> 
>  		/* None of the bound addresses match the source address of the
>  		 * dst. So release it.
> @@ -487,10 +482,10 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  	/* Walk through the bind address list and try to get a dst that
>  	 * matches a bind address as the source address.
>  	 */
> -	sctp_read_lock(addr_lock);
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;
>  		if ((laddr->use_as_src) &&
>  		    (AF_INET == laddr->a.sa.sa_family)) {
>  			fl.fl4_src = laddr->a.v4.sin_addr.s_addr;
> @@ -502,7 +497,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  	}
> 
>  out_unlock:
> -	sctp_read_unlock(addr_lock);
> +	rcu_read_unlock();
>  out:
>  	if (dst)
>  		SCTP_DEBUG_PRINTK("rt_dst:%u.%u.%u.%u, rt_src:%u.%u.%u.%u\n",
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 79856c9..2e34220 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -1531,7 +1531,7 @@ no_hmac:
>  	/* Also, add the destination address. */
>  	if (list_empty(&retval->base.bind_addr.address_list)) {
>  		sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
> -				   GFP_ATOMIC);
> +				GFP_ATOMIC);
>  	}
> 
>  	retval->next_tsn = retval->c.initial_tsn;
> @@ -2613,22 +2613,16 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
> 
>  	switch (asconf_param->param_hdr.type) {
>  	case SCTP_PARAM_ADD_IP:
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
> -		list_for_each(pos, &bp->address_list) {
> -			saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +		/* This is always done in BH context with a socket lock
> +		 * held, so the list can not change.
> +		 */
> +		list_for_each_entry(saddr, &bp->address_list, list) {
>  			if (sctp_cmp_addr_exact(&saddr->a, &addr))
>  				saddr->use_as_src = 1;
>  		}
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
>  		break;
>  	case SCTP_PARAM_DEL_IP:
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
> -		retval = sctp_del_bind_addr(bp, &addr);
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
> +		retval = sctp_del_bind_addr(bp, &addr, call_rcu_bh);
>  		list_for_each(pos, &asoc->peer.transport_addr_list) {
>  			transport = list_entry(pos, struct sctp_transport,
>  						 transports);
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index a3acf78..772fbfb 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
>  	if (!bp->port)
>  		bp->port = inet_sk(sk)->num;
> 
> -	/* Add the address to the bind address list.  */
> -	sctp_local_bh_disable();
> -	sctp_write_lock(&ep->base.addr_lock);
> -
> -	/* Use GFP_ATOMIC since BHs are disabled.  */
> +	/* Add the address to the bind address list.
> +	 * Use GFP_ATOMIC since BHs will be disabled.
> +	 */
>  	ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
> -	sctp_write_unlock(&ep->base.addr_lock);
> -	sctp_local_bh_enable();
> 
>  	/* Copy back into socket for getsockname() use. */
>  	if (!ret) {
> @@ -544,15 +540,12 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  		if (i < addrcnt)
>  			continue;
> 
> -		/* Use the first address in bind addr list of association as
> -		 * Address Parameter of ASCONF CHUNK.
> +		/* Use the first valid address in bind addr list of
> +		 * association as Address Parameter of ASCONF CHUNK.
>  		 */
> -		sctp_read_lock(&asoc->base.addr_lock);
>  		bp = &asoc->base.bind_addr;
>  		p = bp->address_list.next;
>  		laddr = list_entry(p, struct sctp_sockaddr_entry, list);
> -		sctp_read_unlock(&asoc->base.addr_lock);
> -
>  		chunk = sctp_make_asconf_update_ip(asoc, &laddr->a, addrs,
>  						   addrcnt, SCTP_PARAM_ADD_IP);
>  		if (!chunk) {
> @@ -567,8 +560,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  		/* Add the new addresses to the bind address list with
>  		 * use_as_src set to 0.
>  		 */
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
>  		addr_buf = addrs;
>  		for (i = 0; i < addrcnt; i++) {
>  			addr = (union sctp_addr *)addr_buf;
> @@ -578,8 +569,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  						    GFP_ATOMIC);
>  			addr_buf += af->sockaddr_len;
>  		}
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
>  	}
> 
>  out:
> @@ -651,13 +640,7 @@ static int sctp_bindx_rem(struct sock *sk, struct sockaddr *addrs, int addrcnt)
>  		 * socket routing and failover schemes. Refer to comments in
>  		 * sctp_do_bind(). -daisy
>  		 */
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&ep->base.addr_lock);
> -
> -		retval = sctp_del_bind_addr(bp, sa_addr);
> -
> -		sctp_write_unlock(&ep->base.addr_lock);
> -		sctp_local_bh_enable();
> +		retval = sctp_del_bind_addr(bp, sa_addr, call_rcu);
> 
>  		addr_buf += af->sockaddr_len;
>  err_bindx_rem:
> @@ -748,14 +731,16 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		 * make sure that we do not delete all the addresses in the
>  		 * association.
>  		 */
> -		sctp_read_lock(&asoc->base.addr_lock);
>  		bp = &asoc->base.bind_addr;
>  		laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
>  					       addrcnt, sp);
> -		sctp_read_unlock(&asoc->base.addr_lock);
>  		if (!laddr)
>  			continue;
> 
> +		/* We do not need RCU protection throughout this loop
> +		 * because this is done under a socket lock from the
> +		 * setsockopt call.
> +		 */
>  		chunk = sctp_make_asconf_update_ip(asoc, laddr, addrs, addrcnt,
>  						   SCTP_PARAM_DEL_IP);
>  		if (!chunk) {
> @@ -766,23 +751,16 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		/* Reset use_as_src flag for the addresses in the bind address
>  		 * list that are to be deleted.
>  		 */
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
>  		addr_buf = addrs;
>  		for (i = 0; i < addrcnt; i++) {
>  			laddr = (union sctp_addr *)addr_buf;
>  			af = sctp_get_af_specific(laddr->v4.sin_family);
> -			list_for_each(pos1, &bp->address_list) {
> -				saddr = list_entry(pos1,
> -						   struct sctp_sockaddr_entry,
> -						   list);
> +			list_for_each_entry(saddr, &bp->address_list, list) {
>  				if (sctp_cmp_addr_exact(&saddr->a, laddr))
>  					saddr->use_as_src = 0;
>  			}
>  			addr_buf += af->sockaddr_len;
>  		}
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
> 
>  		/* Update the route and saddr entries for all the transports
>  		 * as some of the addresses in the bind address list are
> @@ -4057,11 +4035,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  					       int __user *optlen)
>  {
>  	sctp_assoc_t id;
> -	struct list_head *pos;
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
>  	struct sctp_sockaddr_entry *addr;
> -	rwlock_t *addr_lock;
>  	int cnt = 0;
> 
>  	if (len < sizeof(sctp_assoc_t))
> @@ -4078,17 +4054,13 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  	 */
>  	if (0 == id) {
>  		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
>  	} else {
>  		asoc = sctp_id2assoc(sk, id);
>  		if (!asoc)
>  			return -EINVAL;
>  		bp = &asoc->base.bind_addr;
> -		addr_lock = &asoc->base.addr_lock;
>  	}
> 
> -	sctp_read_lock(addr_lock);
> -
>  	/* If the endpoint is bound to 0.0.0.0 or ::0, count the valid
>  	 * addresses from the global local address list.
>  	 */
> @@ -4115,12 +4087,14 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  		goto done;
>  	}
> 
> -	list_for_each(pos, &bp->address_list) {
> +	/* Protection on the bound address list is not needed,
> +	 * since in the socket option context we hold the socket lock,
> +	 * so there is no way that the bound address list can change.
> +	 */
> +	list_for_each_entry(addr, &bp->address_list, list) {
>  		cnt ++;
>  	}
> -
>  done:
> -	sctp_read_unlock(addr_lock);
>  	return cnt;
>  }
> 
> @@ -4204,7 +4178,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  {
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
> -	struct list_head *pos;
>  	int cnt = 0;
>  	struct sctp_getaddrs_old getaddrs;
>  	struct sctp_sockaddr_entry *addr;
> @@ -4212,7 +4185,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	union sctp_addr temp;
>  	struct sctp_sock *sp = sctp_sk(sk);
>  	int addrlen;
> -	rwlock_t *addr_lock;
>  	int err = 0;
>  	void *addrs;
>  	void *buf;
> @@ -4234,13 +4206,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	 */
>  	if (0 == getaddrs.assoc_id) {
>  		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
>  	} else {
>  		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
>  		if (!asoc)
>  			return -EINVAL;
>  		bp = &asoc->base.bind_addr;
> -		addr_lock = &asoc->base.addr_lock;
>  	}
> 
>  	to = getaddrs.addrs;
> @@ -4254,8 +4224,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	if (!addrs)
>  		return -ENOMEM;
> 
> -	sctp_read_lock(addr_lock);
> -
>  	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
>  	 * addresses from the global local address list.
>  	 */
> @@ -4271,8 +4239,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	}
> 
>  	buf = addrs;
> -	list_for_each(pos, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	/* Protection on the bound address list is not needed since
> +	 * in the socket option context we hold a socket lock and
> +	 * thus the bound address list can't change.
> +	 */
> +	list_for_each_entry(addr, &bp->address_list, list) {
>  		memcpy(&temp, &addr->a, sizeof(temp));
>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> @@ -4284,8 +4255,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	}
> 
>  copy_getaddrs:
> -	sctp_read_unlock(addr_lock);
> -
>  	/* copy the entire address list into the user provided space */
>  	if (copy_to_user(to, addrs, bytes_copied)) {
>  		err = -EFAULT;
> @@ -4307,7 +4276,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  {
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
> -	struct list_head *pos;
>  	int cnt = 0;
>  	struct sctp_getaddrs getaddrs;
>  	struct sctp_sockaddr_entry *addr;
> @@ -4315,7 +4283,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	union sctp_addr temp;
>  	struct sctp_sock *sp = sctp_sk(sk);
>  	int addrlen;
> -	rwlock_t *addr_lock;
>  	int err = 0;
>  	size_t space_left;
>  	int bytes_copied = 0;
> @@ -4336,13 +4303,11 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	 */
>  	if (0 == getaddrs.assoc_id) {
>  		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
>  	} else {
>  		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
>  		if (!asoc)
>  			return -EINVAL;
>  		bp = &asoc->base.bind_addr;
> -		addr_lock = &asoc->base.addr_lock;
>  	}
> 
>  	to = optval + offsetof(struct sctp_getaddrs,addrs);
> @@ -4352,8 +4317,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	if (!addrs)
>  		return -ENOMEM;
> 
> -	sctp_read_lock(addr_lock);
> -
>  	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
>  	 * addresses from the global local address list.
>  	 */
> @@ -4365,21 +4328,24 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  						space_left, &bytes_copied);
>  			if (cnt < 0) {
>  				err = cnt;
> -				goto error_lock;
> +				goto out;
>  			}
>  			goto copy_getaddrs;
>  		}
>  	}
> 
>  	buf = addrs;
> -	list_for_each(pos, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	/* Protection on the bound address list is not needed since
> +	 * in the socket option context we hold a socket lock and
> +	 * thus the bound address list can't change.
> +	 */
> +	list_for_each_entry(addr, &bp->address_list, list) {
>  		memcpy(&temp, &addr->a, sizeof(temp));
>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
>  		if (space_left < addrlen) {
>  			err =  -ENOMEM; /*fixme: right error?*/
> -			goto error_lock;
> +			goto out;
>  		}
>  		memcpy(buf, &temp, addrlen);
>  		buf += addrlen;
> @@ -4389,8 +4355,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	}
> 
>  copy_getaddrs:
> -	sctp_read_unlock(addr_lock);
> -
>  	if (copy_to_user(to, addrs, bytes_copied)) {
>  		err = -EFAULT;
>  		goto out;
> @@ -4401,12 +4365,6 @@ copy_getaddrs:
>  	}
>  	if (put_user(bytes_copied, optlen))
>  		err = -EFAULT;
> -
> -	goto out;
> -
> -error_lock:
> -	sctp_read_unlock(addr_lock);
> -
>  out:
>  	kfree(addrs);
>  	return err;


^ permalink raw reply

* Re: [v3 PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
From: Sridhar Samudrala @ 2007-09-13 19:57 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp
In-Reply-To: <1189712077711-git-send-email-vladislav.yasevich@hp.com>

On Thu, 2007-09-13 at 15:34 -0400, Vlad Yasevich wrote:
> sctp_localaddr_list is modified dynamically via NETDEV_UP
> and NETDEV_DOWN events, but there is not synchronization
> between writer (even handler) and readers.  As a result,
> the readers can access an entry that has been freed and
> crash the sytem.
> 
> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Acked-by: Sridhar Samdurala <sri@us.ibm.com>

Thanks
Sridhar
> ---
>  include/net/sctp/sctp.h    |    1 +
>  include/net/sctp/structs.h |    6 +++++
>  net/sctp/bind_addr.c       |    2 +
>  net/sctp/ipv6.c            |   34 +++++++++++++++++++--------
>  net/sctp/protocol.c        |   54 +++++++++++++++++++++++++++++++------------
>  net/sctp/socket.c          |   38 ++++++++++++++++++++----------
>  6 files changed, 97 insertions(+), 38 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index d529045..c9cc00c 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -123,6 +123,7 @@
>   * sctp/protocol.c
>   */
>  extern struct sock *sctp_get_ctl_sock(void);
> +extern void sctp_local_addr_free(struct rcu_head *head);
>  extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
>  				     sctp_scope_t, gfp_t gfp,
>  				     int flags);
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index c0d5848..a89e361 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -207,6 +207,9 @@ extern struct sctp_globals {
>  	 * It is a list of sctp_sockaddr_entry.
>  	 */
>  	struct list_head local_addr_list;
> +
> +	/* Lock that protects the local_addr_list writers */
> +	spinlock_t addr_list_lock;
>  	
>  	/* Flag to indicate if addip is enabled. */
>  	int addip_enable;
> @@ -242,6 +245,7 @@ extern struct sctp_globals {
>  #define sctp_port_alloc_lock		(sctp_globals.port_alloc_lock)
>  #define sctp_port_hashtable		(sctp_globals.port_hashtable)
>  #define sctp_local_addr_list		(sctp_globals.local_addr_list)
> +#define sctp_local_addr_lock		(sctp_globals.addr_list_lock)
>  #define sctp_addip_enable		(sctp_globals.addip_enable)
>  #define sctp_prsctp_enable		(sctp_globals.prsctp_enable)
> 
> @@ -737,8 +741,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
>  /* This is a structure for holding either an IPv6 or an IPv4 address.  */
>  struct sctp_sockaddr_entry {
>  	struct list_head list;
> +	struct rcu_head	rcu;
>  	union sctp_addr a;
>  	__u8 use_as_src;
> +	__u8 valid;
>  };
> 
>  typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index fdb287a..7fc369f 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>  		addr->a.v4.sin_port = htons(bp->port);
> 
>  	addr->use_as_src = use_as_src;
> +	addr->valid = 1;
> 
>  	INIT_LIST_HEAD(&addr->list);
> +	INIT_RCU_HEAD(&addr->rcu);
>  	list_add_tail(&addr->list, &bp->address_list);
>  	SCTP_DBG_OBJCNT_INC(addr);
> 
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index f8aa23d..e12fa0a 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -77,13 +77,18 @@
> 
>  #include <asm/uaccess.h>
> 
> -/* Event handler for inet6 address addition/deletion events.  */
> +/* Event handler for inet6 address addition/deletion events.
> + * The sctp_local_addr_list needs to be protocted by a spin lock since
> + * multiple notifiers (say IPv4 and IPv6) may be running at the same
> + * time and thus corrupt the list.
> + * The reader side is protected with RCU.
> + */
>  static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
>  				void *ptr)
>  {
>  	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
> -	struct sctp_sockaddr_entry *addr;
> -	struct list_head *pos, *temp;
> +	struct sctp_sockaddr_entry *addr = NULL;
> +	struct sctp_sockaddr_entry *temp;
> 
>  	switch (ev) {
>  	case NETDEV_UP:
> @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
>  			memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
>  				 sizeof(struct in6_addr));
>  			addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
> -			list_add_tail(&addr->list, &sctp_local_addr_list);
> +			addr->valid = 1;
> +			spin_lock_bh(&sctp_local_addr_lock);
> +			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> +			spin_unlock_bh(&sctp_local_addr_lock);
>  		}
>  		break;
>  	case NETDEV_DOWN:
> -		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -			if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
> -				list_del(pos);
> -				kfree(addr);
> +		spin_lock_bh(&sctp_local_addr_lock);
> +		list_for_each_entry_safe(addr, temp,
> +					&sctp_local_addr_list, list) {
> +			if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
> +					     &ifa->addr)) {
> +				addr->valid = 0;
> +				list_del_rcu(&addr->list);
>  				break;
>  			}
>  		}
> -
> +		spin_unlock_bh(&sctp_local_addr_lock);
> +		if (addr && !addr->valid)
> +			call_rcu(&addr->rcu, sctp_local_addr_free);
>  		break;
>  	}
> 
> @@ -367,7 +379,9 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
>  			addr->a.v6.sin6_port = 0;
>  			addr->a.v6.sin6_addr = ifp->addr;
>  			addr->a.v6.sin6_scope_id = dev->ifindex;
> +			addr->valid = 1;
>  			INIT_LIST_HEAD(&addr->list);
> +			INIT_RCU_HEAD(&addr->rcu);
>  			list_add_tail(&addr->list, addrlist);
>  		}
>  	}
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index e98579b..7ee120e 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -153,6 +153,9 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
>  			addr->a.v4.sin_family = AF_INET;
>  			addr->a.v4.sin_port = 0;
>  			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
> +			addr->valid = 1;
> +			INIT_LIST_HEAD(&addr->list);
> +			INIT_RCU_HEAD(&addr->rcu);
>  			list_add_tail(&addr->list, addrlist);
>  		}
>  	}
> @@ -192,16 +195,24 @@ static void sctp_free_local_addr_list(void)
>  	}
>  }
> 
> +void sctp_local_addr_free(struct rcu_head *head)
> +{
> +	struct sctp_sockaddr_entry *e = container_of(head,
> +				struct sctp_sockaddr_entry, rcu);
> +	kfree(e);
> +}
> +
>  /* Copy the local addresses which are valid for 'scope' into 'bp'.  */
>  int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>  			      gfp_t gfp, int copy_flags)
>  {
>  	struct sctp_sockaddr_entry *addr;
>  	int error = 0;
> -	struct list_head *pos, *temp;
> 
> -	list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> +		if (!addr->valid)
> +			continue;
>  		if (sctp_in_scope(&addr->a, scope)) {
>  			/* Now that the address is in scope, check to see if
>  			 * the address type is really supported by the local
> @@ -221,6 +232,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>  	}
> 
>  end_copy:
> +	rcu_read_unlock();
>  	return error;
>  }
> 
> @@ -600,13 +612,18 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
>  	seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
>  }
> 
> -/* Event handler for inet address addition/deletion events.  */
> +/* Event handler for inet address addition/deletion events.
> + * The sctp_local_addr_list needs to be protocted by a spin lock since
> + * multiple notifiers (say IPv4 and IPv6) may be running at the same
> + * time and thus corrupt the list.
> + * The reader side is protected with RCU.
> + */
>  static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
>  			       void *ptr)
>  {
>  	struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
> -	struct sctp_sockaddr_entry *addr;
> -	struct list_head *pos, *temp;
> +	struct sctp_sockaddr_entry *addr = NULL;
> +	struct sctp_sockaddr_entry *temp;
> 
>  	switch (ev) {
>  	case NETDEV_UP:
> @@ -615,19 +632,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
>  			addr->a.v4.sin_family = AF_INET;
>  			addr->a.v4.sin_port = 0;
>  			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
> -			list_add_tail(&addr->list, &sctp_local_addr_list);
> +			addr->valid = 1;
> +			spin_lock_bh(&sctp_local_addr_lock);
> +			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> +			spin_unlock_bh(&sctp_local_addr_lock);
>  		}
>  		break;
>  	case NETDEV_DOWN:
> -		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +		spin_lock_bh(&sctp_local_addr_lock);
> +		list_for_each_entry_safe(addr, temp,
> +					&sctp_local_addr_list, list) {
>  			if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
> -				list_del(pos);
> -				kfree(addr);
> +				addr->valid = 0;
> +				list_del_rcu(&addr->list);
>  				break;
>  			}
>  		}
> -
> +		spin_unlock_bh(&sctp_local_addr_lock);
> +		if (addr && !addr->valid)
> +			call_rcu(&addr->rcu, sctp_local_addr_free);
>  		break;
>  	}
> 
> @@ -1160,6 +1183,7 @@ SCTP_STATIC __init int sctp_init(void)
> 
>  	/* Initialize the local address list. */
>  	INIT_LIST_HEAD(&sctp_local_addr_list);
> +	spin_lock_init(&sctp_local_addr_lock);
>  	sctp_get_local_addr_list();
> 
>  	/* Register notifier for inet address additions/deletions. */
> @@ -1227,6 +1251,9 @@ SCTP_STATIC __exit void sctp_exit(void)
>  	sctp_v6_del_protocol();
>  	inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
> 
> +	/* Unregister notifier for inet address additions/deletions. */
> +	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
> +
>  	/* Free the local address list.  */
>  	sctp_free_local_addr_list();
> 
> @@ -1240,9 +1267,6 @@ SCTP_STATIC __exit void sctp_exit(void)
>  	inet_unregister_protosw(&sctp_stream_protosw);
>  	inet_unregister_protosw(&sctp_seqpacket_protosw);
> 
> -	/* Unregister notifier for inet address additions/deletions. */
> -	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
> -
>  	sctp_sysctl_unregister();
>  	list_del(&sctp_ipv4_specific.list);
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 3335460..a3acf78 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  					       int __user *optlen)
>  {
>  	sctp_assoc_t id;
> +	struct list_head *pos;
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
> -	struct list_head *pos, *temp;
>  	struct sctp_sockaddr_entry *addr;
>  	rwlock_t *addr_lock;
>  	int cnt = 0;
> @@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  		addr = list_entry(bp->address_list.next,
>  				  struct sctp_sockaddr_entry, list);
>  		if (sctp_is_any(&addr->a)) {
> -			list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -				addr = list_entry(pos,
> -						  struct sctp_sockaddr_entry,
> -						  list);
> +			rcu_read_lock();
> +			list_for_each_entry_rcu(addr,
> +						&sctp_local_addr_list, list) {
> +				if (!addr->valid)
> +					continue;
> +
>  				if ((PF_INET == sk->sk_family) &&
>  				    (AF_INET6 == addr->a.sa.sa_family))
>  					continue;
> +
>  				cnt++;
>  			}
> +			rcu_read_unlock();
>  		} else {
>  			cnt = 1;
>  		}
> @@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>  					int max_addrs, void *to,
>  					int *bytes_copied)
>  {
> -	struct list_head *pos, *next;
>  	struct sctp_sockaddr_entry *addr;
>  	union sctp_addr temp;
>  	int cnt = 0;
>  	int addrlen;
> 
> -	list_for_each_safe(pos, next, &sctp_local_addr_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> +		if (!addr->valid)
> +			continue;
> +
>  		if ((PF_INET == sk->sk_family) &&
>  		    (AF_INET6 == addr->a.sa.sa_family))
>  			continue;
> @@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>  		cnt ++;
>  		if (cnt >= max_addrs) break;
>  	}
> +	rcu_read_unlock();
> 
>  	return cnt;
>  }
> @@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>  static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  			    size_t space_left, int *bytes_copied)
>  {
> -	struct list_head *pos, *next;
>  	struct sctp_sockaddr_entry *addr;
>  	union sctp_addr temp;
>  	int cnt = 0;
>  	int addrlen;
> 
> -	list_for_each_safe(pos, next, &sctp_local_addr_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> +		if (!addr->valid)
> +			continue;
> +
>  		if ((PF_INET == sk->sk_family) &&
>  		    (AF_INET6 == addr->a.sa.sa_family))
>  			continue;
> @@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
>  								&temp);
>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> -		if (space_left < addrlen)
> -			return -ENOMEM;
> +		if (space_left < addrlen) {
> +			cnt =  -ENOMEM;
> +			break;
> +		}
>  		memcpy(to, &temp, addrlen);
> 
>  		to += addrlen;
> @@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  		space_left -= addrlen;
>  		*bytes_copied += addrlen;
>  	}
> +	rcu_read_unlock();
> 
>  	return cnt;
>  }


^ permalink raw reply

* Re: [Lksctp-developers] [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU
From: Sridhar Samudrala @ 2007-09-13 19:56 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: paulmck, netdev, lksctp-developers
In-Reply-To: <46E9908E.6000905@hp.com>

On Thu, 2007-09-13 at 15:33 -0400, Vlad Yasevich wrote:
> Hi Sridhar
> 
> Sridhar Samudrala wrote:
> > On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
> >> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
> >>> [... and here is the updated version as promissed ...]
> >>>
> >>> Since the sctp_sockaddr_entry is now RCU enabled as part of
> >>> the patch to synchronize sctp_localaddr_list, it makes sense to
> >>> change all handling of these entries to RCU.  This includes the
> >>> sctp_bind_addrs structure and it's list of bound addresses.
> >>>
> >>> This list is currently protected by an external rw_lock and that
> >>> looks like an overkill.  There are only 2 writers to the list:
> >>> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> >>> These are already seriealized via the socket lock, so they will
> >>> not step on each other.  These are also relatively rare, so we
> >>> should be good with RCU.
> >>>
> >>> The readers are varied and they are easily converted to RCU.
> >> Looks good from an RCU viewpoint -- I must defer to others on
> >> the networking aspects.
> >>
> >> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > looks good to me too. some minor typos and some comments on
> > RCU usage comments inline.
> > 
> > Also, I guess we can remove the sctp_[read/write]_[un]lock macros
> > from sctp.h now that you removed the all the users of rwlocks
> > in SCTP
> > 
> 
> Looks like some of the hashing calls still use sctp_write_[un]lock
> macros, but use normal read_lock() for the read side.
> 
> I'll clean that up after these patches are accepted.

OK. You may also consider looking into the generic inet_hashtable
infrastructure and see if we can use it for SCTP.

Thanks
Sridhar


^ permalink raw reply

* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Jeff Garzik @ 2007-09-13 19:55 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, Roland Dreier, linux-kernel, general
In-Reply-To: <46E98889.1080706@opengridcomputing.com>

Steve Wise wrote:
> Jeff Garzik wrote:
>> Steve Wise wrote:
>>> I was about to post v2 of my patch to avoid port space collisions 
>>> with the native stack.  Can we get that 2.6.24?  It is high priority 
>>> IMO. I've tried to solicit review on it, but I think folks are 
>>> reluctant... ;-)

>> Well, if it involves /sharing/ port space with the native stack, i.e. 
>> where port 1234 is IB but 1235 is Linux, pretty much all the 
>> networking devs have NAK'd that approach AFAICS.

> Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
> device will use its own ip address and subnet to avoid collisions.  You 
> should review the patch when I post v2.

Sounds promising, then!  :)

	Jeff

^ permalink raw reply

* Re: [ofa-general] [PATCH v2] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.
From: Sean Hefty @ 2007-09-13 19:54 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, rdreier, general, linux-kernel
In-Reply-To: <20070913191617.30937.95960.stgit@dell3.ogc.int>

> The iWARP driver must translate all listens on address 0.0.0.0 to the
> set of rdma-only ip addresses for the device in question.  This prevents
> incoming connect requests to the TCP ipaddresses from going up the
> rdma stack.

I've only given this a high level review at this point, and while the 
patch looks okay on first pass, is there a way to move some of this 
functionality to either the rdma_cm or iw_cm?  I don't like the idea of 
every iwarp driver having to implement address/listen list maintenance. 
  I may have some ideas after re-examining it.

> Implementation Details:

There are a couple of areas that I made a note to look at in more detail 
(because I didn't understand everything that was happening), but I did 
have one minor nit - most uses of list_del_init can just be list_del.

- Sean

^ permalink raw reply

* Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot
From: Michael Chan @ 2007-09-13 20:47 UTC (permalink / raw)
  To: Lucas Nussbaum; +Cc: netdev
In-Reply-To: <20070913192812.GA12053@xanadu.blop.info>

On Thu, 2007-09-13 at 21:28 +0200, Lucas Nussbaum wrote:

> Erm, Wouldn't it be possible to print a warning when the driver loads,
> saying that the firmware is outdated ?

It's possible, but would require the driver to parse the version string.
The driver currently reports the version string for information and for
the human to parse it.


^ permalink raw reply

* [v3 PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
From: Vlad Yasevich @ 2007-09-13 19:34 UTC (permalink / raw)
  To: netdev; +Cc: lksctp, Vlad Yasevich
In-Reply-To: <11897120771278-git-send-email-vladislav.yasevich@hp.com>

Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU.  This includes the
sctp_bind_addrs structure and it's list of bound addresses.

This list is currently protected by an external rw_lock and that
looks like an overkill.  There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other.  These are also relatively rare, so we
should be good with RCU.

The readers are varied and they are easily converted to RCU.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/net/sctp/structs.h |    7 +--
 net/sctp/associola.c       |   14 +-----
 net/sctp/bind_addr.c       |   68 ++++++++++++++++++++----------
 net/sctp/endpointola.c     |   27 +++---------
 net/sctp/ipv6.c            |   12 ++---
 net/sctp/protocol.c        |   25 ++++-------
 net/sctp/sm_make_chunk.c   |   18 +++-----
 net/sctp/socket.c          |   98 ++++++++++++-------------------------------
 8 files changed, 106 insertions(+), 163 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a89e361..c2fe2dc 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1155,7 +1155,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest,
 			int flags);
 int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
 		       __u8 use_as_src, gfp_t gfp);
-int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *);
+int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
+			void (*rcu_call)(struct rcu_head *,
+					  void (*func)(struct rcu_head *)));
 int sctp_bind_addr_match(struct sctp_bind_addr *, const union sctp_addr *,
 			 struct sctp_sock *);
 union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
@@ -1226,9 +1228,6 @@ struct sctp_ep_common {
 	 * bind_addr.address_list is our set of local IP addresses.
 	 */
 	struct sctp_bind_addr bind_addr;
-
-	/* Protection during address list comparisons. */
-	rwlock_t   addr_lock;
 };
 
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 2ad1caf..9bad8ba 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 
 	/* Initialize the bind addr area.  */
 	sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
-	rwlock_init(&asoc->base.addr_lock);
 
 	asoc->state = SCTP_STATE_CLOSED;
 
@@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
 {
 	struct sctp_transport *transport;
 
-	sctp_read_lock(&asoc->base.addr_lock);
-
 	if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
 	    (htons(asoc->peer.port) == paddr->v4.sin_port)) {
 		transport = sctp_assoc_lookup_paddr(asoc, paddr);
@@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
 	transport = NULL;
 
 out:
-	sctp_read_unlock(&asoc->base.addr_lock);
 	return transport;
 }
 
@@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc,
 int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
 			    const union sctp_addr *laddr)
 {
-	int found;
+	int found = 0;
 
-	sctp_read_lock(&asoc->base.addr_lock);
 	if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
 	    sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
-				 sctp_sk(asoc->base.sk))) {
+				 sctp_sk(asoc->base.sk)))
 		found = 1;
-		goto out;
-	}
 
-	found = 0;
-out:
-	sctp_read_unlock(&asoc->base.addr_lock);
 	return found;
 }
 
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 7fc369f..d35cbf5 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
 
 	INIT_LIST_HEAD(&addr->list);
 	INIT_RCU_HEAD(&addr->rcu);
-	list_add_tail(&addr->list, &bp->address_list);
+
+	/* We always hold a socket lock when calling this function,
+	 * and that acts as a writer synchronizing lock.
+	 */
+	list_add_tail_rcu(&addr->list, &bp->address_list);
 	SCTP_DBG_OBJCNT_INC(addr);
 
 	return 0;
@@ -176,23 +180,35 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
 /* Delete an address from the bind address list in the SCTP_bind_addr
  * structure.
  */
-int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
+int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr,
+			void (*rcu_call)(struct rcu_head *head,
+					 void (*func)(struct rcu_head *head)))
 {
-	struct list_head *pos, *temp;
-	struct sctp_sockaddr_entry *addr;
+	struct sctp_sockaddr_entry *addr, *temp;
 
-	list_for_each_safe(pos, temp, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	/* We hold the socket lock when calling this function,
+	 * and that acts as a writer synchronizing lock.
+	 */
+	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
 		if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
 			/* Found the exact match. */
-			list_del(pos);
-			kfree(addr);
-			SCTP_DBG_OBJCNT_DEC(addr);
-
-			return 0;
+			addr->valid = 0;
+			list_del_rcu(&addr->list);
+			break;
 		}
 	}
 
+	/* Call the rcu callback provided in the args.  This function is
+	 * called by both BH packet processing and user side socket option
+	 * processing, but it works on different lists in those 2 contexts.
+	 * Each context provides it's own callback, whether call_rcu_bh()
+	 * or call_rcu(), to make sure that we wait for an appropriate time.
+	 */
+	if (addr && !addr->valid) {
+		rcu_call(&addr->rcu, sctp_local_addr_free);
+		SCTP_DBG_OBJCNT_DEC(addr);
+	}
+
 	return -EINVAL;
 }
 
@@ -302,15 +318,20 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
 			 struct sctp_sock *opt)
 {
 	struct sctp_sockaddr_entry *laddr;
-	struct list_head *pos;
-
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
-		if (opt->pf->cmp_addr(&laddr->a, addr, opt))
-			return 1;
+	int match = 0;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
+		if (opt->pf->cmp_addr(&laddr->a, addr, opt)) {
+			match = 1;
+			break;
+		}
 	}
+	rcu_read_unlock();
 
-	return 0;
+	return match;
 }
 
 /* Find the first address in the bind address list that is not present in
@@ -325,18 +346,19 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
 	union sctp_addr			*addr;
 	void 				*addr_buf;
 	struct sctp_af			*af;
-	struct list_head		*pos;
 	int				i;
 
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
-
+	/* This is only called sctp_send_asconf_del_ip() and we hold
+	 * the socket lock in that code patch, so that address list
+	 * can't change.
+	 */
+	list_for_each_entry(laddr, &bp->address_list, list) {
 		addr_buf = (union sctp_addr *)addrs;
 		for (i = 0; i < addrcnt; i++) {
 			addr = (union sctp_addr *)addr_buf;
 			af = sctp_get_af_specific(addr->v4.sin_family);
 			if (!af)
-				return NULL;
+				break;
 
 			if (opt->pf->cmp_addr(&laddr->a, addr, opt))
 				break;
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 1404a9e..8f485a0 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -92,7 +92,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep,
 
 	/* Initialize the bind addr area */
 	sctp_bind_addr_init(&ep->base.bind_addr, 0);
-	rwlock_init(&ep->base.addr_lock);
 
 	/* Remember who we are attached to.  */
 	ep->base.sk = sk;
@@ -225,21 +224,14 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
 struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
 					       const union sctp_addr *laddr)
 {
-	struct sctp_endpoint *retval;
+	struct sctp_endpoint *retval = NULL;
 
-	sctp_read_lock(&ep->base.addr_lock);
 	if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
 		if (sctp_bind_addr_match(&ep->base.bind_addr, laddr,
-					 sctp_sk(ep->base.sk))) {
+					 sctp_sk(ep->base.sk)))
 			retval = ep;
-			goto out;
-		}
 	}
 
-	retval = NULL;
-
-out:
-	sctp_read_unlock(&ep->base.addr_lock);
 	return retval;
 }
 
@@ -261,9 +253,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
 	list_for_each(pos, &ep->asocs) {
 		asoc = list_entry(pos, struct sctp_association, asocs);
 		if (rport == asoc->peer.port) {
-			sctp_read_lock(&asoc->base.addr_lock);
 			*transport = sctp_assoc_lookup_paddr(asoc, paddr);
-			sctp_read_unlock(&asoc->base.addr_lock);
 
 			if (*transport)
 				return asoc;
@@ -295,20 +285,17 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
 int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
 				const union sctp_addr *paddr)
 {
-	struct list_head *pos;
 	struct sctp_sockaddr_entry *addr;
 	struct sctp_bind_addr *bp;
 
-	sctp_read_lock(&ep->base.addr_lock);
 	bp = &ep->base.bind_addr;
-	list_for_each(pos, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
-		if (sctp_has_association(&addr->a, paddr)) {
-			sctp_read_unlock(&ep->base.addr_lock);
+	/* This function is called with the socket lock held,
+	 * so the address_list can not change.
+	 */
+	list_for_each_entry(addr, &bp->address_list, list) {
+		if (sctp_has_association(&addr->a, paddr))
 			return 1;
-		}
 	}
-	sctp_read_unlock(&ep->base.addr_lock);
 
 	return 0;
 }
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e12fa0a..670fd27 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -302,9 +302,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
 			      union sctp_addr *saddr)
 {
 	struct sctp_bind_addr *bp;
-	rwlock_t *addr_lock;
 	struct sctp_sockaddr_entry *laddr;
-	struct list_head *pos;
 	sctp_scope_t scope;
 	union sctp_addr *baddr = NULL;
 	__u8 matchlen = 0;
@@ -324,14 +322,14 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
 	scope = sctp_scope(daddr);
 
 	bp = &asoc->base.bind_addr;
-	addr_lock = &asoc->base.addr_lock;
 
 	/* Go through the bind address list and find the best source address
 	 * that matches the scope of the destination address.
 	 */
-	sctp_read_lock(addr_lock);
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
 		if ((laddr->use_as_src) &&
 		    (laddr->a.sa.sa_family == AF_INET6) &&
 		    (scope <= sctp_scope(&laddr->a))) {
@@ -353,7 +351,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
 		       __FUNCTION__, asoc, NIP6(daddr->v6.sin6_addr));
 	}
 
-	sctp_read_unlock(addr_lock);
+	rcu_read_unlock();
 }
 
 /* Make a copy of all potential local addresses. */
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 7ee120e..3d036cd 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -224,7 +224,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
 			      (copy_flags & SCTP_ADDR6_ALLOWED) &&
 			      (copy_flags & SCTP_ADDR6_PEERSUPP)))) {
 				error = sctp_add_bind_addr(bp, &addr->a, 1,
-							   GFP_ATOMIC);
+						    GFP_ATOMIC);
 				if (error)
 					goto end_copy;
 			}
@@ -428,9 +428,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 	struct rtable *rt;
 	struct flowi fl;
 	struct sctp_bind_addr *bp;
-	rwlock_t *addr_lock;
 	struct sctp_sockaddr_entry *laddr;
-	struct list_head *pos;
 	struct dst_entry *dst = NULL;
 	union sctp_addr dst_saddr;
 
@@ -459,23 +457,20 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 		goto out;
 
 	bp = &asoc->base.bind_addr;
-	addr_lock = &asoc->base.addr_lock;
 
 	if (dst) {
 		/* Walk through the bind address list and look for a bind
 		 * address that matches the source address of the returned dst.
 		 */
-		sctp_read_lock(addr_lock);
-		list_for_each(pos, &bp->address_list) {
-			laddr = list_entry(pos, struct sctp_sockaddr_entry,
-					   list);
-			if (!laddr->use_as_src)
+		rcu_read_lock();
+		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+			if (!laddr->valid || !laddr->use_as_src)
 				continue;
 			sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
 			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
 				goto out_unlock;
 		}
-		sctp_read_unlock(addr_lock);
+		rcu_read_unlock();
 
 		/* None of the bound addresses match the source address of the
 		 * dst. So release it.
@@ -487,10 +482,10 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 	/* Walk through the bind address list and try to get a dst that
 	 * matches a bind address as the source address.
 	 */
-	sctp_read_lock(addr_lock);
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
-
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
 		if ((laddr->use_as_src) &&
 		    (AF_INET == laddr->a.sa.sa_family)) {
 			fl.fl4_src = laddr->a.v4.sin_addr.s_addr;
@@ -502,7 +497,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 	}
 
 out_unlock:
-	sctp_read_unlock(addr_lock);
+	rcu_read_unlock();
 out:
 	if (dst)
 		SCTP_DEBUG_PRINTK("rt_dst:%u.%u.%u.%u, rt_src:%u.%u.%u.%u\n",
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..2e34220 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1531,7 +1531,7 @@ no_hmac:
 	/* Also, add the destination address. */
 	if (list_empty(&retval->base.bind_addr.address_list)) {
 		sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
-				   GFP_ATOMIC);
+				GFP_ATOMIC);
 	}
 
 	retval->next_tsn = retval->c.initial_tsn;
@@ -2613,22 +2613,16 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
 
 	switch (asconf_param->param_hdr.type) {
 	case SCTP_PARAM_ADD_IP:
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
-		list_for_each(pos, &bp->address_list) {
-			saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+		/* This is always done in BH context with a socket lock
+		 * held, so the list can not change.
+		 */
+		list_for_each_entry(saddr, &bp->address_list, list) {
 			if (sctp_cmp_addr_exact(&saddr->a, &addr))
 				saddr->use_as_src = 1;
 		}
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
 		break;
 	case SCTP_PARAM_DEL_IP:
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
-		retval = sctp_del_bind_addr(bp, &addr);
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
+		retval = sctp_del_bind_addr(bp, &addr, call_rcu_bh);
 		list_for_each(pos, &asoc->peer.transport_addr_list) {
 			transport = list_entry(pos, struct sctp_transport,
 						 transports);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a3acf78..772fbfb 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
 	if (!bp->port)
 		bp->port = inet_sk(sk)->num;
 
-	/* Add the address to the bind address list.  */
-	sctp_local_bh_disable();
-	sctp_write_lock(&ep->base.addr_lock);
-
-	/* Use GFP_ATOMIC since BHs are disabled.  */
+	/* Add the address to the bind address list.
+	 * Use GFP_ATOMIC since BHs will be disabled.
+	 */
 	ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
-	sctp_write_unlock(&ep->base.addr_lock);
-	sctp_local_bh_enable();
 
 	/* Copy back into socket for getsockname() use. */
 	if (!ret) {
@@ -544,15 +540,12 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 		if (i < addrcnt)
 			continue;
 
-		/* Use the first address in bind addr list of association as
-		 * Address Parameter of ASCONF CHUNK.
+		/* Use the first valid address in bind addr list of
+		 * association as Address Parameter of ASCONF CHUNK.
 		 */
-		sctp_read_lock(&asoc->base.addr_lock);
 		bp = &asoc->base.bind_addr;
 		p = bp->address_list.next;
 		laddr = list_entry(p, struct sctp_sockaddr_entry, list);
-		sctp_read_unlock(&asoc->base.addr_lock);
-
 		chunk = sctp_make_asconf_update_ip(asoc, &laddr->a, addrs,
 						   addrcnt, SCTP_PARAM_ADD_IP);
 		if (!chunk) {
@@ -567,8 +560,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 		/* Add the new addresses to the bind address list with
 		 * use_as_src set to 0.
 		 */
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
 		addr_buf = addrs;
 		for (i = 0; i < addrcnt; i++) {
 			addr = (union sctp_addr *)addr_buf;
@@ -578,8 +569,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 						    GFP_ATOMIC);
 			addr_buf += af->sockaddr_len;
 		}
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
 	}
 
 out:
@@ -651,13 +640,7 @@ static int sctp_bindx_rem(struct sock *sk, struct sockaddr *addrs, int addrcnt)
 		 * socket routing and failover schemes. Refer to comments in
 		 * sctp_do_bind(). -daisy
 		 */
-		sctp_local_bh_disable();
-		sctp_write_lock(&ep->base.addr_lock);
-
-		retval = sctp_del_bind_addr(bp, sa_addr);
-
-		sctp_write_unlock(&ep->base.addr_lock);
-		sctp_local_bh_enable();
+		retval = sctp_del_bind_addr(bp, sa_addr, call_rcu);
 
 		addr_buf += af->sockaddr_len;
 err_bindx_rem:
@@ -748,14 +731,16 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		 * make sure that we do not delete all the addresses in the
 		 * association.
 		 */
-		sctp_read_lock(&asoc->base.addr_lock);
 		bp = &asoc->base.bind_addr;
 		laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
 					       addrcnt, sp);
-		sctp_read_unlock(&asoc->base.addr_lock);
 		if (!laddr)
 			continue;
 
+		/* We do not need RCU protection throughout this loop
+		 * because this is done under a socket lock from the
+		 * setsockopt call.
+		 */
 		chunk = sctp_make_asconf_update_ip(asoc, laddr, addrs, addrcnt,
 						   SCTP_PARAM_DEL_IP);
 		if (!chunk) {
@@ -766,23 +751,16 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		/* Reset use_as_src flag for the addresses in the bind address
 		 * list that are to be deleted.
 		 */
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
 		addr_buf = addrs;
 		for (i = 0; i < addrcnt; i++) {
 			laddr = (union sctp_addr *)addr_buf;
 			af = sctp_get_af_specific(laddr->v4.sin_family);
-			list_for_each(pos1, &bp->address_list) {
-				saddr = list_entry(pos1,
-						   struct sctp_sockaddr_entry,
-						   list);
+			list_for_each_entry(saddr, &bp->address_list, list) {
 				if (sctp_cmp_addr_exact(&saddr->a, laddr))
 					saddr->use_as_src = 0;
 			}
 			addr_buf += af->sockaddr_len;
 		}
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
 
 		/* Update the route and saddr entries for all the transports
 		 * as some of the addresses in the bind address list are
@@ -4057,11 +4035,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 					       int __user *optlen)
 {
 	sctp_assoc_t id;
-	struct list_head *pos;
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
 	struct sctp_sockaddr_entry *addr;
-	rwlock_t *addr_lock;
 	int cnt = 0;
 
 	if (len < sizeof(sctp_assoc_t))
@@ -4078,17 +4054,13 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 	 */
 	if (0 == id) {
 		bp = &sctp_sk(sk)->ep->base.bind_addr;
-		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
 	} else {
 		asoc = sctp_id2assoc(sk, id);
 		if (!asoc)
 			return -EINVAL;
 		bp = &asoc->base.bind_addr;
-		addr_lock = &asoc->base.addr_lock;
 	}
 
-	sctp_read_lock(addr_lock);
-
 	/* If the endpoint is bound to 0.0.0.0 or ::0, count the valid
 	 * addresses from the global local address list.
 	 */
@@ -4115,12 +4087,14 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 		goto done;
 	}
 
-	list_for_each(pos, &bp->address_list) {
+	/* Protection on the bound address list is not needed,
+	 * since in the socket option context we hold the socket lock,
+	 * so there is no way that the bound address list can change.
+	 */
+	list_for_each_entry(addr, &bp->address_list, list) {
 		cnt ++;
 	}
-
 done:
-	sctp_read_unlock(addr_lock);
 	return cnt;
 }
 
@@ -4204,7 +4178,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 {
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
-	struct list_head *pos;
 	int cnt = 0;
 	struct sctp_getaddrs_old getaddrs;
 	struct sctp_sockaddr_entry *addr;
@@ -4212,7 +4185,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	union sctp_addr temp;
 	struct sctp_sock *sp = sctp_sk(sk);
 	int addrlen;
-	rwlock_t *addr_lock;
 	int err = 0;
 	void *addrs;
 	void *buf;
@@ -4234,13 +4206,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	 */
 	if (0 == getaddrs.assoc_id) {
 		bp = &sctp_sk(sk)->ep->base.bind_addr;
-		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
 	} else {
 		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
 		if (!asoc)
 			return -EINVAL;
 		bp = &asoc->base.bind_addr;
-		addr_lock = &asoc->base.addr_lock;
 	}
 
 	to = getaddrs.addrs;
@@ -4254,8 +4224,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	if (!addrs)
 		return -ENOMEM;
 
-	sctp_read_lock(addr_lock);
-
 	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
 	 * addresses from the global local address list.
 	 */
@@ -4271,8 +4239,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	}
 
 	buf = addrs;
-	list_for_each(pos, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	/* Protection on the bound address list is not needed since
+	 * in the socket option context we hold a socket lock and
+	 * thus the bound address list can't change.
+	 */
+	list_for_each_entry(addr, &bp->address_list, list) {
 		memcpy(&temp, &addr->a, sizeof(temp));
 		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
 		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
@@ -4284,8 +4255,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	}
 
 copy_getaddrs:
-	sctp_read_unlock(addr_lock);
-
 	/* copy the entire address list into the user provided space */
 	if (copy_to_user(to, addrs, bytes_copied)) {
 		err = -EFAULT;
@@ -4307,7 +4276,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 {
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
-	struct list_head *pos;
 	int cnt = 0;
 	struct sctp_getaddrs getaddrs;
 	struct sctp_sockaddr_entry *addr;
@@ -4315,7 +4283,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	union sctp_addr temp;
 	struct sctp_sock *sp = sctp_sk(sk);
 	int addrlen;
-	rwlock_t *addr_lock;
 	int err = 0;
 	size_t space_left;
 	int bytes_copied = 0;
@@ -4336,13 +4303,11 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	 */
 	if (0 == getaddrs.assoc_id) {
 		bp = &sctp_sk(sk)->ep->base.bind_addr;
-		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
 	} else {
 		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
 		if (!asoc)
 			return -EINVAL;
 		bp = &asoc->base.bind_addr;
-		addr_lock = &asoc->base.addr_lock;
 	}
 
 	to = optval + offsetof(struct sctp_getaddrs,addrs);
@@ -4352,8 +4317,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	if (!addrs)
 		return -ENOMEM;
 
-	sctp_read_lock(addr_lock);
-
 	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
 	 * addresses from the global local address list.
 	 */
@@ -4365,21 +4328,24 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 						space_left, &bytes_copied);
 			if (cnt < 0) {
 				err = cnt;
-				goto error_lock;
+				goto out;
 			}
 			goto copy_getaddrs;
 		}
 	}
 
 	buf = addrs;
-	list_for_each(pos, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	/* Protection on the bound address list is not needed since
+	 * in the socket option context we hold a socket lock and
+	 * thus the bound address list can't change.
+	 */
+	list_for_each_entry(addr, &bp->address_list, list) {
 		memcpy(&temp, &addr->a, sizeof(temp));
 		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
 		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
 		if (space_left < addrlen) {
 			err =  -ENOMEM; /*fixme: right error?*/
-			goto error_lock;
+			goto out;
 		}
 		memcpy(buf, &temp, addrlen);
 		buf += addrlen;
@@ -4389,8 +4355,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	}
 
 copy_getaddrs:
-	sctp_read_unlock(addr_lock);
-
 	if (copy_to_user(to, addrs, bytes_copied)) {
 		err = -EFAULT;
 		goto out;
@@ -4401,12 +4365,6 @@ copy_getaddrs:
 	}
 	if (put_user(bytes_copied, optlen))
 		err = -EFAULT;
-
-	goto out;
-
-error_lock:
-	sctp_read_unlock(addr_lock);
-
 out:
 	kfree(addrs);
 	return err;
-- 
1.5.2.4


^ permalink raw reply related

* [v3 PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
From: Vlad Yasevich @ 2007-09-13 19:34 UTC (permalink / raw)
  To: netdev; +Cc: lksctp, Vlad Yasevich
In-Reply-To: <11897120771278-git-send-email-vladislav.yasevich@hp.com>

sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers.  As a result,
the readers can access an entry that has been freed and
crash the sytem.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/net/sctp/sctp.h    |    1 +
 include/net/sctp/structs.h |    6 +++++
 net/sctp/bind_addr.c       |    2 +
 net/sctp/ipv6.c            |   34 +++++++++++++++++++--------
 net/sctp/protocol.c        |   54 +++++++++++++++++++++++++++++++------------
 net/sctp/socket.c          |   38 ++++++++++++++++++++----------
 6 files changed, 97 insertions(+), 38 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d529045..c9cc00c 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -123,6 +123,7 @@
  * sctp/protocol.c
  */
 extern struct sock *sctp_get_ctl_sock(void);
+extern void sctp_local_addr_free(struct rcu_head *head);
 extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
 				     sctp_scope_t, gfp_t gfp,
 				     int flags);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..a89e361 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -207,6 +207,9 @@ extern struct sctp_globals {
 	 * It is a list of sctp_sockaddr_entry.
 	 */
 	struct list_head local_addr_list;
+
+	/* Lock that protects the local_addr_list writers */
+	spinlock_t addr_list_lock;
 	
 	/* Flag to indicate if addip is enabled. */
 	int addip_enable;
@@ -242,6 +245,7 @@ extern struct sctp_globals {
 #define sctp_port_alloc_lock		(sctp_globals.port_alloc_lock)
 #define sctp_port_hashtable		(sctp_globals.port_hashtable)
 #define sctp_local_addr_list		(sctp_globals.local_addr_list)
+#define sctp_local_addr_lock		(sctp_globals.addr_list_lock)
 #define sctp_addip_enable		(sctp_globals.addip_enable)
 #define sctp_prsctp_enable		(sctp_globals.prsctp_enable)
 
@@ -737,8 +741,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
 /* This is a structure for holding either an IPv6 or an IPv4 address.  */
 struct sctp_sockaddr_entry {
 	struct list_head list;
+	struct rcu_head	rcu;
 	union sctp_addr a;
 	__u8 use_as_src;
+	__u8 valid;
 };
 
 typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index fdb287a..7fc369f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
 		addr->a.v4.sin_port = htons(bp->port);
 
 	addr->use_as_src = use_as_src;
+	addr->valid = 1;
 
 	INIT_LIST_HEAD(&addr->list);
+	INIT_RCU_HEAD(&addr->rcu);
 	list_add_tail(&addr->list, &bp->address_list);
 	SCTP_DBG_OBJCNT_INC(addr);
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f8aa23d..e12fa0a 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -77,13 +77,18 @@
 
 #include <asm/uaccess.h>
 
-/* Event handler for inet6 address addition/deletion events.  */
+/* Event handler for inet6 address addition/deletion events.
+ * The sctp_local_addr_list needs to be protocted by a spin lock since
+ * multiple notifiers (say IPv4 and IPv6) may be running at the same
+ * time and thus corrupt the list.
+ * The reader side is protected with RCU.
+ */
 static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
 				void *ptr)
 {
 	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
-	struct sctp_sockaddr_entry *addr;
-	struct list_head *pos, *temp;
+	struct sctp_sockaddr_entry *addr = NULL;
+	struct sctp_sockaddr_entry *temp;
 
 	switch (ev) {
 	case NETDEV_UP:
@@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
 			memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
 				 sizeof(struct in6_addr));
 			addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
-			list_add_tail(&addr->list, &sctp_local_addr_list);
+			addr->valid = 1;
+			spin_lock_bh(&sctp_local_addr_lock);
+			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			spin_unlock_bh(&sctp_local_addr_lock);
 		}
 		break;
 	case NETDEV_DOWN:
-		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
-			if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
-				list_del(pos);
-				kfree(addr);
+		spin_lock_bh(&sctp_local_addr_lock);
+		list_for_each_entry_safe(addr, temp,
+					&sctp_local_addr_list, list) {
+			if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
+					     &ifa->addr)) {
+				addr->valid = 0;
+				list_del_rcu(&addr->list);
 				break;
 			}
 		}
-
+		spin_unlock_bh(&sctp_local_addr_lock);
+		if (addr && !addr->valid)
+			call_rcu(&addr->rcu, sctp_local_addr_free);
 		break;
 	}
 
@@ -367,7 +379,9 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
 			addr->a.v6.sin6_port = 0;
 			addr->a.v6.sin6_addr = ifp->addr;
 			addr->a.v6.sin6_scope_id = dev->ifindex;
+			addr->valid = 1;
 			INIT_LIST_HEAD(&addr->list);
+			INIT_RCU_HEAD(&addr->rcu);
 			list_add_tail(&addr->list, addrlist);
 		}
 	}
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index e98579b..7ee120e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -153,6 +153,9 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
 			addr->a.v4.sin_family = AF_INET;
 			addr->a.v4.sin_port = 0;
 			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
+			addr->valid = 1;
+			INIT_LIST_HEAD(&addr->list);
+			INIT_RCU_HEAD(&addr->rcu);
 			list_add_tail(&addr->list, addrlist);
 		}
 	}
@@ -192,16 +195,24 @@ static void sctp_free_local_addr_list(void)
 	}
 }
 
+void sctp_local_addr_free(struct rcu_head *head)
+{
+	struct sctp_sockaddr_entry *e = container_of(head,
+				struct sctp_sockaddr_entry, rcu);
+	kfree(e);
+}
+
 /* Copy the local addresses which are valid for 'scope' into 'bp'.  */
 int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
 			      gfp_t gfp, int copy_flags)
 {
 	struct sctp_sockaddr_entry *addr;
 	int error = 0;
-	struct list_head *pos, *temp;
 
-	list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+		if (!addr->valid)
+			continue;
 		if (sctp_in_scope(&addr->a, scope)) {
 			/* Now that the address is in scope, check to see if
 			 * the address type is really supported by the local
@@ -221,6 +232,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
 	}
 
 end_copy:
+	rcu_read_unlock();
 	return error;
 }
 
@@ -600,13 +612,18 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
 	seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
 }
 
-/* Event handler for inet address addition/deletion events.  */
+/* Event handler for inet address addition/deletion events.
+ * The sctp_local_addr_list needs to be protocted by a spin lock since
+ * multiple notifiers (say IPv4 and IPv6) may be running at the same
+ * time and thus corrupt the list.
+ * The reader side is protected with RCU.
+ */
 static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
 			       void *ptr)
 {
 	struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
-	struct sctp_sockaddr_entry *addr;
-	struct list_head *pos, *temp;
+	struct sctp_sockaddr_entry *addr = NULL;
+	struct sctp_sockaddr_entry *temp;
 
 	switch (ev) {
 	case NETDEV_UP:
@@ -615,19 +632,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
 			addr->a.v4.sin_family = AF_INET;
 			addr->a.v4.sin_port = 0;
 			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
-			list_add_tail(&addr->list, &sctp_local_addr_list);
+			addr->valid = 1;
+			spin_lock_bh(&sctp_local_addr_lock);
+			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			spin_unlock_bh(&sctp_local_addr_lock);
 		}
 		break;
 	case NETDEV_DOWN:
-		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+		spin_lock_bh(&sctp_local_addr_lock);
+		list_for_each_entry_safe(addr, temp,
+					&sctp_local_addr_list, list) {
 			if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
-				list_del(pos);
-				kfree(addr);
+				addr->valid = 0;
+				list_del_rcu(&addr->list);
 				break;
 			}
 		}
-
+		spin_unlock_bh(&sctp_local_addr_lock);
+		if (addr && !addr->valid)
+			call_rcu(&addr->rcu, sctp_local_addr_free);
 		break;
 	}
 
@@ -1160,6 +1183,7 @@ SCTP_STATIC __init int sctp_init(void)
 
 	/* Initialize the local address list. */
 	INIT_LIST_HEAD(&sctp_local_addr_list);
+	spin_lock_init(&sctp_local_addr_lock);
 	sctp_get_local_addr_list();
 
 	/* Register notifier for inet address additions/deletions. */
@@ -1227,6 +1251,9 @@ SCTP_STATIC __exit void sctp_exit(void)
 	sctp_v6_del_protocol();
 	inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
 
+	/* Unregister notifier for inet address additions/deletions. */
+	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
+
 	/* Free the local address list.  */
 	sctp_free_local_addr_list();
 
@@ -1240,9 +1267,6 @@ SCTP_STATIC __exit void sctp_exit(void)
 	inet_unregister_protosw(&sctp_stream_protosw);
 	inet_unregister_protosw(&sctp_seqpacket_protosw);
 
-	/* Unregister notifier for inet address additions/deletions. */
-	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
-
 	sctp_sysctl_unregister();
 	list_del(&sctp_ipv4_specific.list);
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3335460..a3acf78 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 					       int __user *optlen)
 {
 	sctp_assoc_t id;
+	struct list_head *pos;
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
-	struct list_head *pos, *temp;
 	struct sctp_sockaddr_entry *addr;
 	rwlock_t *addr_lock;
 	int cnt = 0;
@@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 		addr = list_entry(bp->address_list.next,
 				  struct sctp_sockaddr_entry, list);
 		if (sctp_is_any(&addr->a)) {
-			list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-				addr = list_entry(pos,
-						  struct sctp_sockaddr_entry,
-						  list);
+			rcu_read_lock();
+			list_for_each_entry_rcu(addr,
+						&sctp_local_addr_list, list) {
+				if (!addr->valid)
+					continue;
+
 				if ((PF_INET == sk->sk_family) &&
 				    (AF_INET6 == addr->a.sa.sa_family))
 					continue;
+
 				cnt++;
 			}
+			rcu_read_unlock();
 		} else {
 			cnt = 1;
 		}
@@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
 					int max_addrs, void *to,
 					int *bytes_copied)
 {
-	struct list_head *pos, *next;
 	struct sctp_sockaddr_entry *addr;
 	union sctp_addr temp;
 	int cnt = 0;
 	int addrlen;
 
-	list_for_each_safe(pos, next, &sctp_local_addr_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+		if (!addr->valid)
+			continue;
+
 		if ((PF_INET == sk->sk_family) &&
 		    (AF_INET6 == addr->a.sa.sa_family))
 			continue;
@@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
 		cnt ++;
 		if (cnt >= max_addrs) break;
 	}
+	rcu_read_unlock();
 
 	return cnt;
 }
@@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
 static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
 			    size_t space_left, int *bytes_copied)
 {
-	struct list_head *pos, *next;
 	struct sctp_sockaddr_entry *addr;
 	union sctp_addr temp;
 	int cnt = 0;
 	int addrlen;
 
-	list_for_each_safe(pos, next, &sctp_local_addr_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+		if (!addr->valid)
+			continue;
+
 		if ((PF_INET == sk->sk_family) &&
 		    (AF_INET6 == addr->a.sa.sa_family))
 			continue;
@@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
 		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
 								&temp);
 		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
-		if (space_left < addrlen)
-			return -ENOMEM;
+		if (space_left < addrlen) {
+			cnt =  -ENOMEM;
+			break;
+		}
 		memcpy(to, &temp, addrlen);
 
 		to += addrlen;
@@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
 		space_left -= addrlen;
 		*bytes_copied += addrlen;
 	}
+	rcu_read_unlock();
 
 	return cnt;
 }
-- 
1.5.2.4


^ permalink raw reply related

* [v3 PATCH 0/2] Add RCU locking to SCTPaddress management
From: Vlad Yasevich @ 2007-09-13 19:34 UTC (permalink / raw)
  To: netdev; +Cc: lksctp

Hi All

Thanks to Sridhar Samudral and Paul McKenney for all the help and comments.
I think this is a final version, unless someone else can spot more problems.
I've ran this under heavy load and it the patches behaves well.

I think patch 1 is a candidate for 2.6.23 since it fixes a bug, but splitting
these seems a bit odd to me.  I'll leave it to DaveM to decide where to
put them.

Thanks
-vlad

^ permalink raw reply

* Re: [Lksctp-developers] [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU
From: Vlad Yasevich @ 2007-09-13 19:33 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: paulmck, netdev, lksctp-developers
In-Reply-To: <1189706346.2748.20.camel@w-sridhar2.beaverton.ibm.com>

Hi Sridhar

Sridhar Samudrala wrote:
> On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
>> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
>>> [... and here is the updated version as promissed ...]
>>>
>>> Since the sctp_sockaddr_entry is now RCU enabled as part of
>>> the patch to synchronize sctp_localaddr_list, it makes sense to
>>> change all handling of these entries to RCU.  This includes the
>>> sctp_bind_addrs structure and it's list of bound addresses.
>>>
>>> This list is currently protected by an external rw_lock and that
>>> looks like an overkill.  There are only 2 writers to the list:
>>> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
>>> These are already seriealized via the socket lock, so they will
>>> not step on each other.  These are also relatively rare, so we
>>> should be good with RCU.
>>>
>>> The readers are varied and they are easily converted to RCU.
>> Looks good from an RCU viewpoint -- I must defer to others on
>> the networking aspects.
>>
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> looks good to me too. some minor typos and some comments on
> RCU usage comments inline.
> 
> Also, I guess we can remove the sctp_[read/write]_[un]lock macros
> from sctp.h now that you removed the all the users of rwlocks
> in SCTP
> 

Looks like some of the hashing calls still use sctp_write_[un]lock
macros, but use normal read_lock() for the read side.

I'll clean that up after these patches are accepted.

-vlad

^ permalink raw reply

* Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot
From: Lucas Nussbaum @ 2007-09-13 19:28 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev
In-Reply-To: <1189706701.9540.84.camel@dell>

On 13/09/07 at 11:05 -0700, Michael Chan wrote:
> On Thu, 2007-09-13 at 17:41 +0200, Lucas Nussbaum wrote:
> 
> > # ethtool -i eth0
> > driver: tg3
> > version: 3.65
> > firmware-version: 5703-v2.21a
> > bus-info: 0000:02:02.0
> 
> The firmware is quite old and needs to be upgraded to fix the problem.
> I'll have someone contact you to get it upgraded.

Erm, Wouldn't it be possible to print a warning when the driver loads,
saying that the firmware is outdated ?
-- 
| Lucas Nussbaum                        PhD student |
| lucas.nussbaum@imag.fr        LIG / Projet MESCAL |
| jabber: lucas@nussbaum.fr    +33 (0)6 64 71 41 65 |
| homepage:        http://www-id.imag.fr/~nussbaum/ |

^ permalink raw reply

* [PATCH v2] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.
From: Steve Wise @ 2007-09-13 19:16 UTC (permalink / raw)
  To: rdreier, sean.hefty; +Cc: netdev, linux-kernel, general


iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.

Version 2:

- added a per-device mutex for the address and listening endpoints lists.

- wait for all replies if sending multiple passive_open requests to rnic.

- log warning if no addresses are available when a listen is issued.

- tested

---

Design:

The sysadmin creates "for iwarp use only" alias interfaces of the form
"devname:iw*" where devname is the native interface name (eg eth0) for the
iwarp netdev device.  The alias label can be anything starting with "iw".
The "iw" immediately after the ':' is the key used by the iw_cxgb3 driver.

EG:
	ifconfig eth0 192.168.70.123 up
	ifconfig eth0:iw1 192.168.71.123 up
	ifconfig eth0:iw2 192.168.72.123 up

In the above example, 192.168.70/24 is for TCP traffic, while
192.168.71/24 and 192.168.72/24 are for iWARP/RDMA use.

The rdma-only interface must be on its own IP subnet. This allows routing
all rdma traffic onto this interface.

The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses for the device in question.  This prevents
incoming connect requests to the TCP ipaddresses from going up the
rdma stack.

Implementation Details:

- The iw_cxgb3 driver registers for inetaddr events via
register_inetaddr_notifier().  This allows tracking the iwarp-only
addresses/subnets as they get added and deleted.  The iwarp driver
maintains a list of the current iwarp-only addresses.

- The iw_cxgb3 driver builds the list of iwarp-only addresses for its
devices at module insert time.  This is needed because the inetaddr
notifier callbacks don't "replay" address-add events when someone
registers.  So the driver must build the initial list at module load time.

- When a listen is done on address 0.0.0.0, then the iw_cxgb3 driver
must translate that into a set of listens on the iwarp-only addresses.
This is implemented by maintaining a list of stid/addr entries per
listening endpoint.

- When a new iwarp-only address is added or removed, the iw_cxgb3 driver
must traverse the set of listening endpoints and update them accordingly.
This allows an application to bind to 0.0.0.0 prior to the iwarp-only
interfaces being configured.  It also allows changing the iwarp-only set
of addresses and getting the expected behavior for apps already bound
to 0.0.0.0.  This is done by maintaining a list of listening endpoints
off the device struct.

- The address list, the listening endpoint list, and each list of
stid/addrs in use per listening endpoint are all protected via a mutex
per iw_cxgb3 device.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch.c    |  125 ++++++++++++++++
 drivers/infiniband/hw/cxgb3/iwch.h    |   11 +
 drivers/infiniband/hw/cxgb3/iwch_cm.c |  259 +++++++++++++++++++++++++++------
 drivers/infiniband/hw/cxgb3/iwch_cm.h |   15 ++
 4 files changed, 360 insertions(+), 50 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
index 0315c9d..296fb66 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -63,6 +63,123 @@ struct cxgb3_client t3c_client = {
 static LIST_HEAD(dev_list);
 static DEFINE_MUTEX(dev_mutex);
 
+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+	struct iwch_addrlist *addr;
+
+	addr = kmalloc(sizeof *addr, GFP_KERNEL);
+	if (!addr) {
+		printk(KERN_ERR MOD "%s - failed to alloc memory!\n",
+		       __FUNCTION__);
+		return;
+	}
+	addr->ifa = ifa;
+	mutex_lock(&rnicp->mutex);
+	list_add_tail(&addr->entry, &rnicp->addrlist);
+	mutex_unlock(&rnicp->mutex);
+}
+
+static void remove_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+	struct iwch_addrlist *addr, *tmp;
+
+	mutex_lock(&rnicp->mutex);
+	list_for_each_entry_safe(addr, tmp, &rnicp->addrlist, entry) {
+		if (addr->ifa == ifa) {
+			list_del_init(&addr->entry);
+			kfree(addr);
+			goto out;
+		}
+	}
+out:
+	mutex_unlock(&rnicp->mutex);
+}
+
+static int netdev_is_ours(struct iwch_dev *rnicp, struct net_device *netdev)
+{
+	int i;
+
+	for (i = 0; i < rnicp->rdev.port_info.nports; i++)
+		if (netdev == rnicp->rdev.port_info.lldevs[i])
+			return 1;
+	return 0;
+}
+
+static inline int is_iwarp_label(char *label)
+{
+	char *colon;
+
+	colon = strchr(label, ':');
+	if (colon && !strncmp(colon+1, "iw", 2))
+		return 1;
+	return 0;
+}
+
+static int nb_callback(struct notifier_block *self, unsigned long event,
+		       void *ctx)
+{
+	struct in_ifaddr *ifa = ctx;
+	struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb);
+
+	PDBG("%s rnicp %p event %lx\n", __FUNCTION__, rnicp, event);
+
+	switch (event) {
+	case NETDEV_UP:
+		if (netdev_is_ours(rnicp, ifa->ifa_dev->dev) &&
+		    is_iwarp_label(ifa->ifa_label)) {
+			PDBG("%s label %s addr 0x%x added\n",
+				__FUNCTION__, ifa->ifa_label, ifa->ifa_address);
+			insert_ifa(rnicp, ifa);
+			iwch_listeners_add_addr(rnicp, ifa->ifa_address);
+		}
+		break;
+	case NETDEV_DOWN:
+		if (netdev_is_ours(rnicp, ifa->ifa_dev->dev) &&
+		    is_iwarp_label(ifa->ifa_label)) {
+			PDBG("%s label %s addr 0x%x deleted\n",
+				__FUNCTION__, ifa->ifa_label, ifa->ifa_address);
+			iwch_listeners_del_addr(rnicp, ifa->ifa_address);
+			remove_ifa(rnicp, ifa);
+		}
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static void delete_addrlist(struct iwch_dev *rnicp)
+{
+	struct iwch_addrlist *addr, *tmp;
+
+	mutex_lock(&rnicp->mutex);
+	list_for_each_entry_safe(addr, tmp, &rnicp->addrlist, entry) {
+		list_del_init(&addr->entry);
+		kfree(addr);
+	}
+	mutex_unlock(&rnicp->mutex);
+}
+
+static void populate_addrlist(struct iwch_dev *rnicp)
+{
+	int i;
+	struct in_device *indev;
+
+	for (i = 0; i < rnicp->rdev.port_info.nports; i++) {
+		indev = in_dev_get(rnicp->rdev.port_info.lldevs[i]);
+		if (!indev)
+			continue;
+		for_ifa(indev)
+			if (is_iwarp_label(ifa->ifa_label)) {
+				PDBG("%s label %s addr 0x%x added\n",
+				     __FUNCTION__, ifa->ifa_label,
+				     ifa->ifa_address);
+				insert_ifa(rnicp, ifa);
+			}
+		endfor_ifa(indev);
+	}
+}
+
 static void rnic_init(struct iwch_dev *rnicp)
 {
 	PDBG("%s iwch_dev %p\n", __FUNCTION__,  rnicp);
@@ -70,6 +187,12 @@ static void rnic_init(struct iwch_dev *r
 	idr_init(&rnicp->qpidr);
 	idr_init(&rnicp->mmidr);
 	spin_lock_init(&rnicp->lock);
+	INIT_LIST_HEAD(&rnicp->addrlist);
+	INIT_LIST_HEAD(&rnicp->listen_eps);
+	mutex_init(&rnicp->mutex);
+	rnicp->nb.notifier_call = nb_callback;
+	populate_addrlist(rnicp);
+	register_inetaddr_notifier(&rnicp->nb);
 
 	rnicp->attr.vendor_id = 0x168;
 	rnicp->attr.vendor_part_id = 7;
@@ -148,6 +271,8 @@ static void close_rnic_dev(struct t3cdev
 	mutex_lock(&dev_mutex);
 	list_for_each_entry_safe(dev, tmp, &dev_list, entry) {
 		if (dev->rdev.t3cdev_p == tdev) {
+			unregister_inetaddr_notifier(&dev->nb);
+			delete_addrlist(dev);
 			list_del(&dev->entry);
 			iwch_unregister_device(dev);
 			cxio_rdev_close(&dev->rdev);
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index caf4e60..7fa0a47 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -36,6 +36,8 @@ #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/idr.h>
+#include <linux/notifier.h>
+#include <linux/inetdevice.h>
 
 #include <rdma/ib_verbs.h>
 
@@ -101,6 +103,11 @@ struct iwch_rnic_attributes {
 	u32 cq_overflow_detection;
 };
 
+struct iwch_addrlist {
+	struct list_head entry;
+	struct in_ifaddr *ifa;
+};
+
 struct iwch_dev {
 	struct ib_device ibdev;
 	struct cxio_rdev rdev;
@@ -111,6 +118,10 @@ struct iwch_dev {
 	struct idr mmidr;
 	spinlock_t lock;
 	struct list_head entry;
+	struct notifier_block nb;
+	struct list_head addrlist;
+	struct list_head listen_eps;
+	struct mutex mutex;
 };
 
 static inline struct iwch_dev *to_iwch_dev(struct ib_device *ibdev)
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 1cdfcd4..954069f 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1127,23 +1127,149 @@ static int act_open_rpl(struct t3cdev *t
 	return CPL_RET_BUF_DONE;
 }
 
-static int listen_start(struct iwch_listen_ep *ep)
+static int wait_for_reply(struct iwch_ep_common *epc)
+{
+	PDBG("%s ep %p waiting\n", __FUNCTION__, epc);
+	wait_event(epc->waitq, epc->rpl_done);
+	PDBG("%s ep %p done waiting err %d\n", __FUNCTION__, epc, epc->rpl_err);
+	return epc->rpl_err;
+}
+
+static struct iwch_listen_entry *alloc_listener(struct iwch_listen_ep *ep,
+						  __be32 addr)
+{
+	struct iwch_dev *h = to_iwch_dev(ep->com.cm_id->device);
+	struct iwch_listen_entry *le;
+
+	le = kmalloc(sizeof *le, GFP_KERNEL);
+	if (!le) {
+		printk(KERN_ERR MOD "%s - failed to alloc memory!\n",
+		       __FUNCTION__);
+		return NULL;
+	}
+	le->stid = cxgb3_alloc_stid(h->rdev.t3cdev_p,
+				    &t3c_client, ep);
+	if (le->stid == -1) {
+		printk(KERN_ERR MOD "%s - cannot alloc stid.\n",
+		       __FUNCTION__);
+		kfree(le);
+		return NULL;
+	}
+	le->addr = addr;
+	PDBG("%s stid %u addr %x port %x\n", __FUNCTION__, le->stid,
+	     ntohl(le->addr), ntohs(ep->com.local_addr.sin_port));
+	return le;
+}
+
+static void dealloc_listener(struct iwch_listen_ep *ep,
+			     struct iwch_listen_entry *le)
+{
+	PDBG("%s stid %u addr %x port %x\n", __FUNCTION__, le->stid,
+	     ntohl(le->addr), ntohs(ep->com.local_addr.sin_port));
+	cxgb3_free_stid(ep->com.tdev, le->stid);
+	kfree(le);
+}
+
+static void dealloc_listener_list(struct iwch_listen_ep *ep)
+{
+	struct iwch_listen_entry *le, *tmp;
+	struct iwch_dev *h = to_iwch_dev(ep->com.cm_id->device);
+
+	mutex_lock(&h->mutex);
+	list_for_each_entry_safe(le, tmp, &ep->listeners, entry) {
+		list_del_init(&le->entry);
+		dealloc_listener(ep, le);
+	}
+	mutex_unlock(&h->mutex);
+}
+
+static int alloc_listener_list(struct iwch_listen_ep *ep)
+{
+	struct iwch_dev *h = to_iwch_dev(ep->com.cm_id->device);
+	struct iwch_addrlist *addr;
+	struct iwch_listen_entry *le;
+	int err = 0;
+	int added=0;
+	mutex_lock(&h->mutex);
+	list_for_each_entry(addr, &h->addrlist, entry) {
+		if (ep->com.local_addr.sin_addr.s_addr == 0 ||
+		    ep->com.local_addr.sin_addr.s_addr ==
+		    addr->ifa->ifa_address) {
+			le = alloc_listener(ep, addr->ifa->ifa_address);
+			if (!le)
+				break;
+			list_add_tail(&le->entry, &ep->listeners);
+			added++;
+		}
+	}
+	mutex_unlock(&h->mutex);
+	if (ep->com.local_addr.sin_addr.s_addr != 0 && !added)
+		err = -EADDRNOTAVAIL;
+	if (!err && !added)
+		printk(KERN_WARNING MOD
+		       "No RDMA interface found for device %s\n",
+		       pci_name(h->rdev.rnic_info.pdev));
+	return err;
+}
+
+static int listen_stop_one(struct iwch_listen_ep  *ep, unsigned int stid)
 {
 	struct sk_buff *skb;
-	struct cpl_pass_open_req *req;
+	struct cpl_close_listserv_req *req;
+
+	PDBG("%s stid %u\n", __FUNCTION__, stid);
+	skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+	if (!skb) {
+		printk(KERN_ERR MOD "%s - failed to alloc skb\n", __FUNCTION__);
+		return -ENOMEM;
+	}
+	req = (struct cpl_close_listserv_req *) skb_put(skb, sizeof(*req));
+	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+	req->cpu_idx = 0;
+	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_LISTSRV_REQ, stid));
+	skb->priority = 1;
+	ep->com.rpl_err = 0;
+	ep->com.rpl_done = 0;
+	cxgb3_ofld_send(ep->com.tdev, skb);
+	return wait_for_reply(&ep->com);
+}
+
+static int listen_stop(struct iwch_listen_ep *ep)
+{
+	struct iwch_listen_entry *le;
+	struct iwch_dev *h = to_iwch_dev(ep->com.cm_id->device);
+	int err = 0;
 
 	PDBG("%s ep %p\n", __FUNCTION__, ep);
+	mutex_lock(&h->mutex);
+	list_for_each_entry(le, &ep->listeners, entry) {
+		err = listen_stop_one(ep, le->stid);
+		if (err)
+			break;
+	}
+	mutex_unlock(&h->mutex);
+	return err;
+}
+
+static int listen_start_one(struct iwch_listen_ep *ep, unsigned int stid,
+			    __be32 addr, __be16 port)
+{
+	struct sk_buff *skb;
+	struct cpl_pass_open_req *req;
+
+	PDBG("%s stid %u addr %x port %x\n", __FUNCTION__, stid, ntohl(addr),
+	     ntohs(port));
 	skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
 	if (!skb) {
-		printk(KERN_ERR MOD "t3c_listen_start failed to alloc skb!\n");
+		printk(KERN_ERR MOD "%s - failed to alloc skb\n", __FUNCTION__);
 		return -ENOMEM;
 	}
 
 	req = (struct cpl_pass_open_req *) skb_put(skb, sizeof(*req));
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
-	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_PASS_OPEN_REQ, ep->stid));
-	req->local_port = ep->com.local_addr.sin_port;
-	req->local_ip = ep->com.local_addr.sin_addr.s_addr;
+	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_PASS_OPEN_REQ, stid));
+	req->local_port = port;
+	req->local_ip = addr;
 	req->peer_port = 0;
 	req->peer_ip = 0;
 	req->peer_netmask = 0;
@@ -1152,8 +1278,32 @@ static int listen_start(struct iwch_list
 	req->opt1 = htonl(V_CONN_POLICY(CPL_CONN_POLICY_ASK));
 
 	skb->priority = 1;
+	ep->com.rpl_err = 0;
+	ep->com.rpl_done = 0;
 	cxgb3_ofld_send(ep->com.tdev, skb);
-	return 0;
+	return wait_for_reply(&ep->com);
+}
+
+static int listen_start(struct iwch_listen_ep *ep)
+{
+	struct iwch_listen_entry *le;
+	struct iwch_dev *h = to_iwch_dev(ep->com.cm_id->device);
+	int err = 0;
+
+	PDBG("%s ep %p\n", __FUNCTION__, ep);
+	mutex_lock(&h->mutex);
+	list_for_each_entry(le, &ep->listeners, entry) {
+		err = listen_start_one(ep, le->stid, le->addr,
+				 ep->com.local_addr.sin_port);
+		if (err)
+			goto fail;
+	}
+	mutex_unlock(&h->mutex);
+	return err;
+fail:
+	mutex_unlock(&h->mutex);
+	listen_stop(ep);
+	return err;
 }
 
 static int pass_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
@@ -1170,39 +1320,59 @@ static int pass_open_rpl(struct t3cdev *
 	return CPL_RET_BUF_DONE;
 }
 
-static int listen_stop(struct iwch_listen_ep *ep)
-{
-	struct sk_buff *skb;
-	struct cpl_close_listserv_req *req;
-
-	PDBG("%s ep %p\n", __FUNCTION__, ep);
-	skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
-	if (!skb) {
-		printk(KERN_ERR MOD "%s - failed to alloc skb\n", __FUNCTION__);
-		return -ENOMEM;
-	}
-	req = (struct cpl_close_listserv_req *) skb_put(skb, sizeof(*req));
-	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
-	req->cpu_idx = 0;
-	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_LISTSRV_REQ, ep->stid));
-	skb->priority = 1;
-	cxgb3_ofld_send(ep->com.tdev, skb);
-	return 0;
-}
-
 static int close_listsrv_rpl(struct t3cdev *tdev, struct sk_buff *skb,
 			     void *ctx)
 {
 	struct iwch_listen_ep *ep = ctx;
 	struct cpl_close_listserv_rpl *rpl = cplhdr(skb);
 
-	PDBG("%s ep %p\n", __FUNCTION__, ep);
+	PDBG("%s ep %p stid %u\n", __FUNCTION__, ep, GET_TID(rpl));
+
 	ep->com.rpl_err = status2errno(rpl->status);
 	ep->com.rpl_done = 1;
 	wake_up(&ep->com.waitq);
 	return CPL_RET_BUF_DONE;
 }
 
+void iwch_listeners_add_addr(struct iwch_dev *rnicp, __be32 addr)
+{
+	struct iwch_listen_ep *listen_ep;
+	struct iwch_listen_entry *le;
+
+	mutex_lock(&rnicp->mutex);
+	list_for_each_entry(listen_ep, &rnicp->listen_eps, entry) {
+		if (listen_ep->com.local_addr.sin_addr.s_addr)
+			continue;
+		le = alloc_listener(listen_ep, addr);
+		if (le) {
+			list_add_tail(&le->entry, &listen_ep->listeners);
+			listen_start_one(listen_ep, le->stid, addr,
+					 listen_ep->com.local_addr.sin_port);
+		}
+	}
+	mutex_unlock(&rnicp->mutex);
+}
+
+void iwch_listeners_del_addr(struct iwch_dev *rnicp, __be32 addr)
+{
+	struct iwch_listen_ep *listen_ep;
+	struct iwch_listen_entry *le, *tmp;
+
+	mutex_lock(&rnicp->mutex);
+	list_for_each_entry(listen_ep, &rnicp->listen_eps, entry) {
+		if (listen_ep->com.local_addr.sin_addr.s_addr)
+			continue;
+		list_for_each_entry_safe(le, tmp, &listen_ep->listeners,
+					 entry)
+			if (le->addr == addr) {
+				listen_stop_one(listen_ep, le->stid);
+				list_del_init(&le->entry);
+				dealloc_listener(listen_ep, le);
+			}
+	}
+	mutex_unlock(&rnicp->mutex);
+}
+
 static void accept_cr(struct iwch_ep *ep, __be32 peer_ip, struct sk_buff *skb)
 {
 	struct cpl_pass_accept_rpl *rpl;
@@ -1767,8 +1937,7 @@ int iwch_accept_cr(struct iw_cm_id *cm_i
 		goto err;
 
 	/* wait for wr_ack */
-	wait_event(ep->com.waitq, ep->com.rpl_done);
-	err = ep->com.rpl_err;
+	err = wait_for_reply(&ep->com);
 	if (err)
 		goto err;
 
@@ -1887,31 +2056,23 @@ int iwch_create_listen(struct iw_cm_id *
 	ep->com.cm_id = cm_id;
 	ep->backlog = backlog;
 	ep->com.local_addr = cm_id->local_addr;
+	INIT_LIST_HEAD(&ep->listeners);
 
-	/*
-	 * Allocate a server TID.
-	 */
-	ep->stid = cxgb3_alloc_stid(h->rdev.t3cdev_p, &t3c_client, ep);
-	if (ep->stid == -1) {
-		printk(KERN_ERR MOD "%s - cannot alloc atid.\n", __FUNCTION__);
-		err = -ENOMEM;
+	err = alloc_listener_list(ep);
+	if (err)
 		goto fail2;
-	}
 
 	state_set(&ep->com, LISTEN);
 	err = listen_start(ep);
-	if (err)
-		goto fail3;
 
-	/* wait for pass_open_rpl */
-	wait_event(ep->com.waitq, ep->com.rpl_done);
-	err = ep->com.rpl_err;
 	if (!err) {
 		cm_id->provider_data = ep;
+		mutex_lock(&h->mutex);
+		list_add_tail(&ep->entry, &h->listen_eps);
+		mutex_unlock(&h->mutex);
 		goto out;
 	}
-fail3:
-	cxgb3_free_stid(ep->com.tdev, ep->stid);
+	dealloc_listener_list(ep);
 fail2:
 	cm_id->rem_ref(cm_id);
 	put_ep(&ep->com);
@@ -1923,18 +2084,20 @@ out:
 int iwch_destroy_listen(struct iw_cm_id *cm_id)
 {
 	int err;
+	struct iwch_dev *h = to_iwch_dev(cm_id->device);
 	struct iwch_listen_ep *ep = to_listen_ep(cm_id);
 
 	PDBG("%s ep %p\n", __FUNCTION__, ep);
 
 	might_sleep();
+	mutex_lock(&h->mutex);
+	list_del_init(&ep->entry);
+	mutex_unlock(&h->mutex);
 	state_set(&ep->com, DEAD);
 	ep->com.rpl_done = 0;
 	ep->com.rpl_err = 0;
 	err = listen_stop(ep);
-	wait_event(ep->com.waitq, ep->com.rpl_done);
-	cxgb3_free_stid(ep->com.tdev, ep->stid);
-	err = ep->com.rpl_err;
+	dealloc_listener_list(ep);
 	cm_id->rem_ref(cm_id);
 	put_ep(&ep->com);
 	return err;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
index 6107e7c..23e5a22 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -162,10 +162,19 @@ struct iwch_ep_common {
 	int rpl_err;
 };
 
-struct iwch_listen_ep {
-	struct iwch_ep_common com;
+struct iwch_listen_entry {
+	struct list_head entry;
 	unsigned int stid;
+	__be32 addr;
+};
+
+struct iwch_listen_ep {
+	struct iwch_ep_common com;	/* Must be first entry! */
+	struct list_head entry;
+	struct list_head listeners;
 	int backlog;
+	int listen_count;
+	int listen_rpls;
 };
 
 struct iwch_ep {
@@ -222,6 +231,8 @@ int iwch_resume_tid(struct iwch_ep *ep);
 void __free_ep(struct kref *kref);
 void iwch_rearp(struct iwch_ep *ep);
 int iwch_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new, struct l2t_entry *l2t);
+void iwch_listeners_add_addr(struct iwch_dev *rnicp, __be32 addr);
+void iwch_listeners_del_addr(struct iwch_dev *rnicp, __be32 addr);
 
 int __init iwch_cm_init(void);
 void __exit iwch_cm_term(void);

^ permalink raw reply related

* Network Namespace status
From: Eric W. Biederman @ 2007-09-13 19:12 UTC (permalink / raw)
  To: Linux Containers
  Cc: netdev, David Miller, Tejun Heo, Greg Kroah-Hartman,
	Andrew Morton


Now that the network namespace work is partly merged I figure
a short status summary of where everything is at is in order.

David Miller has merged the core of the network namespace work
and that probably needs to sit just a little while to make certain
we don't have unexpected breakage.

Before enabling multiple instances of the network namespace
it is necessary to sort through a few last user interface issues.

In Greg KH's tree there is work from Tejun and myself that decouples
the sysfs dentry tree from the kobject tree, and Tejun is actively
working on completing that decoupling.  From the current sysfs state
it takes just a handful of patches to support multiple super_blocks
each displaying the network devices for a different network namespace.
And the last round of patches that did that Tejun and I almost agree
upon.  That support is needed before we can allow network devices
to exist in anything except the initial network namespace.

In Andrew's tree there is the start of my sysctl cleanup.  Basically
just an additional sanity check in register_sysctl_table and a bunch
of fixes to avoid the errors that sanity check has found.  Pending
I have a few more general cleanups and code to support multiple
network namespaces.  Last we talked Andrew said I have sent
him enough sysctl changes for now, and to wait until after the
merge window before sending more.

The proc support in the net-2.6.24 tree is reasonable from the
direction of the networking code.  Currently I am looking at
"current->net_ns" and resolving /proc/net based upon that.  Long term
we want to refactor that code so that "current->net_ns" is captured
when we mount /proc.  So the network namespace state can be monitored
from outside applications, and so that we aren't playing dangerous
games with the vfs dentry trees.

The final blocker to having multiple useful instances of network
namespaces is the loopback device.  We recognize the network namespace
of incoming packets by looking at dev->nd_net.  Which means for
packets to properly loopback within a network namespace we need a
loopback device per network namespace.  There were some concerns
expressed when we posted the cleanup part of the patches that allowed
for multiple loopback devices a few weeks ago so resolving this one
may be tricky.


Looking into my patch queue I have:
5 patches for cleaning up and making a per network namespace loopback device.
4 patches for making rtnetlink message processing per network namespace
1 patch for making AF_UNIX per network namespace
1 patch for making AF_PACKET per network namespace

The ipv4 part of my patchset is currently working but it needs some
more cleanup and reordering of patches before it is ready to go anywhere.
Nothing has been done for ipv6, but the changes should very much parallel
ipv4.

The other protocols I haven't even looked at yet.

Eric

^ permalink raw reply

* Re: [PATCH] Add IP1000A Driver
From: Francois Romieu @ 2007-09-13 19:02 UTC (permalink / raw)
  To: =?unknown-8bit?B?6buD5bu66IiILUplc3Nl?=
  Cc: jeff, akpm, netdev, Stephen Hemminger
In-Reply-To: <AA68EB0EBA29BA40A06B700C33343EEF01901340@fileserver.icplus.com.tw>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 147 bytes --]

黃建興-Jesse <Jesse@icplus.com.tw> :
[...]
> I wish to list three people you, me and, my leader Sorbica in this file.

Yes.

-- 
Ueimor

^ permalink raw reply

* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Steve Wise @ 2007-09-13 18:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, Roland Dreier, linux-kernel, general
In-Reply-To: <46E987E0.2010605@garzik.org>



Jeff Garzik wrote:
> Steve Wise wrote:
>> I was about to post v2 of my patch to avoid port space collisions with 
>> the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
>> I've tried to solicit review on it, but I think folks are reluctant... 
>> ;-)
> 
> Well, if it involves /sharing/ port space with the native stack, i.e. 
> where port 1234 is IB but 1235 is Linux, pretty much all the networking 
> devs have NAK'd that approach AFAICS.
> 

Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
device will use its own ip address and subnet to avoid collisions.  You 
should review the patch when I post v2.

Thanks,

Steve.

^ permalink raw reply

* Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
From: Jeff Garzik @ 2007-09-13 18:56 UTC (permalink / raw)
  To: Steve Wise; +Cc: Roland Dreier, general, linux-kernel, netdev
In-Reply-To: <46E97BB0.9030106@opengridcomputing.com>

Steve Wise wrote:
> I was about to post v2 of my patch to avoid port space collisions with 
> the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
> I've tried to solicit review on it, but I think folks are reluctant... ;-)

Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the networking 
devs have NAK'd that approach AFAICS.

	Jeff




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox