netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATH 0/2] Add RCU locking to SCTP address management
@ 2007-09-10 19:46 Vlad Yasevich
  2007-09-10 19:46 ` [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list Vlad Yasevich
  2007-09-10 19:46 ` [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU Vlad Yasevich
  0 siblings, 2 replies; 10+ messages in thread
From: Vlad Yasevich @ 2007-09-10 19:46 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers

Hi All

This is my first attempt to add RCU synchronization to pieces of SCTP
and I want to make sure I do this right.

The RCU docs a somewhat outdated, and the calling conventions differ
between subsystems, so I am using what I've been able to find.

A bit of a background...

The whole problem started with a panic that led me to NULL deref while
walking a global list of addresses that SCTP maitains.  That list
is modified curing the NETDEV_UP and NETDEV_DOWN notifier processing
and all readers are run in user context.  It looks like the list was
modified while a user app was walking it and we crashed.

Easy enough to fix by adding locking.  However, some notifiers
(ipv6 addrconf) run in atomic context and it says that they can't sleep.
(Q1.  Why are ipv6 notifiers atomic, while IPv4 notifires are blocking?)

So, I decided to add RCU locking around that list.  These events are
already synchronized, so no additional locking on the writer side are need,
so should be make for a nice RCU conversion.

While doing this conversion, I saw that the same structures are used to
maintain a list of bound addresses.  This seemed like another opportunity
to use RCU.  In this case, the readers are wide spread, but there are
only 2 wirters: BH processing of specific chunks, and bind() calls.
Again the writers are already synchronized on the socket lock, so they
will keep out of each others way.  Readers are widespread, but rcu takes
care of that nicely.

So, can folks please take a look and make sure I didn't mess up the
rcu conventions.

Thanks a lot
-vlad

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
  2007-09-10 19:46 [RFC PATH 0/2] Add RCU locking to SCTP address management Vlad Yasevich
@ 2007-09-10 19:46 ` Vlad Yasevich
  2007-09-10 21:47   ` Paul E. McKenney
  2007-09-10 19:46 ` [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU Vlad Yasevich
  1 sibling, 1 reply; 10+ messages in thread
From: Vlad Yasevich @ 2007-09-10 19:46 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers, Vlad Yasevich

sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers.  As a result,
the readers can access an entry that has been freed and
crash the sytem.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 include/net/sctp/sctp.h    |    1 +
 include/net/sctp/structs.h |    2 +
 net/sctp/bind_addr.c       |    2 +
 net/sctp/ipv6.c            |   33 ++++++++++++++++++++--------
 net/sctp/protocol.c        |   50 ++++++++++++++++++++++++++++++-------------
 net/sctp/socket.c          |   38 ++++++++++++++++++++++-----------
 6 files changed, 88 insertions(+), 38 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d529045..c9cc00c 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -123,6 +123,7 @@
  * sctp/protocol.c
  */
 extern struct sock *sctp_get_ctl_sock(void);
+extern void sctp_local_addr_free(struct rcu_head *head);
 extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
 				     sctp_scope_t, gfp_t gfp,
 				     int flags);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..2591c49 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
 /* This is a structure for holding either an IPv6 or an IPv4 address.  */
 struct sctp_sockaddr_entry {
 	struct list_head list;
+	struct rcu_head	rcu;
 	union sctp_addr a;
 	__u8 use_as_src;
+	__u8 valid;
 };
 
 typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index fdb287a..7fc369f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
 		addr->a.v4.sin_port = htons(bp->port);
 
 	addr->use_as_src = use_as_src;
+	addr->valid = 1;
 
 	INIT_LIST_HEAD(&addr->list);
+	INIT_RCU_HEAD(&addr->rcu);
 	list_add_tail(&addr->list, &bp->address_list);
 	SCTP_DBG_OBJCNT_INC(addr);
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f8aa23d..fc2e4e2 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -77,13 +77,18 @@
 
 #include <asm/uaccess.h>
 
-/* Event handler for inet6 address addition/deletion events.  */
+/* Event handler for inet6 address addition/deletion events.
+ * This even is part of the atomic notifier call chain
+ * and thus happens atomically and can NOT sleep.  As a result
+ * we can't and really don't need to add any locks to guard the
+ * RCU.
+ */
 static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
 				void *ptr)
 {
 	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
-	struct sctp_sockaddr_entry *addr;
-	struct list_head *pos, *temp;
+	struct sctp_sockaddr_entry *addr = NULL;
+	struct sctp_sockaddr_entry *temp;
 
 	switch (ev) {
 	case NETDEV_UP:
@@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
 			memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
 				 sizeof(struct in6_addr));
 			addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
-			list_add_tail(&addr->list, &sctp_local_addr_list);
+			addr->valid = 1;
+			rcu_read_lock();
+			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			rcu_read_unlock();
 		}
 		break;
 	case NETDEV_DOWN:
-		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
-			if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
-				list_del(pos);
-				kfree(addr);
+		rcu_read_lock();
+		list_for_each_entry_safe(addr, temp,
+					&sctp_local_addr_list, list) {
+			if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
+					     &ifa->addr)) {
+				addr->valid = 0;
+				list_del_rcu(&addr->list);
 				break;
 			}
 		}
-
+		rcu_read_unlock();
+		if (addr && !addr->valid)
+			call_rcu(&addr->rcu, sctp_local_addr_free);
 		break;
 	}
 
@@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
 			addr->a.v6.sin6_addr = ifp->addr;
 			addr->a.v6.sin6_scope_id = dev->ifindex;
 			INIT_LIST_HEAD(&addr->list);
+			INIT_RCU_HEAD(&addr->rcu);
 			list_add_tail(&addr->list, addrlist);
 		}
 	}
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index e98579b..ac52f9e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -153,6 +153,7 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
 			addr->a.v4.sin_family = AF_INET;
 			addr->a.v4.sin_port = 0;
 			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
+			INIT_RCU_HEAD(&addr->rcu);
 			list_add_tail(&addr->list, addrlist);
 		}
 	}
@@ -192,16 +193,24 @@ static void sctp_free_local_addr_list(void)
 	}
 }
 
+void sctp_local_addr_free(struct rcu_head *head)
+{
+	struct sctp_sockaddr_entry *e = container_of(head,
+				struct sctp_sockaddr_entry, rcu);
+	kfree(e);
+}
+
 /* Copy the local addresses which are valid for 'scope' into 'bp'.  */
 int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
 			      gfp_t gfp, int copy_flags)
 {
 	struct sctp_sockaddr_entry *addr;
 	int error = 0;
-	struct list_head *pos, *temp;
 
-	list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+		if (!addr->valid)
+			continue;
 		if (sctp_in_scope(&addr->a, scope)) {
 			/* Now that the address is in scope, check to see if
 			 * the address type is really supported by the local
@@ -221,6 +230,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
 	}
 
 end_copy:
+	rcu_read_unlock();
 	return error;
 }
 
@@ -600,13 +610,17 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
 	seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
 }
 
-/* Event handler for inet address addition/deletion events.  */
+/* Event handler for inet address addition/deletion events.
+ * This is part of the blocking notifier call chain that is
+ * guarted by a mutex.  As a result we don't need to add
+ * any additional guards for the RCU
+ */
 static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
 			       void *ptr)
 {
 	struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
-	struct sctp_sockaddr_entry *addr;
-	struct list_head *pos, *temp;
+	struct sctp_sockaddr_entry *addr = NULL;
+	struct sctp_sockaddr_entry *temp;
 
 	switch (ev) {
 	case NETDEV_UP:
@@ -615,19 +629,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
 			addr->a.v4.sin_family = AF_INET;
 			addr->a.v4.sin_port = 0;
 			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
-			list_add_tail(&addr->list, &sctp_local_addr_list);
+			addr->valid = 1;
+			rcu_read_lock();
+			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			rcu_read_unlock();
 		}
 		break;
 	case NETDEV_DOWN:
-		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+		rcu_read_lock();
+		list_for_each_entry_safe(addr, temp,
+					&sctp_local_addr_list, list) {
 			if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
-				list_del(pos);
-				kfree(addr);
+				addr->valid = 0;
+				list_del_rcu(&addr->list);
 				break;
 			}
 		}
-
+		rcu_read_unlock();
+		if (addr && !addr->valid)
+			call_rcu(&addr->rcu, sctp_local_addr_free);
 		break;
 	}
 
@@ -1227,6 +1247,9 @@ SCTP_STATIC __exit void sctp_exit(void)
 	sctp_v6_del_protocol();
 	inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
 
+	/* Unregister notifier for inet address additions/deletions. */
+	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
+
 	/* Free the local address list.  */
 	sctp_free_local_addr_list();
 
@@ -1240,9 +1263,6 @@ SCTP_STATIC __exit void sctp_exit(void)
 	inet_unregister_protosw(&sctp_stream_protosw);
 	inet_unregister_protosw(&sctp_seqpacket_protosw);
 
-	/* Unregister notifier for inet address additions/deletions. */
-	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
-
 	sctp_sysctl_unregister();
 	list_del(&sctp_ipv4_specific.list);
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3335460..a3acf78 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 					       int __user *optlen)
 {
 	sctp_assoc_t id;
+	struct list_head *pos;
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
-	struct list_head *pos, *temp;
 	struct sctp_sockaddr_entry *addr;
 	rwlock_t *addr_lock;
 	int cnt = 0;
@@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 		addr = list_entry(bp->address_list.next,
 				  struct sctp_sockaddr_entry, list);
 		if (sctp_is_any(&addr->a)) {
-			list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-				addr = list_entry(pos,
-						  struct sctp_sockaddr_entry,
-						  list);
+			rcu_read_lock();
+			list_for_each_entry_rcu(addr,
+						&sctp_local_addr_list, list) {
+				if (!addr->valid)
+					continue;
+
 				if ((PF_INET == sk->sk_family) &&
 				    (AF_INET6 == addr->a.sa.sa_family))
 					continue;
+
 				cnt++;
 			}
+			rcu_read_unlock();
 		} else {
 			cnt = 1;
 		}
@@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
 					int max_addrs, void *to,
 					int *bytes_copied)
 {
-	struct list_head *pos, *next;
 	struct sctp_sockaddr_entry *addr;
 	union sctp_addr temp;
 	int cnt = 0;
 	int addrlen;
 
-	list_for_each_safe(pos, next, &sctp_local_addr_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+		if (!addr->valid)
+			continue;
+
 		if ((PF_INET == sk->sk_family) &&
 		    (AF_INET6 == addr->a.sa.sa_family))
 			continue;
@@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
 		cnt ++;
 		if (cnt >= max_addrs) break;
 	}
+	rcu_read_unlock();
 
 	return cnt;
 }
@@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
 static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
 			    size_t space_left, int *bytes_copied)
 {
-	struct list_head *pos, *next;
 	struct sctp_sockaddr_entry *addr;
 	union sctp_addr temp;
 	int cnt = 0;
 	int addrlen;
 
-	list_for_each_safe(pos, next, &sctp_local_addr_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+		if (!addr->valid)
+			continue;
+
 		if ((PF_INET == sk->sk_family) &&
 		    (AF_INET6 == addr->a.sa.sa_family))
 			continue;
@@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
 		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
 								&temp);
 		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
-		if (space_left < addrlen)
-			return -ENOMEM;
+		if (space_left < addrlen) {
+			cnt =  -ENOMEM;
+			break;
+		}
 		memcpy(to, &temp, addrlen);
 
 		to += addrlen;
@@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
 		space_left -= addrlen;
 		*bytes_copied += addrlen;
 	}
+	rcu_read_unlock();
 
 	return cnt;
 }
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
  2007-09-10 19:46 [RFC PATH 0/2] Add RCU locking to SCTP address management Vlad Yasevich
  2007-09-10 19:46 ` [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list Vlad Yasevich
@ 2007-09-10 19:46 ` Vlad Yasevich
  2007-09-10 22:08   ` Paul E. McKenney
  1 sibling, 1 reply; 10+ messages in thread
From: Vlad Yasevich @ 2007-09-10 19:46 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers, Vlad Yasevich

Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU.  This includes the
sctp_bind_addrs structure and it's list of bound addresses.

This list is currently protected by an external rw_lock and that
looks like an overkill.  There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other.  These are also relatively rare, so we
should be good with RCU.

The readers are varied and they are easily converted to RCU.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 include/net/sctp/structs.h |    3 -
 net/sctp/associola.c       |   14 +------
 net/sctp/bind_addr.c       |   59 ++++++++++++++++++----------
 net/sctp/endpointola.c     |   26 ++++---------
 net/sctp/ipv6.c            |   12 ++---
 net/sctp/protocol.c        |   25 +++++-------
 net/sctp/sm_make_chunk.c   |   17 +++-----
 net/sctp/socket.c          |   92 ++++++++++++++------------------------------
 8 files changed, 97 insertions(+), 151 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2591c49..1d46f7d 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1222,9 +1222,6 @@ struct sctp_ep_common {
 	 * bind_addr.address_list is our set of local IP addresses.
 	 */
 	struct sctp_bind_addr bind_addr;
-
-	/* Protection during address list comparisons. */
-	rwlock_t   addr_lock;
 };
 
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 2ad1caf..9bad8ba 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 
 	/* Initialize the bind addr area.  */
 	sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
-	rwlock_init(&asoc->base.addr_lock);
 
 	asoc->state = SCTP_STATE_CLOSED;
 
@@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
 {
 	struct sctp_transport *transport;
 
-	sctp_read_lock(&asoc->base.addr_lock);
-
 	if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
 	    (htons(asoc->peer.port) == paddr->v4.sin_port)) {
 		transport = sctp_assoc_lookup_paddr(asoc, paddr);
@@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
 	transport = NULL;
 
 out:
-	sctp_read_unlock(&asoc->base.addr_lock);
 	return transport;
 }
 
@@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc,
 int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
 			    const union sctp_addr *laddr)
 {
-	int found;
+	int found = 0;
 
-	sctp_read_lock(&asoc->base.addr_lock);
 	if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
 	    sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
-				 sctp_sk(asoc->base.sk))) {
+				 sctp_sk(asoc->base.sk)))
 		found = 1;
-		goto out;
-	}
 
-	found = 0;
-out:
-	sctp_read_unlock(&asoc->base.addr_lock);
 	return found;
 }
 
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 7fc369f..9c7db1f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
 
 	INIT_LIST_HEAD(&addr->list);
 	INIT_RCU_HEAD(&addr->rcu);
-	list_add_tail(&addr->list, &bp->address_list);
+
+	rcu_read_lock();
+	list_add_tail_rcu(&addr->list, &bp->address_list);
+	rcu_read_unlock();
 	SCTP_DBG_OBJCNT_INC(addr);
 
 	return 0;
@@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
  */
 int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
 {
-	struct list_head *pos, *temp;
-	struct sctp_sockaddr_entry *addr;
+	struct sctp_sockaddr_entry *addr, *temp;
 
-	list_for_each_safe(pos, temp, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock_bh();
+	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
 		if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
 			/* Found the exact match. */
-			list_del(pos);
-			kfree(addr);
-			SCTP_DBG_OBJCNT_DEC(addr);
-
-			return 0;
+			addr->valid = 0;
+			list_del_rcu(&addr->list);
+			break;
 		}
 	}
+	rcu_read_unlock_bh();
+
+	if (addr && !addr->valid) {
+		call_rcu_bh(&addr->rcu, sctp_local_addr_free);
+		SCTP_DBG_OBJCNT_DEC(addr);
+	}
 
 	return -EINVAL;
 }
@@ -302,15 +308,20 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
 			 struct sctp_sock *opt)
 {
 	struct sctp_sockaddr_entry *laddr;
-	struct list_head *pos;
-
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
-		if (opt->pf->cmp_addr(&laddr->a, addr, opt))
-			return 1;
+	int match = 0;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
+		if (opt->pf->cmp_addr(&laddr->a, addr, opt)) {
+			match = 1;
+			break;
+		}
 	}
+	rcu_read_unlock();
 
-	return 0;
+	return match;
 }
 
 /* Find the first address in the bind address list that is not present in
@@ -325,27 +336,31 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
 	union sctp_addr			*addr;
 	void 				*addr_buf;
 	struct sctp_af			*af;
-	struct list_head		*pos;
 	int				i;
 
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
 
 		addr_buf = (union sctp_addr *)addrs;
 		for (i = 0; i < addrcnt; i++) {
 			addr = (union sctp_addr *)addr_buf;
 			af = sctp_get_af_specific(addr->v4.sin_family);
 			if (!af)
-				return NULL;
+				break;
 
 			if (opt->pf->cmp_addr(&laddr->a, addr, opt))
 				break;
 
 			addr_buf += af->sockaddr_len;
 		}
-		if (i == addrcnt)
+		if (i == addrcnt) {
+			rcu_read_unlock();
 			return &laddr->a;
+		}
 	}
+	rcu_read_unlock();
 
 	return NULL;
 }
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 1404a9e..fa10af5 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -92,7 +92,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep,
 
 	/* Initialize the bind addr area */
 	sctp_bind_addr_init(&ep->base.bind_addr, 0);
-	rwlock_init(&ep->base.addr_lock);
 
 	/* Remember who we are attached to.  */
 	ep->base.sk = sk;
@@ -225,21 +224,14 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
 struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
 					       const union sctp_addr *laddr)
 {
-	struct sctp_endpoint *retval;
+	struct sctp_endpoint *retval = NULL;
 
-	sctp_read_lock(&ep->base.addr_lock);
 	if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
 		if (sctp_bind_addr_match(&ep->base.bind_addr, laddr,
-					 sctp_sk(ep->base.sk))) {
+					 sctp_sk(ep->base.sk)))
 			retval = ep;
-			goto out;
-		}
 	}
 
-	retval = NULL;
-
-out:
-	sctp_read_unlock(&ep->base.addr_lock);
 	return retval;
 }
 
@@ -261,9 +253,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
 	list_for_each(pos, &ep->asocs) {
 		asoc = list_entry(pos, struct sctp_association, asocs);
 		if (rport == asoc->peer.port) {
-			sctp_read_lock(&asoc->base.addr_lock);
 			*transport = sctp_assoc_lookup_paddr(asoc, paddr);
-			sctp_read_unlock(&asoc->base.addr_lock);
 
 			if (*transport)
 				return asoc;
@@ -295,20 +285,20 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
 int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
 				const union sctp_addr *paddr)
 {
-	struct list_head *pos;
 	struct sctp_sockaddr_entry *addr;
 	struct sctp_bind_addr *bp;
 
-	sctp_read_lock(&ep->base.addr_lock);
 	bp = &ep->base.bind_addr;
-	list_for_each(pos, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &bp->address_list, list) {
+		if (!addr->valid)
+			continue;
 		if (sctp_has_association(&addr->a, paddr)) {
-			sctp_read_unlock(&ep->base.addr_lock);
+			rcu_read_unlock();
 			return 1;
 		}
 	}
-	sctp_read_unlock(&ep->base.addr_lock);
+	rcu_read_unlock();
 
 	return 0;
 }
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index fc2e4e2..4f6dc55 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -302,9 +302,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
 			      union sctp_addr *saddr)
 {
 	struct sctp_bind_addr *bp;
-	rwlock_t *addr_lock;
 	struct sctp_sockaddr_entry *laddr;
-	struct list_head *pos;
 	sctp_scope_t scope;
 	union sctp_addr *baddr = NULL;
 	__u8 matchlen = 0;
@@ -324,14 +322,14 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
 	scope = sctp_scope(daddr);
 
 	bp = &asoc->base.bind_addr;
-	addr_lock = &asoc->base.addr_lock;
 
 	/* Go through the bind address list and find the best source address
 	 * that matches the scope of the destination address.
 	 */
-	sctp_read_lock(addr_lock);
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
 		if ((laddr->use_as_src) &&
 		    (laddr->a.sa.sa_family == AF_INET6) &&
 		    (scope <= sctp_scope(&laddr->a))) {
@@ -353,7 +351,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
 		       __FUNCTION__, asoc, NIP6(daddr->v6.sin6_addr));
 	}
 
-	sctp_read_unlock(addr_lock);
+	rcu_read_unlock();
 }
 
 /* Make a copy of all potential local addresses. */
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index ac52f9e..a1030ed 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -222,7 +222,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
 			      (copy_flags & SCTP_ADDR6_ALLOWED) &&
 			      (copy_flags & SCTP_ADDR6_PEERSUPP)))) {
 				error = sctp_add_bind_addr(bp, &addr->a, 1,
-							   GFP_ATOMIC);
+						    GFP_ATOMIC);
 				if (error)
 					goto end_copy;
 			}
@@ -426,9 +426,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 	struct rtable *rt;
 	struct flowi fl;
 	struct sctp_bind_addr *bp;
-	rwlock_t *addr_lock;
 	struct sctp_sockaddr_entry *laddr;
-	struct list_head *pos;
 	struct dst_entry *dst = NULL;
 	union sctp_addr dst_saddr;
 
@@ -457,23 +455,20 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 		goto out;
 
 	bp = &asoc->base.bind_addr;
-	addr_lock = &asoc->base.addr_lock;
 
 	if (dst) {
 		/* Walk through the bind address list and look for a bind
 		 * address that matches the source address of the returned dst.
 		 */
-		sctp_read_lock(addr_lock);
-		list_for_each(pos, &bp->address_list) {
-			laddr = list_entry(pos, struct sctp_sockaddr_entry,
-					   list);
-			if (!laddr->use_as_src)
+		rcu_read_lock();
+		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+			if (!laddr->valid || !laddr->use_as_src)
 				continue;
 			sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
 			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
 				goto out_unlock;
 		}
-		sctp_read_unlock(addr_lock);
+		rcu_read_unlock();
 
 		/* None of the bound addresses match the source address of the
 		 * dst. So release it.
@@ -485,10 +480,10 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 	/* Walk through the bind address list and try to get a dst that
 	 * matches a bind address as the source address.
 	 */
-	sctp_read_lock(addr_lock);
-	list_for_each(pos, &bp->address_list) {
-		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
-
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid)
+			continue;
 		if ((laddr->use_as_src) &&
 		    (AF_INET == laddr->a.sa.sa_family)) {
 			fl.fl4_src = laddr->a.v4.sin_addr.s_addr;
@@ -500,7 +495,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 	}
 
 out_unlock:
-	sctp_read_unlock(addr_lock);
+	rcu_read_unlock();
 out:
 	if (dst)
 		SCTP_DEBUG_PRINTK("rt_dst:%u.%u.%u.%u, rt_src:%u.%u.%u.%u\n",
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..caaa29f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1531,7 +1531,7 @@ no_hmac:
 	/* Also, add the destination address. */
 	if (list_empty(&retval->base.bind_addr.address_list)) {
 		sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
-				   GFP_ATOMIC);
+				GFP_ATOMIC);
 	}
 
 	retval->next_tsn = retval->c.initial_tsn;
@@ -2613,22 +2613,17 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
 
 	switch (asconf_param->param_hdr.type) {
 	case SCTP_PARAM_ADD_IP:
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
-		list_for_each(pos, &bp->address_list) {
-			saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+		rcu_read_lock_bh();
+		list_for_each_entry_rcu(saddr, &bp->address_list, list) {
+			if (!saddr->valid)
+				continue;
 			if (sctp_cmp_addr_exact(&saddr->a, &addr))
 				saddr->use_as_src = 1;
 		}
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
+		rcu_read_unlock_bh();
 		break;
 	case SCTP_PARAM_DEL_IP:
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
 		retval = sctp_del_bind_addr(bp, &addr);
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
 		list_for_each(pos, &asoc->peer.transport_addr_list) {
 			transport = list_entry(pos, struct sctp_transport,
 						 transports);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a3acf78..35cc30c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
 	if (!bp->port)
 		bp->port = inet_sk(sk)->num;
 
-	/* Add the address to the bind address list.  */
-	sctp_local_bh_disable();
-	sctp_write_lock(&ep->base.addr_lock);
-
-	/* Use GFP_ATOMIC since BHs are disabled.  */
+	/* Add the address to the bind address list.
+	 * Use GFP_ATOMIC since BHs will be disabled.
+	 */
 	ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
-	sctp_write_unlock(&ep->base.addr_lock);
-	sctp_local_bh_enable();
 
 	/* Copy back into socket for getsockname() use. */
 	if (!ret) {
@@ -497,7 +493,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 	void				*addr_buf;
 	struct sctp_af			*af;
 	struct list_head		*pos;
-	struct list_head		*p;
 	int 				i;
 	int 				retval = 0;
 
@@ -544,14 +539,15 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 		if (i < addrcnt)
 			continue;
 
-		/* Use the first address in bind addr list of association as
-		 * Address Parameter of ASCONF CHUNK.
+		/* Use the first valid address in bind addr list of
+		 * association as Address Parameter of ASCONF CHUNK.
 		 */
-		sctp_read_lock(&asoc->base.addr_lock);
 		bp = &asoc->base.bind_addr;
-		p = bp->address_list.next;
-		laddr = list_entry(p, struct sctp_sockaddr_entry, list);
-		sctp_read_unlock(&asoc->base.addr_lock);
+		rcu_read_lock();
+		list_for_each_entry_rcu(laddr, &bp->address_list, list)
+			if (laddr->valid)
+				break;
+		rcu_read_unlock();
 
 		chunk = sctp_make_asconf_update_ip(asoc, &laddr->a, addrs,
 						   addrcnt, SCTP_PARAM_ADD_IP);
@@ -567,8 +563,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 		/* Add the new addresses to the bind address list with
 		 * use_as_src set to 0.
 		 */
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
 		addr_buf = addrs;
 		for (i = 0; i < addrcnt; i++) {
 			addr = (union sctp_addr *)addr_buf;
@@ -578,8 +572,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 						    GFP_ATOMIC);
 			addr_buf += af->sockaddr_len;
 		}
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
 	}
 
 out:
@@ -651,14 +643,8 @@ static int sctp_bindx_rem(struct sock *sk, struct sockaddr *addrs, int addrcnt)
 		 * socket routing and failover schemes. Refer to comments in
 		 * sctp_do_bind(). -daisy
 		 */
-		sctp_local_bh_disable();
-		sctp_write_lock(&ep->base.addr_lock);
-
 		retval = sctp_del_bind_addr(bp, sa_addr);
 
-		sctp_write_unlock(&ep->base.addr_lock);
-		sctp_local_bh_enable();
-
 		addr_buf += af->sockaddr_len;
 err_bindx_rem:
 		if (retval < 0) {
@@ -748,11 +734,9 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		 * make sure that we do not delete all the addresses in the
 		 * association.
 		 */
-		sctp_read_lock(&asoc->base.addr_lock);
 		bp = &asoc->base.bind_addr;
 		laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
 					       addrcnt, sp);
-		sctp_read_unlock(&asoc->base.addr_lock);
 		if (!laddr)
 			continue;
 
@@ -766,23 +750,18 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		/* Reset use_as_src flag for the addresses in the bind address
 		 * list that are to be deleted.
 		 */
-		sctp_local_bh_disable();
-		sctp_write_lock(&asoc->base.addr_lock);
 		addr_buf = addrs;
 		for (i = 0; i < addrcnt; i++) {
 			laddr = (union sctp_addr *)addr_buf;
 			af = sctp_get_af_specific(laddr->v4.sin_family);
-			list_for_each(pos1, &bp->address_list) {
-				saddr = list_entry(pos1,
-						   struct sctp_sockaddr_entry,
-						   list);
+			rcu_read_lock();
+			list_for_each_entry_rcu(saddr, &bp->address_list, list) {
 				if (sctp_cmp_addr_exact(&saddr->a, laddr))
 					saddr->use_as_src = 0;
 			}
+			rcu_read_unlock();
 			addr_buf += af->sockaddr_len;
 		}
-		sctp_write_unlock(&asoc->base.addr_lock);
-		sctp_local_bh_enable();
 
 		/* Update the route and saddr entries for all the transports
 		 * as some of the addresses in the bind address list are
@@ -4057,11 +4036,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 					       int __user *optlen)
 {
 	sctp_assoc_t id;
-	struct list_head *pos;
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
 	struct sctp_sockaddr_entry *addr;
-	rwlock_t *addr_lock;
 	int cnt = 0;
 
 	if (len < sizeof(sctp_assoc_t))
@@ -4078,17 +4055,13 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 	 */
 	if (0 == id) {
 		bp = &sctp_sk(sk)->ep->base.bind_addr;
-		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
 	} else {
 		asoc = sctp_id2assoc(sk, id);
 		if (!asoc)
 			return -EINVAL;
 		bp = &asoc->base.bind_addr;
-		addr_lock = &asoc->base.addr_lock;
 	}
 
-	sctp_read_lock(addr_lock);
-
 	/* If the endpoint is bound to 0.0.0.0 or ::0, count the valid
 	 * addresses from the global local address list.
 	 */
@@ -4115,12 +4088,15 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
 		goto done;
 	}
 
-	list_for_each(pos, &bp->address_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &bp->address_list, list) {
+		if (!addr->valid)
+			continue;
 		cnt ++;
 	}
+	rcu_read_unlock();
 
 done:
-	sctp_read_unlock(addr_lock);
 	return cnt;
 }
 
@@ -4204,7 +4180,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 {
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
-	struct list_head *pos;
 	int cnt = 0;
 	struct sctp_getaddrs_old getaddrs;
 	struct sctp_sockaddr_entry *addr;
@@ -4212,7 +4187,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	union sctp_addr temp;
 	struct sctp_sock *sp = sctp_sk(sk);
 	int addrlen;
-	rwlock_t *addr_lock;
 	int err = 0;
 	void *addrs;
 	void *buf;
@@ -4234,13 +4208,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	 */
 	if (0 == getaddrs.assoc_id) {
 		bp = &sctp_sk(sk)->ep->base.bind_addr;
-		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
 	} else {
 		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
 		if (!asoc)
 			return -EINVAL;
 		bp = &asoc->base.bind_addr;
-		addr_lock = &asoc->base.addr_lock;
 	}
 
 	to = getaddrs.addrs;
@@ -4254,8 +4226,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	if (!addrs)
 		return -ENOMEM;
 
-	sctp_read_lock(addr_lock);
-
 	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
 	 * addresses from the global local address list.
 	 */
@@ -4271,8 +4241,10 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 	}
 
 	buf = addrs;
-	list_for_each(pos, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &bp->address_list, list) {
+		if (!addr->valid)
+			continue;
 		memcpy(&temp, &addr->a, sizeof(temp));
 		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
 		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
@@ -4282,10 +4254,9 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
 		cnt ++;
 		if (cnt >= getaddrs.addr_num) break;
 	}
+	rcu_read_unlock();
 
 copy_getaddrs:
-	sctp_read_unlock(addr_lock);
-
 	/* copy the entire address list into the user provided space */
 	if (copy_to_user(to, addrs, bytes_copied)) {
 		err = -EFAULT;
@@ -4307,7 +4278,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 {
 	struct sctp_bind_addr *bp;
 	struct sctp_association *asoc;
-	struct list_head *pos;
 	int cnt = 0;
 	struct sctp_getaddrs getaddrs;
 	struct sctp_sockaddr_entry *addr;
@@ -4315,7 +4285,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	union sctp_addr temp;
 	struct sctp_sock *sp = sctp_sk(sk);
 	int addrlen;
-	rwlock_t *addr_lock;
 	int err = 0;
 	size_t space_left;
 	int bytes_copied = 0;
@@ -4336,13 +4305,11 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	 */
 	if (0 == getaddrs.assoc_id) {
 		bp = &sctp_sk(sk)->ep->base.bind_addr;
-		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
 	} else {
 		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
 		if (!asoc)
 			return -EINVAL;
 		bp = &asoc->base.bind_addr;
-		addr_lock = &asoc->base.addr_lock;
 	}
 
 	to = optval + offsetof(struct sctp_getaddrs,addrs);
@@ -4352,8 +4319,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	if (!addrs)
 		return -ENOMEM;
 
-	sctp_read_lock(addr_lock);
-
 	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
 	 * addresses from the global local address list.
 	 */
@@ -4372,8 +4337,10 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 	}
 
 	buf = addrs;
-	list_for_each(pos, &bp->address_list) {
-		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+	rcu_read_lock();
+	list_for_each_entry_rcu(addr, &bp->address_list, list) {
+		if (!addr->valid)
+			continue;
 		memcpy(&temp, &addr->a, sizeof(temp));
 		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
 		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
@@ -4387,10 +4354,9 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
 		cnt ++;
 		space_left -= addrlen;
 	}
+	rcu_read_unlock();
 
 copy_getaddrs:
-	sctp_read_unlock(addr_lock);
-
 	if (copy_to_user(to, addrs, bytes_copied)) {
 		err = -EFAULT;
 		goto out;
@@ -4405,7 +4371,7 @@ copy_getaddrs:
 	goto out;
 
 error_lock:
-	sctp_read_unlock(addr_lock);
+	rcu_read_unlock();
 
 out:
 	kfree(addrs);
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
  2007-09-10 19:46 ` [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list Vlad Yasevich
@ 2007-09-10 21:47   ` Paul E. McKenney
  2007-09-11  7:24     ` Sridhar Samudrala
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2007-09-10 21:47 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers

On Mon, Sep 10, 2007 at 03:46:29PM -0400, Vlad Yasevich wrote:
> sctp_localaddr_list is modified dynamically via NETDEV_UP
> and NETDEV_DOWN events, but there is not synchronization
> between writer (even handler) and readers.  As a result,
> the readers can access an entry that has been freed and
> crash the sytem.

Good start, but few questions interspersed below...

							Thanx, Paul

> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> ---
>  include/net/sctp/sctp.h    |    1 +
>  include/net/sctp/structs.h |    2 +
>  net/sctp/bind_addr.c       |    2 +
>  net/sctp/ipv6.c            |   33 ++++++++++++++++++++--------
>  net/sctp/protocol.c        |   50 ++++++++++++++++++++++++++++++-------------
>  net/sctp/socket.c          |   38 ++++++++++++++++++++++-----------
>  6 files changed, 88 insertions(+), 38 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index d529045..c9cc00c 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -123,6 +123,7 @@
>   * sctp/protocol.c
>   */
>  extern struct sock *sctp_get_ctl_sock(void);
> +extern void sctp_local_addr_free(struct rcu_head *head);
>  extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
>  				     sctp_scope_t, gfp_t gfp,
>  				     int flags);
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index c0d5848..2591c49 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
>  /* This is a structure for holding either an IPv6 or an IPv4 address.  */
>  struct sctp_sockaddr_entry {
>  	struct list_head list;
> +	struct rcu_head	rcu;
>  	union sctp_addr a;
>  	__u8 use_as_src;
> +	__u8 valid;
>  };
> 
>  typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index fdb287a..7fc369f 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>  		addr->a.v4.sin_port = htons(bp->port);
> 
>  	addr->use_as_src = use_as_src;
> +	addr->valid = 1;
> 
>  	INIT_LIST_HEAD(&addr->list);
> +	INIT_RCU_HEAD(&addr->rcu);
>  	list_add_tail(&addr->list, &bp->address_list);
>  	SCTP_DBG_OBJCNT_INC(addr);
> 
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index f8aa23d..fc2e4e2 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -77,13 +77,18 @@
> 
>  #include <asm/uaccess.h>
> 
> -/* Event handler for inet6 address addition/deletion events.  */
> +/* Event handler for inet6 address addition/deletion events.
> + * This even is part of the atomic notifier call chain
> + * and thus happens atomically and can NOT sleep.  As a result
> + * we can't and really don't need to add any locks to guard the
> + * RCU.
> + */
>  static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
>  				void *ptr)
>  {
>  	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
> -	struct sctp_sockaddr_entry *addr;
> -	struct list_head *pos, *temp;
> +	struct sctp_sockaddr_entry *addr = NULL;
> +	struct sctp_sockaddr_entry *temp;
> 
>  	switch (ev) {
>  	case NETDEV_UP:
> @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
>  			memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
>  				 sizeof(struct in6_addr));
>  			addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
> -			list_add_tail(&addr->list, &sctp_local_addr_list);
> +			addr->valid = 1;
> +			rcu_read_lock();
> +			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> +			rcu_read_unlock();

If we are under the protection of the update-side mutex, the rcu_read_lock()
and rcu_read_unlock() are (harmlessly) redundant.  If we are not under
the protection of some mutex, what prevents concurrent list_add_tail_rcu()
calls from messing up the sctp_sockaddr_entry list?

>  		}
>  		break;
>  	case NETDEV_DOWN:
> -		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -			if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
> -				list_del(pos);
> -				kfree(addr);
> +		rcu_read_lock();
> +		list_for_each_entry_safe(addr, temp,
> +					&sctp_local_addr_list, list) {
> +			if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
> +					     &ifa->addr)) {
> +				addr->valid = 0;
> +				list_del_rcu(&addr->list);
>  				break;
>  			}
>  		}
> -
> +		rcu_read_unlock();
> +		if (addr && !addr->valid)
> +			call_rcu(&addr->rcu, sctp_local_addr_free);

Are we under the protection of the update-side lock here?  If not,
what prevents two different tasks from executing this in parallel,
potentially tangling both the list that the sctp_sockaddr_entry list and
the internal RCU lists?  (It is forbidden to call_rcu() a given element
twice concurrently.)

If we are in fact under the protection of the update-side lock, the
rcu_read_lock() and rcu_read_unlock() pairs are redundant (though this
is harmless, aside from the (small) potential for confusion).

>  		break;
>  	}
> 
> @@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
>  			addr->a.v6.sin6_addr = ifp->addr;
>  			addr->a.v6.sin6_scope_id = dev->ifindex;
>  			INIT_LIST_HEAD(&addr->list);
> +			INIT_RCU_HEAD(&addr->rcu);
>  			list_add_tail(&addr->list, addrlist);
>  		}
>  	}
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index e98579b..ac52f9e 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -153,6 +153,7 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
>  			addr->a.v4.sin_family = AF_INET;
>  			addr->a.v4.sin_port = 0;
>  			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
> +			INIT_RCU_HEAD(&addr->rcu);
>  			list_add_tail(&addr->list, addrlist);
>  		}
>  	}
> @@ -192,16 +193,24 @@ static void sctp_free_local_addr_list(void)
>  	}
>  }
> 
> +void sctp_local_addr_free(struct rcu_head *head)
> +{
> +	struct sctp_sockaddr_entry *e = container_of(head,
> +				struct sctp_sockaddr_entry, rcu);
> +	kfree(e);
> +}
> +
>  /* Copy the local addresses which are valid for 'scope' into 'bp'.  */
>  int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>  			      gfp_t gfp, int copy_flags)
>  {
>  	struct sctp_sockaddr_entry *addr;
>  	int error = 0;
> -	struct list_head *pos, *temp;
> 
> -	list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> +		if (!addr->valid)
> +			continue;

What happens if the update-side code removes the element from the list
and marks it !->valid right here?

If this turns out to be harmless, why not just dispense with the ->valid
flag entirely?

>  		if (sctp_in_scope(&addr->a, scope)) {
>  			/* Now that the address is in scope, check to see if
>  			 * the address type is really supported by the local
> @@ -221,6 +230,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>  	}
> 
>  end_copy:
> +	rcu_read_unlock();
>  	return error;
>  }
> 
> @@ -600,13 +610,17 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
>  	seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
>  }
> 
> -/* Event handler for inet address addition/deletion events.  */
> +/* Event handler for inet address addition/deletion events.
> + * This is part of the blocking notifier call chain that is
> + * guarted by a mutex.  As a result we don't need to add
> + * any additional guards for the RCU
> + */
>  static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
>  			       void *ptr)
>  {
>  	struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
> -	struct sctp_sockaddr_entry *addr;
> -	struct list_head *pos, *temp;
> +	struct sctp_sockaddr_entry *addr = NULL;
> +	struct sctp_sockaddr_entry *temp;
> 
>  	switch (ev) {
>  	case NETDEV_UP:
> @@ -615,19 +629,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
>  			addr->a.v4.sin_family = AF_INET;
>  			addr->a.v4.sin_port = 0;
>  			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
> -			list_add_tail(&addr->list, &sctp_local_addr_list);
> +			addr->valid = 1;
> +			rcu_read_lock();
> +			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> +			rcu_read_unlock();

Based on the additions to the header comment, I am assuming that we
hold an update-side mutex.  This means that the rcu_read_lock() and
rcu_read_unlock() are (harmlessly) redundant.

>  		}
>  		break;
>  	case NETDEV_DOWN:
> -		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +		rcu_read_lock();
> +		list_for_each_entry_safe(addr, temp,
> +					&sctp_local_addr_list, list) {
>  			if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
> -				list_del(pos);
> -				kfree(addr);
> +				addr->valid = 0;
> +				list_del_rcu(&addr->list);
>  				break;
>  			}
>  		}
> -
> +		rcu_read_unlock();

Ditto.

> +		if (addr && !addr->valid)
> +			call_rcu(&addr->rcu, sctp_local_addr_free);

This one is OK, since we hold the update-side mutex.

>  		break;
>  	}
> 
> @@ -1227,6 +1247,9 @@ SCTP_STATIC __exit void sctp_exit(void)
>  	sctp_v6_del_protocol();
>  	inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
> 
> +	/* Unregister notifier for inet address additions/deletions. */
> +	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
> +
>  	/* Free the local address list.  */
>  	sctp_free_local_addr_list();
> 
> @@ -1240,9 +1263,6 @@ SCTP_STATIC __exit void sctp_exit(void)
>  	inet_unregister_protosw(&sctp_stream_protosw);
>  	inet_unregister_protosw(&sctp_seqpacket_protosw);
> 
> -	/* Unregister notifier for inet address additions/deletions. */
> -	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
> -
>  	sctp_sysctl_unregister();
>  	list_del(&sctp_ipv4_specific.list);
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 3335460..a3acf78 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  					       int __user *optlen)
>  {
>  	sctp_assoc_t id;
> +	struct list_head *pos;
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
> -	struct list_head *pos, *temp;
>  	struct sctp_sockaddr_entry *addr;
>  	rwlock_t *addr_lock;
>  	int cnt = 0;
> @@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  		addr = list_entry(bp->address_list.next,
>  				  struct sctp_sockaddr_entry, list);
>  		if (sctp_is_any(&addr->a)) {
> -			list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> -				addr = list_entry(pos,
> -						  struct sctp_sockaddr_entry,
> -						  list);
> +			rcu_read_lock();
> +			list_for_each_entry_rcu(addr,
> +						&sctp_local_addr_list, list) {
> +				if (!addr->valid)
> +					continue;
> +

Again, what happens if the element is deleted just at this point?
If harmless, might be good to get rid of ->valid.

>  				if ((PF_INET == sk->sk_family) &&
>  				    (AF_INET6 == addr->a.sa.sa_family))
>  					continue;
> +
>  				cnt++;
>  			}
> +			rcu_read_unlock();

We are just counting these things, right?  If on the other hand we are
keeping a reference outside of rcu_read_lock() protection, then there
needs to be some explicit mechanism preventing the corresponding entry
from being freed.

>  		} else {
>  			cnt = 1;
>  		}
> @@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>  					int max_addrs, void *to,
>  					int *bytes_copied)
>  {
> -	struct list_head *pos, *next;
>  	struct sctp_sockaddr_entry *addr;
>  	union sctp_addr temp;
>  	int cnt = 0;
>  	int addrlen;
> 
> -	list_for_each_safe(pos, next, &sctp_local_addr_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> +		if (!addr->valid)
> +			continue;
> +

Same question as before -- what happens if the element is deleted by some
other CPU (thus clearing ->valid) just at this point?

>  		if ((PF_INET == sk->sk_family) &&
>  		    (AF_INET6 == addr->a.sa.sa_family))
>  			continue;
> @@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>  		cnt ++;
>  		if (cnt >= max_addrs) break;
>  	}
> +	rcu_read_unlock();
> 
>  	return cnt;
>  }
> @@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>  static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  			    size_t space_left, int *bytes_copied)
>  {
> -	struct list_head *pos, *next;
>  	struct sctp_sockaddr_entry *addr;
>  	union sctp_addr temp;
>  	int cnt = 0;
>  	int addrlen;
> 
> -	list_for_each_safe(pos, next, &sctp_local_addr_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> +		if (!addr->valid)
> +			continue;
> +

And the same question here as well...

>  		if ((PF_INET == sk->sk_family) &&
>  		    (AF_INET6 == addr->a.sa.sa_family))
>  			continue;
> @@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
>  								&temp);
>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> -		if (space_left < addrlen)
> -			return -ENOMEM;
> +		if (space_left < addrlen) {
> +			cnt =  -ENOMEM;
> +			break;
> +		}
>  		memcpy(to, &temp, addrlen);
> 
>  		to += addrlen;
> @@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>  		space_left -= addrlen;
>  		*bytes_copied += addrlen;
>  	}
> +	rcu_read_unlock();
> 
>  	return cnt;
>  }
> -- 
> 1.5.2.4
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
  2007-09-10 19:46 ` [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU Vlad Yasevich
@ 2007-09-10 22:08   ` Paul E. McKenney
  2007-09-11 14:56     ` Vlad Yasevich
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2007-09-10 22:08 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers

On Mon, Sep 10, 2007 at 03:46:30PM -0400, Vlad Yasevich wrote:
> Since the sctp_sockaddr_entry is now RCU enabled as part of
> the patch to synchronize sctp_localaddr_list, it makes sense to
> change all handling of these entries to RCU.  This includes the
> sctp_bind_addrs structure and it's list of bound addresses.
> 
> This list is currently protected by an external rw_lock and that
> looks like an overkill.  There are only 2 writers to the list:
> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> These are already seriealized via the socket lock, so they will
> not step on each other.  These are also relatively rare, so we
> should be good with RCU.
> 
> The readers are varied and they are easily converted to RCU.

Again, good start -- similar questions as for the other patch in this
series.  Also a few places where it looks like you are letting a pointer
to an RCU-protected data structure slip out of rcu_read_lock() protection,
and a case of mixing rcu_read_lock() and rcu_read_lock_bh() within the
same RCU-protected data structure.

						Thanx, Paul

> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> ---
>  include/net/sctp/structs.h |    3 -
>  net/sctp/associola.c       |   14 +------
>  net/sctp/bind_addr.c       |   59 ++++++++++++++++++----------
>  net/sctp/endpointola.c     |   26 ++++---------
>  net/sctp/ipv6.c            |   12 ++---
>  net/sctp/protocol.c        |   25 +++++-------
>  net/sctp/sm_make_chunk.c   |   17 +++-----
>  net/sctp/socket.c          |   92 ++++++++++++++------------------------------
>  8 files changed, 97 insertions(+), 151 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 2591c49..1d46f7d 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -1222,9 +1222,6 @@ struct sctp_ep_common {
>  	 * bind_addr.address_list is our set of local IP addresses.
>  	 */
>  	struct sctp_bind_addr bind_addr;
> -
> -	/* Protection during address list comparisons. */
> -	rwlock_t   addr_lock;
>  };
> 
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 2ad1caf..9bad8ba 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> 
>  	/* Initialize the bind addr area.  */
>  	sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
> -	rwlock_init(&asoc->base.addr_lock);
> 
>  	asoc->state = SCTP_STATE_CLOSED;
> 
> @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
>  {
>  	struct sctp_transport *transport;
> 
> -	sctp_read_lock(&asoc->base.addr_lock);
> -
>  	if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
>  	    (htons(asoc->peer.port) == paddr->v4.sin_port)) {
>  		transport = sctp_assoc_lookup_paddr(asoc, paddr);
> @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
>  	transport = NULL;
> 
>  out:
> -	sctp_read_unlock(&asoc->base.addr_lock);
>  	return transport;
>  }
> 
> @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc,
>  int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
>  			    const union sctp_addr *laddr)
>  {
> -	int found;
> +	int found = 0;
> 
> -	sctp_read_lock(&asoc->base.addr_lock);
>  	if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
>  	    sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
> -				 sctp_sk(asoc->base.sk))) {
> +				 sctp_sk(asoc->base.sk)))
>  		found = 1;
> -		goto out;
> -	}
> 
> -	found = 0;
> -out:
> -	sctp_read_unlock(&asoc->base.addr_lock);
>  	return found;
>  }
> 
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index 7fc369f..9c7db1f 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> 
>  	INIT_LIST_HEAD(&addr->list);
>  	INIT_RCU_HEAD(&addr->rcu);
> -	list_add_tail(&addr->list, &bp->address_list);
> +
> +	rcu_read_lock();
> +	list_add_tail_rcu(&addr->list, &bp->address_list);
> +	rcu_read_unlock();

Given the original code, we presumably hold the update-side lock.  If so,
the rcu_read_lock() and rcu_read_unlock() are (harmlessly) redundant.

>  	SCTP_DBG_OBJCNT_INC(addr);
> 
>  	return 0;
> @@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>   */
>  int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
>  {
> -	struct list_head *pos, *temp;
> -	struct sctp_sockaddr_entry *addr;
> +	struct sctp_sockaddr_entry *addr, *temp;
> 
> -	list_for_each_safe(pos, temp, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock_bh();
> +	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
>  		if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
>  			/* Found the exact match. */
> -			list_del(pos);
> -			kfree(addr);
> -			SCTP_DBG_OBJCNT_DEC(addr);
> -
> -			return 0;
> +			addr->valid = 0;
> +			list_del_rcu(&addr->list);
> +			break;
>  		}
>  	}
> +	rcu_read_unlock_bh();

Ditto.

> +
> +	if (addr && !addr->valid) {
> +		call_rcu_bh(&addr->rcu, sctp_local_addr_free);
> +		SCTP_DBG_OBJCNT_DEC(addr);
> +	}
> 
>  	return -EINVAL;
>  }
> @@ -302,15 +308,20 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
>  			 struct sctp_sock *opt)
>  {
>  	struct sctp_sockaddr_entry *laddr;
> -	struct list_head *pos;
> -
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -		if (opt->pf->cmp_addr(&laddr->a, addr, opt))
> -			return 1;
> +	int match = 0;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;

As before, what happens if the entry is deleted by some other CPU at
this point, and thus ->valid is cleared?  If harmless, why bother with
->valid?

> +		if (opt->pf->cmp_addr(&laddr->a, addr, opt)) {
> +			match = 1;
> +			break;
> +		}
>  	}
> +	rcu_read_unlock();
> 
> -	return 0;
> +	return match;
>  }
> 
>  /* Find the first address in the bind address list that is not present in
> @@ -325,27 +336,31 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
>  	union sctp_addr			*addr;
>  	void 				*addr_buf;
>  	struct sctp_af			*af;
> -	struct list_head		*pos;
>  	int				i;
> 
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;

Ditto...

> 
>  		addr_buf = (union sctp_addr *)addrs;
>  		for (i = 0; i < addrcnt; i++) {
>  			addr = (union sctp_addr *)addr_buf;
>  			af = sctp_get_af_specific(addr->v4.sin_family);
>  			if (!af)
> -				return NULL;
> +				break;
> 
>  			if (opt->pf->cmp_addr(&laddr->a, addr, opt))
>  				break;
> 
>  			addr_buf += af->sockaddr_len;
>  		}
> -		if (i == addrcnt)
> +		if (i == addrcnt) {
> +			rcu_read_unlock();

Since rcu_read_unlock() just happened, some other CPU is free to
free up this data structure.  In a CONFIG_PREEMPT kernel (as well as a
CONFIG_PREEMPT_RT kernel, for that matter), this task might be preempted
at this point, and a full grace period might elapse.

In which case, the following statement returns a pointer to the freelist,
which is not good.

>  			return &laddr->a;
> +		}
>  	}
> +	rcu_read_unlock();
> 
>  	return NULL;
>  }
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index 1404a9e..fa10af5 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -92,7 +92,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep,
> 
>  	/* Initialize the bind addr area */
>  	sctp_bind_addr_init(&ep->base.bind_addr, 0);
> -	rwlock_init(&ep->base.addr_lock);
> 
>  	/* Remember who we are attached to.  */
>  	ep->base.sk = sk;
> @@ -225,21 +224,14 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
>  struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
>  					       const union sctp_addr *laddr)
>  {
> -	struct sctp_endpoint *retval;
> +	struct sctp_endpoint *retval = NULL;
> 
> -	sctp_read_lock(&ep->base.addr_lock);
>  	if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
>  		if (sctp_bind_addr_match(&ep->base.bind_addr, laddr,
> -					 sctp_sk(ep->base.sk))) {
> +					 sctp_sk(ep->base.sk)))
>  			retval = ep;
> -			goto out;
> -		}
>  	}
> 
> -	retval = NULL;
> -
> -out:
> -	sctp_read_unlock(&ep->base.addr_lock);
>  	return retval;
>  }
> 
> @@ -261,9 +253,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
>  	list_for_each(pos, &ep->asocs) {
>  		asoc = list_entry(pos, struct sctp_association, asocs);
>  		if (rport == asoc->peer.port) {
> -			sctp_read_lock(&asoc->base.addr_lock);
>  			*transport = sctp_assoc_lookup_paddr(asoc, paddr);
> -			sctp_read_unlock(&asoc->base.addr_lock);
> 
>  			if (*transport)
>  				return asoc;
> @@ -295,20 +285,20 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
>  int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
>  				const union sctp_addr *paddr)
>  {
> -	struct list_head *pos;
>  	struct sctp_sockaddr_entry *addr;
>  	struct sctp_bind_addr *bp;
> 
> -	sctp_read_lock(&ep->base.addr_lock);
>  	bp = &ep->base.bind_addr;
> -	list_for_each(pos, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &bp->address_list, list) {
> +		if (!addr->valid)
> +			continue;

And ditto again...

>  		if (sctp_has_association(&addr->a, paddr)) {
> -			sctp_read_unlock(&ep->base.addr_lock);
> +			rcu_read_unlock();
>  			return 1;
>  		}
>  	}
> -	sctp_read_unlock(&ep->base.addr_lock);
> +	rcu_read_unlock();
> 
>  	return 0;
>  }
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index fc2e4e2..4f6dc55 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -302,9 +302,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
>  			      union sctp_addr *saddr)
>  {
>  	struct sctp_bind_addr *bp;
> -	rwlock_t *addr_lock;
>  	struct sctp_sockaddr_entry *laddr;
> -	struct list_head *pos;
>  	sctp_scope_t scope;
>  	union sctp_addr *baddr = NULL;
>  	__u8 matchlen = 0;
> @@ -324,14 +322,14 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
>  	scope = sctp_scope(daddr);
> 
>  	bp = &asoc->base.bind_addr;
> -	addr_lock = &asoc->base.addr_lock;
> 
>  	/* Go through the bind address list and find the best source address
>  	 * that matches the scope of the destination address.
>  	 */
> -	sctp_read_lock(addr_lock);
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;

Ditto yet again...

>  		if ((laddr->use_as_src) &&
>  		    (laddr->a.sa.sa_family == AF_INET6) &&
>  		    (scope <= sctp_scope(&laddr->a))) {
> @@ -353,7 +351,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
>  		       __FUNCTION__, asoc, NIP6(daddr->v6.sin6_addr));
>  	}
> 
> -	sctp_read_unlock(addr_lock);
> +	rcu_read_unlock();
>  }
> 
>  /* Make a copy of all potential local addresses. */
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index ac52f9e..a1030ed 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -222,7 +222,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>  			      (copy_flags & SCTP_ADDR6_ALLOWED) &&
>  			      (copy_flags & SCTP_ADDR6_PEERSUPP)))) {
>  				error = sctp_add_bind_addr(bp, &addr->a, 1,
> -							   GFP_ATOMIC);
> +						    GFP_ATOMIC);
>  				if (error)
>  					goto end_copy;
>  			}
> @@ -426,9 +426,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  	struct rtable *rt;
>  	struct flowi fl;
>  	struct sctp_bind_addr *bp;
> -	rwlock_t *addr_lock;
>  	struct sctp_sockaddr_entry *laddr;
> -	struct list_head *pos;
>  	struct dst_entry *dst = NULL;
>  	union sctp_addr dst_saddr;
> 
> @@ -457,23 +455,20 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  		goto out;
> 
>  	bp = &asoc->base.bind_addr;
> -	addr_lock = &asoc->base.addr_lock;
> 
>  	if (dst) {
>  		/* Walk through the bind address list and look for a bind
>  		 * address that matches the source address of the returned dst.
>  		 */
> -		sctp_read_lock(addr_lock);
> -		list_for_each(pos, &bp->address_list) {
> -			laddr = list_entry(pos, struct sctp_sockaddr_entry,
> -					   list);
> -			if (!laddr->use_as_src)
> +		rcu_read_lock();
> +		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +			if (!laddr->valid || !laddr->use_as_src)
>  				continue;

And here as well...

>  			sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
>  			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
>  				goto out_unlock;
>  		}
> -		sctp_read_unlock(addr_lock);
> +		rcu_read_unlock();
> 
>  		/* None of the bound addresses match the source address of the
>  		 * dst. So release it.
> @@ -485,10 +480,10 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  	/* Walk through the bind address list and try to get a dst that
>  	 * matches a bind address as the source address.
>  	 */
> -	sctp_read_lock(addr_lock);
> -	list_for_each(pos, &bp->address_list) {
> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> -
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> +		if (!laddr->valid)
> +			continue;

OK, this is the last one I am flagging, you can find the others.  ;-)

>  		if ((laddr->use_as_src) &&
>  		    (AF_INET == laddr->a.sa.sa_family)) {
>  			fl.fl4_src = laddr->a.v4.sin_addr.s_addr;
> @@ -500,7 +495,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
>  	}
> 
>  out_unlock:
> -	sctp_read_unlock(addr_lock);
> +	rcu_read_unlock();
>  out:
>  	if (dst)
>  		SCTP_DEBUG_PRINTK("rt_dst:%u.%u.%u.%u, rt_src:%u.%u.%u.%u\n",
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 79856c9..caaa29f 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -1531,7 +1531,7 @@ no_hmac:
>  	/* Also, add the destination address. */
>  	if (list_empty(&retval->base.bind_addr.address_list)) {
>  		sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
> -				   GFP_ATOMIC);
> +				GFP_ATOMIC);
>  	}
> 
>  	retval->next_tsn = retval->c.initial_tsn;
> @@ -2613,22 +2613,17 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
> 
>  	switch (asconf_param->param_hdr.type) {
>  	case SCTP_PARAM_ADD_IP:
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
> -		list_for_each(pos, &bp->address_list) {
> -			saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +		rcu_read_lock_bh();
> +		list_for_each_entry_rcu(saddr, &bp->address_list, list) {
> +			if (!saddr->valid)
> +				continue;
>  			if (sctp_cmp_addr_exact(&saddr->a, &addr))
>  				saddr->use_as_src = 1;
>  		}
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
> +		rcu_read_unlock_bh();

If you use rcu_read_lock_bh() and rcu_read_unlock_bh() in one read path
for a given data structure, you need to use them in all the other read
paths for that data structure.  In addition, you must use call_rcu_bh()
when deleting the corresponding data elements.

The normal and the _bh RCU grace periods are unrelated, so mixing them
for a given RCU-protected data structure is a bad idea.  (Or are these
somehow two independent data structures?)

>  		break;
>  	case SCTP_PARAM_DEL_IP:
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
>  		retval = sctp_del_bind_addr(bp, &addr);
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
>  		list_for_each(pos, &asoc->peer.transport_addr_list) {
>  			transport = list_entry(pos, struct sctp_transport,
>  						 transports);
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index a3acf78..35cc30c 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
>  	if (!bp->port)
>  		bp->port = inet_sk(sk)->num;
> 
> -	/* Add the address to the bind address list.  */
> -	sctp_local_bh_disable();
> -	sctp_write_lock(&ep->base.addr_lock);
> -
> -	/* Use GFP_ATOMIC since BHs are disabled.  */
> +	/* Add the address to the bind address list.
> +	 * Use GFP_ATOMIC since BHs will be disabled.
> +	 */
>  	ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
> -	sctp_write_unlock(&ep->base.addr_lock);
> -	sctp_local_bh_enable();
> 
>  	/* Copy back into socket for getsockname() use. */
>  	if (!ret) {
> @@ -497,7 +493,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  	void				*addr_buf;
>  	struct sctp_af			*af;
>  	struct list_head		*pos;
> -	struct list_head		*p;
>  	int 				i;
>  	int 				retval = 0;
> 
> @@ -544,14 +539,15 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  		if (i < addrcnt)
>  			continue;
> 
> -		/* Use the first address in bind addr list of association as
> -		 * Address Parameter of ASCONF CHUNK.
> +		/* Use the first valid address in bind addr list of
> +		 * association as Address Parameter of ASCONF CHUNK.
>  		 */
> -		sctp_read_lock(&asoc->base.addr_lock);
>  		bp = &asoc->base.bind_addr;
> -		p = bp->address_list.next;
> -		laddr = list_entry(p, struct sctp_sockaddr_entry, list);
> -		sctp_read_unlock(&asoc->base.addr_lock);
> +		rcu_read_lock();
> +		list_for_each_entry_rcu(laddr, &bp->address_list, list)
> +			if (laddr->valid)
> +				break;
> +		rcu_read_unlock();

Here you are carrying an RCU-protected data item (*laddr) outside of an
rcu_read_lock()/rcu_read_unlock() pair.  This is not good -- you need
to move the rcu_read_unlock() farther down to cover the full extend to
uses of the laddr pointer.

Again, RCU is within its rights allowing a grace period to elapse, so
that past this point, laddr might well point into the freelist.

> 
>  		chunk = sctp_make_asconf_update_ip(asoc, &laddr->a, addrs,
>  						   addrcnt, SCTP_PARAM_ADD_IP);
> @@ -567,8 +563,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  		/* Add the new addresses to the bind address list with
>  		 * use_as_src set to 0.
>  		 */
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
>  		addr_buf = addrs;
>  		for (i = 0; i < addrcnt; i++) {
>  			addr = (union sctp_addr *)addr_buf;
> @@ -578,8 +572,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  						    GFP_ATOMIC);
>  			addr_buf += af->sockaddr_len;
>  		}
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
>  	}
> 
>  out:
> @@ -651,14 +643,8 @@ static int sctp_bindx_rem(struct sock *sk, struct sockaddr *addrs, int addrcnt)
>  		 * socket routing and failover schemes. Refer to comments in
>  		 * sctp_do_bind(). -daisy
>  		 */
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&ep->base.addr_lock);
> -
>  		retval = sctp_del_bind_addr(bp, sa_addr);
> 
> -		sctp_write_unlock(&ep->base.addr_lock);
> -		sctp_local_bh_enable();
> -
>  		addr_buf += af->sockaddr_len;
>  err_bindx_rem:
>  		if (retval < 0) {
> @@ -748,11 +734,9 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		 * make sure that we do not delete all the addresses in the
>  		 * association.
>  		 */
> -		sctp_read_lock(&asoc->base.addr_lock);
>  		bp = &asoc->base.bind_addr;
>  		laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
>  					       addrcnt, sp);
> -		sctp_read_unlock(&asoc->base.addr_lock);
>  		if (!laddr)
>  			continue;
> 
> @@ -766,23 +750,18 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		/* Reset use_as_src flag for the addresses in the bind address
>  		 * list that are to be deleted.
>  		 */
> -		sctp_local_bh_disable();
> -		sctp_write_lock(&asoc->base.addr_lock);
>  		addr_buf = addrs;
>  		for (i = 0; i < addrcnt; i++) {
>  			laddr = (union sctp_addr *)addr_buf;
>  			af = sctp_get_af_specific(laddr->v4.sin_family);
> -			list_for_each(pos1, &bp->address_list) {
> -				saddr = list_entry(pos1,
> -						   struct sctp_sockaddr_entry,
> -						   list);
> +			rcu_read_lock();
> +			list_for_each_entry_rcu(saddr, &bp->address_list, list) {
>  				if (sctp_cmp_addr_exact(&saddr->a, laddr))
>  					saddr->use_as_src = 0;
>  			}
> +			rcu_read_unlock();
>  			addr_buf += af->sockaddr_len;
>  		}
> -		sctp_write_unlock(&asoc->base.addr_lock);
> -		sctp_local_bh_enable();
> 
>  		/* Update the route and saddr entries for all the transports
>  		 * as some of the addresses in the bind address list are
> @@ -4057,11 +4036,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  					       int __user *optlen)
>  {
>  	sctp_assoc_t id;
> -	struct list_head *pos;
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
>  	struct sctp_sockaddr_entry *addr;
> -	rwlock_t *addr_lock;
>  	int cnt = 0;
> 
>  	if (len < sizeof(sctp_assoc_t))
> @@ -4078,17 +4055,13 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  	 */
>  	if (0 == id) {
>  		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
>  	} else {
>  		asoc = sctp_id2assoc(sk, id);
>  		if (!asoc)
>  			return -EINVAL;
>  		bp = &asoc->base.bind_addr;
> -		addr_lock = &asoc->base.addr_lock;
>  	}
> 
> -	sctp_read_lock(addr_lock);
> -
>  	/* If the endpoint is bound to 0.0.0.0 or ::0, count the valid
>  	 * addresses from the global local address list.
>  	 */
> @@ -4115,12 +4088,15 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>  		goto done;
>  	}
> 
> -	list_for_each(pos, &bp->address_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &bp->address_list, list) {
> +		if (!addr->valid)
> +			continue;
>  		cnt ++;
>  	}
> +	rcu_read_unlock();
> 
>  done:
> -	sctp_read_unlock(addr_lock);
>  	return cnt;
>  }
> 
> @@ -4204,7 +4180,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  {
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
> -	struct list_head *pos;
>  	int cnt = 0;
>  	struct sctp_getaddrs_old getaddrs;
>  	struct sctp_sockaddr_entry *addr;
> @@ -4212,7 +4187,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	union sctp_addr temp;
>  	struct sctp_sock *sp = sctp_sk(sk);
>  	int addrlen;
> -	rwlock_t *addr_lock;
>  	int err = 0;
>  	void *addrs;
>  	void *buf;
> @@ -4234,13 +4208,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	 */
>  	if (0 == getaddrs.assoc_id) {
>  		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
>  	} else {
>  		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
>  		if (!asoc)
>  			return -EINVAL;
>  		bp = &asoc->base.bind_addr;
> -		addr_lock = &asoc->base.addr_lock;
>  	}
> 
>  	to = getaddrs.addrs;
> @@ -4254,8 +4226,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	if (!addrs)
>  		return -ENOMEM;
> 
> -	sctp_read_lock(addr_lock);
> -
>  	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
>  	 * addresses from the global local address list.
>  	 */
> @@ -4271,8 +4241,10 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  	}
> 
>  	buf = addrs;
> -	list_for_each(pos, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &bp->address_list, list) {
> +		if (!addr->valid)
> +			continue;
>  		memcpy(&temp, &addr->a, sizeof(temp));
>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> @@ -4282,10 +4254,9 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
>  		cnt ++;
>  		if (cnt >= getaddrs.addr_num) break;
>  	}
> +	rcu_read_unlock();
> 
>  copy_getaddrs:
> -	sctp_read_unlock(addr_lock);
> -
>  	/* copy the entire address list into the user provided space */
>  	if (copy_to_user(to, addrs, bytes_copied)) {
>  		err = -EFAULT;
> @@ -4307,7 +4278,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  {
>  	struct sctp_bind_addr *bp;
>  	struct sctp_association *asoc;
> -	struct list_head *pos;
>  	int cnt = 0;
>  	struct sctp_getaddrs getaddrs;
>  	struct sctp_sockaddr_entry *addr;
> @@ -4315,7 +4285,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	union sctp_addr temp;
>  	struct sctp_sock *sp = sctp_sk(sk);
>  	int addrlen;
> -	rwlock_t *addr_lock;
>  	int err = 0;
>  	size_t space_left;
>  	int bytes_copied = 0;
> @@ -4336,13 +4305,11 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	 */
>  	if (0 == getaddrs.assoc_id) {
>  		bp = &sctp_sk(sk)->ep->base.bind_addr;
> -		addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
>  	} else {
>  		asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
>  		if (!asoc)
>  			return -EINVAL;
>  		bp = &asoc->base.bind_addr;
> -		addr_lock = &asoc->base.addr_lock;
>  	}
> 
>  	to = optval + offsetof(struct sctp_getaddrs,addrs);
> @@ -4352,8 +4319,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	if (!addrs)
>  		return -ENOMEM;
> 
> -	sctp_read_lock(addr_lock);
> -
>  	/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
>  	 * addresses from the global local address list.
>  	 */
> @@ -4372,8 +4337,10 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  	}
> 
>  	buf = addrs;
> -	list_for_each(pos, &bp->address_list) {
> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(addr, &bp->address_list, list) {
> +		if (!addr->valid)
> +			continue;
>  		memcpy(&temp, &addr->a, sizeof(temp));
>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> @@ -4387,10 +4354,9 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
>  		cnt ++;
>  		space_left -= addrlen;
>  	}
> +	rcu_read_unlock();
> 
>  copy_getaddrs:
> -	sctp_read_unlock(addr_lock);
> -
>  	if (copy_to_user(to, addrs, bytes_copied)) {
>  		err = -EFAULT;
>  		goto out;
> @@ -4405,7 +4371,7 @@ copy_getaddrs:
>  	goto out;
> 
>  error_lock:
> -	sctp_read_unlock(addr_lock);
> +	rcu_read_unlock();
> 
>  out:
>  	kfree(addrs);
> -- 
> 1.5.2.4
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
  2007-09-10 21:47   ` Paul E. McKenney
@ 2007-09-11  7:24     ` Sridhar Samudrala
  2007-09-11 14:05       ` Vlad Yasevich
  0 siblings, 1 reply; 10+ messages in thread
From: Sridhar Samudrala @ 2007-09-11  7:24 UTC (permalink / raw)
  To: paulmck; +Cc: Vlad Yasevich, netdev, lksctp-developers

Paul E. McKenney wrote:
> On Mon, Sep 10, 2007 at 03:46:29PM -0400, Vlad Yasevich wrote:
>> sctp_localaddr_list is modified dynamically via NETDEV_UP
>> and NETDEV_DOWN events, but there is not synchronization
>> between writer (even handler) and readers.  As a result,
>> the readers can access an entry that has been freed and
>> crash the sytem.
> 
> Good start, but few questions interspersed below...
> 
> 							Thanx, Paul
> 
>> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>> ---
>>  include/net/sctp/sctp.h    |    1 +
>>  include/net/sctp/structs.h |    2 +
>>  net/sctp/bind_addr.c       |    2 +
>>  net/sctp/ipv6.c            |   33 ++++++++++++++++++++--------
>>  net/sctp/protocol.c        |   50 ++++++++++++++++++++++++++++++-------------
>>  net/sctp/socket.c          |   38 ++++++++++++++++++++++-----------
>>  6 files changed, 88 insertions(+), 38 deletions(-)
>>
>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
>> index d529045..c9cc00c 100644
>> --- a/include/net/sctp/sctp.h
>> +++ b/include/net/sctp/sctp.h
>> @@ -123,6 +123,7 @@
>>   * sctp/protocol.c
>>   */
>>  extern struct sock *sctp_get_ctl_sock(void);
>> +extern void sctp_local_addr_free(struct rcu_head *head);
>>  extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
>>  				     sctp_scope_t, gfp_t gfp,
>>  				     int flags);
>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>> index c0d5848..2591c49 100644
>> --- a/include/net/sctp/structs.h
>> +++ b/include/net/sctp/structs.h
>> @@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
>>  /* This is a structure for holding either an IPv6 or an IPv4 address.  */
>>  struct sctp_sockaddr_entry {
>>  	struct list_head list;
>> +	struct rcu_head	rcu;
>>  	union sctp_addr a;
>>  	__u8 use_as_src;
>> +	__u8 valid;
>>  };
>>
>>  typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
>> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
>> index fdb287a..7fc369f 100644
>> --- a/net/sctp/bind_addr.c
>> +++ b/net/sctp/bind_addr.c
>> @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>>  		addr->a.v4.sin_port = htons(bp->port);
>>
>>  	addr->use_as_src = use_as_src;
>> +	addr->valid = 1;
>>
>>  	INIT_LIST_HEAD(&addr->list);
>> +	INIT_RCU_HEAD(&addr->rcu);
>>  	list_add_tail(&addr->list, &bp->address_list);
>>  	SCTP_DBG_OBJCNT_INC(addr);
>>
>> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
>> index f8aa23d..fc2e4e2 100644
>> --- a/net/sctp/ipv6.c
>> +++ b/net/sctp/ipv6.c
>> @@ -77,13 +77,18 @@
>>
>>  #include <asm/uaccess.h>
>>
>> -/* Event handler for inet6 address addition/deletion events.  */
>> +/* Event handler for inet6 address addition/deletion events.
>> + * This even is part of the atomic notifier call chain
>> + * and thus happens atomically and can NOT sleep.  As a result
>> + * we can't and really don't need to add any locks to guard the
>> + * RCU.
>> + */
>>  static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
>>  				void *ptr)
>>  {
>>  	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
>> -	struct sctp_sockaddr_entry *addr;
>> -	struct list_head *pos, *temp;
>> +	struct sctp_sockaddr_entry *addr = NULL;
>> +	struct sctp_sockaddr_entry *temp;
>>
>>  	switch (ev) {
>>  	case NETDEV_UP:
>> @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
>>  			memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
>>  				 sizeof(struct in6_addr));
>>  			addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
>> -			list_add_tail(&addr->list, &sctp_local_addr_list);
>> +			addr->valid = 1;
>> +			rcu_read_lock();
>> +			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
>> +			rcu_read_unlock();
> 
> If we are under the protection of the update-side mutex, the rcu_read_lock()
> and rcu_read_unlock() are (harmlessly) redundant.  If we are not under
> the protection of some mutex, what prevents concurrent list_add_tail_rcu()
> calls from messing up the sctp_sockaddr_entry list?

This is an atomic notifier call chain event and as such can be called from a
softirq. So i think we need a spin_lock_bh here.

> 
>>  		}
>>  		break;
>>  	case NETDEV_DOWN:
>> -		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>> -			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> -			if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
>> -				list_del(pos);
>> -				kfree(addr);
>> +		rcu_read_lock();
>> +		list_for_each_entry_safe(addr, temp,
>> +					&sctp_local_addr_list, list) {
>> +			if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
>> +					     &ifa->addr)) {
>> +				addr->valid = 0;
>> +				list_del_rcu(&addr->list);
>>  				break;
>>  			}
>>  		}
>> -
>> +		rcu_read_unlock();
>> +		if (addr && !addr->valid)
>> +			call_rcu(&addr->rcu, sctp_local_addr_free);
> 
> Are we under the protection of the update-side lock here?  If not,
> what prevents two different tasks from executing this in parallel,
> potentially tangling both the list that the sctp_sockaddr_entry list and
> the internal RCU lists?  (It is forbidden to call_rcu() a given element
> twice concurrently.)
> 
> If we are in fact under the protection of the update-side lock, the
> rcu_read_lock() and rcu_read_unlock() pairs are redundant (though this
> is harmless, aside from the (small) potential for confusion).

There is no update-side lock protection here. We need a spin_lock_bh().

> 
>>  		break;
>>  	}
>>
>> @@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
>>  			addr->a.v6.sin6_addr = ifp->addr;
>>  			addr->a.v6.sin6_scope_id = dev->ifindex;
>>  			INIT_LIST_HEAD(&addr->list);
>> +			INIT_RCU_HEAD(&addr->rcu);
>>  			list_add_tail(&addr->list, addrlist);
>>  		}
>>  	}
>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>> index e98579b..ac52f9e 100644
>> --- a/net/sctp/protocol.c
>> +++ b/net/sctp/protocol.c
>> @@ -153,6 +153,7 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
>>  			addr->a.v4.sin_family = AF_INET;
>>  			addr->a.v4.sin_port = 0;
>>  			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
>> +			INIT_RCU_HEAD(&addr->rcu);
>>  			list_add_tail(&addr->list, addrlist);
>>  		}
>>  	}
>> @@ -192,16 +193,24 @@ static void sctp_free_local_addr_list(void)
>>  	}
>>  }
>>
>> +void sctp_local_addr_free(struct rcu_head *head)
>> +{
>> +	struct sctp_sockaddr_entry *e = container_of(head,
>> +				struct sctp_sockaddr_entry, rcu);
>> +	kfree(e);
>> +}
>> +
>>  /* Copy the local addresses which are valid for 'scope' into 'bp'.  */
>>  int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>>  			      gfp_t gfp, int copy_flags)
>>  {
>>  	struct sctp_sockaddr_entry *addr;
>>  	int error = 0;
>> -	struct list_head *pos, *temp;
>>
>> -	list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
>> +		if (!addr->valid)
>> +			continue;
> 
> What happens if the update-side code removes the element from the list
> and marks it !->valid right here?
> 
> If this turns out to be harmless, why not just dispense with the ->valid
> flag entirely?

It should be OK if an address gets removed from the list. So i agree that
->valid flag is not really useful.

> 
>>  		if (sctp_in_scope(&addr->a, scope)) {
>>  			/* Now that the address is in scope, check to see if
>>  			 * the address type is really supported by the local
>> @@ -221,6 +230,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
>>  	}
>>
>>  end_copy:
>> +	rcu_read_unlock();
>>  	return error;
>>  }
>>
>> @@ -600,13 +610,17 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
>>  	seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
>>  }
>>
>> -/* Event handler for inet address addition/deletion events.  */
>> +/* Event handler for inet address addition/deletion events.
>> + * This is part of the blocking notifier call chain that is
>> + * guarted by a mutex.  As a result we don't need to add
>> + * any additional guards for the RCU
>> + */
>>  static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
>>  			       void *ptr)
>>  {
>>  	struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
>> -	struct sctp_sockaddr_entry *addr;
>> -	struct list_head *pos, *temp;
>> +	struct sctp_sockaddr_entry *addr = NULL;
>> +	struct sctp_sockaddr_entry *temp;
>>
>>  	switch (ev) {
>>  	case NETDEV_UP:
>> @@ -615,19 +629,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
>>  			addr->a.v4.sin_family = AF_INET;
>>  			addr->a.v4.sin_port = 0;
>>  			addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
>> -			list_add_tail(&addr->list, &sctp_local_addr_list);
>> +			addr->valid = 1;
>> +			rcu_read_lock();
>> +			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
>> +			rcu_read_unlock();
> 
> Based on the additions to the header comment, I am assuming that we
> hold an update-side mutex.  This means that the rcu_read_lock() and
> rcu_read_unlock() are (harmlessly) redundant.

This is called via a blocking notifier call chain and hence we could
protect using an update-side mutex. But considering that sctp_inet6addr_event
requires a spin_lock_bh(), may be we should use it here also to make it
simple.
> 
>>  		}
>>  		break;
>>  	case NETDEV_DOWN:
>> -		list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>> -			addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +		rcu_read_lock();
>> +		list_for_each_entry_safe(addr, temp,
>> +					&sctp_local_addr_list, list) {
>>  			if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
>> -				list_del(pos);
>> -				kfree(addr);
>> +				addr->valid = 0;
>> +				list_del_rcu(&addr->list);
>>  				break;
>>  			}
>>  		}
>> -
>> +		rcu_read_unlock();
> 
> Ditto.
> 
>> +		if (addr && !addr->valid)
>> +			call_rcu(&addr->rcu, sctp_local_addr_free);
> 
> This one is OK, since we hold the update-side mutex.
> 
>>  		break;
>>  	}
>>
>> @@ -1227,6 +1247,9 @@ SCTP_STATIC __exit void sctp_exit(void)
>>  	sctp_v6_del_protocol();
>>  	inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
>>
>> +	/* Unregister notifier for inet address additions/deletions. */
>> +	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
>> +
>>  	/* Free the local address list.  */
>>  	sctp_free_local_addr_list();
>>
>> @@ -1240,9 +1263,6 @@ SCTP_STATIC __exit void sctp_exit(void)
>>  	inet_unregister_protosw(&sctp_stream_protosw);
>>  	inet_unregister_protosw(&sctp_seqpacket_protosw);
>>
>> -	/* Unregister notifier for inet address additions/deletions. */
>> -	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
>> -
>>  	sctp_sysctl_unregister();
>>  	list_del(&sctp_ipv4_specific.list);
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 3335460..a3acf78 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>>  					       int __user *optlen)
>>  {
>>  	sctp_assoc_t id;
>> +	struct list_head *pos;
>>  	struct sctp_bind_addr *bp;
>>  	struct sctp_association *asoc;
>> -	struct list_head *pos, *temp;
>>  	struct sctp_sockaddr_entry *addr;
>>  	rwlock_t *addr_lock;
>>  	int cnt = 0;
>> @@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>>  		addr = list_entry(bp->address_list.next,
>>  				  struct sctp_sockaddr_entry, list);
>>  		if (sctp_is_any(&addr->a)) {
>> -			list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>> -				addr = list_entry(pos,
>> -						  struct sctp_sockaddr_entry,
>> -						  list);
>> +			rcu_read_lock();
>> +			list_for_each_entry_rcu(addr,
>> +						&sctp_local_addr_list, list) {
>> +				if (!addr->valid)
>> +					continue;
>> +
> 
> Again, what happens if the element is deleted just at this point?
> If harmless, might be good to get rid of ->valid.
> 
>>  				if ((PF_INET == sk->sk_family) &&
>>  				    (AF_INET6 == addr->a.sa.sa_family))
>>  					continue;
>> +
>>  				cnt++;
>>  			}
>> +			rcu_read_unlock();
> 
> We are just counting these things, right?  If on the other hand we are
> keeping a reference outside of rcu_read_lock() protection, then there
> needs to be some explicit mechanism preventing the corresponding entry
> from being freed.
> 
>>  		} else {
>>  			cnt = 1;
>>  		}
>> @@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>>  					int max_addrs, void *to,
>>  					int *bytes_copied)
>>  {
>> -	struct list_head *pos, *next;
>>  	struct sctp_sockaddr_entry *addr;
>>  	union sctp_addr temp;
>>  	int cnt = 0;
>>  	int addrlen;
>>
>> -	list_for_each_safe(pos, next, &sctp_local_addr_list) {
>> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
>> +		if (!addr->valid)
>> +			continue;
>> +
> 
> Same question as before -- what happens if the element is deleted by some
> other CPU (thus clearing ->valid) just at this point?
> 
>>  		if ((PF_INET == sk->sk_family) &&
>>  		    (AF_INET6 == addr->a.sa.sa_family))
>>  			continue;
>> @@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>>  		cnt ++;
>>  		if (cnt >= max_addrs) break;
>>  	}
>> +	rcu_read_unlock();
>>
>>  	return cnt;
>>  }
>> @@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
>>  static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>>  			    size_t space_left, int *bytes_copied)
>>  {
>> -	struct list_head *pos, *next;
>>  	struct sctp_sockaddr_entry *addr;
>>  	union sctp_addr temp;
>>  	int cnt = 0;
>>  	int addrlen;
>>
>> -	list_for_each_safe(pos, next, &sctp_local_addr_list) {
>> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
>> +		if (!addr->valid)
>> +			continue;
>> +
> 
> And the same question here as well...
> 
>>  		if ((PF_INET == sk->sk_family) &&
>>  		    (AF_INET6 == addr->a.sa.sa_family))
>>  			continue;
>> @@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>>  		sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
>>  								&temp);
>>  		addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
>> -		if (space_left < addrlen)
>> -			return -ENOMEM;
>> +		if (space_left < addrlen) {
>> +			cnt =  -ENOMEM;
>> +			break;
>> +		}
>>  		memcpy(to, &temp, addrlen);
>>
>>  		to += addrlen;
>> @@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
>>  		space_left -= addrlen;
>>  		*bytes_copied += addrlen;
>>  	}
>> +	rcu_read_unlock();
>>
>>  	return cnt;
>>  }
>> -- 
>> 1.5.2.4
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
  2007-09-11  7:24     ` Sridhar Samudrala
@ 2007-09-11 14:05       ` Vlad Yasevich
  2007-09-12 15:20         ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Vlad Yasevich @ 2007-09-11 14:05 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: paulmck, netdev, lksctp-developers

Sridhar, Paul

Thanks for review.  Some answers and questions below...

Sridhar Samudrala wrote:
> Paul E. McKenney wrote:
>> On Mon, Sep 10, 2007 at 03:46:29PM -0400, Vlad Yasevich wrote:
>>> sctp_localaddr_list is modified dynamically via NETDEV_UP
>>> and NETDEV_DOWN events, but there is not synchronization
>>> between writer (even handler) and readers.  As a result,
>>> the readers can access an entry that has been freed and
>>> crash the sytem.
>>
>> Good start, but few questions interspersed below...
>>
>>                             Thanx, Paul
>>

>>> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
>>> index f8aa23d..fc2e4e2 100644
>>> --- a/net/sctp/ipv6.c
>>> +++ b/net/sctp/ipv6.c
>>> @@ -77,13 +77,18 @@
>>>
>>>  #include <asm/uaccess.h>
>>>
>>> -/* Event handler for inet6 address addition/deletion events.  */
>>> +/* Event handler for inet6 address addition/deletion events.
>>> + * This even is part of the atomic notifier call chain
>>> + * and thus happens atomically and can NOT sleep.  As a result
>>> + * we can't and really don't need to add any locks to guard the
>>> + * RCU.
>>> + */
>>>  static int sctp_inet6addr_event(struct notifier_block *this,
>>> unsigned long ev,
>>>                  void *ptr)
>>>  {
>>>      struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
>>> -    struct sctp_sockaddr_entry *addr;
>>> -    struct list_head *pos, *temp;
>>> +    struct sctp_sockaddr_entry *addr = NULL;
>>> +    struct sctp_sockaddr_entry *temp;
>>>
>>>      switch (ev) {
>>>      case NETDEV_UP:
>>> @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct
>>> notifier_block *this, unsigned long ev,
>>>              memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
>>>                   sizeof(struct in6_addr));
>>>              addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
>>> -            list_add_tail(&addr->list, &sctp_local_addr_list);
>>> +            addr->valid = 1;
>>> +            rcu_read_lock();
>>> +            list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
>>> +            rcu_read_unlock();
>>
>> If we are under the protection of the update-side mutex, the
>> rcu_read_lock()
>> and rcu_read_unlock() are (harmlessly) redundant.  If we are not under
>> the protection of some mutex, what prevents concurrent
>> list_add_tail_rcu()
>> calls from messing up the sctp_sockaddr_entry list?
> 
> This is an atomic notifier call chain event and as such can be called
> from a
> softirq. So i think we need a spin_lock_bh here.

But the question is, can two notifiers be called at the same time?
If yes, then I think there is need for spin lock protection.  (and I think
bonding might need that too).

> 
>>
>>>          }
>>>          break;
>>>      case NETDEV_DOWN:
>>> -        list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>>> -            addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> -            if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
>>> -                list_del(pos);
>>> -                kfree(addr);
>>> +        rcu_read_lock();
>>> +        list_for_each_entry_safe(addr, temp,
>>> +                    &sctp_local_addr_list, list) {
>>> +            if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
>>> +                         &ifa->addr)) {
>>> +                addr->valid = 0;
>>> +                list_del_rcu(&addr->list);
>>>                  break;
>>>              }
>>>          }
>>> -
>>> +        rcu_read_unlock();
>>> +        if (addr && !addr->valid)
>>> +            call_rcu(&addr->rcu, sctp_local_addr_free);
>>
>> Are we under the protection of the update-side lock here?  If not,
>> what prevents two different tasks from executing this in parallel,
>> potentially tangling both the list that the sctp_sockaddr_entry list and
>> the internal RCU lists?  (It is forbidden to call_rcu() a given element
>> twice concurrently.)
>>
>> If we are in fact under the protection of the update-side lock, the
>> rcu_read_lock() and rcu_read_unlock() pairs are redundant (though this
>> is harmless, aside from the (small) potential for confusion).
> 
> There is no update-side lock protection here. We need a spin_lock_bh().

Recurring theme.  Same questions about notifiers apply.

> 
>>
>>>          break;
>>>      }
>>>
>>> @@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct
>>> list_head *addrlist,
>>>              addr->a.v6.sin6_addr = ifp->addr;
>>>              addr->a.v6.sin6_scope_id = dev->ifindex;
>>>              INIT_LIST_HEAD(&addr->list);
>>> +            INIT_RCU_HEAD(&addr->rcu);
>>>              list_add_tail(&addr->list, addrlist);
>>>          }
>>>      }
>>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>>> index e98579b..ac52f9e 100644
>>> --- a/net/sctp/protocol.c
>>> +++ b/net/sctp/protocol.c
>>> @@ -153,6 +153,7 @@ static void sctp_v4_copy_addrlist(struct
>>> list_head *addrlist,
>>>              addr->a.v4.sin_family = AF_INET;
>>>              addr->a.v4.sin_port = 0;
>>>              addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
>>> +            INIT_RCU_HEAD(&addr->rcu);
>>>              list_add_tail(&addr->list, addrlist);
>>>          }
>>>      }
>>> @@ -192,16 +193,24 @@ static void sctp_free_local_addr_list(void)
>>>      }
>>>  }
>>>
>>> +void sctp_local_addr_free(struct rcu_head *head)
>>> +{
>>> +    struct sctp_sockaddr_entry *e = container_of(head,
>>> +                struct sctp_sockaddr_entry, rcu);
>>> +    kfree(e);
>>> +}
>>> +
>>>  /* Copy the local addresses which are valid for 'scope' into 'bp'.  */
>>>  int sctp_copy_local_addr_list(struct sctp_bind_addr *bp,
>>> sctp_scope_t scope,
>>>                    gfp_t gfp, int copy_flags)
>>>  {
>>>      struct sctp_sockaddr_entry *addr;
>>>      int error = 0;
>>> -    struct list_head *pos, *temp;
>>>
>>> -    list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>>> -        addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> +    rcu_read_lock();
>>> +    list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
>>> +        if (!addr->valid)
>>> +            continue;
>>
>> What happens if the update-side code removes the element from the list
>> and marks it !->valid right here?
>>
>> If this turns out to be harmless, why not just dispense with the ->valid
>> flag entirely?
> 
> It should be OK if an address gets removed from the list. So i agree that
> ->valid flag is not really useful.

I added the valid flag because we don't really want to give back addresses that
are going away at the rcu quiescent state.  I agree, there is a race between
notifier marking the address invalid, and these checks on the reader side.   This
was modeled after other code that uses similar semantics.

> 
>>
>>>          if (sctp_in_scope(&addr->a, scope)) {
>>>              /* Now that the address is in scope, check to see if
>>>               * the address type is really supported by the local
>>> @@ -221,6 +230,7 @@ int sctp_copy_local_addr_list(struct
>>> sctp_bind_addr *bp, sctp_scope_t scope,
>>>      }
>>>
>>>  end_copy:
>>> +    rcu_read_unlock();
>>>      return error;
>>>  }
>>>
>>> @@ -600,13 +610,17 @@ static void sctp_v4_seq_dump_addr(struct
>>> seq_file *seq, union sctp_addr *addr)
>>>      seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
>>>  }
>>>
>>> -/* Event handler for inet address addition/deletion events.  */
>>> +/* Event handler for inet address addition/deletion events.
>>> + * This is part of the blocking notifier call chain that is
>>> + * guarted by a mutex.  As a result we don't need to add
>>> + * any additional guards for the RCU
>>> + */
>>>  static int sctp_inetaddr_event(struct notifier_block *this, unsigned
>>> long ev,
>>>                     void *ptr)
>>>  {
>>>      struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
>>> -    struct sctp_sockaddr_entry *addr;
>>> -    struct list_head *pos, *temp;
>>> +    struct sctp_sockaddr_entry *addr = NULL;
>>> +    struct sctp_sockaddr_entry *temp;
>>>
>>>      switch (ev) {
>>>      case NETDEV_UP:
>>> @@ -615,19 +629,25 @@ static int sctp_inetaddr_event(struct
>>> notifier_block *this, unsigned long ev,
>>>              addr->a.v4.sin_family = AF_INET;
>>>              addr->a.v4.sin_port = 0;
>>>              addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
>>> -            list_add_tail(&addr->list, &sctp_local_addr_list);
>>> +            addr->valid = 1;
>>> +            rcu_read_lock();
>>> +            list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
>>> +            rcu_read_unlock();
>>
>> Based on the additions to the header comment, I am assuming that we
>> hold an update-side mutex.  This means that the rcu_read_lock() and
>> rcu_read_unlock() are (harmlessly) redundant.
> 
> This is called via a blocking notifier call chain and hence we could
> protect using an update-side mutex. But considering that
> sctp_inet6addr_event
> requires a spin_lock_bh(), may be we should use it here also to make it
> simple.

This again boils down the question of whether notifiers can be running concurrently...
If yes, then spin_lock_bh is the right choice.


>>
>>>          }
>>>          break;
>>>      case NETDEV_DOWN:
>>> -        list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>>> -            addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> +        rcu_read_lock();
>>> +        list_for_each_entry_safe(addr, temp,
>>> +                    &sctp_local_addr_list, list) {
>>>              if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
>>> -                list_del(pos);
>>> -                kfree(addr);
>>> +                addr->valid = 0;
>>> +                list_del_rcu(&addr->list);
>>>                  break;
>>>              }
>>>          }
>>> -
>>> +        rcu_read_unlock();
>>
>> Ditto.
>>
>>> +        if (addr && !addr->valid)
>>> +            call_rcu(&addr->rcu, sctp_local_addr_free);
>>
>> This one is OK, since we hold the update-side mutex.

This would need to be protected as well, if the answer to the notifier
question is yes.

>>
>>>          break;
>>>      }
>>>
>>> @@ -1227,6 +1247,9 @@ SCTP_STATIC __exit void sctp_exit(void)
>>>      sctp_v6_del_protocol();
>>>      inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
>>>
>>> +    /* Unregister notifier for inet address additions/deletions. */
>>> +    unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
>>> +
>>>      /* Free the local address list.  */
>>>      sctp_free_local_addr_list();
>>>
>>> @@ -1240,9 +1263,6 @@ SCTP_STATIC __exit void sctp_exit(void)
>>>      inet_unregister_protosw(&sctp_stream_protosw);
>>>      inet_unregister_protosw(&sctp_seqpacket_protosw);
>>>
>>> -    /* Unregister notifier for inet address additions/deletions. */
>>> -    unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
>>> -
>>>      sctp_sysctl_unregister();
>>>      list_del(&sctp_ipv4_specific.list);
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index 3335460..a3acf78 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -4057,9 +4057,9 @@ static int
>>> sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>>>                             int __user *optlen)
>>>  {
>>>      sctp_assoc_t id;
>>> +    struct list_head *pos;
>>>      struct sctp_bind_addr *bp;
>>>      struct sctp_association *asoc;
>>> -    struct list_head *pos, *temp;
>>>      struct sctp_sockaddr_entry *addr;
>>>      rwlock_t *addr_lock;
>>>      int cnt = 0;
>>> @@ -4096,15 +4096,19 @@ static int
>>> sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
>>>          addr = list_entry(bp->address_list.next,
>>>                    struct sctp_sockaddr_entry, list);
>>>          if (sctp_is_any(&addr->a)) {
>>> -            list_for_each_safe(pos, temp, &sctp_local_addr_list) {
>>> -                addr = list_entry(pos,
>>> -                          struct sctp_sockaddr_entry,
>>> -                          list);
>>> +            rcu_read_lock();
>>> +            list_for_each_entry_rcu(addr,
>>> +                        &sctp_local_addr_list, list) {
>>> +                if (!addr->valid)
>>> +                    continue;
>>> +
>>
>> Again, what happens if the element is deleted just at this point?
>> If harmless, might be good to get rid of ->valid.
>>
>>>                  if ((PF_INET == sk->sk_family) &&
>>>                      (AF_INET6 == addr->a.sa.sa_family))
>>>                      continue;
>>> +
>>>                  cnt++;
>>>              }
>>> +            rcu_read_unlock();
>>
>> We are just counting these things, right?  If on the other hand we are
>> keeping a reference outside of rcu_read_lock() protection, then there
>> needs to be some explicit mechanism preventing the corresponding entry
>> from being freed.

In this particular case, we are just counting.  There are other cases,
we make a copy of the address in the list.  The goal was to reduce the
probability that an address that is about to be deleted at the rcu
quiescent state will not be copied/counted.

My other thought was to use atomics, but with all the yelling about atomic_read(),
that didn't seem any better then a simple __u8 flag.

Thanks
-vlad

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
  2007-09-10 22:08   ` Paul E. McKenney
@ 2007-09-11 14:56     ` Vlad Yasevich
  2007-09-12 16:50       ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Vlad Yasevich @ 2007-09-11 14:56 UTC (permalink / raw)
  To: paulmck; +Cc: netdev, lksctp-developers

Hi Paul

Thanks for review.  I'll leave out the comments about
the ->valid usage since there are the same as the first patch
in the series.

Other questions/responses below...

Paul E. McKenney wrote:
>>
>> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
>> index 7fc369f..9c7db1f 100644
>> --- a/net/sctp/bind_addr.c
>> +++ b/net/sctp/bind_addr.c
>> @@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>>
>>  	INIT_LIST_HEAD(&addr->list);
>>  	INIT_RCU_HEAD(&addr->rcu);
>> -	list_add_tail(&addr->list, &bp->address_list);
>> +
>> +	rcu_read_lock();
>> +	list_add_tail_rcu(&addr->list, &bp->address_list);
>> +	rcu_read_unlock();
> 
> Given the original code, we presumably hold the update-side lock.  If so,
> the rcu_read_lock() and rcu_read_unlock() are (harmlessly) redundant.
> 

Yes, it this case, the writer would already hold the socket lock.  However, I was told during
private review that even writers need to be in rcu critical section.  Looking at the different
users of RCU, it seems there is no consistency, i.e. sometimes writers take the rcu_read_lock()
and sometimes the don't.

Is there a rule of when writer needs to be in rcu critical section?

>>  	SCTP_DBG_OBJCNT_INC(addr);
>>
>>  	return 0;
>> @@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
>>   */
>>  int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
>>  {
>> -	struct list_head *pos, *temp;
>> -	struct sctp_sockaddr_entry *addr;
>> +	struct sctp_sockaddr_entry *addr, *temp;
>>
>> -	list_for_each_safe(pos, temp, &bp->address_list) {
>> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +	rcu_read_lock_bh();
>> +	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
>>  		if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
>>  			/* Found the exact match. */
>> -			list_del(pos);
>> -			kfree(addr);
>> -			SCTP_DBG_OBJCNT_DEC(addr);
>> -
>> -			return 0;
>> +			addr->valid = 0;
>> +			list_del_rcu(&addr->list);
>> +			break;
>>  		}
>>  	}
>> +	rcu_read_unlock_bh();
> 
> Ditto.
> 
>> +
>> +	if (addr && !addr->valid) {
>> +		call_rcu_bh(&addr->rcu, sctp_local_addr_free);
>> +		SCTP_DBG_OBJCNT_DEC(addr);
>> +	}
>>
>>  	return -EINVAL;
>>  }

>> @@ -325,27 +336,31 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
>>  	union sctp_addr			*addr;
>>  	void 				*addr_buf;
>>  	struct sctp_af			*af;
>> -	struct list_head		*pos;
>>  	int				i;
>>
>> -	list_for_each(pos, &bp->address_list) {
>> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
>> +		if (!laddr->valid)
>> +			continue;
> 
> Ditto...
> 
>>  		addr_buf = (union sctp_addr *)addrs;
>>  		for (i = 0; i < addrcnt; i++) {
>>  			addr = (union sctp_addr *)addr_buf;
>>  			af = sctp_get_af_specific(addr->v4.sin_family);
>>  			if (!af)
>> -				return NULL;
>> +				break;
>>
>>  			if (opt->pf->cmp_addr(&laddr->a, addr, opt))
>>  				break;
>>
>>  			addr_buf += af->sockaddr_len;
>>  		}
>> -		if (i == addrcnt)
>> +		if (i == addrcnt) {
>> +			rcu_read_unlock();
> 
> Since rcu_read_unlock() just happened, some other CPU is free to
> free up this data structure.  In a CONFIG_PREEMPT kernel (as well as a
> CONFIG_PREEMPT_RT kernel, for that matter), this task might be preempted
> at this point, and a full grace period might elapse.
> 
> In which case, the following statement returns a pointer to the freelist,
> which is not good.

Hm...  my saving grace here is that this happens under the protection of the
socket lock so the rcu_read_lock is again potentially redundant.  If it wasn't
for that socket lock, the original code would also have the same race. 

> 
>>  			return &laddr->a;
>> +		}
>>  	}
>> +	rcu_read_unlock();
>>
>>  	return NULL;
>>  }

>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>> index 79856c9..caaa29f 100644
>> --- a/net/sctp/sm_make_chunk.c
>> +++ b/net/sctp/sm_make_chunk.c
>> @@ -1531,7 +1531,7 @@ no_hmac:
>>  	/* Also, add the destination address. */
>>  	if (list_empty(&retval->base.bind_addr.address_list)) {
>>  		sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
>> -				   GFP_ATOMIC);
>> +				GFP_ATOMIC);
>>  	}
>>
>>  	retval->next_tsn = retval->c.initial_tsn;
>> @@ -2613,22 +2613,17 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
>>
>>  	switch (asconf_param->param_hdr.type) {
>>  	case SCTP_PARAM_ADD_IP:
>> -		sctp_local_bh_disable();
>> -		sctp_write_lock(&asoc->base.addr_lock);
>> -		list_for_each(pos, &bp->address_list) {
>> -			saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
>> +		rcu_read_lock_bh();
>> +		list_for_each_entry_rcu(saddr, &bp->address_list, list) {
>> +			if (!saddr->valid)
>> +				continue;
>>  			if (sctp_cmp_addr_exact(&saddr->a, &addr))
>>  				saddr->use_as_src = 1;
>>  		}
>> -		sctp_write_unlock(&asoc->base.addr_lock);
>> -		sctp_local_bh_enable();
>> +		rcu_read_unlock_bh();
> 
> If you use rcu_read_lock_bh() and rcu_read_unlock_bh() in one read path
> for a given data structure, you need to use them in all the other read
> paths for that data structure.  In addition, you must use call_rcu_bh()
> when deleting the corresponding data elements.
> 
> The normal and the _bh RCU grace periods are unrelated, so mixing them
> for a given RCU-protected data structure is a bad idea.  (Or are these
> somehow two independent data structures?)

It's the same data structure, but update some of its contents in the softirq
context. Thankfully we don't change the list in this case.  As you can see,
the code was calling local_bh_disable() before, so I was trying to preserve
that condition.

This condition also happens under the synchronization of the socket lock, so
rcu may be potentially redundant here.  It might be sufficient to simply
call local_bh_disable() for this particular case.

> 
>>  		break;
>>  	case SCTP_PARAM_DEL_IP:
>> -		sctp_local_bh_disable();
>> -		sctp_write_lock(&asoc->base.addr_lock);
>>  		retval = sctp_del_bind_addr(bp, &addr);
>> -		sctp_write_unlock(&asoc->base.addr_lock);
>> -		sctp_local_bh_enable();

This one is actually the one I worry more about.  If you look close to 
the beginning of this patch at sctp_del_bind_addr(), you'll see that
it also uses BH conventions for it's RCU calls and uses call_rcu_bh()
to free up the entry.

However, given what you said about the grace periods being unrelated, I
am questioning that condition.

I think I'll need 2 version of the del_bind_addr() call, depending on
which list I am cleaning up.

There is a list on the 'endpoint' structure, that we add to and remove from
only in user context.
There is also a list on the 'association' structure, the we add to in user
context, but remove from in BH context only.  I think this one will
need be cleaned up using rcu_read_[un]lock_bh() and call_rcu_bh() calls, while
the first one can use regular rcu calls.  Does that sound generally right?

>>  		list_for_each(pos, &asoc->peer.transport_addr_list) {
>>  			transport = list_entry(pos, struct sctp_transport,
>>  						 transports);
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index a3acf78..35cc30c 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
>>  	if (!bp->port)
>>  		bp->port = inet_sk(sk)->num;
>>
>> -	/* Add the address to the bind address list.  */
>> -	sctp_local_bh_disable();
>> -	sctp_write_lock(&ep->base.addr_lock);
>> -
>> -	/* Use GFP_ATOMIC since BHs are disabled.  */
>> +	/* Add the address to the bind address list.
>> +	 * Use GFP_ATOMIC since BHs will be disabled.
>> +	 */
>>  	ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
>> -	sctp_write_unlock(&ep->base.addr_lock);
>> -	sctp_local_bh_enable();
>>
>>  	/* Copy back into socket for getsockname() use. */
>>  	if (!ret) {
>> @@ -497,7 +493,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>>  	void				*addr_buf;
>>  	struct sctp_af			*af;
>>  	struct list_head		*pos;
>> -	struct list_head		*p;
>>  	int 				i;
>>  	int 				retval = 0;
>>
>> @@ -544,14 +539,15 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>>  		if (i < addrcnt)
>>  			continue;
>>
>> -		/* Use the first address in bind addr list of association as
>> -		 * Address Parameter of ASCONF CHUNK.
>> +		/* Use the first valid address in bind addr list of
>> +		 * association as Address Parameter of ASCONF CHUNK.
>>  		 */
>> -		sctp_read_lock(&asoc->base.addr_lock);
>>  		bp = &asoc->base.bind_addr;
>> -		p = bp->address_list.next;
>> -		laddr = list_entry(p, struct sctp_sockaddr_entry, list);
>> -		sctp_read_unlock(&asoc->base.addr_lock);
>> +		rcu_read_lock();
>> +		list_for_each_entry_rcu(laddr, &bp->address_list, list)
>> +			if (laddr->valid)
>> +				break;
>> +		rcu_read_unlock();
> 
> Here you are carrying an RCU-protected data item (*laddr) outside of an
> rcu_read_lock()/rcu_read_unlock() pair.  This is not good -- you need
> to move the rcu_read_unlock() farther down to cover the full extend to
> uses of the laddr pointer.
> 
> Again, RCU is within its rights allowing a grace period to elapse, so
> that past this point, laddr might well point into the freelist.
> 

This is another area, where we may not even need rcu locking, since we
are already holding a socket lock that guarantees that the list will not change.
This was a blind conversion on my part, and I can probably restore the simple
list_entry() dereference there.

<snip the rest, there were no comments>

Thanks a lot
-vlad

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
  2007-09-11 14:05       ` Vlad Yasevich
@ 2007-09-12 15:20         ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2007-09-12 15:20 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Sridhar Samudrala, netdev, lksctp-developers

On Tue, Sep 11, 2007 at 10:05:10AM -0400, Vlad Yasevich wrote:
> Sridhar, Paul
> 
> Thanks for review.  Some answers and questions below...

NP!

> Sridhar Samudrala wrote:
> > Paul E. McKenney wrote:

[ . . . ]

> >>>                  if ((PF_INET == sk->sk_family) &&
> >>>                      (AF_INET6 == addr->a.sa.sa_family))
> >>>                      continue;
> >>> +
> >>>                  cnt++;
> >>>              }
> >>> +            rcu_read_unlock();
> >>
> >> We are just counting these things, right?  If on the other hand we are
> >> keeping a reference outside of rcu_read_lock() protection, then there
> >> needs to be some explicit mechanism preventing the corresponding entry
> >> from being freed.
> 
> In this particular case, we are just counting.  There are other cases,
> we make a copy of the address in the list.  The goal was to reduce the
> probability that an address that is about to be deleted at the rcu
> quiescent state will not be copied/counted.
> 
> My other thought was to use atomics, but with all the yelling about atomic_read(),
> that didn't seem any better then a simple __u8 flag.

If just counting, then no worries either way.  As long as you are counting
to a local variable, as in fact you are.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
  2007-09-11 14:56     ` Vlad Yasevich
@ 2007-09-12 16:50       ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2007-09-12 16:50 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers

On Tue, Sep 11, 2007 at 10:56:09AM -0400, Vlad Yasevich wrote:
> Hi Paul
> 
> Thanks for review.  I'll leave out the comments about
> the ->valid usage since there are the same as the first patch
> in the series.

Fair enough.

> Other questions/responses below...
> 
> Paul E. McKenney wrote:
> >>
> >> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> >> index 7fc369f..9c7db1f 100644
> >> --- a/net/sctp/bind_addr.c
> >> +++ b/net/sctp/bind_addr.c
> >> @@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> >>
> >>  	INIT_LIST_HEAD(&addr->list);
> >>  	INIT_RCU_HEAD(&addr->rcu);
> >> -	list_add_tail(&addr->list, &bp->address_list);
> >> +
> >> +	rcu_read_lock();
> >> +	list_add_tail_rcu(&addr->list, &bp->address_list);
> >> +	rcu_read_unlock();
> > 
> > Given the original code, we presumably hold the update-side lock.  If so,
> > the rcu_read_lock() and rcu_read_unlock() are (harmlessly) redundant.
> > 
> 
> Yes, it this case, the writer would already hold the socket lock.
> However, I was told during private review that even writers need to be
> in rcu critical section.  Looking at the different users of RCU, it seems
> there is no consistency, i.e. sometimes writers take the rcu_read_lock()
> and sometimes the don't.
> 
> Is there a rule of when writer needs to be in rcu critical section?

And the answer is...

	"It depends!!!"  ;-)

Normally, the writer does -not- need to be in an RCU read-side critical
section.  However, it really doesn't hurt anything, aside from possible
confusion for the people reading the code.  Here are some situations
where it makes sense to have RCU protection even though the update-side
lock is held:

1.	Read-side code discovers a need to update while in an RCU
	read-side critical section.  Then it is best to simply acquire the
	update-side lock while still in the RCU read-side critical section,
	do the update, and release the lock.

2.	Common code called both by readers and updaters will often do
	rcu_read_lock() -- better to have the updaters take an (often)
	unmeasurable increase in overhead than to duplicate the common
	code.

3.	Sometimes update-side code for one data structure involves
	read-side code for some other data structure.  In this case,
	one must of course hold the update-side lock for the first
	data structure -and- do rcu_read_lock() to protect access
	to the second data structure.

(Hey, you asked!!!)

> >>  	SCTP_DBG_OBJCNT_INC(addr);
> >>
> >>  	return 0;
> >> @@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> >>   */
> >>  int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
> >>  {
> >> -	struct list_head *pos, *temp;
> >> -	struct sctp_sockaddr_entry *addr;
> >> +	struct sctp_sockaddr_entry *addr, *temp;
> >>
> >> -	list_for_each_safe(pos, temp, &bp->address_list) {
> >> -		addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> >> +	rcu_read_lock_bh();
> >> +	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
> >>  		if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
> >>  			/* Found the exact match. */
> >> -			list_del(pos);
> >> -			kfree(addr);
> >> -			SCTP_DBG_OBJCNT_DEC(addr);
> >> -
> >> -			return 0;
> >> +			addr->valid = 0;
> >> +			list_del_rcu(&addr->list);
> >> +			break;
> >>  		}
> >>  	}
> >> +	rcu_read_unlock_bh();
> > 
> > Ditto.
> > 
> >> +
> >> +	if (addr && !addr->valid) {
> >> +		call_rcu_bh(&addr->rcu, sctp_local_addr_free);
> >> +		SCTP_DBG_OBJCNT_DEC(addr);
> >> +	}
> >>
> >>  	return -EINVAL;
> >>  }
> 
> >> @@ -325,27 +336,31 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr	*bp,
> >>  	union sctp_addr			*addr;
> >>  	void 				*addr_buf;
> >>  	struct sctp_af			*af;
> >> -	struct list_head		*pos;
> >>  	int				i;
> >>
> >> -	list_for_each(pos, &bp->address_list) {
> >> -		laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> >> +	rcu_read_lock();
> >> +	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> >> +		if (!laddr->valid)
> >> +			continue;
> > 
> > Ditto...
> > 
> >>  		addr_buf = (union sctp_addr *)addrs;
> >>  		for (i = 0; i < addrcnt; i++) {
> >>  			addr = (union sctp_addr *)addr_buf;
> >>  			af = sctp_get_af_specific(addr->v4.sin_family);
> >>  			if (!af)
> >> -				return NULL;
> >> +				break;
> >>
> >>  			if (opt->pf->cmp_addr(&laddr->a, addr, opt))
> >>  				break;
> >>
> >>  			addr_buf += af->sockaddr_len;
> >>  		}
> >> -		if (i == addrcnt)
> >> +		if (i == addrcnt) {
> >> +			rcu_read_unlock();
> > 
> > Since rcu_read_unlock() just happened, some other CPU is free to
> > free up this data structure.  In a CONFIG_PREEMPT kernel (as well as a
> > CONFIG_PREEMPT_RT kernel, for that matter), this task might be preempted
> > at this point, and a full grace period might elapse.
> > 
> > In which case, the following statement returns a pointer to the freelist,
> > which is not good.
> 
> Hm...  my saving grace here is that this happens under the protection of the
> socket lock so the rcu_read_lock is again potentially redundant.  If it wasn't
> for that socket lock, the original code would also have the same race. 

OK, no problem then.  Of course, that means that the rcu_read_lock() is
redundant.  ;-)

> >>  			return &laddr->a;
> >> +		}
> >>  	}
> >> +	rcu_read_unlock();
> >>
> >>  	return NULL;
> >>  }
> 
> >> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> >> index 79856c9..caaa29f 100644
> >> --- a/net/sctp/sm_make_chunk.c
> >> +++ b/net/sctp/sm_make_chunk.c
> >> @@ -1531,7 +1531,7 @@ no_hmac:
> >>  	/* Also, add the destination address. */
> >>  	if (list_empty(&retval->base.bind_addr.address_list)) {
> >>  		sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
> >> -				   GFP_ATOMIC);
> >> +				GFP_ATOMIC);
> >>  	}
> >>
> >>  	retval->next_tsn = retval->c.initial_tsn;
> >> @@ -2613,22 +2613,17 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
> >>
> >>  	switch (asconf_param->param_hdr.type) {
> >>  	case SCTP_PARAM_ADD_IP:
> >> -		sctp_local_bh_disable();
> >> -		sctp_write_lock(&asoc->base.addr_lock);
> >> -		list_for_each(pos, &bp->address_list) {
> >> -			saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
> >> +		rcu_read_lock_bh();
> >> +		list_for_each_entry_rcu(saddr, &bp->address_list, list) {
> >> +			if (!saddr->valid)
> >> +				continue;
> >>  			if (sctp_cmp_addr_exact(&saddr->a, &addr))
> >>  				saddr->use_as_src = 1;
> >>  		}
> >> -		sctp_write_unlock(&asoc->base.addr_lock);
> >> -		sctp_local_bh_enable();
> >> +		rcu_read_unlock_bh();
> > 
> > If you use rcu_read_lock_bh() and rcu_read_unlock_bh() in one read path
> > for a given data structure, you need to use them in all the other read
> > paths for that data structure.  In addition, you must use call_rcu_bh()
> > when deleting the corresponding data elements.
> > 
> > The normal and the _bh RCU grace periods are unrelated, so mixing them
> > for a given RCU-protected data structure is a bad idea.  (Or are these
> > somehow two independent data structures?)
> 
> It's the same data structure, but update some of its contents in the softirq
> context. Thankfully we don't change the list in this case.  As you can see,
> the code was calling local_bh_disable() before, so I was trying to preserve
> that condition.
> 
> This condition also happens under the synchronization of the socket lock, so
> rcu may be potentially redundant here.  It might be sufficient to simply
> call local_bh_disable() for this particular case.

You are free to invoke rcu_read_lock() under local_bh_disable() and vice
versa, if that helps.

> >>  		break;
> >>  	case SCTP_PARAM_DEL_IP:
> >> -		sctp_local_bh_disable();
> >> -		sctp_write_lock(&asoc->base.addr_lock);
> >>  		retval = sctp_del_bind_addr(bp, &addr);
> >> -		sctp_write_unlock(&asoc->base.addr_lock);
> >> -		sctp_local_bh_enable();
> 
> This one is actually the one I worry more about.  If you look close to 
> the beginning of this patch at sctp_del_bind_addr(), you'll see that
> it also uses BH conventions for it's RCU calls and uses call_rcu_bh()
> to free up the entry.
> 
> However, given what you said about the grace periods being unrelated, I
> am questioning that condition.
> 
> I think I'll need 2 version of the del_bind_addr() call, depending on
> which list I am cleaning up.

Another approach is to chain the two callbacks.  For example, use
call_rcu() from del_bind_addr(), and have the resulting callback
invoke call_rcu_bh(), and then free up the structure from that
callback.  Like this:

	...

	call_rcu(p, del_bind_addr_rcu_1);

	void del_bind_addr_rcu_1(struct rcu_head *head)
	{
		call_rcu_bh(head, del_bind_addr_rcu_2);
	}

Then del_bind_addr_rcu_2() actually frees the referenced struct, as
you already have in your patch.  This works as follows:

o	Once del_bind_addr_rcu_1() is invoked, all pre-existing RCU
	read-side critical sections using rcu_read_lock() have completed.
	Of course, RCU read-side critical sections that started -after-
	the original call_rcu() in del_bind_addr() might still be in
	progress, but those RCU read-side critical sections cannot
	possibly gain a reference to the structure being freed.

o	Once del_bind_addr_rcu_2() is invoked, all pre-existing RCU
	read-side critical sections using rcu_read_lock_bh() have
	completed.  Since both types of RCU read-side critical sections
	have now been taken care of, it is now safe to free up the
	structure.

The downside is that everything goes through two grace periods rather
than one.  You choice -- always use the same type of RCU read-side
critical section, or make updates go through the grace periods
corresponding to each type of RCU used by the readers.

> There is a list on the 'endpoint' structure, that we add to and remove from
> only in user context.
> There is also a list on the 'association' structure, the we add to in user
> context, but remove from in BH context only.  I think this one will
> need be cleaned up using rcu_read_[un]lock_bh() and call_rcu_bh() calls, while
> the first one can use regular rcu calls.  Does that sound generally right?

OK, yes, if you have two different data structures, then it is OK to
protect one with rcu_read_lock() and the other with rcu_read_lock_bh().
If you have a common piece of code that cleans a combined linked structure
containing both types of data structures, you will indeeed need to use
the corresponding call_rcu() variant on the data structure in question.

I suppose you could pass a flag to a common cleanup function to indicate
whether that common function should use call_rcu() or call_rcu_bh(),
or even pass in a pointer to the callback-registry function (call_rcu()
or call_rcu_bh()) itself.

> >>  		list_for_each(pos, &asoc->peer.transport_addr_list) {
> >>  			transport = list_entry(pos, struct sctp_transport,
> >>  						 transports);
> >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> >> index a3acf78..35cc30c 100644
> >> --- a/net/sctp/socket.c
> >> +++ b/net/sctp/socket.c
> >> @@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
> >>  	if (!bp->port)
> >>  		bp->port = inet_sk(sk)->num;
> >>
> >> -	/* Add the address to the bind address list.  */
> >> -	sctp_local_bh_disable();
> >> -	sctp_write_lock(&ep->base.addr_lock);
> >> -
> >> -	/* Use GFP_ATOMIC since BHs are disabled.  */
> >> +	/* Add the address to the bind address list.
> >> +	 * Use GFP_ATOMIC since BHs will be disabled.
> >> +	 */
> >>  	ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
> >> -	sctp_write_unlock(&ep->base.addr_lock);
> >> -	sctp_local_bh_enable();
> >>
> >>  	/* Copy back into socket for getsockname() use. */
> >>  	if (!ret) {
> >> @@ -497,7 +493,6 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
> >>  	void				*addr_buf;
> >>  	struct sctp_af			*af;
> >>  	struct list_head		*pos;
> >> -	struct list_head		*p;
> >>  	int 				i;
> >>  	int 				retval = 0;
> >>
> >> @@ -544,14 +539,15 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
> >>  		if (i < addrcnt)
> >>  			continue;
> >>
> >> -		/* Use the first address in bind addr list of association as
> >> -		 * Address Parameter of ASCONF CHUNK.
> >> +		/* Use the first valid address in bind addr list of
> >> +		 * association as Address Parameter of ASCONF CHUNK.
> >>  		 */
> >> -		sctp_read_lock(&asoc->base.addr_lock);
> >>  		bp = &asoc->base.bind_addr;
> >> -		p = bp->address_list.next;
> >> -		laddr = list_entry(p, struct sctp_sockaddr_entry, list);
> >> -		sctp_read_unlock(&asoc->base.addr_lock);
> >> +		rcu_read_lock();
> >> +		list_for_each_entry_rcu(laddr, &bp->address_list, list)
> >> +			if (laddr->valid)
> >> +				break;
> >> +		rcu_read_unlock();
> > 
> > Here you are carrying an RCU-protected data item (*laddr) outside of an
> > rcu_read_lock()/rcu_read_unlock() pair.  This is not good -- you need
> > to move the rcu_read_unlock() farther down to cover the full extend to
> > uses of the laddr pointer.
> > 
> > Again, RCU is within its rights allowing a grace period to elapse, so
> > that past this point, laddr might well point into the freelist.
> > 
> 
> This is another area, where we may not even need rcu locking, since we
> are already holding a socket lock that guarantees that the list will not change.
> This was a blind conversion on my part, and I can probably restore the simple
> list_entry() dereference there.
> 
> <snip the rest, there were no comments>

Fair enough!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-09-12 16:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-10 19:46 [RFC PATH 0/2] Add RCU locking to SCTP address management Vlad Yasevich
2007-09-10 19:46 ` [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list Vlad Yasevich
2007-09-10 21:47   ` Paul E. McKenney
2007-09-11  7:24     ` Sridhar Samudrala
2007-09-11 14:05       ` Vlad Yasevich
2007-09-12 15:20         ` Paul E. McKenney
2007-09-10 19:46 ` [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU Vlad Yasevich
2007-09-10 22:08   ` Paul E. McKenney
2007-09-11 14:56     ` Vlad Yasevich
2007-09-12 16:50       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).