netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/13] RDS bugfixes and cleanups
@ 2010-03-11 23:49 Andy Grover
  2010-03-11 23:49 ` [PATCH 01/13] RDS: Do not BUG() on error returned from ib_post_send Andy Grover
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:49 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Hi Dave,

These patches are against net-next, please apply.

Thanks -- Andy


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/13] RDS: Do not BUG() on error returned from ib_post_send
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
@ 2010-03-11 23:49 ` Andy Grover
  2010-03-11 23:49 ` [PATCH 02/13] RDS: sendmsg() should check sndtimeo, not rcvtimeo Andy Grover
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:49 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

BUGging on a runtime error code should be avoided. This
patch also eliminates all other BUG()s that have no real
reason to exist.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/ib_recv.c        |    4 ++--
 net/rds/ib_send.c        |    7 +++----
 net/rds/iw_recv.c        |    4 ++--
 net/rds/iw_send.c        |    3 +--
 net/rds/rdma_transport.c |    3 +--
 5 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 04dc0d3..c338881 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -468,8 +468,8 @@ static void rds_ib_send_ack(struct rds_ib_connection *ic, unsigned int adv_credi
 		set_bit(IB_ACK_REQUESTED, &ic->i_ack_flags);
 
 		rds_ib_stats_inc(s_ib_ack_send_failure);
-		/* Need to finesse this later. */
-		BUG();
+
+		rds_ib_conn_error(ic->conn, "sending ack failed\n");
 	} else
 		rds_ib_stats_inc(s_ib_ack_sent);
 }
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index a10fab6..f380c3f 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -574,8 +574,7 @@ int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
 		rds_ib_send_grab_credits(ic, 0, &posted, 1, RDS_MAX_ADV_CREDIT - adv_credits);
 		adv_credits += posted;
 		BUG_ON(adv_credits > 255);
-	} else if (ic->i_rm != rm)
-		BUG();
+	}
 
 	send = &ic->i_sends[pos];
 	first = send;
@@ -714,8 +713,8 @@ add_header:
 			ic->i_rm = prev->s_rm;
 			prev->s_rm = NULL;
 		}
-		/* Finesse this later */
-		BUG();
+
+		rds_ib_conn_error(ic->conn, "ib_post_send failed\n");
 		goto out;
 	}
 
diff --git a/net/rds/iw_recv.c b/net/rds/iw_recv.c
index 54af7d6..337e4e5 100644
--- a/net/rds/iw_recv.c
+++ b/net/rds/iw_recv.c
@@ -468,8 +468,8 @@ static void rds_iw_send_ack(struct rds_iw_connection *ic, unsigned int adv_credi
 		set_bit(IB_ACK_REQUESTED, &ic->i_ack_flags);
 
 		rds_iw_stats_inc(s_iw_ack_send_failure);
-		/* Need to finesse this later. */
-		BUG();
+
+		rds_iw_conn_error(ic->conn, "sending ack failed\n");
 	} else
 		rds_iw_stats_inc(s_iw_ack_sent);
 }
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index 1379e9d..52182ff 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -616,8 +616,7 @@ int rds_iw_xmit(struct rds_connection *conn, struct rds_message *rm,
 		rds_iw_send_grab_credits(ic, 0, &posted, 1, RDS_MAX_ADV_CREDIT - adv_credits);
 		adv_credits += posted;
 		BUG_ON(adv_credits > 255);
-	} else if (ic->i_rm != rm)
-		BUG();
+	}
 
 	send = &ic->i_sends[pos];
 	first = send;
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 9ece910..31f9c72 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -109,8 +109,7 @@ int rds_rdma_cm_event_handler(struct rdma_cm_id *cm_id,
 
 	default:
 		/* things like device disconnect? */
-		printk(KERN_ERR "unknown event %u\n", event->event);
-		BUG();
+		printk(KERN_ERR "RDS: unknown event %u!\n", event->event);
 		break;
 	}
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/13] RDS: sendmsg() should check sndtimeo, not rcvtimeo
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
  2010-03-11 23:49 ` [PATCH 01/13] RDS: Do not BUG() on error returned from ib_post_send Andy Grover
@ 2010-03-11 23:49 ` Andy Grover
  2010-03-11 23:49 ` [PATCH 03/13] RDS: update copy_to_user state in tcp transport Andy Grover
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:49 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Most likely cut n paste error - sendmsg() was checking sock_rcvtimeo.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/send.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/rds/send.c b/net/rds/send.c
index b2fccfc..ad2e469 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -815,7 +815,7 @@ int rds_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	int ret = 0;
 	int queued = 0, allocated_mr = 0;
 	int nonblock = msg->msg_flags & MSG_DONTWAIT;
-	long timeo = sock_rcvtimeo(sk, nonblock);
+	long timeo = sock_sndtimeo(sk, nonblock);
 
 	/* Mirror Linux UDP mirror of BSD error message compatibility */
 	/* XXX: Perhaps MSG_MORE someday */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/13] RDS: update copy_to_user state in tcp transport
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
  2010-03-11 23:49 ` [PATCH 01/13] RDS: Do not BUG() on error returned from ib_post_send Andy Grover
  2010-03-11 23:49 ` [PATCH 02/13] RDS: sendmsg() should check sndtimeo, not rcvtimeo Andy Grover
@ 2010-03-11 23:49 ` Andy Grover
  2010-03-11 23:49 ` [PATCH 04/13] RDS/TCP: Wait to wake thread when write space available Andy Grover
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:49 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Other transports use rds_page_copy_user, which updates our
s_copy_to_user counter. TCP doesn't, so it needs to explicity
call rds_stats_add().

Reported-by: Richard Frank <richard.frank@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/tcp_recv.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index c00daff..40bfcf8 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -97,6 +97,7 @@ int rds_tcp_inc_copy_to_user(struct rds_incoming *inc, struct iovec *first_iov,
 				goto out;
 			}
 
+			rds_stats_add(s_copy_to_user, to_copy);
 			size -= to_copy;
 			ret += to_copy;
 			skb_off += to_copy;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/13] RDS/TCP: Wait to wake thread when write space available
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (2 preceding siblings ...)
  2010-03-11 23:49 ` [PATCH 03/13] RDS: update copy_to_user state in tcp transport Andy Grover
@ 2010-03-11 23:49 ` Andy Grover
  2010-03-11 23:49 ` [PATCH 05/13] RDS: Fix congestion issues for loopback Andy Grover
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:49 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Instead of waking the send thread whenever any send space is available,
wait until it is at least half empty. This is modeled on how
sock_def_write_space() does it, and may help to minimize context
switches.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/tcp_send.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 34fdcc0..a28b895 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -240,7 +240,9 @@ void rds_tcp_write_space(struct sock *sk)
 	tc->t_last_seen_una = rds_tcp_snd_una(tc);
 	rds_send_drop_acked(conn, rds_tcp_snd_una(tc), rds_tcp_is_acked);
 
-	queue_delayed_work(rds_wq, &conn->c_send_w, 0);
+        if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
+		queue_delayed_work(rds_wq, &conn->c_send_w, 0);
+
 out:
 	read_unlock(&sk->sk_callback_lock);
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/13] RDS: Fix congestion issues for loopback
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (3 preceding siblings ...)
  2010-03-11 23:49 ` [PATCH 04/13] RDS/TCP: Wait to wake thread when write space available Andy Grover
@ 2010-03-11 23:49 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 06/13] RDS: Fix send locking issue Andy Grover
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:49 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

We have two kinds of loopback: software (via loop transport)
and hardware (via IB). sw is used for 127.0.0.1, and doesn't
support rdma ops. hw is used for sends to local device IPs,
and supports rdma. Both are used in different cases.

For both of these, when there is a congestion map update, we
want to call rds_cong_map_updated() but not actually send
anything -- since loopback local and foreign congestion maps
point to the same spot, they're already in sync.

The old code never called sw loop's xmit_cong_map(),so
rds_cong_map_updated() wasn't being called for it. sw loop
ports would not work right with the congestion monitor.

Fixing that meant that hw loopback now would send congestion maps
to itself. This is also undesirable (racy), so we check for this
case in the ib-specific xmit code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/cong.c    |    2 --
 net/rds/ib_send.c |    7 +++++++
 net/rds/loop.c    |    7 -------
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/rds/cong.c b/net/rds/cong.c
index 6d06cac..dd2711d 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -218,8 +218,6 @@ void rds_cong_queue_updates(struct rds_cong_map *map)
 	spin_lock_irqsave(&rds_cong_lock, flags);
 
 	list_for_each_entry(conn, &map->m_conn_list, c_map_item) {
-		if (conn->c_loopback)
-			continue;
 		if (!test_and_set_bit(0, &conn->c_map_queued)) {
 			rds_stats_inc(s_cong_update_queued);
 			queue_delayed_work(rds_wq, &conn->c_send_w, 0);
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index f380c3f..c18228a 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -482,6 +482,13 @@ int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
 	BUG_ON(off % RDS_FRAG_SIZE);
 	BUG_ON(hdr_off != 0 && hdr_off != sizeof(struct rds_header));
 
+	/* Do not send cong updates to IB loopback */
+	if (conn->c_loopback
+	    && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
+		rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
+		return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
+	}
+
 	/* FIXME we may overallocate here */
 	if (be32_to_cpu(rm->m_inc.i_hdr.h_len) == 0)
 		i = 1;
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 4a61997..93a45f1 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -80,16 +80,9 @@ static int rds_loop_xmit_cong_map(struct rds_connection *conn,
 				  struct rds_cong_map *map,
 				  unsigned long offset)
 {
-	unsigned long i;
-
 	BUG_ON(offset);
 	BUG_ON(map != conn->c_lcong);
 
-	for (i = 0; i < RDS_CONG_MAP_PAGES; i++) {
-		memcpy((void *)conn->c_fcong->m_page_addrs[i],
-		       (void *)map->m_page_addrs[i], PAGE_SIZE);
-	}
-
 	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
 
 	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/13] RDS: Fix send locking issue
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (4 preceding siblings ...)
  2010-03-11 23:49 ` [PATCH 05/13] RDS: Fix congestion issues for loopback Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 07/13] RDS: Workaround for in-use MRs on close causing crash Andy Grover
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel, Tina Yang

From: Tina Yang <Tina.Yang@oracle.com>

Fix a deadlock between rds_rdma_send_complete() and
rds_send_remove_from_sock() when rds socket lock and
rds message lock are acquired out-of-order.

Signed-off-by: Tina Yang <Tina.Yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/send.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/rds/send.c b/net/rds/send.c
index ad2e469..1a2ef24 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -533,14 +533,13 @@ void rds_send_remove_from_sock(struct list_head *messages, int status)
 
 		if (rs != rm->m_rs) {
 			if (rs) {
-				spin_unlock(&rs->rs_lock);
 				rds_wake_sk_sleep(rs);
 				sock_put(rds_rs_to_sk(rs));
 			}
 			rs = rm->m_rs;
-			spin_lock(&rs->rs_lock);
 			sock_hold(rds_rs_to_sk(rs));
 		}
+		spin_lock(&rs->rs_lock);
 
 		if (test_and_clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags)) {
 			struct rds_rdma_op *ro = rm->m_rdma_op;
@@ -560,6 +559,7 @@ void rds_send_remove_from_sock(struct list_head *messages, int status)
 			rds_message_put(rm);
 			rm->m_rs = NULL;
 		}
+		spin_unlock(&rs->rs_lock);
 
 unlock_and_drop:
 		spin_unlock(&rm->m_rs_lock);
@@ -567,7 +567,6 @@ unlock_and_drop:
 	}
 
 	if (rs) {
-		spin_unlock(&rs->rs_lock);
 		rds_wake_sk_sleep(rs);
 		sock_put(rds_rs_to_sk(rs));
 	}
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/13] RDS: Workaround for in-use MRs on close causing crash
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (5 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 06/13] RDS: Fix send locking issue Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 08/13] RDS: Turn down alarming reconnect messages Andy Grover
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

if a machine is shut down without closing sockets properly, and
freeing all MRs, then a BUG_ON will bring it down. This patch
changes these to WARN_ONs -- leaking MRs is not fatal (although
not ideal, and there is more work to do here for a proper fix.)

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/ib_rdma.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 4b0da86..65e668d 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -234,8 +234,8 @@ void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *pool)
 {
 	flush_workqueue(rds_wq);
 	rds_ib_flush_mr_pool(pool, 1);
-	BUG_ON(atomic_read(&pool->item_count));
-	BUG_ON(atomic_read(&pool->free_pinned));
+	WARN_ON(atomic_read(&pool->item_count));
+	WARN_ON(atomic_read(&pool->free_pinned));
 	kfree(pool);
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/13] RDS: Turn down alarming reconnect messages
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (6 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 07/13] RDS: Workaround for in-use MRs on close causing crash Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 09/13] RDS: Fix locking in rds_send_drop_to() Andy Grover
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

RDS's error messages when a connection goes down are a little
extreme. A connection may go down, and it will be re-established,
and everything is fine. This patch links these messages through
rdsdebug(), instead of to printk directly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/ib_cm.c          |    3 ++-
 net/rds/iw_cm.c          |    4 +++-
 net/rds/rdma_transport.c |    2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 647cb8f..e1f124b 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -203,9 +203,10 @@ static void rds_ib_qp_event_handler(struct ib_event *event, void *data)
 		rdma_notify(ic->i_cm_id, IB_EVENT_COMM_EST);
 		break;
 	default:
-		rds_ib_conn_error(conn, "RDS/IB: Fatal QP Event %u "
+		rdsdebug("Fatal QP Event %u "
 			"- connection %pI4->%pI4, reconnecting\n",
 			event->event, &conn->c_laddr, &conn->c_faddr);
+		rds_conn_drop(conn);
 		break;
 	}
 }
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index 394cf6b..6bc638f 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -156,9 +156,11 @@ static void rds_iw_qp_event_handler(struct ib_event *event, void *data)
 	case IB_EVENT_QP_REQ_ERR:
 	case IB_EVENT_QP_FATAL:
 	default:
-		rds_iw_conn_error(conn, "RDS/IW: Fatal QP Event %u - connection %pI4->%pI4...reconnecting\n",
+		rdsdebug("Fatal QP Event %u "
+			"- connection %pI4->%pI4, reconnecting\n",
 			event->event, &conn->c_laddr,
 			&conn->c_faddr);
+		rds_conn_drop(conn);
 		break;
 	}
 }
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 31f9c72..5ea82fc 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -101,7 +101,7 @@ int rds_rdma_cm_event_handler(struct rdma_cm_id *cm_id,
 		break;
 
 	case RDMA_CM_EVENT_DISCONNECTED:
-		printk(KERN_WARNING "RDS/RDMA: DISCONNECT event - dropping connection "
+		rdsdebug("DISCONNECT event - dropping connection "
 			"%pI4->%pI4\n", &conn->c_laddr,
 			 &conn->c_faddr);
 		rds_conn_drop(conn);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/13] RDS: Fix locking in rds_send_drop_to()
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (7 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 08/13] RDS: Turn down alarming reconnect messages Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 10/13] RDS: only put sockets that have seen congestion on the poll_waitq Andy Grover
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel, Tina Yang

From: Tina Yang <tina.yang@oracle.com>

It seems rds_send_drop_to() called
__rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED)
with only rds_sock lock, but not rds_message lock. It raced with
other threads that is attempting to modify the rds_message as well,
such as from within rds_rdma_send_complete().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/send.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/rds/send.c b/net/rds/send.c
index 1a2ef24..192a480 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -632,9 +632,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest)
 		list_move(&rm->m_sock_item, &list);
 		rds_send_sndbuf_remove(rs, rm);
 		clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags);
-
-		/* If this is a RDMA operation, notify the app. */
-		__rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED);
 	}
 
 	/* order flag updates with the rs lock */
@@ -643,9 +640,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest)
 
 	spin_unlock_irqrestore(&rs->rs_lock, flags);
 
-	if (wake)
-		rds_wake_sk_sleep(rs);
-
 	conn = NULL;
 
 	/* now remove the messages from the conn list as needed */
@@ -653,6 +647,10 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest)
 		/* We do this here rather than in the loop above, so that
 		 * we don't have to nest m_rs_lock under rs->rs_lock */
 		spin_lock_irqsave(&rm->m_rs_lock, flags2);
+		/* If this is a RDMA operation, notify the app. */
+		spin_lock(&rs->rs_lock);
+		__rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED);
+		spin_unlock(&rs->rs_lock);
 		rm->m_rs = NULL;
 		spin_unlock_irqrestore(&rm->m_rs_lock, flags2);
 
@@ -681,6 +679,9 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest)
 	if (conn)
 		spin_unlock_irqrestore(&conn->c_lock, flags);
 
+	if (wake)
+		rds_wake_sk_sleep(rs);
+
 	while (!list_empty(&list)) {
 		rm = list_entry(list.next, struct rds_message, m_sock_item);
 		list_del_init(&rm->m_sock_item);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/13] RDS: only put sockets that have seen congestion on the poll_waitq
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (8 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 09/13] RDS: Fix locking in rds_send_drop_to() Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 11/13] RDS: Properly unmap when getting a remote access error Andy Grover
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

rds_poll_waitq's listeners will be awoken if we receive a congestion
notification. Bad performance may result because *all* polled sockets
contend for this single lock. However, it should not be necessary to
wake pollers when a congestion update arrives if they have never
experienced congestion, and not putting these on the waitq will
hopefully greatly reduce contention.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/af_rds.c |    7 ++++++-
 net/rds/rds.h    |    2 ++
 net/rds/send.c   |    4 +++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 853c52b..937ecda 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -159,7 +159,8 @@ static unsigned int rds_poll(struct file *file, struct socket *sock,
 
 	poll_wait(file, sk->sk_sleep, wait);
 
-	poll_wait(file, &rds_poll_waitq, wait);
+	if (rs->rs_seen_congestion)
+		poll_wait(file, &rds_poll_waitq, wait);
 
 	read_lock_irqsave(&rs->rs_recv_lock, flags);
 	if (!rs->rs_cong_monitor) {
@@ -181,6 +182,10 @@ static unsigned int rds_poll(struct file *file, struct socket *sock,
 		mask |= (POLLOUT | POLLWRNORM);
 	read_unlock_irqrestore(&rs->rs_recv_lock, flags);
 
+	/* clear state any time we wake a seen-congested socket */
+	if (mask)
+		rs->rs_seen_congestion = 0;
+
 	return mask;
 }
 
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 85d6f89..4bec6e2 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -388,6 +388,8 @@ struct rds_sock {
 
 	/* flag indicating we were congested or not */
 	int			rs_congested;
+	/* seen congestion (ENOBUFS) when sending? */
+	int			rs_seen_congestion;
 
 	/* rs_lock protects all these adjacent members before the newline */
 	spinlock_t		rs_lock;
diff --git a/net/rds/send.c b/net/rds/send.c
index 192a480..51e2def 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -894,8 +894,10 @@ int rds_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 		queue_delayed_work(rds_wq, &conn->c_conn_w, 0);
 
 	ret = rds_cong_wait(conn->c_fcong, dport, nonblock, rs);
-	if (ret)
+	if (ret) {
+		rs->rs_seen_congestion = 1;
 		goto out;
+	}
 
 	while (!rds_send_queue_rm(rs, conn, rm, rs->rs_bound_port,
 				  dport, &queued)) {
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/13] RDS: Properly unmap when getting a remote access error
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (9 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 10/13] RDS: only put sockets that have seen congestion on the poll_waitq Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 12/13] RDS: Do not call set_page_dirty() with irqs off Andy Grover
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel, Sherman Pun

From: Sherman Pun <sherman.pun@sun.com>

If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.

Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/ib_send.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index c18228a..17fa808 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -243,8 +243,12 @@ void rds_ib_send_cq_comp_handler(struct ib_cq *cq, void *context)
 				struct rds_message *rm;
 
 				rm = rds_send_get_message(conn, send->s_op);
-				if (rm)
+				if (rm) {
+					if (rm->m_rdma_op)
+						rds_ib_send_unmap_rdma(ic, rm->m_rdma_op);
 					rds_ib_send_rdma_complete(rm, wc.status);
+					rds_message_put(rm);
+				}
 			}
 
 			oldest = (oldest + 1) % ic->i_send_ring.w_nr;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 12/13] RDS: Do not call set_page_dirty() with irqs off
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (10 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 11/13] RDS: Properly unmap when getting a remote access error Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-11 23:50 ` [PATCH 13/13] RDS: Enable per-cpu workqueue threads Andy Grover
  2010-03-17  4:18 ` [PATCH net-next 00/13] RDS bugfixes and cleanups David Miller
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

set_page_dirty() unconditionally re-enables interrupts, so
if we call it with irqs off, they will be on after the call,
and that's bad. This patch moves the call after we've re-enabled
interrupts in send_drop_to(), so it's safe.

Also, add BUG_ONs to let us know if we ever do call set_page_dirty
with interrupts off.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/ib_rdma.c |    1 +
 net/rds/rdma.c    |    4 +++-
 net/rds/send.c    |   14 ++++++++------
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 65e668d..cfb1d90 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -440,6 +440,7 @@ static void __rds_ib_teardown_mr(struct rds_ib_mr *ibmr)
 
 			/* FIXME we need a way to tell a r/w MR
 			 * from a r/o MR */
+			BUG_ON(in_interrupt());
 			set_page_dirty(page);
 			put_page(page);
 		}
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 4c64daa..61b359d 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -438,8 +438,10 @@ void rds_rdma_free_op(struct rds_rdma_op *ro)
 		/* Mark page dirty if it was possibly modified, which
 		 * is the case for a RDMA_READ which copies from remote
 		 * to local memory */
-		if (!ro->r_write)
+		if (!ro->r_write) {
+			BUG_ON(in_interrupt());
 			set_page_dirty(page);
+		}
 		put_page(page);
 	}
 
diff --git a/net/rds/send.c b/net/rds/send.c
index 51e2def..4629a0b 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -507,12 +507,13 @@ EXPORT_SYMBOL_GPL(rds_send_get_message);
  */
 void rds_send_remove_from_sock(struct list_head *messages, int status)
 {
-	unsigned long flags = 0; /* silence gcc :P */
+	unsigned long flags;
 	struct rds_sock *rs = NULL;
 	struct rds_message *rm;
 
-	local_irq_save(flags);
 	while (!list_empty(messages)) {
+		int was_on_sock = 0;
+
 		rm = list_entry(messages->next, struct rds_message,
 				m_conn_item);
 		list_del_init(&rm->m_conn_item);
@@ -527,7 +528,7 @@ void rds_send_remove_from_sock(struct list_head *messages, int status)
 		 * while we're messing with it. It does not prevent the
 		 * message from being removed from the socket, though.
 		 */
-		spin_lock(&rm->m_rs_lock);
+		spin_lock_irqsave(&rm->m_rs_lock, flags);
 		if (!test_bit(RDS_MSG_ON_SOCK, &rm->m_flags))
 			goto unlock_and_drop;
 
@@ -556,21 +557,22 @@ void rds_send_remove_from_sock(struct list_head *messages, int status)
 					notifier->n_status = status;
 				rm->m_rdma_op->r_notifier = NULL;
 			}
-			rds_message_put(rm);
+			was_on_sock = 1;
 			rm->m_rs = NULL;
 		}
 		spin_unlock(&rs->rs_lock);
 
 unlock_and_drop:
-		spin_unlock(&rm->m_rs_lock);
+		spin_unlock_irqrestore(&rm->m_rs_lock, flags);
 		rds_message_put(rm);
+		if (was_on_sock)
+			rds_message_put(rm);
 	}
 
 	if (rs) {
 		rds_wake_sk_sleep(rs);
 		sock_put(rds_rs_to_sk(rs));
 	}
-	local_irq_restore(flags);
 }
 
 /*
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 13/13] RDS: Enable per-cpu workqueue threads
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (11 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 12/13] RDS: Do not call set_page_dirty() with irqs off Andy Grover
@ 2010-03-11 23:50 ` Andy Grover
  2010-03-17  4:18 ` [PATCH net-next 00/13] RDS bugfixes and cleanups David Miller
  13 siblings, 0 replies; 15+ messages in thread
From: Andy Grover @ 2010-03-11 23:50 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel, Tina Yang

From: Tina Yang <tina.yang@oracle.com>

Create per-cpu workqueue threads instead of a single
krdsd thread. This is a step towards better scalability.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/threads.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/rds/threads.c b/net/rds/threads.c
index 00fa10e..786c20e 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -259,7 +259,7 @@ void rds_threads_exit(void)
 
 int __init rds_threads_init(void)
 {
-	rds_wq = create_singlethread_workqueue("krdsd");
+	rds_wq = create_workqueue("krdsd");
 	if (rds_wq == NULL)
 		return -ENOMEM;
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 00/13] RDS bugfixes and cleanups
  2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
                   ` (12 preceding siblings ...)
  2010-03-11 23:50 ` [PATCH 13/13] RDS: Enable per-cpu workqueue threads Andy Grover
@ 2010-03-17  4:18 ` David Miller
  13 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2010-03-17  4:18 UTC (permalink / raw)
  To: andy.grover; +Cc: netdev, rds-devel

From: Andy Grover <andy.grover@oracle.com>
Date: Thu, 11 Mar 2010 15:49:54 -0800

> These patches are against net-next, please apply.

All applied, thanks Andy.

BTW:

+	/* Do not send cong updates to IB loopback */
+	if (conn->c_loopback
+	    && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
+		rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
+		return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
+	}
+

Be careful with operator precedence here.  Even if it's right,
probably better to add parenthesis just to sleep more easily
at night :-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-03-17  4:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-11 23:49 [PATCH net-next 00/13] RDS bugfixes and cleanups Andy Grover
2010-03-11 23:49 ` [PATCH 01/13] RDS: Do not BUG() on error returned from ib_post_send Andy Grover
2010-03-11 23:49 ` [PATCH 02/13] RDS: sendmsg() should check sndtimeo, not rcvtimeo Andy Grover
2010-03-11 23:49 ` [PATCH 03/13] RDS: update copy_to_user state in tcp transport Andy Grover
2010-03-11 23:49 ` [PATCH 04/13] RDS/TCP: Wait to wake thread when write space available Andy Grover
2010-03-11 23:49 ` [PATCH 05/13] RDS: Fix congestion issues for loopback Andy Grover
2010-03-11 23:50 ` [PATCH 06/13] RDS: Fix send locking issue Andy Grover
2010-03-11 23:50 ` [PATCH 07/13] RDS: Workaround for in-use MRs on close causing crash Andy Grover
2010-03-11 23:50 ` [PATCH 08/13] RDS: Turn down alarming reconnect messages Andy Grover
2010-03-11 23:50 ` [PATCH 09/13] RDS: Fix locking in rds_send_drop_to() Andy Grover
2010-03-11 23:50 ` [PATCH 10/13] RDS: only put sockets that have seen congestion on the poll_waitq Andy Grover
2010-03-11 23:50 ` [PATCH 11/13] RDS: Properly unmap when getting a remote access error Andy Grover
2010-03-11 23:50 ` [PATCH 12/13] RDS: Do not call set_page_dirty() with irqs off Andy Grover
2010-03-11 23:50 ` [PATCH 13/13] RDS: Enable per-cpu workqueue threads Andy Grover
2010-03-17  4:18 ` [PATCH net-next 00/13] RDS bugfixes and cleanups David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).