Netdev List
 help / color / mirror / Atom feed
* Re: [net-next-2.6 PATCH] igb: Clean up left over prototype of igb_get_hw_dev_name()
From: David Miller @ 2010-04-28 21:26 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, emil.s.tantilov
In-Reply-To: <20100428205949.30209.69599.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 28 Apr 2010 13:59:53 -0700

> From: Emil Tantilov <emil.s.tantilov@intel.com>
> 
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] forcedeth: Stay in NAPI as long as there's work
From: David Miller @ 2010-04-28 21:25 UTC (permalink / raw)
  To: shemminger; +Cc: joe, therbert, netdev, aabdulla
In-Reply-To: <20100428112528.01277670@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 28 Apr 2010 11:25:28 -0700

> The following does the same thing without the extra overhead
> of testing all the registers. It also handles the out of memory
> case.
> 
> Compile tested only...
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Tom can you test this version?

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: David Miller @ 2010-04-28 21:23 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: nhorman, sri, linux-sctp, eteo, netdev, security
In-Reply-To: <4BD89C70.6080406@hp.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Wed, 28 Apr 2010 16:37:04 -0400

> 
> Looks good.
> 
> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>

Applied, thanks Neil and Vlad.

^ permalink raw reply

* Re: pull request: wireless-2.6 2010-04-15
From: David Miller @ 2010-04-28 21:23 UTC (permalink / raw)
  To: hauke-5/S+JYg5SzeELgA04lAiVw
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4BD8A6B4.20603-5/S+JYg5SzeELgA04lAiVw@public.gmane.org>

From: Hauke Mehrtens <hauke-5/S+JYg5SzeELgA04lAiVw@public.gmane.org>
Date: Wed, 28 Apr 2010 23:20:52 +0200

> Hi David,
> 
> in your merge in 5c01d5669356e13f0fb468944c1dd4c6a7e978ad you added "int
> i;" into wl1271_main.c which is unused in that function.
> 
> This patch fixes the merge problem:
> 
> Signed-off-by: Hauke Mehrtens <hauke-5/S+JYg5SzeELgA04lAiVw@public.gmane.org>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: pull request: wireless-2.6 2010-04-15
From: Hauke Mehrtens @ 2010-04-28 21:20 UTC (permalink / raw)
  To: David Miller
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100415.142907.68448693.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hi David,

in your merge in 5c01d5669356e13f0fb468944c1dd4c6a7e978ad you added "int
i;" into wl1271_main.c which is unused in that function.

This patch fixes the merge problem:

Signed-off-by: Hauke Mehrtens <hauke-5/S+JYg5SzeELgA04lAiVw@public.gmane.org>

--- a/drivers/net/wireless/wl12xx/wl1271_main.c
+++ b/drivers/net/wireless/wl12xx/wl1271_main.c
@@ -1311,7 +1311,6 @@
 	struct wl1271_filter_params *fp;
 	struct netdev_hw_addr *ha;
 	struct wl1271 *wl = hw->priv;
-	int i;

 	if (unlikely(wl->state == WL1271_STATE_OFF))
 		return 0;

David Miller wrote:
> From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> Date: Thu, 15 Apr 2010 16:03:31 -0400
> 
>> Another fix intended for 2.6.34...without it some firmware wierdness can
>> induce the driver into hanging the box... :-(
>>
>> Please let me know if there are problems!
> 
> Pulled, thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: vlan performance issue on outgoing traffic (solved)
From: R. Weinedel @ 2010-04-28 21:13 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: netdev@vger.kernel.org
In-Reply-To: <alpine.WNT.2.00.1004271110300.4368@jbrandeb-desk1.amr.corp.intel.com>

I made some further test with kernel 2.6.26 and I recognized that my cpu
is really very busy (90-100 % sw-irq's) during offloads.

After that I build and install a 2.6.33 kernel and the problem seems to
be solved. Now I got 946 Mbit/s offload (iperf) !.

Ralf Weinedel


^ permalink raw reply

* [net-next-2.6 PATCH] igb: Clean up left over prototype of igb_get_hw_dev_name()
From: Jeff Kirsher @ 2010-04-28 20:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Emil Tantilov, Jeff Kirsher

From: Emil Tantilov <emil.s.tantilov@intel.com>

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 096a526..735ede9 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -338,7 +338,6 @@ enum igb_boards {
 extern char igb_driver_name[];
 extern char igb_driver_version[];
 
-extern char *igb_get_hw_dev_name(struct e1000_hw *hw);
 extern int igb_up(struct igb_adapter *);
 extern void igb_down(struct igb_adapter *);
 extern void igb_reinit_locked(struct igb_adapter *);


^ permalink raw reply related

* [PATCHv7] add mergeable buffers support to vhost_net
From: David L Stevens @ 2010-04-28 20:57 UTC (permalink / raw)
  To: mst; +Cc: netdev, kvm, virtualization

This patch adds mergeable receive buffer support to vhost_net.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v7/drivers/vhost/net.c
--- net-next-v0/drivers/vhost/net.c	2010-04-24 21:36:54.000000000 -0700
+++ net-next-v7/drivers/vhost/net.c	2010-04-28 12:26:18.000000000 -0700
@@ -74,6 +74,23 @@ static int move_iovec_hdr(struct iovec *
 	}
 	return seg;
 }
+/* Copy iovec entries for len bytes from iovec. Return segments used. */
+static int copy_iovec_hdr(const struct iovec *from, struct iovec *to,
+			  size_t len, int iovcount)
+{
+	int seg = 0;
+	size_t size;
+	while (len && seg < iovcount) {
+		size = min(from->iov_len, len);
+		to->iov_base = from->iov_base;
+		to->iov_len = size;
+		len -= size;
+		++from;
+		++to;
+		++seg;
+	}
+	return seg;
+}
 
 /* Caller must have TX VQ lock */
 static void tx_poll_stop(struct vhost_net *net)
@@ -109,7 +126,7 @@ static void handle_tx(struct vhost_net *
 	};
 	size_t len, total_len = 0;
 	int err, wmem;
-	size_t hdr_size;
+	size_t vhost_hlen;
 	struct socket *sock = rcu_dereference(vq->private_data);
 	if (!sock)
 		return;
@@ -128,13 +145,13 @@ static void handle_tx(struct vhost_net *
 
 	if (wmem < sock->sk->sk_sndbuf / 2)
 		tx_poll_stop(net);
-	hdr_size = vq->hdr_size;
+	vhost_hlen = vq->vhost_hlen;
 
 	for (;;) {
-		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 NULL, NULL);
+		head = vhost_get_desc(&net->dev, vq, vq->iov,
+				      ARRAY_SIZE(vq->iov),
+				      &out, &in,
+				      NULL, NULL);
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
 			wmem = atomic_read(&sock->sk->sk_wmem_alloc);
@@ -155,20 +172,20 @@ static void handle_tx(struct vhost_net *
 			break;
 		}
 		/* Skip header. TODO: support TSO. */
-		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
+		s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, out);
 		msg.msg_iovlen = out;
 		len = iov_length(vq->iov, out);
 		/* Sanity check */
 		if (!len) {
 			vq_err(vq, "Unexpected header len for TX: "
 			       "%zd expected %zd\n",
-			       iov_length(vq->hdr, s), hdr_size);
+			       iov_length(vq->hdr, s), vhost_hlen);
 			break;
 		}
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(NULL, sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq);
+			vhost_discard_desc(vq, 1);
 			tx_poll_start(net, sock);
 			break;
 		}
@@ -187,12 +204,25 @@ static void handle_tx(struct vhost_net *
 	unuse_mm(net->dev.mm);
 }
 
+static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
+{
+	struct sk_buff *head;
+	int len = 0;
+
+	lock_sock(sk);
+	head = skb_peek(&sk->sk_receive_queue);
+	if (head)
+		len = head->len + vq->sock_hlen;
+	release_sock(sk);
+	return len;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
 {
 	struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
-	unsigned head, out, in, log, s;
+	unsigned in, log, s;
 	struct vhost_log *vq_log;
 	struct msghdr msg = {
 		.msg_name = NULL,
@@ -203,14 +233,14 @@ static void handle_rx(struct vhost_net *
 		.msg_flags = MSG_DONTWAIT,
 	};
 
-	struct virtio_net_hdr hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
+	struct virtio_net_hdr_mrg_rxbuf hdr = {
+		.hdr.flags = 0,
+		.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
 	};
 
 	size_t len, total_len = 0;
-	int err;
-	size_t hdr_size;
+	int err, headcount, datalen;
+	size_t vhost_hlen;
 	struct socket *sock = rcu_dereference(vq->private_data);
 	if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
 		return;
@@ -218,18 +248,19 @@ static void handle_rx(struct vhost_net *
 	use_mm(net->dev.mm);
 	mutex_lock(&vq->mutex);
 	vhost_disable_notify(vq);
-	hdr_size = vq->hdr_size;
+	vhost_hlen = vq->vhost_hlen;
 
 	vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
 		vq->log : NULL;
 
-	for (;;) {
-		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 vq_log, &log);
+	while ((datalen = vhost_head_len(vq, sock->sk))) {
+		headcount = vhost_get_desc_n(vq, vq->heads,
+					     datalen + vhost_hlen,
+					     &in, vq_log, &log);
+		if (headcount < 0)
+			break;
 		/* OK, now we need to know about added descriptors. */
-		if (head == vq->num) {
+		if (!headcount) {
 			if (unlikely(vhost_enable_notify(vq))) {
 				/* They have slipped one in as we were
 				 * doing that: check again. */
@@ -241,46 +272,53 @@ static void handle_rx(struct vhost_net *
 			break;
 		}
 		/* We don't need to be notified again. */
-		if (out) {
-			vq_err(vq, "Unexpected descriptor format for RX: "
-			       "out %d, int %d\n",
-			       out, in);
-			break;
-		}
-		/* Skip header. TODO: support TSO/mergeable rx buffers. */
-		s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
+		if (vhost_hlen)
+			/* Skip header. TODO: support TSO. */
+			s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, in);
+		else
+			s = copy_iovec_hdr(vq->iov, vq->hdr, vq->sock_hlen, in);
 		msg.msg_iovlen = in;
 		len = iov_length(vq->iov, in);
 		/* Sanity check */
 		if (!len) {
 			vq_err(vq, "Unexpected header len for RX: "
 			       "%zd expected %zd\n",
-			       iov_length(vq->hdr, s), hdr_size);
+			       iov_length(vq->hdr, s), vhost_hlen);
 			break;
 		}
 		err = sock->ops->recvmsg(NULL, sock, &msg,
 					 len, MSG_DONTWAIT | MSG_TRUNC);
 		/* TODO: Check specific error and bomb out unless EAGAIN? */
 		if (err < 0) {
-			vhost_discard_vq_desc(vq);
+			vhost_discard_desc(vq, headcount);
 			break;
 		}
-		/* TODO: Should check and handle checksum. */
-		if (err > len) {
-			pr_err("Discarded truncated rx packet: "
-			       " len %d > %zd\n", err, len);
-			vhost_discard_vq_desc(vq);
+		if (err != datalen) {
+			pr_err("Discarded rx packet: "
+			       " len %d, expected %zd\n", err, datalen);
+			vhost_discard_desc(vq, headcount);
 			continue;
 		}
 		len = err;
-		err = memcpy_toiovec(vq->hdr, (unsigned char *)&hdr, hdr_size);
-		if (err) {
-			vq_err(vq, "Unable to write vnet_hdr at addr %p: %d\n",
-			       vq->iov->iov_base, err);
+		if (vhost_hlen &&
+		    memcpy_toiovecend(vq->hdr, (unsigned char *)&hdr, 0,
+				      vhost_hlen)) {
+			vq_err(vq, "Unable to write vnet_hdr at addr %p\n",
+			       vq->iov->iov_base);
 			break;
 		}
-		len += hdr_size;
-		vhost_add_used_and_signal(&net->dev, vq, head, len);
+		/* TODO: Should check and handle checksum. */
+		if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF) &&
+		    memcpy_toiovecend(vq->hdr, (unsigned char *)&headcount,
+				      offsetof(typeof(hdr), num_buffers),
+				      sizeof(hdr.num_buffers))) {
+			vq_err(vq, "Failed num_buffers write");
+			vhost_discard_desc(vq, headcount);
+			break;
+		}
+		len += vhost_hlen;
+		vhost_add_used_and_signal_n(&net->dev, vq, vq->heads,
+					    headcount);
 		if (unlikely(vq_log))
 			vhost_log_write(vq, vq_log, log, len);
 		total_len += len;
@@ -561,9 +599,21 @@ done:
 
 static int vhost_net_set_features(struct vhost_net *n, u64 features)
 {
-	size_t hdr_size = features & (1 << VHOST_NET_F_VIRTIO_NET_HDR) ?
-		sizeof(struct virtio_net_hdr) : 0;
+	size_t vhost_hlen, sock_hlen, hdr_len;
 	int i;
+
+	hdr_len = (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ?
+			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
+			sizeof(struct virtio_net_hdr);
+	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
+		/* vhost provides vnet_hdr */
+		vhost_hlen = hdr_len;
+		sock_hlen = 0;
+	} else {
+		/* socket provides vnet_hdr */
+		vhost_hlen = 0;
+		sock_hlen = hdr_len;
+	}
 	mutex_lock(&n->dev.mutex);
 	if ((features & (1 << VHOST_F_LOG_ALL)) &&
 	    !vhost_log_access_ok(&n->dev)) {
@@ -574,7 +624,8 @@ static int vhost_net_set_features(struct
 	smp_wmb();
 	for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
 		mutex_lock(&n->vqs[i].mutex);
-		n->vqs[i].hdr_size = hdr_size;
+		n->vqs[i].vhost_hlen = vhost_hlen;
+		n->vqs[i].sock_hlen = sock_hlen;
 		mutex_unlock(&n->vqs[i].mutex);
 	}
 	vhost_net_flush(n);
diff -ruNp net-next-v0/drivers/vhost/vhost.c net-next-v7/drivers/vhost/vhost.c
--- net-next-v0/drivers/vhost/vhost.c	2010-04-22 11:31:57.000000000 -0700
+++ net-next-v7/drivers/vhost/vhost.c	2010-04-28 11:16:13.000000000 -0700
@@ -114,7 +114,8 @@ static void vhost_vq_reset(struct vhost_
 	vq->used_flags = 0;
 	vq->log_used = false;
 	vq->log_addr = -1ull;
-	vq->hdr_size = 0;
+	vq->vhost_hlen = 0;
+	vq->sock_hlen = 0;
 	vq->private_data = NULL;
 	vq->log_base = NULL;
 	vq->error_ctx = NULL;
@@ -861,6 +862,53 @@ static unsigned get_indirect(struct vhos
 	return 0;
 }
 
+/* This is a multi-buffer version of vhost_get_desc
+ * @vq		- the relevant virtqueue
+ * datalen	- data length we'll be reading
+ * @iovcount	- returned count of io vectors we fill
+ * @log		- vhost log
+ * @log_num	- log offset
+ *	returns number of buffer heads allocated, negative on error
+ */
+int vhost_get_desc_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
+		     int datalen, unsigned *iovcount, struct vhost_log *log,
+		     unsigned int *log_num)
+{
+	unsigned int out, in;
+	int seg = 0;
+	int headcount = 0;
+	int r;
+
+	while (datalen > 0) {
+		if (headcount >= VHOST_NET_MAX_SG) {
+			r = -ENOBUFS;
+			goto err;
+		}
+		heads[headcount].id = vhost_get_desc(vq->dev, vq, vq->iov + seg,
+					      ARRAY_SIZE(vq->iov) - seg, &out,
+					      &in, log, log_num);
+		if (heads[headcount].id == vq->num) {
+			r = 0;
+			goto err;
+		}
+		if (out || in <= 0) {
+			vq_err(vq, "unexpected descriptor format for RX: "
+				"out %d, in %d\n", out, in);
+			r = -EINVAL;
+			goto err;
+		}
+		heads[headcount].len = iov_length(vq->iov + seg, in);
+		datalen -= heads[headcount].len;
+		++headcount;
+		seg += in;
+	}
+	*iovcount = seg;
+	return headcount;
+err:
+	vhost_discard_desc(vq, headcount);
+	return r;
+}
+
 /* This looks in the virtqueue and for the first available buffer, and converts
  * it to an iovec for convenient access.  Since descriptors consist of some
  * number of output then some number of input descriptors, it's actually two
@@ -868,7 +916,7 @@ static unsigned get_indirect(struct vhos
  *
  * This function returns the descriptor number found, or vq->num (which
  * is never a valid descriptor number) if none was found. */
-unsigned vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
+unsigned vhost_get_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
 			   struct iovec iov[], unsigned int iov_size,
 			   unsigned int *out_num, unsigned int *in_num,
 			   struct vhost_log *log, unsigned int *log_num)
@@ -986,9 +1034,9 @@ unsigned vhost_get_vq_desc(struct vhost_
 }
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq)
+void vhost_discard_desc(struct vhost_virtqueue *vq, int n)
 {
-	vq->last_avail_idx--;
+	vq->last_avail_idx -= n;
 }
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
@@ -1033,6 +1081,68 @@ int vhost_add_used(struct vhost_virtqueu
 	return 0;
 }
 
+static void vhost_log_used(struct vhost_virtqueue *vq,
+			   struct vring_used_elem __user *used)
+{
+	/* Make sure data is seen before log. */
+	smp_wmb();
+	/* Log used ring entry write. */
+	log_write(vq->log_base,
+		  vq->log_addr +
+		   ((void __user *)used - (void __user *)vq->used),
+		  sizeof *used);
+	/* Log used index update. */
+	log_write(vq->log_base,
+		  vq->log_addr + offsetof(struct vring_used, idx),
+		  sizeof vq->used->idx);
+	if (vq->log_ctx)
+		eventfd_signal(vq->log_ctx, 1);
+}
+
+static int __vhost_add_used_n(struct vhost_virtqueue *vq,
+			    struct vring_used_elem *heads,
+			    unsigned count)
+{
+	struct vring_used_elem __user *used;
+	int start;
+
+	start = vq->last_used_idx % vq->num;
+	used = vq->used->ring + start;
+	if (copy_to_user(used, heads, count * sizeof *used)) {
+		vq_err(vq, "Failed to write used");
+		return -EFAULT;
+	}
+	/* Make sure buffer is written before we update index. */
+	smp_wmb();
+	if (put_user(vq->last_used_idx + count, &vq->used->idx)) {
+		vq_err(vq, "Failed to increment used idx");
+		return -EFAULT;
+	}
+	if (unlikely(vq->log_used))
+		vhost_log_used(vq, used);
+	vq->last_used_idx += count;
+	return 0;
+}
+
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
+		     unsigned count)
+{
+	int start, n, r;
+
+	start = vq->last_used_idx % vq->num;
+	n = vq->num - start;
+	if (n < count) {
+		r = __vhost_add_used_n(vq, heads, n);
+		if (r < 0)
+			return r;
+		heads += n;
+		count -= n;
+	}
+	return __vhost_add_used_n(vq, heads, count);
+}
+
 /* This actually signals the guest, using eventfd. */
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
@@ -1062,6 +1172,15 @@ void vhost_add_used_and_signal(struct vh
 	vhost_signal(dev, vq);
 }
 
+/* multi-buffer version of vhost_add_used_and_signal */
+void vhost_add_used_and_signal_n(struct vhost_dev *dev,
+				 struct vhost_virtqueue *vq,
+				 struct vring_used_elem *heads, unsigned count)
+{
+	vhost_add_used_n(vq, heads, count);
+	vhost_signal(dev, vq);
+}
+
 /* OK, now we need to know about added descriptors. */
 bool vhost_enable_notify(struct vhost_virtqueue *vq)
 {
@@ -1086,7 +1205,7 @@ bool vhost_enable_notify(struct vhost_vi
 		return false;
 	}
 
-	return avail_idx != vq->last_avail_idx;
+	return avail_idx != vq->avail_idx;
 }
 
 /* We don't need to be notified again. */
diff -ruNp net-next-v0/drivers/vhost/vhost.h net-next-v7/drivers/vhost/vhost.h
--- net-next-v0/drivers/vhost/vhost.h	2010-04-24 21:37:41.000000000 -0700
+++ net-next-v7/drivers/vhost/vhost.h	2010-04-26 10:35:25.000000000 -0700
@@ -84,7 +84,9 @@ struct vhost_virtqueue {
 	struct iovec indirect[VHOST_NET_MAX_SG];
 	struct iovec iov[VHOST_NET_MAX_SG];
 	struct iovec hdr[VHOST_NET_MAX_SG];
-	size_t hdr_size;
+	size_t vhost_hlen;
+	size_t sock_hlen;
+	struct vring_used_elem heads[VHOST_NET_MAX_SG];
 	/* We use a kind of RCU to access private pointer.
 	 * All readers access it from workqueue, which makes it possible to
 	 * flush the workqueue instead of synchronize_rcu. Therefore readers do
@@ -120,16 +122,23 @@ long vhost_dev_ioctl(struct vhost_dev *,
 int vhost_vq_access_ok(struct vhost_virtqueue *vq);
 int vhost_log_access_ok(struct vhost_dev *);
 
-unsigned vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
+int vhost_get_desc_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
+		     int datalen, unsigned int *iovcount, struct vhost_log *log,
+		     unsigned int *log_num);
+unsigned vhost_get_desc(struct vhost_dev *, struct vhost_virtqueue *,
 			   struct iovec iov[], unsigned int iov_count,
 			   unsigned int *out_num, unsigned int *in_num,
 			   struct vhost_log *log, unsigned int *log_num);
-void vhost_discard_vq_desc(struct vhost_virtqueue *);
+void vhost_discard_desc(struct vhost_virtqueue *, int);
 
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
+int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
+		     unsigned count);
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-			       unsigned int head, int len);
+			       unsigned int id, int len);
+void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
+			       struct vring_used_elem *heads, unsigned count);
+void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_virtqueue *);
 
@@ -149,7 +158,8 @@ enum {
 	VHOST_FEATURES = (1 << VIRTIO_F_NOTIFY_ON_EMPTY) |
 			 (1 << VIRTIO_RING_F_INDIRECT_DESC) |
 			 (1 << VHOST_F_LOG_ALL) |
-			 (1 << VHOST_NET_F_VIRTIO_NET_HDR),
+			 (1 << VHOST_NET_F_VIRTIO_NET_HDR) |
+			 (1 << VIRTIO_NET_F_MRG_RXBUF),
 };
 
 static inline int vhost_has_feature(struct vhost_dev *dev, int bit)



^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Paul E. McKenney @ 2010-04-28 20:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <1272485923.2201.22.camel@edumazet-laptop>

On Wed, Apr 28, 2010 at 10:18:43PM +0200, Eric Dumazet wrote:
> Le mercredi 28 avril 2010 à 13:09 -0700, Paul E. McKenney a écrit :
> > On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> > > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > > > This one occurred during the wakeup from suspend to RAM.
> > > > > 
> > > > > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > > [  984.724700] ---------------------------------------------------
> > > > > [  984.724703] include/linux/fdtable.h:88 invoked
> > > > > rcu_dereference_check() without protection!
> > > > > [  984.724706]
> > > > > [  984.724707] other info that might help us debug this:
> > > > > [  984.724708]
> > > > > [  984.724711]
> > > > > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > > > [  984.724714] no locks held by dbus-daemon/4680.
> > > > > [  984.724717]
> > > > > [  984.724717] stack backtrace:
> > > > > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > > > [  984.724724] Call Trace:
> > > > > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > > > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > > > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > > > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > > > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > > > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > > > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > > > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > > > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > > > 
> > > > Hmmm...  I am not convinced that this is a false positive.  Couldn't
> > > > there be a multi-threaded process where one thread is invoking poll()
> > > > on a UNIX socket just as another thread is calling close() on it?
> > > > 
> > > > The current fcheck_files() logic requires that the caller either (1) be in
> > > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > > > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > > > 
> > > > So I don't feel comfortable just slapping an RCU read-side critical
> > > > section around this one, at least not unless someone who understands
> > > > the locking says that doing so is OK.
> > > > 
> > > > 		
> > > 
> > > Its a single threaded program.
> > > 
> > > So fget_light() calls fcheck_files(files, fd); without rcu lock,
> > > but some /proc/pid/fd/... user temporarly raised files->count just
> > > before we perform the condition check.
> > 
> > So I should add a single-threaded check.  My first thought was to use
> > current_is_single_threaded(), but the bit about scanning the full list
> > of processes does give me pause.  However, thread_group_empty() looks
> > like a much lighter-weight alternative.
> > 
> > I believe that it is possible for a pair of single-threaded processes
> > to share a file descriptor, but that should not be a problem, as both
> > of them would need to close it for it to go away.
> > 
> > But what happens if someone does a clone() with CLONE_FILES, as some
> > of the AIO stuff seems to do?  Won't that allow one of the resulting
> > processes to close the file for both of them, even though both are
> > otherwise single-threaded?  And the ->count seems to be the only
> > distinction between these two cases.
> > 
> > And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
> > the check must scan the processes with current_is_single_threaded().
> > Besides which, a user could invoke clone() with only CLONE_FILES
> > specified, right?
> > 
> > Or am I just confused here?
> > 
> > 							Thanx, Paul
> 
> If a program is mono threaded, and doing a fget_light() syscall, it
> cannot possibly do a clone() in // ;)

The sequence of events that I am worried about is as follows:

1.	Single-threaded process does clone(CLONE_FILES).  The
	result is a pair of single-threaded processes that share
	file descriptors.

2.	One of these processes does files_fdtable(i) at the same
	time as the other process closes file descriptor i.

So, clone and -then- do fget_light().

> If we want to be picky, we could add a user provided condition, aka "we
> are sure we are allowed to do this because we are the owner of the files
> struct".

But yes, if I understand your trick below, the race conditions from
the above sequence of events would simply force the processes off
of the fget_light() path, which should be OK.

						Thanx, Paul

> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index 6da962c..027f5e1 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -2694,7 +2694,7 @@ void __do_SAK(struct tty_struct *tty)
>  			spin_lock(&p->files->file_lock);
>  			fdt = files_fdtable(p->files);
>  			for (i = 0; i < fdt->max_fds; i++) {
> -				filp = fcheck_files(p->files, i);
> +				filp = fcheck_files(p->files, i, false);
>  				if (!filp)
>  					continue;
>  				if (filp->f_op->read == tty_read &&
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 452d02f..dabf4d8 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -119,7 +119,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
>  		int retval = oldfd;
> 
>  		rcu_read_lock();
> -		if (!fcheck_files(files, oldfd))
> +		if (!fcheck_files(files, oldfd, false))
>  			retval = -EBADF;
>  		rcu_read_unlock();
>  		return retval;
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 32d12b7..2865f72 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -274,7 +274,7 @@ struct file *fget(unsigned int fd)
>  	struct files_struct *files = current->files;
> 
>  	rcu_read_lock();
> -	file = fcheck_files(files, fd);
> +	file = fcheck_files(files, fd, false);
>  	if (file) {
>  		if (!atomic_long_inc_not_zero(&file->f_count)) {
>  			/* File object ref couldn't be taken */
> @@ -303,10 +303,10 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
> 
>  	*fput_needed = 0;
>  	if (likely((atomic_read(&files->count) == 1))) {
> -		file = fcheck_files(files, fd);
> +		file = fcheck_files(files, fd, true);
>  	} else {
>  		rcu_read_lock();
> -		file = fcheck_files(files, fd);
> +		file = fcheck_files(files, fd, false);
>  		if (file) {
>  			if (atomic_long_inc_not_zero(&file->f_count))
>  				*fput_needed = 1;
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 8418fcc..0e89448 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1716,7 +1716,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
>  		 * hold ->file_lock.
>  		 */
>  		spin_lock(&files->file_lock);
> -		file = fcheck_files(files, fd);
> +		file = fcheck_files(files, fd, false);
>  		if (file) {
>  			if (path) {
>  				*path = file->f_path;
> @@ -1755,7 +1755,7 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
>  		files = get_files_struct(task);
>  		if (files) {
>  			rcu_read_lock();
> -			if (fcheck_files(files, fd)) {
> +			if (fcheck_files(files, fd, false)) {
>  				rcu_read_unlock();
>  				put_files_struct(files);
>  				if (task_dumpable(task)) {
> @@ -1813,7 +1813,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
>  	 * hold ->file_lock.
>  	 */
>  	spin_lock(&files->file_lock);
> -	file = fcheck_files(files, fd);
> +	file = fcheck_files(files, fd, false);
>  	if (!file)
>  		goto out_unlock;
>  	if (file->f_mode & FMODE_READ)
> @@ -1899,7 +1899,7 @@ static int proc_readfd_common(struct file * filp, void * dirent,
>  				char name[PROC_NUMBUF];
>  				int len;
> 
> -				if (!fcheck_files(files, fd))
> +				if (!fcheck_files(files, fd, false))
>  					continue;
>  				rcu_read_unlock();
> 
> diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
> index 013dc52..76423ad 100644
> --- a/include/linux/fdtable.h
> +++ b/include/linux/fdtable.h
> @@ -57,11 +57,12 @@ struct files_struct {
>  	struct file * fd_array[NR_OPEN_DEFAULT];
>  };
> 
> -#define rcu_dereference_check_fdtable(files, fdtfd) \
> +#define rcu_dereference_check_fdtable(files, fdtfd, cond) \
>  	(rcu_dereference_check((fdtfd), \
>  			       rcu_read_lock_held() || \
>  			       lockdep_is_held(&(files)->file_lock) || \
> -			       atomic_read(&(files)->count) == 1))
> +			       atomic_read(&(files)->count) == 1 || \
> +			       cond))
> 
>  #define files_fdtable(files) \
>  		(rcu_dereference_check_fdtable((files), (files)->fdt))
> @@ -79,13 +80,13 @@ static inline void free_fdtable(struct fdtable *fdt)
>  	call_rcu(&fdt->rcu, free_fdtable_rcu);
>  }
> 
> -static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
> +static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd, bool cond)
>  {
>  	struct file * file = NULL;
>  	struct fdtable *fdt = files_fdtable(files);
> 
>  	if (fd < fdt->max_fds)
> -		file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
> +		file = rcu_dereference_check_fdtable(files, fdt->fd[fd], cond);
>  	return file;
>  }
> 
> 
> 

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: Vlad Yasevich @ 2010-04-28 20:37 UTC (permalink / raw)
  To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428203059.GG4818@hmsreliant.think-freely.org>


Looks good.

Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>

-vlad

Neil Horman wrote:
> Ok, version 4
> 
> Change Notes:
> 1) Minor cleanups, from Vlads notes
> 
> Summary:
> 
> 
> Hey-
> 	Recently, it was reported to me that the kernel could oops in the
> following way:
> 
> <5> kernel BUG at net/core/skbuff.c:91!
> <5> invalid operand: 0000 [#1]
> <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
> ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
> vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
> ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
> snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
> pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
> mptbase sd_mod scsi_mod
> <5> CPU:    0
> <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
> <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
> <5> EIP is at skb_over_panic+0x1f/0x2d
> <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
> <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
> <5> ds: 007b   es: 007b   ss: 0068
> <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
> <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
> e0c2947d 
> <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
> df653490 
> <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
> 00000004 
> <5> Call Trace:
> <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
> <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
> <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
> <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
> <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
> <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
> <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
> <5>  [<c01555a4>] cache_grow+0x140/0x233
> <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
> <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
> <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
> <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
> <5>  [<c02d005e>] nf_iterate+0x40/0x81
> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
> <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
> <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
> <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
> <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
> <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
> <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
> <5>  [<c012a7b1>] __do_softirq+0x35/0x79
> <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
> <5>  [<c01094de>] do_softirq+0x46/0x4d
> 
> Its an skb_over_panic BUG halt that results from processing an init chunk in
> which too many of its variable length parameters are in some way malformed.
> 
> The problem is in sctp_process_unk_param:
> if (NULL == *errp)
> 	*errp = sctp_make_op_error_space(asoc, chunk,
> 					 ntohs(chunk->chunk_hdr->length));
> 
> 	if (*errp) {
> 		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> 				 WORD_ROUND(ntohs(param.p->length)));
> 		sctp_addto_chunk(*errp,
> 			WORD_ROUND(ntohs(param.p->length)),
> 				  param.v);
> 
> When we allocate an error chunk, we assume that the worst case scenario requires
> that we have chunk_hdr->length data allocated, which would be correct nominally,
> given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
> we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
> chunk, so the worst case situation in which all parameters are in violation
> requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
> 
> The result of this error is that a deliberately malformed packet sent to a
> listening host can cause a remote DOS, described in CVE-2010-1173:
> http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
> 
> I've tested the below fix and confirmed that it fixes the issue.  We move to a
> strategy whereby we allocate a fixed size error chunk and ignore errors we don't
> have space to report.  Tested by me successfully
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  include/net/sctp/structs.h |    1 
>  net/sctp/sm_make_chunk.c   |   62 +++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 58 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index ff30177..597f8e2 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
>  			  struct iovec *data);
>  void sctp_chunk_free(struct sctp_chunk *);
>  void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
> +void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
>  struct sctp_chunk *sctp_chunkify(struct sk_buff *,
>  				 const struct sctp_association *,
>  				 struct sock *);
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..2971731 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
>  	cpu_to_be16(sizeof(struct sctp_paramhdr)),
>  };
>  
> -/* A helper to initialize to initialize an op error inside a
> +/* A helper to initialize an op error inside a
>   * provided chunk, as most cause codes will be embedded inside an
>   * abort chunk.
>   */
> @@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
>  	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
>  }
>  
> +/* A helper to initialize an op error inside a
> + * provided chunk, as most cause codes will be embedded inside an
> + * abort chunk.  Differs from sctp_init_cause in that it won't oops
> + * if there isn't enough space in the op error chunk
> + */
> +int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
> +		      size_t paylen)
> +{
> +	sctp_errhdr_t err;
> +	__u16 len;
> +
> +	/* Cause code constants are now defined in network order.  */
> +	err.cause = cause_code;
> +	len = sizeof(sctp_errhdr_t) + paylen;
> +	err.length  = htons(len);
> +
> +	if (skb_tailroom(chunk->skb) >  len)
> +		return -ENOSPC;
> +	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
> +						     sizeof(sctp_errhdr_t),
> +						     &err);
> +	return 0;
> +}
>  /* 3.3.2 Initiation (INIT) (1)
>   *
>   * This chunk is used to initiate a SCTP association between two
> @@ -1131,6 +1154,24 @@ nodata:
>  	return retval;
>  }
>  
> +/* Create an Operation Error chunk of a fixed size,
> + * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
> + * This is a helper function to allocate an error chunk for
> + * for those invalid parameter codes in which we may not want
> + * to report all the errors, if the incomming chunk is large
> + */
> +static inline struct sctp_chunk *sctp_make_op_error_fixed(
> +	const struct sctp_association *asoc,
> +	const struct sctp_chunk *chunk)
> +{
> +	size_t size = asoc ? asoc->pathmtu : 0;
> +
> +	if (!size)
> +		size = SCTP_DEFAULT_MAXSEGMENT;
> +
> +	return sctp_make_op_error_space(asoc, chunk, size);
> +}
> +
>  /* Create an Operation Error chunk.  */
>  struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
>  				 const struct sctp_chunk *chunk,
> @@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
>  	return target;
>  }
>  
> +/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
> + * space in the chunk
> + */
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
> +			     int len, const void *data)
> +{
> +	if (skb_tailroom(chunk->skb) > len)
> +		return sctp_addto_chunk(chunk, len, data);
> +	else
> +		return NULL;
> +}
> +
>  /* Append bytes from user space to the end of a chunk.  Will panic if
>   * chunk is not big enough.
>   * Returns a kernel err value.
> @@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
>  		 * returning multiple unknown parameters.
>  		 */
>  		if (NULL == *errp)
> -			*errp = sctp_make_op_error_space(asoc, chunk,
> -					ntohs(chunk->chunk_hdr->length));
> +			*errp = sctp_make_op_error_fixed(asoc, chunk);
>  
>  		if (*errp) {
> -			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> +			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
>  					WORD_ROUND(ntohs(param.p->length)));
> -			sctp_addto_chunk(*errp,
> +			sctp_addto_chunk_fixed(*errp,
>  					WORD_ROUND(ntohs(param.p->length)),
>  					param.v);
>  		} else {
> 

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
From: Neil Horman @ 2010-04-28 20:30 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <4BD897B6.5040405@hp.com>

Ok, version 4

Change Notes:
1) Minor cleanups, from Vlads notes

Summary:


Hey-
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d 
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490 
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004 
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

I've tested the below fix and confirmed that it fixes the issue.  We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report.  Tested by me successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/sctp/structs.h |    1 
 net/sctp/sm_make_chunk.c   |   62 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 5 deletions(-)


diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ff30177..597f8e2 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
 			  struct iovec *data);
 void sctp_chunk_free(struct sctp_chunk *);
 void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
+void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
 struct sctp_chunk *sctp_chunkify(struct sk_buff *,
 				 const struct sctp_association *,
 				 struct sock *);
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f592163..2971731 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
 	cpu_to_be16(sizeof(struct sctp_paramhdr)),
 };
 
-/* A helper to initialize to initialize an op error inside a
+/* A helper to initialize an op error inside a
  * provided chunk, as most cause codes will be embedded inside an
  * abort chunk.
  */
@@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
 	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
 }
 
+/* A helper to initialize an op error inside a
+ * provided chunk, as most cause codes will be embedded inside an
+ * abort chunk.  Differs from sctp_init_cause in that it won't oops
+ * if there isn't enough space in the op error chunk
+ */
+int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
+		      size_t paylen)
+{
+	sctp_errhdr_t err;
+	__u16 len;
+
+	/* Cause code constants are now defined in network order.  */
+	err.cause = cause_code;
+	len = sizeof(sctp_errhdr_t) + paylen;
+	err.length  = htons(len);
+
+	if (skb_tailroom(chunk->skb) >  len)
+		return -ENOSPC;
+	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
+						     sizeof(sctp_errhdr_t),
+						     &err);
+	return 0;
+}
 /* 3.3.2 Initiation (INIT) (1)
  *
  * This chunk is used to initiate a SCTP association between two
@@ -1131,6 +1154,24 @@ nodata:
 	return retval;
 }
 
+/* Create an Operation Error chunk of a fixed size,
+ * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
+ * This is a helper function to allocate an error chunk for
+ * for those invalid parameter codes in which we may not want
+ * to report all the errors, if the incomming chunk is large
+ */
+static inline struct sctp_chunk *sctp_make_op_error_fixed(
+	const struct sctp_association *asoc,
+	const struct sctp_chunk *chunk)
+{
+	size_t size = asoc ? asoc->pathmtu : 0;
+
+	if (!size)
+		size = SCTP_DEFAULT_MAXSEGMENT;
+
+	return sctp_make_op_error_space(asoc, chunk, size);
+}
+
 /* Create an Operation Error chunk.  */
 struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
 				 const struct sctp_chunk *chunk,
@@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
 	return target;
 }
 
+/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
+ * space in the chunk
+ */
+void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
+			     int len, const void *data)
+{
+	if (skb_tailroom(chunk->skb) > len)
+		return sctp_addto_chunk(chunk, len, data);
+	else
+		return NULL;
+}
+
 /* Append bytes from user space to the end of a chunk.  Will panic if
  * chunk is not big enough.
  * Returns a kernel err value.
@@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
 		 * returning multiple unknown parameters.
 		 */
 		if (NULL == *errp)
-			*errp = sctp_make_op_error_space(asoc, chunk,
-					ntohs(chunk->chunk_hdr->length));
+			*errp = sctp_make_op_error_fixed(asoc, chunk);
 
 		if (*errp) {
-			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
+			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
 					WORD_ROUND(ntohs(param.p->length)));
-			sctp_addto_chunk(*errp,
+			sctp_addto_chunk_fixed(*errp,
 					WORD_ROUND(ntohs(param.p->length)),
 					param.v);
 		} else {

^ permalink raw reply related

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Eric Dumazet @ 2010-04-28 20:18 UTC (permalink / raw)
  To: paulmck
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <20100428200904.GS2540@linux.vnet.ibm.com>

Le mercredi 28 avril 2010 à 13:09 -0700, Paul E. McKenney a écrit :
> On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > > This one occurred during the wakeup from suspend to RAM.
> > > > 
> > > > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > [  984.724700] ---------------------------------------------------
> > > > [  984.724703] include/linux/fdtable.h:88 invoked
> > > > rcu_dereference_check() without protection!
> > > > [  984.724706]
> > > > [  984.724707] other info that might help us debug this:
> > > > [  984.724708]
> > > > [  984.724711]
> > > > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > > [  984.724714] no locks held by dbus-daemon/4680.
> > > > [  984.724717]
> > > > [  984.724717] stack backtrace:
> > > > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > > [  984.724724] Call Trace:
> > > > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > > 
> > > Hmmm...  I am not convinced that this is a false positive.  Couldn't
> > > there be a multi-threaded process where one thread is invoking poll()
> > > on a UNIX socket just as another thread is calling close() on it?
> > > 
> > > The current fcheck_files() logic requires that the caller either (1) be in
> > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > > 
> > > So I don't feel comfortable just slapping an RCU read-side critical
> > > section around this one, at least not unless someone who understands
> > > the locking says that doing so is OK.
> > > 
> > > 		
> > 
> > Its a single threaded program.
> > 
> > So fget_light() calls fcheck_files(files, fd); without rcu lock,
> > but some /proc/pid/fd/... user temporarly raised files->count just
> > before we perform the condition check.
> 
> So I should add a single-threaded check.  My first thought was to use
> current_is_single_threaded(), but the bit about scanning the full list
> of processes does give me pause.  However, thread_group_empty() looks
> like a much lighter-weight alternative.
> 
> I believe that it is possible for a pair of single-threaded processes
> to share a file descriptor, but that should not be a problem, as both
> of them would need to close it for it to go away.
> 
> But what happens if someone does a clone() with CLONE_FILES, as some
> of the AIO stuff seems to do?  Won't that allow one of the resulting
> processes to close the file for both of them, even though both are
> otherwise single-threaded?  And the ->count seems to be the only
> distinction between these two cases.
> 
> And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
> the check must scan the processes with current_is_single_threaded().
> Besides which, a user could invoke clone() with only CLONE_FILES
> specified, right?
> 
> Or am I just confused here?
> 
> 							Thanx, Paul

If a program is mono threaded, and doing a fget_light() syscall, it
cannot possibly do a clone() in // ;)

If we want to be picky, we could add a user provided condition, aka "we
are sure we are allowed to do this because we are the owner of the files
struct".






diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 6da962c..027f5e1 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2694,7 +2694,7 @@ void __do_SAK(struct tty_struct *tty)
 			spin_lock(&p->files->file_lock);
 			fdt = files_fdtable(p->files);
 			for (i = 0; i < fdt->max_fds; i++) {
-				filp = fcheck_files(p->files, i);
+				filp = fcheck_files(p->files, i, false);
 				if (!filp)
 					continue;
 				if (filp->f_op->read == tty_read &&
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 452d02f..dabf4d8 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -119,7 +119,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
 		int retval = oldfd;
 
 		rcu_read_lock();
-		if (!fcheck_files(files, oldfd))
+		if (!fcheck_files(files, oldfd, false))
 			retval = -EBADF;
 		rcu_read_unlock();
 		return retval;
diff --git a/fs/file_table.c b/fs/file_table.c
index 32d12b7..2865f72 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -274,7 +274,7 @@ struct file *fget(unsigned int fd)
 	struct files_struct *files = current->files;
 
 	rcu_read_lock();
-	file = fcheck_files(files, fd);
+	file = fcheck_files(files, fd, false);
 	if (file) {
 		if (!atomic_long_inc_not_zero(&file->f_count)) {
 			/* File object ref couldn't be taken */
@@ -303,10 +303,10 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
 
 	*fput_needed = 0;
 	if (likely((atomic_read(&files->count) == 1))) {
-		file = fcheck_files(files, fd);
+		file = fcheck_files(files, fd, true);
 	} else {
 		rcu_read_lock();
-		file = fcheck_files(files, fd);
+		file = fcheck_files(files, fd, false);
 		if (file) {
 			if (atomic_long_inc_not_zero(&file->f_count))
 				*fput_needed = 1;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 8418fcc..0e89448 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1716,7 +1716,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
 		 * hold ->file_lock.
 		 */
 		spin_lock(&files->file_lock);
-		file = fcheck_files(files, fd);
+		file = fcheck_files(files, fd, false);
 		if (file) {
 			if (path) {
 				*path = file->f_path;
@@ -1755,7 +1755,7 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 		files = get_files_struct(task);
 		if (files) {
 			rcu_read_lock();
-			if (fcheck_files(files, fd)) {
+			if (fcheck_files(files, fd, false)) {
 				rcu_read_unlock();
 				put_files_struct(files);
 				if (task_dumpable(task)) {
@@ -1813,7 +1813,7 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	 * hold ->file_lock.
 	 */
 	spin_lock(&files->file_lock);
-	file = fcheck_files(files, fd);
+	file = fcheck_files(files, fd, false);
 	if (!file)
 		goto out_unlock;
 	if (file->f_mode & FMODE_READ)
@@ -1899,7 +1899,7 @@ static int proc_readfd_common(struct file * filp, void * dirent,
 				char name[PROC_NUMBUF];
 				int len;
 
-				if (!fcheck_files(files, fd))
+				if (!fcheck_files(files, fd, false))
 					continue;
 				rcu_read_unlock();
 
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 013dc52..76423ad 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -57,11 +57,12 @@ struct files_struct {
 	struct file * fd_array[NR_OPEN_DEFAULT];
 };
 
-#define rcu_dereference_check_fdtable(files, fdtfd) \
+#define rcu_dereference_check_fdtable(files, fdtfd, cond) \
 	(rcu_dereference_check((fdtfd), \
 			       rcu_read_lock_held() || \
 			       lockdep_is_held(&(files)->file_lock) || \
-			       atomic_read(&(files)->count) == 1))
+			       atomic_read(&(files)->count) == 1 || \
+			       cond))
 
 #define files_fdtable(files) \
 		(rcu_dereference_check_fdtable((files), (files)->fdt))
@@ -79,13 +80,13 @@ static inline void free_fdtable(struct fdtable *fdt)
 	call_rcu(&fdt->rcu, free_fdtable_rcu);
 }
 
-static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
+static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd, bool cond)
 {
 	struct file * file = NULL;
 	struct fdtable *fdt = files_fdtable(files);
 
 	if (fd < fdt->max_fds)
-		file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
+		file = rcu_dereference_check_fdtable(files, fdt->fd[fd], cond);
 	return file;
 }
 



^ permalink raw reply related

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v3)
From: Vlad Yasevich @ 2010-04-28 20:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428193755.GF4818@hmsreliant.think-freely.org>



Neil Horman wrote:

... snip description ...

> 
> 
>  include/net/sctp/structs.h |    1 
>  net/sctp/sm_make_chunk.c   |   70 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 62 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index ff30177..597f8e2 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
>  			  struct iovec *data);
>  void sctp_chunk_free(struct sctp_chunk *);
>  void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
> +void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
>  struct sctp_chunk *sctp_chunkify(struct sk_buff *,
>  				 const struct sctp_association *,
>  				 struct sock *);
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..9623112 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
>  	cpu_to_be16(sizeof(struct sctp_paramhdr)),
>  };
>  
> -/* A helper to initialize to initialize an op error inside a
> +/* A helper to initialize an op error inside a
>   * provided chunk, as most cause codes will be embedded inside an
>   * abort chunk.
>   */
> @@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
>  	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
>  }
>  
> +/* A helper to initialize an op error inside a
> + * provided chunk, as most cause codes will be embedded inside an
> + * abort chunk.  Differs from sctp_init_cause in that it won't oops
> + * if there isn't enough space in the op error chunk
> + */
> +int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
> +		      size_t paylen)
> +{
> +	sctp_errhdr_t err;
> +	__u16 len;
> +
> +	/* Cause code constants are now defined in network order.  */
> +	err.cause = cause_code;
> +	len = sizeof(sctp_errhdr_t) + paylen;
> +	err.length  = htons(len);
> +
> +	if (skb_tailroom(chunk->skb) >  len)
> +		return -ENOSPC;
> +	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
> +						     sizeof(sctp_errhdr_t),
> +						     &err);
> +	return 0;
> +}
>  /* 3.3.2 Initiation (INIT) (1)
>   *
>   * This chunk is used to initiate a SCTP association between two
> @@ -1131,6 +1154,24 @@ nodata:
>  	return retval;
>  }
>  
> +/* Create an Operation Error chunk of a fixed size,
> + * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
> + * This is a helper function to allocate an error chunk for
> + * for those invalid parameter codes in which we may not want
> + * to report all the errors, if the incomming chunk is large
> + */
> +static inline struct sctp_chunk *sctp_make_op_error_fixed(
> +	const struct sctp_association *asoc,
> +	const struct sctp_chunk *chunk)
> +{
> +	size_t size = asoc ? asoc->pathmtu : 0;
> +
> +	if (size > SCTP_DEFAULT_MAXSEGMENT)
> +		size = SCTP_DEFAULT_MAXSEGMENT;
> +

This doesn't look right.  If you don't have an association or if pmtu is
not initialized, you will end up with a size of 0.  I think you simply
want to a
	if (!size)

there.


> +	return sctp_make_op_error_space(asoc, chunk, size);
> +}
> +
>  /* Create an Operation Error chunk.  */
>  struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
>  				 const struct sctp_chunk *chunk,
> @@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
>  	return target;
>  }
>  
> +/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
> + * space in the chunk
> + */
> +void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
> +			     int len, const void *data)
> +{
> +	if (skb_tailroom(chunk->skb) > len)
> +		return sctp_addto_chunk(chunk, len, data);
> +	else
> +		return NULL;
> +}
> +
>  /* Append bytes from user space to the end of a chunk.  Will panic if
>   * chunk is not big enough.
>   * Returns a kernel err value.
> @@ -1793,9 +1846,9 @@ static int sctp_process_missing_param(const struct sctp_association *asoc,
>  	if (*errp) {
>  		report.num_missing = htonl(1);
>  		report.type = paramtype;
> -		sctp_init_cause(*errp, SCTP_ERROR_MISS_PARAM,
> -				sizeof(report));
> -		sctp_addto_chunk(*errp, sizeof(report), &report);
> +		sctp_init_cause_fixed(*errp, SCTP_ERROR_MISS_PARAM,
> +				      sizeof(report));
> +		sctp_addto_chunk_fixed(*errp, sizeof(report), &report);
>  	}
>  
>  	/* Stop processing this chunk. */
> @@ -1813,7 +1866,7 @@ static int sctp_process_inv_mandatory(const struct sctp_association *asoc,
>  		*errp = sctp_make_op_error_space(asoc, chunk, 0);
>  
>  	if (*errp)
> -		sctp_init_cause(*errp, SCTP_ERROR_INV_PARAM, 0);
> +		sctp_init_cause_fixed(*errp, SCTP_ERROR_INV_PARAM, 0);
>  
>  	/* Stop processing this chunk. */
>  	return 0;

I don't think missing or mandatory parameters are effected by this.  They are a
once and done deal.  There don't report multiple errors and we don't add any
more error after them.

> @@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
>  		 * returning multiple unknown parameters.
>  		 */
>  		if (NULL == *errp)
> -			*errp = sctp_make_op_error_space(asoc, chunk,
> -					ntohs(chunk->chunk_hdr->length));
> +			*errp = sctp_make_op_error_fixed(asoc, chunk);
>  
>  		if (*errp) {
> -			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> +			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
>  					WORD_ROUND(ntohs(param.p->length)));
> -			sctp_addto_chunk(*errp,
> +			sctp_addto_chunk_fixed(*errp,
>  					WORD_ROUND(ntohs(param.p->length)),
>  					param.v);
>  		} else {
> 

So we completely get rid of variable size error chunk in this case, which I can
live with.  It simplifies the code.

-vlad

^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Paul E. McKenney @ 2010-04-28 20:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <1272483491.2201.9.camel@edumazet-laptop>

On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote:
> Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > > This one occurred during the wakeup from suspend to RAM.
> > > 
> > > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > > [  984.724700] ---------------------------------------------------
> > > [  984.724703] include/linux/fdtable.h:88 invoked
> > > rcu_dereference_check() without protection!
> > > [  984.724706]
> > > [  984.724707] other info that might help us debug this:
> > > [  984.724708]
> > > [  984.724711]
> > > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > > [  984.724714] no locks held by dbus-daemon/4680.
> > > [  984.724717]
> > > [  984.724717] stack backtrace:
> > > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > > [  984.724724] Call Trace:
> > > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> > 
> > Hmmm...  I am not convinced that this is a false positive.  Couldn't
> > there be a multi-threaded process where one thread is invoking poll()
> > on a UNIX socket just as another thread is calling close() on it?
> > 
> > The current fcheck_files() logic requires that the caller either (1) be in
> > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> > in a files_struct with ->count equal to 1 (initialization or cleanup).
> > 
> > So I don't feel comfortable just slapping an RCU read-side critical
> > section around this one, at least not unless someone who understands
> > the locking says that doing so is OK.
> > 
> > 		
> 
> Its a single threaded program.
> 
> So fget_light() calls fcheck_files(files, fd); without rcu lock,
> but some /proc/pid/fd/... user temporarly raised files->count just
> before we perform the condition check.

So I should add a single-threaded check.  My first thought was to use
current_is_single_threaded(), but the bit about scanning the full list
of processes does give me pause.  However, thread_group_empty() looks
like a much lighter-weight alternative.

I believe that it is possible for a pair of single-threaded processes
to share a file descriptor, but that should not be a problem, as both
of them would need to close it for it to go away.

But what happens if someone does a clone() with CLONE_FILES, as some
of the AIO stuff seems to do?  Won't that allow one of the resulting
processes to close the file for both of them, even though both are
otherwise single-threaded?  And the ->count seems to be the only
distinction between these two cases.

And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that
the check must scan the processes with current_is_single_threaded().
Besides which, a user could invoke clone() with only CLONE_FILES
specified, right?

Or am I just confused here?

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH net-next-2.6 7/7] Bugfix: Link selection was swapped in switch.
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-7-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:40 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 6/7] caif: Bugfixes in CAIF netdevice for close and flow control
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-6-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:39 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
> o Bugfix: Flow control was causing the device to be destroyed.
> o Bugfix: Handle CAIF channel connect failures.
> o If the underlying link layer is gone the net-device is no longer removed,
>   but closed.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 5/7] caif: Rewritten socket implementation
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-5-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:38 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>  This is a complete re-write of the socket layer. Making the socket
>  implementation more aligned with the other socket layers and using more
>  of the support functions available in sock.c. Lots of code is copied
>  from af_unix (and some from af_irda).
>  Non-blocking mode should be working as well.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 4/7] caif: Disconnect without waiting for response
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-4-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:37 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
> o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
>   disconnect, not waiting for any response from the modem. Due to this
>   the function cfcnfg_linkdestroy_rsp does nothing anymore.
> o Because disconnect may take down a connection before a connect response
>   is received the function cfcnfg_linkup_rsp is checking if the client is
>   still waiting for the response, if not a disconnect request is sent to
>   the modem.
> o cfctrl is no longer keeping track of pending disconnect requests.
> o Added function cfctrl_cancel_req, which is used for deleting a pending
>   connect request if disconnect is done before connect response is received.
> o Removed unused function cfctrl_insert_req2
> o Added better handling of connect reject from modem.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 3/7] caif: Add reference counting to service layer
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-3-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:36 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
> o Added functions cfsrvl_get and cfsrvl_put.
> o Added support release_client to use by socket and net device.
> o Increase reference counting for in-flight packets from cfmuxl
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/7] caif: Rename functions in cfcnfg and caif_dev
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-2-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:35 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>  o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
>  o Fixed typo cfcfg to cfcnfg
>  o Renamed linkid to channel_id
>  o Updated documentation in caif_dev.h
>  o Minor formatting changes
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/7] caif: Ldisc add permission check and mem-alloc error check
From: David Miller @ 2010-04-28 20:02 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-1-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:34 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>    o Added permission checks for installing. CAP_SYS_ADMIN and
>      CAP_SYS_TTY_CONFIG can install the ldisc.
>    o Check if allocation of skb was successful.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* [PATCH v2] net/sb1250: register mdio bus in probe
From: Sebastian Andrzej Siewior @ 2010-04-28 19:57 UTC (permalink / raw)
  To: David Miller; +Cc: ralf, netdev
In-Reply-To: <20100427.155215.228945283.davem@davemloft.net>

"ifconfig eth0 up && ifconfig eth0 down" triggers:
| kobject (a8000000cfa5a480): tried to init an initialized object, something is seriously wrong.
| Call Trace:
| [<ffffffff8010aabc>] dump_stack+0x8/0x34
| [<ffffffff80293128>] kobject_init+0xe8/0xf0
| [<ffffffff802d922c>] device_initialize+0x2c/0x98
| [<ffffffff802d9cfc>] device_register+0x14/0x28
| [<ffffffff80312cd4>] mdiobus_register+0xdc/0x1e0
| [<ffffffff80314cf0>] sbmac_open+0x58/0x220
| [<ffffffff803519bc>] __dev_open+0x11c/0x180
| [<ffffffff8034d578>] __dev_change_flags+0x120/0x180
| [<ffffffff80351848>] dev_change_flags+0x20/0x78
| [<ffffffff803a753c>] devinet_ioctl+0x7cc/0x820
| [<ffffffff80339ac8>] sock_do_ioctl+0x38/0x90
| [<ffffffff8033a258>] compat_sock_ioctl_trans+0x408/0x1030
| [<ffffffff8033af30>] compat_sock_ioctl+0xb0/0xd0
| [<ffffffff80208b08>] compat_sys_ioctl+0xa0/0x18b8
| [<ffffffff80102f94>] handle_sys+0x114/0x130
|
| sb1250-mac-mdio: probed

mdiobus_register() calls device_register() which initializes the kobj of
the device. mdiobus_unregister() calls only device_del() so we have one
reference left. That one is leaving with mdiobus_free() which is only
called on remove.
Since I don't see any reason why mdiobus_register()/mdiobus_unregister()
should happen in ->open()/->close() I move them to probe & exit.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
v2: register mdio bus before registering netdev device

 drivers/net/sb1250-mac.c |   67 ++++++++++++++++++++++-----------------------
 1 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/drivers/net/sb1250-mac.c b/drivers/net/sb1250-mac.c
index 3f882d5..b0161b4 100644
--- a/drivers/net/sb1250-mac.c
+++ b/drivers/net/sb1250-mac.c
@@ -2256,17 +2256,36 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 
 	sc->mii_bus = mdiobus_alloc();
 	if (sc->mii_bus == NULL) {
-		sbmac_uninitctx(sc);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto uninit_ctx;
 	}
 
+	sc->mii_bus->name = sbmac_mdio_string;
+	snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
+	sc->mii_bus->priv = sc;
+	sc->mii_bus->read = sbmac_mii_read;
+	sc->mii_bus->write = sbmac_mii_write;
+	sc->mii_bus->irq = sc->phy_irq;
+	for (i = 0; i < PHY_MAX_ADDR; ++i)
+		sc->mii_bus->irq[i] = SBMAC_PHY_INT;
+
+	sc->mii_bus->parent = &pldev->dev;
+	/*
+	 * Probe PHY address
+	 */
+	err = mdiobus_register(sc->mii_bus);
+	if (err) {
+		printk(KERN_ERR "%s: unable to register MDIO bus\n",
+		       dev->name);
+		goto free_mdio;
+	}
+	dev_set_drvdata(&pldev->dev, sc->mii_bus);
+
 	err = register_netdev(dev);
 	if (err) {
 		printk(KERN_ERR "%s.%d: unable to register netdev\n",
 		       sbmac_string, idx);
-		mdiobus_free(sc->mii_bus);
-		sbmac_uninitctx(sc);
-		return err;
+		goto unreg_mdio;
 	}
 
 	pr_info("%s.%d: registered as %s\n", sbmac_string, idx, dev->name);
@@ -2282,19 +2301,15 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 	pr_info("%s: SiByte Ethernet at 0x%08Lx, address: %pM\n",
 	       dev->name, base, eaddr);
 
-	sc->mii_bus->name = sbmac_mdio_string;
-	snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
-	sc->mii_bus->priv = sc;
-	sc->mii_bus->read = sbmac_mii_read;
-	sc->mii_bus->write = sbmac_mii_write;
-	sc->mii_bus->irq = sc->phy_irq;
-	for (i = 0; i < PHY_MAX_ADDR; ++i)
-		sc->mii_bus->irq[i] = SBMAC_PHY_INT;
-
-	sc->mii_bus->parent = &pldev->dev;
-	dev_set_drvdata(&pldev->dev, sc->mii_bus);
-
 	return 0;
+unreg_mdio:
+	mdiobus_unregister(sc->mii_bus);
+	dev_set_drvdata(&pldev->dev, NULL);
+free_mdio:
+	mdiobus_free(sc->mii_bus);
+uninit_ctx:
+	sbmac_uninitctx(sc);
+	return err;
 }
 
 
@@ -2320,16 +2335,6 @@ static int sbmac_open(struct net_device *dev)
 		goto out_err;
 	}
 
-	/*
-	 * Probe PHY address
-	 */
-	err = mdiobus_register(sc->mii_bus);
-	if (err) {
-		printk(KERN_ERR "%s: unable to register MDIO bus\n",
-		       dev->name);
-		goto out_unirq;
-	}
-
 	sc->sbm_speed = sbmac_speed_none;
 	sc->sbm_duplex = sbmac_duplex_none;
 	sc->sbm_fc = sbmac_fc_none;
@@ -2360,11 +2365,7 @@ static int sbmac_open(struct net_device *dev)
 	return 0;
 
 out_unregister:
-	mdiobus_unregister(sc->mii_bus);
-
-out_unirq:
 	free_irq(dev->irq, dev);
-
 out_err:
 	return err;
 }
@@ -2553,9 +2554,6 @@ static int sbmac_close(struct net_device *dev)
 
 	phy_disconnect(sc->phy_dev);
 	sc->phy_dev = NULL;
-
-	mdiobus_unregister(sc->mii_bus);
-
 	free_irq(dev->irq, dev);
 
 	sbdma_emptyring(&(sc->sbm_txdma));
@@ -2663,6 +2661,7 @@ static int __exit sbmac_remove(struct platform_device *pldev)
 
 	unregister_netdev(dev);
 	sbmac_uninitctx(sc);
+	mdiobus_unregister(sc->mii_bus);
 	mdiobus_free(sc->mii_bus);
 	iounmap(sc->sbm_base);
 	free_netdev(dev);
-- 
1.6.6.1

^ permalink raw reply related

* Re: [PATCH 17/17] sfc: Create multiple TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483043.2549.33.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:43 +0100

> Create a core TX queue and 2 hardware TX queues for each channel.
> If separate_tx_channels is set, create equal numbers of RX and TX
> channels instead.
> 
> Rewrite the channel and queue iteration macros accordingly.
> Eliminate efx_channel::used_flags as redundant.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 16/17] sfc: Test only the first pair of TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483030.2549.32.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:30 +0100

> This makes no immediate difference, but we definitely do not want
> to test all TX queues once we allocate a pair of TX queues to each
> channel.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 15/17] sfc: Add Siena PHY BIST and cable diagnostic support
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483022.2549.31.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:22 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox