Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: Jeff Garzik @ 2010-05-30 17:35 UTC (permalink / raw)
  To: davem, romieu, netdev
In-Reply-To: <20100530122213.GB1146@host-a-55.ustcsz.edu.cn>

On 05/30/2010 08:22 AM, Junchang Wang wrote:
> ioread32() returns a 32-bit integer on all platforms.
> There is no need to cast its return value.
>
> Signed-off-by: Junchang Wang<junchangwang@gmail.com>
> ---
>   drivers/net/8139too.c |    8 ++++----
>   1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
> index 4ba7293..cc7d462 100644
> --- a/drivers/net/8139too.c
> +++ b/drivers/net/8139too.c
> @@ -662,7 +662,7 @@ static const struct ethtool_ops rtl8139_ethtool_ops;
>   /* read MMIO register */
>   #define RTL_R8(reg)		ioread8 (ioaddr + (reg))
>   #define RTL_R16(reg)		ioread16 (ioaddr + (reg))
> -#define RTL_R32(reg)		((unsigned long) ioread32 (ioaddr + (reg)))
> +#define RTL_R32(reg)		ioread32 (ioaddr + (reg))

Have you verified this matches all architectures definition of readl()?

	Jeff




^ permalink raw reply

* Re: [Patch]r8169: remove unnecessary cast of readl()'s return value
From: Jeff Garzik @ 2010-05-30 17:36 UTC (permalink / raw)
  To: davem, romieu, netdev
In-Reply-To: <20100530122606.GC1146@host-a-55.ustcsz.edu.cn>

On 05/30/2010 08:26 AM, Junchang Wang wrote:
> readl() returns a 32-bit integer on all platforms.
> There is no need to cast its return value.
>
> Signed-off-by: Junchang Wang<junchangwang@gmail.com>
> ---
>   drivers/net/r8169.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 217e709..ca93cdf 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -88,7 +88,7 @@ static const int multicast_filter_limit = 32;
>   #define RTL_W32(reg, val32)	writel ((val32), ioaddr + (reg))
>   #define RTL_R8(reg)		readb (ioaddr + (reg))
>   #define RTL_R16(reg)		readw (ioaddr + (reg))
> -#define RTL_R32(reg)		((unsigned long) readl (ioaddr + (reg)))
> +#define RTL_R32(reg)		readl (ioaddr + (reg))

Ditto last email:  have you verified this matches all arch's definition 
of readl()?

	Jeff





^ permalink raw reply

* Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Julia Lawall @ 2010-05-30 19:48 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI <yosh

From: Julia Lawall <julia@diku.dk>

A spin lock is taken near the beginning of the enclosing function.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
@@

spin_lock(...)
... when != spin_unlock(...)
-GFP_KERNEL
+GFP_ATOMIC
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 net/ipv6/sit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -u -p a/net/ipv6/sit.c b/net/ipv6/sit.c
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -358,7 +358,7 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t
 		goto out;
 	}
 
-	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
+	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_ATOMIC);
 	if (!p) {
 		err = -ENOBUFS;
 		goto out;

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Eric Dumazet @ 2010-05-30 20:11 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302147310.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 21:48 +0200, Julia Lawall a écrit :
> From: Julia Lawall <julia@diku.dk>
> 
> A spin lock is taken near the beginning of the enclosing function.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @@
> @@
> 
> spin_lock(...)
> ... when != spin_unlock(...)
> -GFP_KERNEL
> +GFP_ATOMIC
> // </smpl>
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>
> 
> ---
>  net/ipv6/sit.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -u -p a/net/ipv6/sit.c b/net/ipv6/sit.c
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -358,7 +358,7 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t
>  		goto out;
>  	}
>  
> -	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> +	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_ATOMIC);
>  	if (!p) {
>  		err = -ENOBUFS;
>  		goto out;

Nice catch, but what about allocating this outside of the locked
section ?

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index e51e650..ff3dd84 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -340,6 +340,10 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
 	if (a->addr == htonl(INADDR_ANY))
 		return -EINVAL;
 
+	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
+	if (!p)
+		return -ENOBUFS;
+
 	spin_lock(&ipip6_prl_lock);
 
 	for (p = t->prl; p; p = p->next) {
@@ -358,19 +362,16 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
 		goto out;
 	}
 
-	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
-	if (!p) {
-		err = -ENOBUFS;
-		goto out;
-	}
 
 	p->next = t->prl;
 	p->addr = a->addr;
 	p->flags = a->flags;
 	t->prl_count++;
 	rcu_assign_pointer(t->prl, p);
+	p = NULL;
 out:
 	spin_unlock(&ipip6_prl_lock);
+	kfree(p);
 	return err;
 }
 



^ permalink raw reply related

* [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-05-30 20:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100530112925.GB27611@redhat.com>

Replace vhost_workqueue with per-vhost kthread.  Other than callback
argument change from struct work_struct * to struct vhost_poll *,
there's no visible change to vhost_poll_*() interface.

This conversion is to make each vhost use a dedicated kthread so that
resource control via cgroup can be applied.

Partially based on Sridhar Samudrala's patch.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
---
Okay, here is three patch series to convert vhost to use per-vhost
kthread, add cgroup_attach_task_current_cg() and apply it to the vhost
kthreads.  The conversion is mostly straight forward although flush is
slightly tricky.

The problem is that I have no idea how to test this.  It builds fine
and I read it several times but it's entirely possible / likely that I
missed something.  Please proceed with caution (so, no sign off yet).

Thanks.

 drivers/vhost/net.c   |   58 +++++++++++++----------------
 drivers/vhost/vhost.c |   99 ++++++++++++++++++++++++++++++++++++--------------
 drivers/vhost/vhost.h |   32 +++++++++-------
 3 files changed, 117 insertions(+), 72 deletions(-)

Index: work/drivers/vhost/net.c
===================================================================
--- work.orig/drivers/vhost/net.c
+++ work/drivers/vhost/net.c
@@ -294,54 +294,60 @@ static void handle_rx(struct vhost_net *
 	unuse_mm(net->dev.mm);
 }

-static void handle_tx_kick(struct work_struct *work)
+static void handle_tx_kick(struct vhost_poll *poll)
 {
-	struct vhost_virtqueue *vq;
-	struct vhost_net *net;
-	vq = container_of(work, struct vhost_virtqueue, poll.work);
-	net = container_of(vq->dev, struct vhost_net, dev);
+	struct vhost_virtqueue *vq =
+		container_of(poll, struct vhost_virtqueue, poll);
+	struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
+
 	handle_tx(net);
 }

-static void handle_rx_kick(struct work_struct *work)
+static void handle_rx_kick(struct vhost_poll *poll)
 {
-	struct vhost_virtqueue *vq;
-	struct vhost_net *net;
-	vq = container_of(work, struct vhost_virtqueue, poll.work);
-	net = container_of(vq->dev, struct vhost_net, dev);
+	struct vhost_virtqueue *vq =
+		container_of(poll, struct vhost_virtqueue, poll);
+	struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev);
+
 	handle_rx(net);
 }

-static void handle_tx_net(struct work_struct *work)
+static void handle_tx_net(struct vhost_poll *poll)
 {
-	struct vhost_net *net;
-	net = container_of(work, struct vhost_net, poll[VHOST_NET_VQ_TX].work);
+	struct vhost_net *net =
+		container_of(poll, struct vhost_net, poll[VHOST_NET_VQ_TX]);
+
 	handle_tx(net);
 }

-static void handle_rx_net(struct work_struct *work)
+static void handle_rx_net(struct vhost_poll *poll)
 {
-	struct vhost_net *net;
-	net = container_of(work, struct vhost_net, poll[VHOST_NET_VQ_RX].work);
+	struct vhost_net *net =
+		container_of(poll, struct vhost_net, poll[VHOST_NET_VQ_RX]);
+
 	handle_rx(net);
 }

 static int vhost_net_open(struct inode *inode, struct file *f)
 {
 	struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
+	struct vhost_dev *dev;
 	int r;
+
 	if (!n)
 		return -ENOMEM;
+
+	dev = &n->dev;
 	n->vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick;
 	n->vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick;
-	r = vhost_dev_init(&n->dev, n->vqs, VHOST_NET_VQ_MAX);
+	r = vhost_dev_init(dev, n->vqs, VHOST_NET_VQ_MAX);
 	if (r < 0) {
 		kfree(n);
 		return r;
 	}

-	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT);
-	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
 	n->tx_poll_state = VHOST_NET_POLL_DISABLED;

 	f->private_data = n;
@@ -644,25 +650,13 @@ static struct miscdevice vhost_net_misc

 static int vhost_net_init(void)
 {
-	int r = vhost_init();
-	if (r)
-		goto err_init;
-	r = misc_register(&vhost_net_misc);
-	if (r)
-		goto err_reg;
-	return 0;
-err_reg:
-	vhost_cleanup();
-err_init:
-	return r;
-
+	return misc_register(&vhost_net_misc);
 }
 module_init(vhost_net_init);

 static void vhost_net_exit(void)
 {
 	misc_deregister(&vhost_net_misc);
-	vhost_cleanup();
 }
 module_exit(vhost_net_exit);

Index: work/drivers/vhost/vhost.c
===================================================================
--- work.orig/drivers/vhost/vhost.c
+++ work/drivers/vhost/vhost.c
@@ -17,12 +17,12 @@
 #include <linux/mm.h>
 #include <linux/miscdevice.h>
 #include <linux/mutex.h>
-#include <linux/workqueue.h>
 #include <linux/rcupdate.h>
 #include <linux/poll.h>
 #include <linux/file.h>
 #include <linux/highmem.h>
 #include <linux/slab.h>
+#include <linux/kthread.h>

 #include <linux/net.h>
 #include <linux/if_packet.h>
@@ -37,8 +37,6 @@ enum {
 	VHOST_MEMORY_F_LOG = 0x1,
 };

-static struct workqueue_struct *vhost_workqueue;
-
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
 			    poll_table *pt)
 {
@@ -52,23 +50,27 @@ static void vhost_poll_func(struct file
 static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
 			     void *key)
 {
-	struct vhost_poll *poll;
-	poll = container_of(wait, struct vhost_poll, wait);
+	struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait);
+
 	if (!((unsigned long)key & poll->mask))
 		return 0;

-	queue_work(vhost_workqueue, &poll->work);
+	vhost_poll_queue(poll);
 	return 0;
 }

 /* Init poll structure */
-void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
-		     unsigned long mask)
+void vhost_poll_init(struct vhost_poll *poll, vhost_poll_fn_t fn,
+		     unsigned long mask, struct vhost_dev *dev)
 {
-	INIT_WORK(&poll->work, func);
+	poll->fn = fn;
 	init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
 	init_poll_funcptr(&poll->table, vhost_poll_func);
+	INIT_LIST_HEAD(&poll->node);
+	init_waitqueue_head(&poll->done);
 	poll->mask = mask;
+	poll->dev = dev;
+	poll->queue_seq = poll->done_seq = 0;
 }

 /* Start polling a file. We add ourselves to file's wait queue. The caller must
@@ -88,16 +90,28 @@ void vhost_poll_stop(struct vhost_poll *
 	remove_wait_queue(poll->wqh, &poll->wait);
 }

-/* Flush any work that has been scheduled. When calling this, don't hold any
+/* Flush any poll that has been scheduled. When calling this, don't hold any
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
 {
-	flush_work(&poll->work);
+	int seq = poll->queue_seq;
+
+	if (seq - poll->done_seq > 0)
+		wait_event(poll->done, seq - poll->done_seq <= 0);
+	smp_rmb();	/* paired with wmb in vhost_poller() */
 }

 void vhost_poll_queue(struct vhost_poll *poll)
 {
-	queue_work(vhost_workqueue, &poll->work);
+	struct vhost_dev *dev = poll->dev;
+
+	spin_lock(&dev->poller_lock);
+	if (list_empty(&poll->node)) {
+		list_add_tail(&poll->node, &dev->poll_list);
+		poll->queue_seq++;
+		wake_up_process(dev->poller);
+	}
+	spin_unlock(&dev->poller_lock);
 }

 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -125,10 +139,50 @@ static void vhost_vq_reset(struct vhost_
 	vq->log_ctx = NULL;
 }

+static int vhost_poller(void *data)
+{
+	struct vhost_dev *dev = data;
+	struct vhost_poll *poll;
+
+repeat:
+	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
+
+	if (kthread_should_stop()) {
+		__set_current_state(TASK_RUNNING);
+		return 0;
+	}
+
+	poll = NULL;
+	spin_lock(&dev->poller_lock);
+	if (!list_empty(&dev->poll_list)) {
+		poll = list_first_entry(&dev->poll_list,
+					struct vhost_poll, node);
+		list_del_init(&poll->node);
+	}
+	spin_unlock(&dev->poller_lock);
+
+	if (poll) {
+		__set_current_state(TASK_RUNNING);
+		poll->fn(poll);
+		smp_wmb();	/* paired with rmb in vhost_poll_flush() */
+		poll->done_seq = poll->queue_seq;
+		wake_up_all(&poll->done);
+	} else
+		schedule();
+
+	goto repeat;
+}
+
 long vhost_dev_init(struct vhost_dev *dev,
 		    struct vhost_virtqueue *vqs, int nvqs)
 {
+	struct task_struct *poller;
 	int i;
+
+	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
+	if (IS_ERR(poller))
+		return PTR_ERR(poller);
+
 	dev->vqs = vqs;
 	dev->nvqs = nvqs;
 	mutex_init(&dev->mutex);
@@ -136,6 +190,9 @@ long vhost_dev_init(struct vhost_dev *de
 	dev->log_file = NULL;
 	dev->memory = NULL;
 	dev->mm = NULL;
+	spin_lock_init(&dev->poller_lock);
+	INIT_LIST_HEAD(&dev->poll_list);
+	dev->poller = poller;

 	for (i = 0; i < dev->nvqs; ++i) {
 		dev->vqs[i].dev = dev;
@@ -143,8 +200,7 @@ long vhost_dev_init(struct vhost_dev *de
 		vhost_vq_reset(dev, dev->vqs + i);
 		if (dev->vqs[i].handle_kick)
 			vhost_poll_init(&dev->vqs[i].poll,
-					dev->vqs[i].handle_kick,
-					POLLIN);
+					dev->vqs[i].handle_kick, POLLIN, dev);
 	}
 	return 0;
 }
@@ -217,6 +273,8 @@ void vhost_dev_cleanup(struct vhost_dev
 	if (dev->mm)
 		mmput(dev->mm);
 	dev->mm = NULL;
+
+	kthread_stop(dev->poller);
 }

 static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
@@ -1113,16 +1171,3 @@ void vhost_disable_notify(struct vhost_v
 		vq_err(vq, "Failed to enable notification at %p: %d\n",
 		       &vq->used->flags, r);
 }
-
-int vhost_init(void)
-{
-	vhost_workqueue = create_singlethread_workqueue("vhost");
-	if (!vhost_workqueue)
-		return -ENOMEM;
-	return 0;
-}
-
-void vhost_cleanup(void)
-{
-	destroy_workqueue(vhost_workqueue);
-}
Index: work/drivers/vhost/vhost.h
===================================================================
--- work.orig/drivers/vhost/vhost.h
+++ work/drivers/vhost/vhost.h
@@ -5,7 +5,6 @@
 #include <linux/vhost.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
-#include <linux/workqueue.h>
 #include <linux/poll.h>
 #include <linux/file.h>
 #include <linux/skbuff.h>
@@ -20,19 +19,26 @@ enum {
 	VHOST_NET_MAX_SG = MAX_SKB_FRAGS + 2,
 };

+struct vhost_poll;
+typedef void (*vhost_poll_fn_t)(struct vhost_poll *poll);
+
 /* Poll a file (eventfd or socket) */
 /* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
+	vhost_poll_fn_t		  fn;
 	poll_table                table;
 	wait_queue_head_t        *wqh;
 	wait_queue_t              wait;
-	/* struct which will handle all actual work. */
-	struct work_struct        work;
+	struct list_head	  node;
+	wait_queue_head_t	  done;
 	unsigned long		  mask;
+	struct vhost_dev	 *dev;
+	int			  queue_seq;
+	int			  done_seq;
 };

-void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
-		     unsigned long mask);
+void vhost_poll_init(struct vhost_poll *poll, vhost_poll_fn_t fn,
+		     unsigned long mask, struct vhost_dev *dev);
 void vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
@@ -63,7 +69,7 @@ struct vhost_virtqueue {
 	struct vhost_poll poll;

 	/* The routine to call when the Guest pings us, or timeout. */
-	work_func_t handle_kick;
+	vhost_poll_fn_t handle_kick;

 	/* Last available index we saw. */
 	u16 last_avail_idx;
@@ -86,11 +92,11 @@ struct vhost_virtqueue {
 	struct iovec hdr[VHOST_NET_MAX_SG];
 	size_t hdr_size;
 	/* We use a kind of RCU to access private pointer.
-	 * All readers access it from workqueue, which makes it possible to
-	 * flush the workqueue instead of synchronize_rcu. Therefore readers do
+	 * All readers access it from poller, which makes it possible to
+	 * flush the vhost_poll instead of synchronize_rcu. Therefore readers do
 	 * not need to call rcu_read_lock/rcu_read_unlock: the beginning of
-	 * work item execution acts instead of rcu_read_lock() and the end of
-	 * work item execution acts instead of rcu_read_lock().
+	 * vhost_poll execution acts instead of rcu_read_lock() and the end of
+	 * vhost_poll execution acts instead of rcu_read_lock().
 	 * Writers use virtqueue mutex. */
 	void *private_data;
 	/* Log write descriptors */
@@ -110,6 +116,9 @@ struct vhost_dev {
 	int nvqs;
 	struct file *log_file;
 	struct eventfd_ctx *log_ctx;
+	spinlock_t poller_lock;
+	struct list_head poll_list;
+	struct task_struct *poller;
 };

 long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
@@ -136,9 +145,6 @@ bool vhost_enable_notify(struct vhost_vi
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len);

-int vhost_init(void);
-void vhost_cleanup(void);
-
 #define vq_err(vq, fmt, ...) do {                                  \
 		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \
 		if ((vq)->error_ctx)                               \

^ permalink raw reply

* [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup
From: Tejun Heo @ 2010-05-30 20:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100530112925.GB27611@redhat.com>

From: Sridhar Samudrala <samudrala.sridhar@gmail.com>

Add a new kernel API to attach a task to current task's cgroup
in all the active hierarchies.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
---
 include/linux/cgroup.h |    1 +
 kernel/cgroup.c        |   23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

Index: work/include/linux/cgroup.h
===================================================================
--- work.orig/include/linux/cgroup.h
+++ work/include/linux/cgroup.h
@@ -570,6 +570,7 @@ struct task_struct *cgroup_iter_next(str
 void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
 int cgroup_scan_tasks(struct cgroup_scanner *scan);
 int cgroup_attach_task(struct cgroup *, struct task_struct *);
+int cgroup_attach_task_current_cg(struct task_struct *);

 /*
  * CSS ID is ID for cgroup_subsys_state structs under subsys. This only works
Index: work/kernel/cgroup.c
===================================================================
--- work.orig/kernel/cgroup.c
+++ work/kernel/cgroup.c
@@ -1788,6 +1788,29 @@ out:
 	return retval;
 }

+/**
+ * cgroup_attach_task_current_cg - attach task 'tsk' to current task's cgroup
+ * @tsk: the task to be attached
+ */
+int cgroup_attach_task_current_cg(struct task_struct *tsk)
+{
+	struct cgroupfs_root *root;
+	struct cgroup *cur_cg;
+	int retval = 0;
+
+	cgroup_lock();
+	for_each_active_root(root) {
+		cur_cg = task_cgroup_from_root(current, root);
+		retval = cgroup_attach_task(cur_cg, tsk);
+		if (retval)
+			break;
+	}
+	cgroup_unlock();
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(cgroup_attach_task_current_cg);
+
 /*
  * Attach task with pid 'pid' to cgroup 'cgrp'. Call with cgroup_mutex
  * held. May take task_lock of task

^ permalink raw reply

* [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers
From: Tejun Heo @ 2010-05-30 20:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100530112925.GB27611@redhat.com>

Apply the cpumask and cgroup of the initializing task to the created
vhost poller.

Based on Sridhar Samudrala's patch.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
---
 drivers/vhost/vhost.c |   36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

Index: work/drivers/vhost/vhost.c
===================================================================
--- work.orig/drivers/vhost/vhost.c
+++ work/drivers/vhost/vhost.c
@@ -23,6 +23,7 @@
 #include <linux/highmem.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/cgroup.h>

 #include <linux/net.h>
 #include <linux/if_packet.h>
@@ -176,12 +177,30 @@ repeat:
 long vhost_dev_init(struct vhost_dev *dev,
 		    struct vhost_virtqueue *vqs, int nvqs)
 {
-	struct task_struct *poller;
-	int i;
+	struct task_struct *poller = NULL;
+	cpumask_var_t mask;
+	int i, ret = -ENOMEM;
+
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		goto out;

 	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
-	if (IS_ERR(poller))
-		return PTR_ERR(poller);
+	if (IS_ERR(poller)) {
+		ret = PTR_ERR(poller);
+		goto out;
+	}
+
+	ret = sched_getaffinity(current->pid, mask);
+	if (ret)
+		goto out;
+
+	ret = sched_setaffinity(poller->pid, mask);
+	if (ret)
+		goto out;
+
+	ret = cgroup_attach_task_current_cg(poller);
+	if (ret)
+		goto out;

 	dev->vqs = vqs;
 	dev->nvqs = nvqs;
@@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
 			vhost_poll_init(&dev->vqs[i].poll,
 					dev->vqs[i].handle_kick, POLLIN, dev);
 	}
-	return 0;
+
+	wake_up_process(poller);	/* avoid contributing to loadavg */
+	ret = 0;
+out:
+	if (ret)
+		kthread_stop(poller);
+	free_cpumask_var(mask);
+	return ret;
 }

 /* Caller should have device mutex */

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Julia Lawall @ 2010-05-30 20:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <1275250288.2472.21.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2442 bytes --]

On Sun, 30 May 2010, Eric Dumazet wrote:

> Le dimanche 30 mai 2010 à 21:48 +0200, Julia Lawall a écrit :
> > From: Julia Lawall <julia@diku.dk>
> > 
> > A spin lock is taken near the beginning of the enclosing function.
> > 
> > The semantic patch that makes this change is as follows:
> > (http://coccinelle.lip6.fr/)
> > 
> > // <smpl>
> > @@
> > @@
> > 
> > spin_lock(...)
> > ... when != spin_unlock(...)
> > -GFP_KERNEL
> > +GFP_ATOMIC
> > // </smpl>
> > 
> > Signed-off-by: Julia Lawall <julia@diku.dk>
> > 
> > ---
> >  net/ipv6/sit.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff -u -p a/net/ipv6/sit.c b/net/ipv6/sit.c
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -358,7 +358,7 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t
> >  		goto out;
> >  	}
> >  
> > -	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> > +	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_ATOMIC);
> >  	if (!p) {
> >  		err = -ENOBUFS;
> >  		goto out;
> 
> Nice catch, but what about allocating this outside of the locked
> section ?

I think the proposed patch does not work, because the for loop overwrites 
p.  That use of p looks like it is completely local to the for loop, so 
perhaps a new variable p1 could be added to be used there?

julia

> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index e51e650..ff3dd84 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -340,6 +340,10 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
>  	if (a->addr == htonl(INADDR_ANY))
>  		return -EINVAL;
>  
> +	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> +	if (!p)
> +		return -ENOBUFS;
> +
>  	spin_lock(&ipip6_prl_lock);
>  
>  	for (p = t->prl; p; p = p->next) {
> @@ -358,19 +362,16 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
>  		goto out;
>  	}
>  
> -	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> -	if (!p) {
> -		err = -ENOBUFS;
> -		goto out;
> -	}
>  
>  	p->next = t->prl;
>  	p->addr = a->addr;
>  	p->flags = a->flags;
>  	t->prl_count++;
>  	rcu_assign_pointer(t->prl, p);
> +	p = NULL;
>  out:
>  	spin_unlock(&ipip6_prl_lock);
> +	kfree(p);
>  	return err;
>  }
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Eric Dumazet @ 2010-05-30 20:55 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302248500.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 22:50 +0200, Julia Lawall a écrit :

> I think the proposed patch does not work, because the for loop overwrites 
> p.  That use of p looks like it is completely local to the for loop, so 
> perhaps a new variable p1 could be added to be used there?

Please do so.

I just wanted to tell you changing GFP_KERNEL to GFP_ATOMIC is not an
appropriate way to solve this kind of problems. My patch was to get an
idea, not a full and tested patch :)

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Julia Lawall @ 2010-05-30 21:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <1275252912.2472.23.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1286 bytes --]

On Sun, 30 May 2010, Eric Dumazet wrote:

> Le dimanche 30 mai 2010 à 22:50 +0200, Julia Lawall a écrit :
> 
> > I think the proposed patch does not work, because the for loop overwrites 
> > p.  That use of p looks like it is completely local to the for loop, so 
> > perhaps a new variable p1 could be added to be used there?
> 
> Please do so.
> 
> I just wanted to tell you changing GFP_KERNEL to GFP_ATOMIC is not an
> appropriate way to solve this kind of problems. My patch was to get an
> idea, not a full and tested patch :)

Looking at it again, there is still a problem, because in the original 
code, the loop:

        for (p = t->prl; p; p = p->next) {
                if (p->addr == a->addr) {
                        if (chg) {
                                p->flags = a->flags;
                                goto out;
                        }
                        err = -EEXIST;
                        goto out;
                }
        }

could exit with success without the kzalloc ever being called.  If the 
kzalloc is moved up, it could fail and then it returns immediately without 
executing the loop.  A solution could be to leave the NULL test on p where 
it is, and only move up the kzalloc.  Or perhaps the change in behavior 
doesn't matter?

julia

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Eric Dumazet @ 2010-05-30 21:18 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302303180.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 23:09 +0200, Julia Lawall a écrit :

> could exit with success without the kzalloc ever being called.  If the 
> kzalloc is moved up, it could fail and then it returns immediately without 
> executing the loop.  A solution could be to leave the NULL test on p where 
> it is, and only move up the kzalloc.  Or perhaps the change in behavior 
> doesn't matter?

If a GFP_KERNEL allocation fails, we are in a big trouble anyway :)

GFP_ATOMIC are more problematic in this area :)

^ permalink raw reply

* Re: [PATCH v2] act_nat: fix the wrong checksum when addr isn't in old_addr/mask
From: Herbert Xu @ 2010-05-30 21:53 UTC (permalink / raw)
  To: Changli Gao; +Cc: jamal, David S. Miller, netdev
In-Reply-To: <AANLkTinDe-AluGZx87q3nvxCfushfeH0jS35Fav95IXk@mail.gmail.com>

On Sun, May 30, 2010 at 09:33:22PM +0800, Changli Gao wrote:
>
> Thinking about this topologic:
> 
> client -> DNAT -> router -> server.
> 
> DNAT is used to map a public IP to server's private IP. If a
> DEST_UNREACH ICMP packet is sent out by router, in order to handle
> this ICMP packet correctly, I have to pass it to act_nat.c. How can I
> filter out the other packets? By inspecting the inner IP destination
> address of this ICMP packet? Maybe I can use u32 with complicate
> parameters.

You should filter out all the non-ICMP packets, just like you
do here.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: David Miller @ 2010-05-30 22:34 UTC (permalink / raw)
  To: jeff; +Cc: romieu, netdev
In-Reply-To: <4C02A1F7.6030701@garzik.org>

From: Jeff Garzik <jeff@garzik.org>
Date: Sun, 30 May 2010 13:35:51 -0400

> Have you verified this matches all architectures definition of
> readl()?

Jeff, when you come out of hiding after months if not years
of not reviewing network driver changes, could you provide
some useful commentary instead of some trite stuff like this?

It does match, that's why I told this person to write these patches.
And if you have been following the thread where we discussed this, you
wouldn't feel the need to ask this question about these two patches.

And if it doesn't match, that's an arch bug which should be fixed and
in any event there is only one possibility of a non-match and that is
if the routine returns "unsigned long"

Which, surprise surprise Jeff, retains current behavior!

So there is no risk whatsoever possible from this change.

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: Jeff Garzik @ 2010-05-30 23:24 UTC (permalink / raw)
  To: David Miller; +Cc: romieu, netdev
In-Reply-To: <20100530.153451.193700815.davem@davemloft.net>

On 05/30/2010 06:34 PM, David Miller wrote:
> From: Jeff Garzik<jeff@garzik.org>
> Date: Sun, 30 May 2010 13:35:51 -0400
>
>> Have you verified this matches all architectures definition of
>> readl()?

> And if it doesn't match, that's an arch bug which should be fixed and
> in any event there is only one possibility of a non-match and that is
> if the routine returns "unsigned long"

That was the genesis of the question.  Some arches still use unsigned long.

	Jeff




^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: Junchang Wang @ 2010-05-31  0:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: davem, romieu, netdev
In-Reply-To: <4C02A1F7.6030701@garzik.org>

On Sun, May 30, 2010 at 01:35:51PM -0400, Jeff Garzik wrote:
>
>Have you verified this matches all architectures definition of readl()?
>

Hi Jeff,

Thanks for your question. Just browsed the kernel. ioread32() returns either 
unsigned int or u32 in all arches. There is no arch that uses unsigned long 
or something else.

Secondly, There's a bug if an arch returns unsigned long. What happen when 
programmers invoke sizeof(ioread32()) on 64-bit platforms?

--Junchang

^ permalink raw reply

* Re: [audit] Suppress runtime loading of audit module.
From: James Morris @ 2010-05-31  0:27 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Audit-ML, linux-security-module, netdev
In-Reply-To: <201005242057.EBJ00004.JOOFVHLFFQOStM@I-love.SAKURA.ne.jp>

Might be best for the networking folk to look at this.



On Mon, 24 May 2010, Tetsuo Handa wrote:

> I noticed that request_module("net-pf-16-proto-9") is issued whenever 'Enter'
> key is pressed if CONFIG_AUDIT=n and console is /dev/tty0 . net-pf-16-proto-9
> is known as audit but CONFIG_AUDIT is "bool". Therefore, trying to load
> net-pf-16-proto-9 at runtime does not make sense.
> 
> Call trace obtained by inserting WARN_ON(protocol == 9);
> ------------[ cut here ]------------
> WARNING: at net/netlink/af_netlink.c:450 netlink_create+0x1cd/0x1e0()
> Hardware name: VMware Virtual Platform
> Modules linked in: mptspi mptscsih mptbase scsi_transport_spi ext3 jbd mbcache
> Pid: 1, comm: bash Not tainted 2.6.34 #10
> Call Trace:
>  [<c06227ad>] ? netlink_create+0x1cd/0x1e0
>  [<c043c9cc>] warn_slowpath_common+0x7c/0xa0
>  [<c06227ad>] ? netlink_create+0x1cd/0x1e0
>  [<c043ca05>] warn_slowpath_null+0x15/0x20
>  [<c06227ad>] netlink_create+0x1cd/0x1e0
>  [<c0600790>] ? __sock_create+0xe0/0x240
>  [<c06225e0>] ? netlink_create+0x0/0x1e0
>  [<c06007b8>] __sock_create+0x108/0x240
>  [<c060072e>] ? __sock_create+0x7e/0x240
>  [<c060095a>] sock_create+0x3a/0x50
>  [<c0600ae6>] sys_socket+0x36/0x60
>  [<c0601f69>] sys_socketcall+0x89/0x290
>  [<c0402b83>] ? sysenter_exit+0xf/0x18
>  [<c0524db4>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c0402b50>] sysenter_do_call+0x12/0x36
> ---[ end trace 4da698b4c0bf1613 ]---
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  net/netlink/af_netlink.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-2.6.34.orig/net/netlink/af_netlink.c
> +++ linux-2.6.34/net/netlink/af_netlink.c
> @@ -446,7 +446,7 @@ static int netlink_create(struct net *ne
>  
>  	netlink_lock_table();
>  #ifdef CONFIG_MODULES
> -	if (!nl_table[protocol].registered) {
> +	if (!nl_table[protocol].registered && protocol != 9) {
>  		netlink_unlock_table();
>  		request_module("net-pf-%d-proto-%d", PF_NETLINK, protocol);
>  		netlink_lock_table();
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup
From: Li Zefan @ 2010-05-31  1:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michael S. Tsirkin, Oleg Nesterov, Sridhar Samudrala, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C987.6020805@kernel.org>

04:24, Tejun Heo wrote:
> From: Sridhar Samudrala <samudrala.sridhar@gmail.com>
> 
> Add a new kernel API to attach a task to current task's cgroup
> in all the active hierarchies.
> 
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

btw: you lost the reviewed-by tag given by Paul Menage.

^ permalink raw reply

* Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers
From: Li Zefan @ 2010-05-31  1:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michael S. Tsirkin, Oleg Nesterov, Sridhar Samudrala, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C99D.9070204@kernel.org>

Tejun Heo wrote:
> Apply the cpumask and cgroup of the initializing task to the created
> vhost poller.
> 
> Based on Sridhar Samudrala's patch.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
> ---
>  drivers/vhost/vhost.c |   36 +++++++++++++++++++++++++++++++-----
>  1 file changed, 31 insertions(+), 5 deletions(-)
> 
> Index: work/drivers/vhost/vhost.c
> ===================================================================
> --- work.orig/drivers/vhost/vhost.c
> +++ work/drivers/vhost/vhost.c
> @@ -23,6 +23,7 @@
>  #include <linux/highmem.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/cgroup.h>
> 
>  #include <linux/net.h>
>  #include <linux/if_packet.h>
> @@ -176,12 +177,30 @@ repeat:
>  long vhost_dev_init(struct vhost_dev *dev,
>  		    struct vhost_virtqueue *vqs, int nvqs)
>  {
> -	struct task_struct *poller;
> -	int i;
> +	struct task_struct *poller = NULL;
> +	cpumask_var_t mask;
> +	int i, ret = -ENOMEM;
> +
> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> +		goto out;
> 

If we "goto out", we will end up calling kthread_stop(poller), but
seems kthread_stop() requires the task_struct pointer != NULL.

>  	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
> -	if (IS_ERR(poller))
> -		return PTR_ERR(poller);
> +	if (IS_ERR(poller)) {
> +		ret = PTR_ERR(poller);
> +		goto out;
> +	}
> +
> +	ret = sched_getaffinity(current->pid, mask);
> +	if (ret)
> +		goto out;
> +
> +	ret = sched_setaffinity(poller->pid, mask);
> +	if (ret)
> +		goto out;
> +
> +	ret = cgroup_attach_task_current_cg(poller);
> +	if (ret)
> +		goto out;
> 
>  	dev->vqs = vqs;
>  	dev->nvqs = nvqs;
> @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
>  			vhost_poll_init(&dev->vqs[i].poll,
>  					dev->vqs[i].handle_kick, POLLIN, dev);
>  	}
> -	return 0;
> +
> +	wake_up_process(poller);	/* avoid contributing to loadavg */
> +	ret = 0;
> +out:
> +	if (ret)
> +		kthread_stop(poller);
> +	free_cpumask_var(mask);
> +	return ret;
>  }
> 
>  /* Caller should have device mutex */

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: David Miller @ 2010-05-31  1:29 UTC (permalink / raw)
  To: jeff; +Cc: romieu, netdev
In-Reply-To: <4C02F3A2.90100@garzik.org>

From: Jeff Garzik <jeff@garzik.org>
Date: Sun, 30 May 2010 19:24:18 -0400

> That was the genesis of the question.  Some arches still use unsigned
> long.

They are 32-bit.

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: David Miller @ 2010-05-31  1:35 UTC (permalink / raw)
  To: jeff; +Cc: romieu, netdev
In-Reply-To: <20100530.182948.71104948.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Sun, 30 May 2010 18:29:48 -0700 (PDT)

> From: Jeff Garzik <jeff@garzik.org>
> Date: Sun, 30 May 2010 19:24:18 -0400
> 
>> That was the genesis of the question.  Some arches still use unsigned
>> long.
> 
> They are 32-bit.

In fact the only two offenders are h8300 and m32r, which are
both 32-bit.

This is really in the realm of "who cares."

^ permalink raw reply

* [PATCH] cls_u32: use skb_copy_bits() to dereference data safely
From: Changli Gao @ 2010-05-31  2:24 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David S. Miller, netdev, Changli Gao

use skb_copy_bits() to dereference data safely

the original skb->data dereference isn't safe, as there isn't any skb->len or
skb_is_nonlinear() check. skb_copy_bits() is used instead in this patch. And
when the skb isn't long enough, we terminate the function u32_classify()
immediately with -1.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/sched/cls_u32.c |   40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 9627542..db35197 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -98,11 +98,11 @@ static int u32_classify(struct sk_buff *skb, struct tcf_proto *tp, struct tcf_re
 {
 	struct {
 		struct tc_u_knode *knode;
-		u8		  *ptr;
+		unsigned int	  off;
 	} stack[TC_U32_MAXDEPTH];
 
 	struct tc_u_hnode *ht = (struct tc_u_hnode*)tp->root;
-	u8 *ptr = skb_network_header(skb);
+	unsigned int off = skb_network_header(skb) - skb->data;
 	struct tc_u_knode *n;
 	int sdepth = 0;
 	int off2 = 0;
@@ -134,8 +134,13 @@ next_knode:
 #endif
 
 		for (i = n->sel.nkeys; i>0; i--, key++) {
+			unsigned int toff;
+			__be32 data;
 
-			if ((*(__be32*)(ptr+key->off+(off2&key->offmask))^key->val)&key->mask) {
+			toff = off + key->off + (off2 & key->offmask);
+			if (skb_copy_bits(skb, toff, &data, 4))
+				goto out;
+			if ((data ^ key->val) & key->mask) {
 				n = n->next;
 				goto next_knode;
 			}
@@ -174,29 +179,41 @@ check_terminal:
 		if (sdepth >= TC_U32_MAXDEPTH)
 			goto deadloop;
 		stack[sdepth].knode = n;
-		stack[sdepth].ptr = ptr;
+		stack[sdepth].off = off;
 		sdepth++;
 
 		ht = n->ht_down;
 		sel = 0;
-		if (ht->divisor)
-			sel = ht->divisor&u32_hash_fold(*(__be32*)(ptr+n->sel.hoff), &n->sel,n->fshift);
+		if (ht->divisor) {
+			__be32 data;
 
+			if (skb_copy_bits(skb, off + n->sel.hoff, &data, 4))
+				goto out;
+			sel = ht->divisor & u32_hash_fold(data, &n->sel,
+							  n->fshift);
+		}
 		if (!(n->sel.flags&(TC_U32_VAROFFSET|TC_U32_OFFSET|TC_U32_EAT)))
 			goto next_ht;
 
 		if (n->sel.flags&(TC_U32_OFFSET|TC_U32_VAROFFSET)) {
 			off2 = n->sel.off + 3;
-			if (n->sel.flags&TC_U32_VAROFFSET)
-				off2 += ntohs(n->sel.offmask & *(__be16*)(ptr+n->sel.offoff)) >>n->sel.offshift;
+			if (n->sel.flags & TC_U32_VAROFFSET) {
+				__be16 data;
+
+				if (skb_copy_bits(skb, off + n->sel.offoff,
+						  &data, 2))
+					goto out;
+				off2 += ntohs(n->sel.offmask & data) >>
+					n->sel.offshift;
+			}
 			off2 &= ~3;
 		}
 		if (n->sel.flags&TC_U32_EAT) {
-			ptr += off2;
+			off += off2;
 			off2 = 0;
 		}
 
-		if (ptr < skb_tail_pointer(skb))
+		if (off < skb->len)
 			goto next_ht;
 	}
 
@@ -204,9 +221,10 @@ check_terminal:
 	if (sdepth--) {
 		n = stack[sdepth].knode;
 		ht = n->ht_up;
-		ptr = stack[sdepth].ptr;
+		off = stack[sdepth].off;
 		goto check_terminal;
 	}
+out:
 	return -1;
 
 deadloop:

^ permalink raw reply related

* [PATCH] ipconfig: send host-name in DHCP requests
From: Wu Fengguang @ 2010-05-31  3:19 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: LKML, Andi Kleen

Normally dhclient can be configured to send the "host-name" option
in DHCP requests to update the client's DNS record. However for an
NFSROOT system, dhclient shall never be called (which may change the
IP addr and therefore lose your root NFS mount connection).

So enable updating the DNS record with kernel parameter

	ip=::::$HOST_NAME::dhcp

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 net/ipv4/ipconfig.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 067ce9e..db54343 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -665,6 +665,13 @@ ic_dhcp_init_options(u8 *options)
 		memcpy(e, ic_req_params, sizeof(ic_req_params));
 		e += sizeof(ic_req_params);

+		if (ic_host_name_set) {
+			*e++ = 12;	/* host-name */
+			len = strlen(utsname()->nodename);
+			*e++ = len;
+			memcpy(e, utsname()->nodename, len);
+			e += len;
+		}
 		if (*vendor_class_identifier) {
 			printk(KERN_INFO "DHCP: sending class identifier \"%s\"\n",
 			       vendor_class_identifier);
-- 
1.6.6

^ permalink raw reply related

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
From: Michael Chan @ 2010-05-31  4:43 UTC (permalink / raw)
  To: 'Andi Kleen'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	'linux-pci@vger.kernel.org'
In-Reply-To: <20100530173014.GB14556@basil.fritz.box>

Andi Kleen wrote:

> On Sun, May 30, 2010 at 09:12:15AM -0700, Michael Chan wrote:
> > Andi Kleen wrote:
> >
> > > "Michael Chan" <mchan@broadcom.com> writes:
> > >
> > > > When switching from the crashed kernel to the kdump kernel
> without
> > > going
> > > > through PCI reset, IRQs may not work if a different IRQ mode is
> used
> > > on
> > >
> > > PCIe with AER actually does support per link root port reset
> > > (e.g. used for AER)
> >
> > Do you mean the slot_reset function in the pci_error_handlers?  This
> 
> Well the fallback code in the PCIE root port driver
> that does the actual resets.

aer_root_reset() in aerdrv.c?

> 
> It could be called directly before kexec.
> 
> > needs to be called in the context of the crashed kernel, right?
> 
> It could be done on kexec, however of course you would rely
> on PCI root port data structures still being intact on a crash
> (I guess that's reasonable, they are not very complicated)
> 
> >
> > >
> > > I've been wondering for some time if kexec should not simply
> > > use that to reset all the devices, instead of addings hacks
> > > around this to all drivers.
> > >
> > > That would fix your problems too, right?
> >
> > If it is called in the context of the crashed kernel, it won't work.
> > We would reset it and put in back into the same IRQ mode.
> 
> Who would put it back? Your driver wouldn't be called anymore.

The bnx2 driver like many other drivers has a slot_reset function in the
pci_driver struct's err_handler.  If the AER code calls this function,
we would reset the chip and put it back to the same IRQ mode.  Without
calling this per driver reset function, I'm not sure if you can reset
the device if the device does not support Function Level Reset.

> 
> >
> > >
> > > The question is just if AER is widely enough supported for this.
> > >
> >
> > Some newer PCIe devices support Function Level Reset, and that would
> > be ideal.  But most existing devices including bnx2 devices don't
> have
> > this feature.
> 
> Root port reset should be fine for this case. Even if some
> innocent device on the same root port gets reset too that shouldn't
> matter.
> Only drawback for the NIC would be that you have to renegotiate links I
> think.
> 
> Also there are systems without AER support.
> 
> -Andi
> --
> ak@linux.intel.com -- Speaking for myself only.



^ permalink raw reply

* [PATCH] ipv6: get rid of ipip6_prl_lock
From: Eric Dumazet @ 2010-05-31  5:04 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302303180.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 23:09 +0200, Julia Lawall a écrit :
> On Sun, 30 May 2010, Eric Dumazet wrote:
> 
> > Le dimanche 30 mai 2010 à 22:50 +0200, Julia Lawall a écrit :
> > 
> > > I think the proposed patch does not work, because the for loop overwrites 
> > > p.  That use of p looks like it is completely local to the for loop, so 
> > > perhaps a new variable p1 could be added to be used there?
> > 
> > Please do so.
> > 
> > I just wanted to tell you changing GFP_KERNEL to GFP_ATOMIC is not an
> > appropriate way to solve this kind of problems. My patch was to get an
> > idea, not a full and tested patch :)
> 
> Looking at it again, there is still a problem, because in the original 
> code, the loop:
> 
...
> 
> could exit with success without the kzalloc ever being called.  If the 
> kzalloc is moved up, it could fail and then it returns immediately without 
> executing the loop.  A solution could be to leave the NULL test on p where 
> it is, and only move up the kzalloc.  Or perhaps the change in behavior 
> doesn't matter?
> 


[PATCH] ipv6: get rid of ipip6_prl_lock

As noticed by Julia Lawall, ipip6_tunnel_add_prl() incorrectly calls 
kzallloc(..., GFP_KERNEL) while a spinlock is held. He provided
a patch to use GFP_ATOMIC instead.

One possibility would be to convert this spinlock to a mutex, or
preallocate the thing before taking the lock.

After RCU conversion, it appears we dont need this lock, since 
caller already holds RTNL

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/sit.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index e51e650..702c532 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -249,8 +249,6 @@ failed:
 	return NULL;
 }
 
-static DEFINE_SPINLOCK(ipip6_prl_lock);
-
 #define for_each_prl_rcu(start)			\
 	for (prl = rcu_dereference(start);	\
 	     prl;				\
@@ -340,7 +338,7 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
 	if (a->addr == htonl(INADDR_ANY))
 		return -EINVAL;
 
-	spin_lock(&ipip6_prl_lock);
+	ASSERT_RTNL();
 
 	for (p = t->prl; p; p = p->next) {
 		if (p->addr == a->addr) {
@@ -370,7 +368,6 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
 	t->prl_count++;
 	rcu_assign_pointer(t->prl, p);
 out:
-	spin_unlock(&ipip6_prl_lock);
 	return err;
 }
 
@@ -397,7 +394,7 @@ ipip6_tunnel_del_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a)
 	struct ip_tunnel_prl_entry *x, **p;
 	int err = 0;
 
-	spin_lock(&ipip6_prl_lock);
+	ASSERT_RTNL();
 
 	if (a && a->addr != htonl(INADDR_ANY)) {
 		for (p = &t->prl; *p; p = &(*p)->next) {
@@ -419,7 +416,6 @@ ipip6_tunnel_del_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a)
 		}
 	}
 out:
-	spin_unlock(&ipip6_prl_lock);
 	return err;
 }
 

^ permalink raw reply related

* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Cong Wang @ 2010-05-31  5:29 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Flavio Leitner, linux-kernel, Matt Mackall, netdev, bridge,
	Andy Gospodarek, Neil Horman, Jeff Moyer, Stephen Hemminger,
	bonding-devel, David Miller
In-Reply-To: <24802.1275080599@death.nxdomain.ibm.com>

On 05/29/10 05:03, Jay Vosburgh wrote:
> Flavio Leitner<fbl@sysclose.org>  wrote:
>
>> On Fri, May 28, 2010 at 04:16:34PM +0800, Cong Wang wrote:
>>> On 05/28/10 02:05, Flavio Leitner wrote:
>>>>
>>>> Hi guys!
>>>>
>>>> I finally could test this to see if an old problem reported on bugzilla[1] was
>>>> fixed now, but unfortunately it is still there.
>>>>
>>>> The ticket is private I guess, but basically the problem happens when bonding
>>>> driver tries to print something after it had taken the write_lock (monitor
>>>> functions, enslave/de-enslave), so the printk() will pass through netpoll, then
>>>> on bonding again which no matter what mode you use, it will try to read_lock()
>>>> the lock again. The result is a deadlock and the entire system hangs.
>>>>
>>>
>>> Does the attached patch fix this hang?
>>
>> I got another issue now:
>>
>> [   89.523062] bonding: bond0: enslaving eth0 as a backup interface with a down link.
>> [   89.580746] bonding: bond0: enslaving eth2 as a backup interface with a down link.
>> [   91.198527] e1000: eth2 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None
>> [   91.238245] bonding: bond0: link status definitely up for interface eth2.
>>
>> [   91.245381] BUG: scheduling while atomic: bond0/2716/0x10000100
>> [   91.251565] 5 locks held by bond0/2716:
>> [   91.255663]  #0:  ((bond_dev->name)){+.+.+.}, at: [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
>> [   91.265179]  #1:  ((&(&bond->mii_work)->work)){+.+.+.}, at: [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
>> [   91.275554]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff812daf38>] rtnl_lock+0x12/0x14
>> [   91.284018]  #3:  (&bond->lock){++.+.+}, at: [<ffffffffa029e06a>] bond_mii_monitor+0x2a2/0x4ed [bonding]
>> [   91.294230]  #4:  (&bond->curr_slave_lock){+...+.}, at: [<ffffffffa029e239>] bond_mii_monitor+0x471/0x4ed [bonding]
>> [   91.305387] Modules linked in: bonding sunrpc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev parport_pc parport rtc_cmos snd_timer tg3 snd ide_cd_mod i5000_edac i2c_i801 libphy rtc_core rtc_lib edac_core pcspkr e1000 dcdbas uhci_hcd tulip shpchp i2c_core cdrom serio_raw soundcore sg snd_page_alloc raid0 sd_mod button [last unloaded: mperf]
>> [   91.357735] Pid: 2716, comm: bond0 Not tainted 2.6.34-04700-gd938a70-dirty #36
>> [   91.371112] Call Trace:
>> [   91.373825]  [<ffffffff81056002>] ? __debug_show_held_locks+0x22/0x24
>> [   91.380530]  [<ffffffff8102e4a2>] __schedule_bug+0x6d/0x72
>> [   91.386284]  [<ffffffff81363f6e>] schedule+0xc9/0x791
>> [   91.391600]  [<ffffffff81032540>] __cond_resched+0x25/0x30
>> [   91.397350]  [<ffffffff81364757>] _cond_resched+0x27/0x32
>> [   91.403013]  [<ffffffff810ab243>] kmem_cache_alloc+0x2b/0xac
>> [   91.408936]  [<ffffffff812c61fd>] skb_clone+0x42/0x5d
>> [   91.414253]  [<ffffffff812ec696>] netlink_broadcast+0x192/0x369
>> [   91.420436]  [<ffffffff812ecdc3>] nlmsg_notify+0x43/0x89
>> [   91.426012]  [<ffffffff812dabc7>] rtnl_notify+0x2b/0x2d
>> [   91.431501]  [<ffffffff812dacbc>] rtmsg_ifinfo+0xf3/0x118
>> [   91.437165]  [<ffffffff812dad0c>] rtnetlink_event+0x2b/0x2f
>> [   91.443003]  [<ffffffff81369fe4>] notifier_call_chain+0x32/0x5e
>> [   91.449188]  [<ffffffff8104d618>] raw_notifier_call_chain+0xf/0x11
>> [   91.455634]  [<ffffffff812cfc73>] call_netdevice_notifiers+0x45/0x4a
>> [   91.462253]  [<ffffffff812d04f7>] netdev_bonding_change+0x12/0x14
>
> 	This warning is because the notifier call is happening with spin
> locks held.
>
>> [   91.468614]  [<ffffffffa029d589>] bond_select_active_slave+0xe8/0x123 [bonding]
>> [   91.476408]  [<ffffffffa029e241>] bond_mii_monitor+0x479/0x4ed [bonding]
>> [   91.483375]  [<ffffffff81046009>] worker_thread+0x1ef/0x2e2
>> [   91.489212]  [<ffffffff81045fb4>] ? worker_thread+0x19a/0x2e2
>> [   91.495227]  [<ffffffffa029ddc8>] ? bond_mii_monitor+0x0/0x4ed [bonding]
>> [   91.502192]  [<ffffffff81049c71>] ? autoremove_wake_function+0x0/0x34
>> [   91.508897]  [<ffffffff81045e1a>] ? worker_thread+0x0/0x2e2
>> [   91.514734]  [<ffffffff810498bb>] kthread+0x7a/0x82
>> [   91.519878]  [<ffffffff81003714>] kernel_thread_helper+0x4/0x10
>> [   91.526060]  [<ffffffff81366ffc>] ? restore_args+0x0/0x30
>> [   91.531723]  [<ffffffff81049841>] ? kthread+0x0/0x82
>> [   91.536953]  [<ffffffff81003710>] ? kernel_thread_helper+0x0/0x10
>> [   91.543343] bonding: bond0: making interface eth2 the new active one.
>> [   91.550554] bonding: bond0: first active interface up!
>> [   91.556859] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
>>
>>
>> No other patch applied. Just started netconsole over bonding, so no need
>> to pull the cable from slaves. Reproduced twice, one I got the
>> backtrace above, and on the other one the system hangs completely
>> after the BUG: scheduling message.
>>
>> fbl
>>
>>
>>>
>>> Thanks!
>>>
>>> ----------------------->
>>>
>>> We should notify netconsole that bond is changing its slaves
>>> when we use active-backup mode.
>>>
>>> Signed-off-by: WANG Cong<amwang@redhat.com>
>>>
>>> ----
>>>
>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 5e12462..9494c02 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1199,6 +1199,7 @@ void bond_select_active_slave(struct bonding *bond)
>>>
>>>   	best_slave = bond_find_best_slave(bond);
>>>   	if (best_slave != bond->curr_active_slave) {
>>> +		netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
>>>   		bond_change_active_slave(bond, best_slave);
>>>   		rv = bond_set_carrier(bond);
>>>   		if (!rv)
>
> 	You can't do this here; the driver is holding various spin
> locks, and notifier calls can sleep (hence the warning).  If you look at
> the bond_change_active_slave function, it drops all locks other than
> RTNL before making a notifier call, e.g.,
>
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
> {
> [...]
> 	if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
> [...]	
> 			write_unlock_bh(&bond->curr_slave_lock);
> 			read_unlock(&bond->lock);
>
> 			netdev_bonding_change(bond->dev, NETDEV_BONDING_FAILOVER);
>
> 			read_lock(&bond->lock);
> 			write_lock_bh(&bond->curr_slave_lock);
> 		}
>
>
> 	You may be able to add your notifier to this case, or change
> your handler to notice the _FAILOVER notifier.


Thanks for your analysis! Hmm, I think let netconsole to handle
NETDEV_BONDING_FAILOVER here is a better solution.

>
>>> @@ -2154,6 +2155,7 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi
>>>   	    (old_active)&&
>>>   	(new_active->link == BOND_LINK_UP)&&
>>>   	IS_UP(new_active->dev)) {
>>> +		netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
>>>   		write_lock_bh(&bond->curr_slave_lock);
>>>   		bond_change_active_slave(bond, new_active);
>>>   		write_unlock_bh(&bond->curr_slave_lock);
>
> 	This case will have the same problem, but will only be hit if a
> user does a manual "ifenslave -c bond0 ethX".
>
> 	You also probably wanted to do the sysfs path, but if the
> notifier goes into the change_active_slave function itself, then I don't
> think additional notifications would be necessary.
>

Okay, sounds above solution should also handle this case.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox