Netdev List
 help / color / mirror / Atom feed
* [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers
From: Tejun Heo @ 2010-05-30 20:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100530112925.GB27611@redhat.com>

Apply the cpumask and cgroup of the initializing task to the created
vhost poller.

Based on Sridhar Samudrala's patch.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
---
 drivers/vhost/vhost.c |   36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

Index: work/drivers/vhost/vhost.c
===================================================================
--- work.orig/drivers/vhost/vhost.c
+++ work/drivers/vhost/vhost.c
@@ -23,6 +23,7 @@
 #include <linux/highmem.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/cgroup.h>

 #include <linux/net.h>
 #include <linux/if_packet.h>
@@ -176,12 +177,30 @@ repeat:
 long vhost_dev_init(struct vhost_dev *dev,
 		    struct vhost_virtqueue *vqs, int nvqs)
 {
-	struct task_struct *poller;
-	int i;
+	struct task_struct *poller = NULL;
+	cpumask_var_t mask;
+	int i, ret = -ENOMEM;
+
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		goto out;

 	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
-	if (IS_ERR(poller))
-		return PTR_ERR(poller);
+	if (IS_ERR(poller)) {
+		ret = PTR_ERR(poller);
+		goto out;
+	}
+
+	ret = sched_getaffinity(current->pid, mask);
+	if (ret)
+		goto out;
+
+	ret = sched_setaffinity(poller->pid, mask);
+	if (ret)
+		goto out;
+
+	ret = cgroup_attach_task_current_cg(poller);
+	if (ret)
+		goto out;

 	dev->vqs = vqs;
 	dev->nvqs = nvqs;
@@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
 			vhost_poll_init(&dev->vqs[i].poll,
 					dev->vqs[i].handle_kick, POLLIN, dev);
 	}
-	return 0;
+
+	wake_up_process(poller);	/* avoid contributing to loadavg */
+	ret = 0;
+out:
+	if (ret)
+		kthread_stop(poller);
+	free_cpumask_var(mask);
+	return ret;
 }

 /* Caller should have device mutex */

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Julia Lawall @ 2010-05-30 20:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <1275250288.2472.21.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2442 bytes --]

On Sun, 30 May 2010, Eric Dumazet wrote:

> Le dimanche 30 mai 2010 à 21:48 +0200, Julia Lawall a écrit :
> > From: Julia Lawall <julia@diku.dk>
> > 
> > A spin lock is taken near the beginning of the enclosing function.
> > 
> > The semantic patch that makes this change is as follows:
> > (http://coccinelle.lip6.fr/)
> > 
> > // <smpl>
> > @@
> > @@
> > 
> > spin_lock(...)
> > ... when != spin_unlock(...)
> > -GFP_KERNEL
> > +GFP_ATOMIC
> > // </smpl>
> > 
> > Signed-off-by: Julia Lawall <julia@diku.dk>
> > 
> > ---
> >  net/ipv6/sit.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff -u -p a/net/ipv6/sit.c b/net/ipv6/sit.c
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -358,7 +358,7 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t
> >  		goto out;
> >  	}
> >  
> > -	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> > +	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_ATOMIC);
> >  	if (!p) {
> >  		err = -ENOBUFS;
> >  		goto out;
> 
> Nice catch, but what about allocating this outside of the locked
> section ?

I think the proposed patch does not work, because the for loop overwrites 
p.  That use of p looks like it is completely local to the for loop, so 
perhaps a new variable p1 could be added to be used there?

julia

> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index e51e650..ff3dd84 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -340,6 +340,10 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
>  	if (a->addr == htonl(INADDR_ANY))
>  		return -EINVAL;
>  
> +	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> +	if (!p)
> +		return -ENOBUFS;
> +
>  	spin_lock(&ipip6_prl_lock);
>  
>  	for (p = t->prl; p; p = p->next) {
> @@ -358,19 +362,16 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
>  		goto out;
>  	}
>  
> -	p = kzalloc(sizeof(struct ip_tunnel_prl_entry), GFP_KERNEL);
> -	if (!p) {
> -		err = -ENOBUFS;
> -		goto out;
> -	}
>  
>  	p->next = t->prl;
>  	p->addr = a->addr;
>  	p->flags = a->flags;
>  	t->prl_count++;
>  	rcu_assign_pointer(t->prl, p);
> +	p = NULL;
>  out:
>  	spin_unlock(&ipip6_prl_lock);
> +	kfree(p);
>  	return err;
>  }
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Eric Dumazet @ 2010-05-30 20:55 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302248500.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 22:50 +0200, Julia Lawall a écrit :

> I think the proposed patch does not work, because the for loop overwrites 
> p.  That use of p looks like it is completely local to the for loop, so 
> perhaps a new variable p1 could be added to be used there?

Please do so.

I just wanted to tell you changing GFP_KERNEL to GFP_ATOMIC is not an
appropriate way to solve this kind of problems. My patch was to get an
idea, not a full and tested patch :)

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Julia Lawall @ 2010-05-30 21:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <1275252912.2472.23.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1286 bytes --]

On Sun, 30 May 2010, Eric Dumazet wrote:

> Le dimanche 30 mai 2010 à 22:50 +0200, Julia Lawall a écrit :
> 
> > I think the proposed patch does not work, because the for loop overwrites 
> > p.  That use of p looks like it is completely local to the for loop, so 
> > perhaps a new variable p1 could be added to be used there?
> 
> Please do so.
> 
> I just wanted to tell you changing GFP_KERNEL to GFP_ATOMIC is not an
> appropriate way to solve this kind of problems. My patch was to get an
> idea, not a full and tested patch :)

Looking at it again, there is still a problem, because in the original 
code, the loop:

        for (p = t->prl; p; p = p->next) {
                if (p->addr == a->addr) {
                        if (chg) {
                                p->flags = a->flags;
                                goto out;
                        }
                        err = -EEXIST;
                        goto out;
                }
        }

could exit with success without the kzalloc ever being called.  If the 
kzalloc is moved up, it could fail and then it returns immediately without 
executing the loop.  A solution could be to leave the NULL test on p where 
it is, and only move up the kzalloc.  Or perhaps the change in behavior 
doesn't matter?

julia

^ permalink raw reply

* Re: Subject: [PATCH] net/ipv6: Use GFP_ATOMIC when a lock is held
From: Eric Dumazet @ 2010-05-30 21:18 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302303180.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 23:09 +0200, Julia Lawall a écrit :

> could exit with success without the kzalloc ever being called.  If the 
> kzalloc is moved up, it could fail and then it returns immediately without 
> executing the loop.  A solution could be to leave the NULL test on p where 
> it is, and only move up the kzalloc.  Or perhaps the change in behavior 
> doesn't matter?

If a GFP_KERNEL allocation fails, we are in a big trouble anyway :)

GFP_ATOMIC are more problematic in this area :)

^ permalink raw reply

* Re: [PATCH v2] act_nat: fix the wrong checksum when addr isn't in old_addr/mask
From: Herbert Xu @ 2010-05-30 21:53 UTC (permalink / raw)
  To: Changli Gao; +Cc: jamal, David S. Miller, netdev
In-Reply-To: <AANLkTinDe-AluGZx87q3nvxCfushfeH0jS35Fav95IXk@mail.gmail.com>

On Sun, May 30, 2010 at 09:33:22PM +0800, Changli Gao wrote:
>
> Thinking about this topologic:
> 
> client -> DNAT -> router -> server.
> 
> DNAT is used to map a public IP to server's private IP. If a
> DEST_UNREACH ICMP packet is sent out by router, in order to handle
> this ICMP packet correctly, I have to pass it to act_nat.c. How can I
> filter out the other packets? By inspecting the inner IP destination
> address of this ICMP packet? Maybe I can use u32 with complicate
> parameters.

You should filter out all the non-ICMP packets, just like you
do here.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: David Miller @ 2010-05-30 22:34 UTC (permalink / raw)
  To: jeff; +Cc: romieu, netdev
In-Reply-To: <4C02A1F7.6030701@garzik.org>

From: Jeff Garzik <jeff@garzik.org>
Date: Sun, 30 May 2010 13:35:51 -0400

> Have you verified this matches all architectures definition of
> readl()?

Jeff, when you come out of hiding after months if not years
of not reviewing network driver changes, could you provide
some useful commentary instead of some trite stuff like this?

It does match, that's why I told this person to write these patches.
And if you have been following the thread where we discussed this, you
wouldn't feel the need to ask this question about these two patches.

And if it doesn't match, that's an arch bug which should be fixed and
in any event there is only one possibility of a non-match and that is
if the routine returns "unsigned long"

Which, surprise surprise Jeff, retains current behavior!

So there is no risk whatsoever possible from this change.

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: Jeff Garzik @ 2010-05-30 23:24 UTC (permalink / raw)
  To: David Miller; +Cc: romieu, netdev
In-Reply-To: <20100530.153451.193700815.davem@davemloft.net>

On 05/30/2010 06:34 PM, David Miller wrote:
> From: Jeff Garzik<jeff@garzik.org>
> Date: Sun, 30 May 2010 13:35:51 -0400
>
>> Have you verified this matches all architectures definition of
>> readl()?

> And if it doesn't match, that's an arch bug which should be fixed and
> in any event there is only one possibility of a non-match and that is
> if the routine returns "unsigned long"

That was the genesis of the question.  Some arches still use unsigned long.

	Jeff




^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: Junchang Wang @ 2010-05-31  0:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: davem, romieu, netdev
In-Reply-To: <4C02A1F7.6030701@garzik.org>

On Sun, May 30, 2010 at 01:35:51PM -0400, Jeff Garzik wrote:
>
>Have you verified this matches all architectures definition of readl()?
>

Hi Jeff,

Thanks for your question. Just browsed the kernel. ioread32() returns either 
unsigned int or u32 in all arches. There is no arch that uses unsigned long 
or something else.

Secondly, There's a bug if an arch returns unsigned long. What happen when 
programmers invoke sizeof(ioread32()) on 64-bit platforms?

--Junchang

^ permalink raw reply

* Re: [audit] Suppress runtime loading of audit module.
From: James Morris @ 2010-05-31  0:27 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Audit-ML, linux-security-module, netdev
In-Reply-To: <201005242057.EBJ00004.JOOFVHLFFQOStM@I-love.SAKURA.ne.jp>

Might be best for the networking folk to look at this.



On Mon, 24 May 2010, Tetsuo Handa wrote:

> I noticed that request_module("net-pf-16-proto-9") is issued whenever 'Enter'
> key is pressed if CONFIG_AUDIT=n and console is /dev/tty0 . net-pf-16-proto-9
> is known as audit but CONFIG_AUDIT is "bool". Therefore, trying to load
> net-pf-16-proto-9 at runtime does not make sense.
> 
> Call trace obtained by inserting WARN_ON(protocol == 9);
> ------------[ cut here ]------------
> WARNING: at net/netlink/af_netlink.c:450 netlink_create+0x1cd/0x1e0()
> Hardware name: VMware Virtual Platform
> Modules linked in: mptspi mptscsih mptbase scsi_transport_spi ext3 jbd mbcache
> Pid: 1, comm: bash Not tainted 2.6.34 #10
> Call Trace:
>  [<c06227ad>] ? netlink_create+0x1cd/0x1e0
>  [<c043c9cc>] warn_slowpath_common+0x7c/0xa0
>  [<c06227ad>] ? netlink_create+0x1cd/0x1e0
>  [<c043ca05>] warn_slowpath_null+0x15/0x20
>  [<c06227ad>] netlink_create+0x1cd/0x1e0
>  [<c0600790>] ? __sock_create+0xe0/0x240
>  [<c06225e0>] ? netlink_create+0x0/0x1e0
>  [<c06007b8>] __sock_create+0x108/0x240
>  [<c060072e>] ? __sock_create+0x7e/0x240
>  [<c060095a>] sock_create+0x3a/0x50
>  [<c0600ae6>] sys_socket+0x36/0x60
>  [<c0601f69>] sys_socketcall+0x89/0x290
>  [<c0402b83>] ? sysenter_exit+0xf/0x18
>  [<c0524db4>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c0402b50>] sysenter_do_call+0x12/0x36
> ---[ end trace 4da698b4c0bf1613 ]---
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  net/netlink/af_netlink.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-2.6.34.orig/net/netlink/af_netlink.c
> +++ linux-2.6.34/net/netlink/af_netlink.c
> @@ -446,7 +446,7 @@ static int netlink_create(struct net *ne
>  
>  	netlink_lock_table();
>  #ifdef CONFIG_MODULES
> -	if (!nl_table[protocol].registered) {
> +	if (!nl_table[protocol].registered && protocol != 9) {
>  		netlink_unlock_table();
>  		request_module("net-pf-%d-proto-%d", PF_NETLINK, protocol);
>  		netlink_lock_table();
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup
From: Li Zefan @ 2010-05-31  1:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michael S. Tsirkin, Oleg Nesterov, Sridhar Samudrala, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C987.6020805@kernel.org>

04:24, Tejun Heo wrote:
> From: Sridhar Samudrala <samudrala.sridhar@gmail.com>
> 
> Add a new kernel API to attach a task to current task's cgroup
> in all the active hierarchies.
> 
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

btw: you lost the reviewed-by tag given by Paul Menage.

^ permalink raw reply

* Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers
From: Li Zefan @ 2010-05-31  1:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michael S. Tsirkin, Oleg Nesterov, Sridhar Samudrala, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C99D.9070204@kernel.org>

Tejun Heo wrote:
> Apply the cpumask and cgroup of the initializing task to the created
> vhost poller.
> 
> Based on Sridhar Samudrala's patch.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
> ---
>  drivers/vhost/vhost.c |   36 +++++++++++++++++++++++++++++++-----
>  1 file changed, 31 insertions(+), 5 deletions(-)
> 
> Index: work/drivers/vhost/vhost.c
> ===================================================================
> --- work.orig/drivers/vhost/vhost.c
> +++ work/drivers/vhost/vhost.c
> @@ -23,6 +23,7 @@
>  #include <linux/highmem.h>
>  #include <linux/slab.h>
>  #include <linux/kthread.h>
> +#include <linux/cgroup.h>
> 
>  #include <linux/net.h>
>  #include <linux/if_packet.h>
> @@ -176,12 +177,30 @@ repeat:
>  long vhost_dev_init(struct vhost_dev *dev,
>  		    struct vhost_virtqueue *vqs, int nvqs)
>  {
> -	struct task_struct *poller;
> -	int i;
> +	struct task_struct *poller = NULL;
> +	cpumask_var_t mask;
> +	int i, ret = -ENOMEM;
> +
> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> +		goto out;
> 

If we "goto out", we will end up calling kthread_stop(poller), but
seems kthread_stop() requires the task_struct pointer != NULL.

>  	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
> -	if (IS_ERR(poller))
> -		return PTR_ERR(poller);
> +	if (IS_ERR(poller)) {
> +		ret = PTR_ERR(poller);
> +		goto out;
> +	}
> +
> +	ret = sched_getaffinity(current->pid, mask);
> +	if (ret)
> +		goto out;
> +
> +	ret = sched_setaffinity(poller->pid, mask);
> +	if (ret)
> +		goto out;
> +
> +	ret = cgroup_attach_task_current_cg(poller);
> +	if (ret)
> +		goto out;
> 
>  	dev->vqs = vqs;
>  	dev->nvqs = nvqs;
> @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
>  			vhost_poll_init(&dev->vqs[i].poll,
>  					dev->vqs[i].handle_kick, POLLIN, dev);
>  	}
> -	return 0;
> +
> +	wake_up_process(poller);	/* avoid contributing to loadavg */
> +	ret = 0;
> +out:
> +	if (ret)
> +		kthread_stop(poller);
> +	free_cpumask_var(mask);
> +	return ret;
>  }
> 
>  /* Caller should have device mutex */

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: David Miller @ 2010-05-31  1:29 UTC (permalink / raw)
  To: jeff; +Cc: romieu, netdev
In-Reply-To: <4C02F3A2.90100@garzik.org>

From: Jeff Garzik <jeff@garzik.org>
Date: Sun, 30 May 2010 19:24:18 -0400

> That was the genesis of the question.  Some arches still use unsigned
> long.

They are 32-bit.

^ permalink raw reply

* Re: [Patch]8139too: remove unnecessary cast of ioread32()'s return value
From: David Miller @ 2010-05-31  1:35 UTC (permalink / raw)
  To: jeff; +Cc: romieu, netdev
In-Reply-To: <20100530.182948.71104948.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Sun, 30 May 2010 18:29:48 -0700 (PDT)

> From: Jeff Garzik <jeff@garzik.org>
> Date: Sun, 30 May 2010 19:24:18 -0400
> 
>> That was the genesis of the question.  Some arches still use unsigned
>> long.
> 
> They are 32-bit.

In fact the only two offenders are h8300 and m32r, which are
both 32-bit.

This is really in the realm of "who cares."

^ permalink raw reply

* [PATCH] cls_u32: use skb_copy_bits() to dereference data safely
From: Changli Gao @ 2010-05-31  2:24 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David S. Miller, netdev, Changli Gao

use skb_copy_bits() to dereference data safely

the original skb->data dereference isn't safe, as there isn't any skb->len or
skb_is_nonlinear() check. skb_copy_bits() is used instead in this patch. And
when the skb isn't long enough, we terminate the function u32_classify()
immediately with -1.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/sched/cls_u32.c |   40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 9627542..db35197 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -98,11 +98,11 @@ static int u32_classify(struct sk_buff *skb, struct tcf_proto *tp, struct tcf_re
 {
 	struct {
 		struct tc_u_knode *knode;
-		u8		  *ptr;
+		unsigned int	  off;
 	} stack[TC_U32_MAXDEPTH];
 
 	struct tc_u_hnode *ht = (struct tc_u_hnode*)tp->root;
-	u8 *ptr = skb_network_header(skb);
+	unsigned int off = skb_network_header(skb) - skb->data;
 	struct tc_u_knode *n;
 	int sdepth = 0;
 	int off2 = 0;
@@ -134,8 +134,13 @@ next_knode:
 #endif
 
 		for (i = n->sel.nkeys; i>0; i--, key++) {
+			unsigned int toff;
+			__be32 data;
 
-			if ((*(__be32*)(ptr+key->off+(off2&key->offmask))^key->val)&key->mask) {
+			toff = off + key->off + (off2 & key->offmask);
+			if (skb_copy_bits(skb, toff, &data, 4))
+				goto out;
+			if ((data ^ key->val) & key->mask) {
 				n = n->next;
 				goto next_knode;
 			}
@@ -174,29 +179,41 @@ check_terminal:
 		if (sdepth >= TC_U32_MAXDEPTH)
 			goto deadloop;
 		stack[sdepth].knode = n;
-		stack[sdepth].ptr = ptr;
+		stack[sdepth].off = off;
 		sdepth++;
 
 		ht = n->ht_down;
 		sel = 0;
-		if (ht->divisor)
-			sel = ht->divisor&u32_hash_fold(*(__be32*)(ptr+n->sel.hoff), &n->sel,n->fshift);
+		if (ht->divisor) {
+			__be32 data;
 
+			if (skb_copy_bits(skb, off + n->sel.hoff, &data, 4))
+				goto out;
+			sel = ht->divisor & u32_hash_fold(data, &n->sel,
+							  n->fshift);
+		}
 		if (!(n->sel.flags&(TC_U32_VAROFFSET|TC_U32_OFFSET|TC_U32_EAT)))
 			goto next_ht;
 
 		if (n->sel.flags&(TC_U32_OFFSET|TC_U32_VAROFFSET)) {
 			off2 = n->sel.off + 3;
-			if (n->sel.flags&TC_U32_VAROFFSET)
-				off2 += ntohs(n->sel.offmask & *(__be16*)(ptr+n->sel.offoff)) >>n->sel.offshift;
+			if (n->sel.flags & TC_U32_VAROFFSET) {
+				__be16 data;
+
+				if (skb_copy_bits(skb, off + n->sel.offoff,
+						  &data, 2))
+					goto out;
+				off2 += ntohs(n->sel.offmask & data) >>
+					n->sel.offshift;
+			}
 			off2 &= ~3;
 		}
 		if (n->sel.flags&TC_U32_EAT) {
-			ptr += off2;
+			off += off2;
 			off2 = 0;
 		}
 
-		if (ptr < skb_tail_pointer(skb))
+		if (off < skb->len)
 			goto next_ht;
 	}
 
@@ -204,9 +221,10 @@ check_terminal:
 	if (sdepth--) {
 		n = stack[sdepth].knode;
 		ht = n->ht_up;
-		ptr = stack[sdepth].ptr;
+		off = stack[sdepth].off;
 		goto check_terminal;
 	}
+out:
 	return -1;
 
 deadloop:

^ permalink raw reply related

* [PATCH] ipconfig: send host-name in DHCP requests
From: Wu Fengguang @ 2010-05-31  3:19 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: LKML, Andi Kleen

Normally dhclient can be configured to send the "host-name" option
in DHCP requests to update the client's DNS record. However for an
NFSROOT system, dhclient shall never be called (which may change the
IP addr and therefore lose your root NFS mount connection).

So enable updating the DNS record with kernel parameter

	ip=::::$HOST_NAME::dhcp

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 net/ipv4/ipconfig.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 067ce9e..db54343 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -665,6 +665,13 @@ ic_dhcp_init_options(u8 *options)
 		memcpy(e, ic_req_params, sizeof(ic_req_params));
 		e += sizeof(ic_req_params);
 
+		if (ic_host_name_set) {
+			*e++ = 12;	/* host-name */
+			len = strlen(utsname()->nodename);
+			*e++ = len;
+			memcpy(e, utsname()->nodename, len);
+			e += len;
+		}
 		if (*vendor_class_identifier) {
 			printk(KERN_INFO "DHCP: sending class identifier \"%s\"\n",
 			       vendor_class_identifier);
-- 
1.6.6

^ permalink raw reply related

* Re: [PATCH] bnx2: Fix IRQ failures during kdump.
From: Michael Chan @ 2010-05-31  4:43 UTC (permalink / raw)
  To: 'Andi Kleen'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	'linux-pci@vger.kernel.org'
In-Reply-To: <20100530173014.GB14556@basil.fritz.box>

Andi Kleen wrote:

> On Sun, May 30, 2010 at 09:12:15AM -0700, Michael Chan wrote:
> > Andi Kleen wrote:
> >
> > > "Michael Chan" <mchan@broadcom.com> writes:
> > >
> > > > When switching from the crashed kernel to the kdump kernel
> without
> > > going
> > > > through PCI reset, IRQs may not work if a different IRQ mode is
> used
> > > on
> > >
> > > PCIe with AER actually does support per link root port reset
> > > (e.g. used for AER)
> >
> > Do you mean the slot_reset function in the pci_error_handlers?  This
> 
> Well the fallback code in the PCIE root port driver
> that does the actual resets.

aer_root_reset() in aerdrv.c?

> 
> It could be called directly before kexec.
> 
> > needs to be called in the context of the crashed kernel, right?
> 
> It could be done on kexec, however of course you would rely
> on PCI root port data structures still being intact on a crash
> (I guess that's reasonable, they are not very complicated)
> 
> >
> > >
> > > I've been wondering for some time if kexec should not simply
> > > use that to reset all the devices, instead of addings hacks
> > > around this to all drivers.
> > >
> > > That would fix your problems too, right?
> >
> > If it is called in the context of the crashed kernel, it won't work.
> > We would reset it and put in back into the same IRQ mode.
> 
> Who would put it back? Your driver wouldn't be called anymore.

The bnx2 driver like many other drivers has a slot_reset function in the
pci_driver struct's err_handler.  If the AER code calls this function,
we would reset the chip and put it back to the same IRQ mode.  Without
calling this per driver reset function, I'm not sure if you can reset
the device if the device does not support Function Level Reset.

> 
> >
> > >
> > > The question is just if AER is widely enough supported for this.
> > >
> >
> > Some newer PCIe devices support Function Level Reset, and that would
> > be ideal.  But most existing devices including bnx2 devices don't
> have
> > this feature.
> 
> Root port reset should be fine for this case. Even if some
> innocent device on the same root port gets reset too that shouldn't
> matter.
> Only drawback for the NIC would be that you have to renegotiate links I
> think.
> 
> Also there are systems without AER support.
> 
> -Andi
> --
> ak@linux.intel.com -- Speaking for myself only.



^ permalink raw reply

* [PATCH] ipv6: get rid of ipip6_prl_lock
From: Eric Dumazet @ 2010-05-31  5:04 UTC (permalink / raw)
  To: Julia Lawall
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005302303180.19253@ask.diku.dk>

Le dimanche 30 mai 2010 à 23:09 +0200, Julia Lawall a écrit :
> On Sun, 30 May 2010, Eric Dumazet wrote:
> 
> > Le dimanche 30 mai 2010 à 22:50 +0200, Julia Lawall a écrit :
> > 
> > > I think the proposed patch does not work, because the for loop overwrites 
> > > p.  That use of p looks like it is completely local to the for loop, so 
> > > perhaps a new variable p1 could be added to be used there?
> > 
> > Please do so.
> > 
> > I just wanted to tell you changing GFP_KERNEL to GFP_ATOMIC is not an
> > appropriate way to solve this kind of problems. My patch was to get an
> > idea, not a full and tested patch :)
> 
> Looking at it again, there is still a problem, because in the original 
> code, the loop:
> 
...
> 
> could exit with success without the kzalloc ever being called.  If the 
> kzalloc is moved up, it could fail and then it returns immediately without 
> executing the loop.  A solution could be to leave the NULL test on p where 
> it is, and only move up the kzalloc.  Or perhaps the change in behavior 
> doesn't matter?
> 


[PATCH] ipv6: get rid of ipip6_prl_lock

As noticed by Julia Lawall, ipip6_tunnel_add_prl() incorrectly calls 
kzallloc(..., GFP_KERNEL) while a spinlock is held. He provided
a patch to use GFP_ATOMIC instead.

One possibility would be to convert this spinlock to a mutex, or
preallocate the thing before taking the lock.

After RCU conversion, it appears we dont need this lock, since 
caller already holds RTNL

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/sit.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index e51e650..702c532 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -249,8 +249,6 @@ failed:
 	return NULL;
 }
 
-static DEFINE_SPINLOCK(ipip6_prl_lock);
-
 #define for_each_prl_rcu(start)			\
 	for (prl = rcu_dereference(start);	\
 	     prl;				\
@@ -340,7 +338,7 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
 	if (a->addr == htonl(INADDR_ANY))
 		return -EINVAL;
 
-	spin_lock(&ipip6_prl_lock);
+	ASSERT_RTNL();
 
 	for (p = t->prl; p; p = p->next) {
 		if (p->addr == a->addr) {
@@ -370,7 +368,6 @@ ipip6_tunnel_add_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a, int chg)
 	t->prl_count++;
 	rcu_assign_pointer(t->prl, p);
 out:
-	spin_unlock(&ipip6_prl_lock);
 	return err;
 }
 
@@ -397,7 +394,7 @@ ipip6_tunnel_del_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a)
 	struct ip_tunnel_prl_entry *x, **p;
 	int err = 0;
 
-	spin_lock(&ipip6_prl_lock);
+	ASSERT_RTNL();
 
 	if (a && a->addr != htonl(INADDR_ANY)) {
 		for (p = &t->prl; *p; p = &(*p)->next) {
@@ -419,7 +416,6 @@ ipip6_tunnel_del_prl(struct ip_tunnel *t, struct ip_tunnel_prl *a)
 		}
 	}
 out:
-	spin_unlock(&ipip6_prl_lock);
 	return err;
 }
 

^ permalink raw reply related

* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Cong Wang @ 2010-05-31  5:29 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Flavio Leitner, linux-kernel, Matt Mackall, netdev, bridge,
	Andy Gospodarek, Neil Horman, Jeff Moyer, Stephen Hemminger,
	bonding-devel, David Miller
In-Reply-To: <24802.1275080599@death.nxdomain.ibm.com>

On 05/29/10 05:03, Jay Vosburgh wrote:
> Flavio Leitner<fbl@sysclose.org>  wrote:
>
>> On Fri, May 28, 2010 at 04:16:34PM +0800, Cong Wang wrote:
>>> On 05/28/10 02:05, Flavio Leitner wrote:
>>>>
>>>> Hi guys!
>>>>
>>>> I finally could test this to see if an old problem reported on bugzilla[1] was
>>>> fixed now, but unfortunately it is still there.
>>>>
>>>> The ticket is private I guess, but basically the problem happens when bonding
>>>> driver tries to print something after it had taken the write_lock (monitor
>>>> functions, enslave/de-enslave), so the printk() will pass through netpoll, then
>>>> on bonding again which no matter what mode you use, it will try to read_lock()
>>>> the lock again. The result is a deadlock and the entire system hangs.
>>>>
>>>
>>> Does the attached patch fix this hang?
>>
>> I got another issue now:
>>
>> [   89.523062] bonding: bond0: enslaving eth0 as a backup interface with a down link.
>> [   89.580746] bonding: bond0: enslaving eth2 as a backup interface with a down link.
>> [   91.198527] e1000: eth2 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None
>> [   91.238245] bonding: bond0: link status definitely up for interface eth2.
>>
>> [   91.245381] BUG: scheduling while atomic: bond0/2716/0x10000100
>> [   91.251565] 5 locks held by bond0/2716:
>> [   91.255663]  #0:  ((bond_dev->name)){+.+.+.}, at: [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
>> [   91.265179]  #1:  ((&(&bond->mii_work)->work)){+.+.+.}, at: [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
>> [   91.275554]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff812daf38>] rtnl_lock+0x12/0x14
>> [   91.284018]  #3:  (&bond->lock){++.+.+}, at: [<ffffffffa029e06a>] bond_mii_monitor+0x2a2/0x4ed [bonding]
>> [   91.294230]  #4:  (&bond->curr_slave_lock){+...+.}, at: [<ffffffffa029e239>] bond_mii_monitor+0x471/0x4ed [bonding]
>> [   91.305387] Modules linked in: bonding sunrpc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev parport_pc parport rtc_cmos snd_timer tg3 snd ide_cd_mod i5000_edac i2c_i801 libphy rtc_core rtc_lib edac_core pcspkr e1000 dcdbas uhci_hcd tulip shpchp i2c_core cdrom serio_raw soundcore sg snd_page_alloc raid0 sd_mod button [last unloaded: mperf]
>> [   91.357735] Pid: 2716, comm: bond0 Not tainted 2.6.34-04700-gd938a70-dirty #36
>> [   91.371112] Call Trace:
>> [   91.373825]  [<ffffffff81056002>] ? __debug_show_held_locks+0x22/0x24
>> [   91.380530]  [<ffffffff8102e4a2>] __schedule_bug+0x6d/0x72
>> [   91.386284]  [<ffffffff81363f6e>] schedule+0xc9/0x791
>> [   91.391600]  [<ffffffff81032540>] __cond_resched+0x25/0x30
>> [   91.397350]  [<ffffffff81364757>] _cond_resched+0x27/0x32
>> [   91.403013]  [<ffffffff810ab243>] kmem_cache_alloc+0x2b/0xac
>> [   91.408936]  [<ffffffff812c61fd>] skb_clone+0x42/0x5d
>> [   91.414253]  [<ffffffff812ec696>] netlink_broadcast+0x192/0x369
>> [   91.420436]  [<ffffffff812ecdc3>] nlmsg_notify+0x43/0x89
>> [   91.426012]  [<ffffffff812dabc7>] rtnl_notify+0x2b/0x2d
>> [   91.431501]  [<ffffffff812dacbc>] rtmsg_ifinfo+0xf3/0x118
>> [   91.437165]  [<ffffffff812dad0c>] rtnetlink_event+0x2b/0x2f
>> [   91.443003]  [<ffffffff81369fe4>] notifier_call_chain+0x32/0x5e
>> [   91.449188]  [<ffffffff8104d618>] raw_notifier_call_chain+0xf/0x11
>> [   91.455634]  [<ffffffff812cfc73>] call_netdevice_notifiers+0x45/0x4a
>> [   91.462253]  [<ffffffff812d04f7>] netdev_bonding_change+0x12/0x14
>
> 	This warning is because the notifier call is happening with spin
> locks held.
>
>> [   91.468614]  [<ffffffffa029d589>] bond_select_active_slave+0xe8/0x123 [bonding]
>> [   91.476408]  [<ffffffffa029e241>] bond_mii_monitor+0x479/0x4ed [bonding]
>> [   91.483375]  [<ffffffff81046009>] worker_thread+0x1ef/0x2e2
>> [   91.489212]  [<ffffffff81045fb4>] ? worker_thread+0x19a/0x2e2
>> [   91.495227]  [<ffffffffa029ddc8>] ? bond_mii_monitor+0x0/0x4ed [bonding]
>> [   91.502192]  [<ffffffff81049c71>] ? autoremove_wake_function+0x0/0x34
>> [   91.508897]  [<ffffffff81045e1a>] ? worker_thread+0x0/0x2e2
>> [   91.514734]  [<ffffffff810498bb>] kthread+0x7a/0x82
>> [   91.519878]  [<ffffffff81003714>] kernel_thread_helper+0x4/0x10
>> [   91.526060]  [<ffffffff81366ffc>] ? restore_args+0x0/0x30
>> [   91.531723]  [<ffffffff81049841>] ? kthread+0x0/0x82
>> [   91.536953]  [<ffffffff81003710>] ? kernel_thread_helper+0x0/0x10
>> [   91.543343] bonding: bond0: making interface eth2 the new active one.
>> [   91.550554] bonding: bond0: first active interface up!
>> [   91.556859] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
>>
>>
>> No other patch applied. Just started netconsole over bonding, so no need
>> to pull the cable from slaves. Reproduced twice, one I got the
>> backtrace above, and on the other one the system hangs completely
>> after the BUG: scheduling message.
>>
>> fbl
>>
>>
>>>
>>> Thanks!
>>>
>>> ----------------------->
>>>
>>> We should notify netconsole that bond is changing its slaves
>>> when we use active-backup mode.
>>>
>>> Signed-off-by: WANG Cong<amwang@redhat.com>
>>>
>>> ----
>>>
>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 5e12462..9494c02 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1199,6 +1199,7 @@ void bond_select_active_slave(struct bonding *bond)
>>>
>>>   	best_slave = bond_find_best_slave(bond);
>>>   	if (best_slave != bond->curr_active_slave) {
>>> +		netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
>>>   		bond_change_active_slave(bond, best_slave);
>>>   		rv = bond_set_carrier(bond);
>>>   		if (!rv)
>
> 	You can't do this here; the driver is holding various spin
> locks, and notifier calls can sleep (hence the warning).  If you look at
> the bond_change_active_slave function, it drops all locks other than
> RTNL before making a notifier call, e.g.,
>
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
> {
> [...]
> 	if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
> [...]	
> 			write_unlock_bh(&bond->curr_slave_lock);
> 			read_unlock(&bond->lock);
>
> 			netdev_bonding_change(bond->dev, NETDEV_BONDING_FAILOVER);
>
> 			read_lock(&bond->lock);
> 			write_lock_bh(&bond->curr_slave_lock);
> 		}
>
>
> 	You may be able to add your notifier to this case, or change
> your handler to notice the _FAILOVER notifier.


Thanks for your analysis! Hmm, I think let netconsole to handle
NETDEV_BONDING_FAILOVER here is a better solution.

>
>>> @@ -2154,6 +2155,7 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi
>>>   	    (old_active)&&
>>>   	(new_active->link == BOND_LINK_UP)&&
>>>   	IS_UP(new_active->dev)) {
>>> +		netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
>>>   		write_lock_bh(&bond->curr_slave_lock);
>>>   		bond_change_active_slave(bond, new_active);
>>>   		write_unlock_bh(&bond->curr_slave_lock);
>
> 	This case will have the same problem, but will only be hit if a
> user does a manual "ifenslave -c bond0 ethX".
>
> 	You also probably wanted to do the sysfs path, but if the
> notifier goes into the change_active_slave function itself, then I don't
> think additional notifications would be necessary.
>

Okay, sounds above solution should also handle this case.

Thanks.

^ permalink raw reply

* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Cong Wang @ 2010-05-31  5:37 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Flavio Leitner, linux-kernel, Matt Mackall, netdev, bridge,
	Andy Gospodarek, Neil Horman, Jeff Moyer, Stephen Hemminger,
	bonding-devel, David Miller
In-Reply-To: <4C03494B.1010802@redhat.com>

On 05/31/10 13:29, Cong Wang wrote:
> On 05/29/10 05:03, Jay Vosburgh wrote:
>> Flavio Leitner<fbl@sysclose.org> wrote:
>>
>>> On Fri, May 28, 2010 at 04:16:34PM +0800, Cong Wang wrote:
>>>> On 05/28/10 02:05, Flavio Leitner wrote:
>>>>>
>>>>> Hi guys!
>>>>>
>>>>> I finally could test this to see if an old problem reported on
>>>>> bugzilla[1] was
>>>>> fixed now, but unfortunately it is still there.
>>>>>
>>>>> The ticket is private I guess, but basically the problem happens
>>>>> when bonding
>>>>> driver tries to print something after it had taken the write_lock
>>>>> (monitor
>>>>> functions, enslave/de-enslave), so the printk() will pass through
>>>>> netpoll, then
>>>>> on bonding again which no matter what mode you use, it will try to
>>>>> read_lock()
>>>>> the lock again. The result is a deadlock and the entire system hangs.
>>>>>
>>>>
>>>> Does the attached patch fix this hang?
>>>
>>> I got another issue now:
>>>
>>> [ 89.523062] bonding: bond0: enslaving eth0 as a backup interface
>>> with a down link.
>>> [ 89.580746] bonding: bond0: enslaving eth2 as a backup interface
>>> with a down link.
>>> [ 91.198527] e1000: eth2 NIC Link is Up 100 Mbps Half Duplex, Flow
>>> Control: None
>>> [ 91.238245] bonding: bond0: link status definitely up for interface
>>> eth2.
>>>
>>> [ 91.245381] BUG: scheduling while atomic: bond0/2716/0x10000100
>>> [ 91.251565] 5 locks held by bond0/2716:
>>> [ 91.255663] #0: ((bond_dev->name)){+.+.+.}, at: [<ffffffff81045fb4>]
>>> worker_thread+0x19a/0x2e2
>>> [ 91.265179] #1: ((&(&bond->mii_work)->work)){+.+.+.}, at:
>>> [<ffffffff81045fb4>] worker_thread+0x19a/0x2e2
>>> [ 91.275554] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff812daf38>]
>>> rtnl_lock+0x12/0x14
>>> [ 91.284018] #3: (&bond->lock){++.+.+}, at: [<ffffffffa029e06a>]
>>> bond_mii_monitor+0x2a2/0x4ed [bonding]
>>> [ 91.294230] #4: (&bond->curr_slave_lock){+...+.}, at:
>>> [<ffffffffa029e239>] bond_mii_monitor+0x471/0x4ed [bonding]
>>> [ 91.305387] Modules linked in: bonding sunrpc ip6t_REJECT xt_tcpudp
>>> nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables
>>> x_tables ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput
>>> snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq
>>> snd_seq_device snd_pcm ppdev parport_pc parport rtc_cmos snd_timer
>>> tg3 snd ide_cd_mod i5000_edac i2c_i801 libphy rtc_core rtc_lib
>>> edac_core pcspkr e1000 dcdbas uhci_hcd tulip shpchp i2c_core cdrom
>>> serio_raw soundcore sg snd_page_alloc raid0 sd_mod button [last
>>> unloaded: mperf]
>>> [ 91.357735] Pid: 2716, comm: bond0 Not tainted
>>> 2.6.34-04700-gd938a70-dirty #36
>>> [ 91.371112] Call Trace:
>>> [ 91.373825] [<ffffffff81056002>] ? __debug_show_held_locks+0x22/0x24
>>> [ 91.380530] [<ffffffff8102e4a2>] __schedule_bug+0x6d/0x72
>>> [ 91.386284] [<ffffffff81363f6e>] schedule+0xc9/0x791
>>> [ 91.391600] [<ffffffff81032540>] __cond_resched+0x25/0x30
>>> [ 91.397350] [<ffffffff81364757>] _cond_resched+0x27/0x32
>>> [ 91.403013] [<ffffffff810ab243>] kmem_cache_alloc+0x2b/0xac
>>> [ 91.408936] [<ffffffff812c61fd>] skb_clone+0x42/0x5d
>>> [ 91.414253] [<ffffffff812ec696>] netlink_broadcast+0x192/0x369
>>> [ 91.420436] [<ffffffff812ecdc3>] nlmsg_notify+0x43/0x89
>>> [ 91.426012] [<ffffffff812dabc7>] rtnl_notify+0x2b/0x2d
>>> [ 91.431501] [<ffffffff812dacbc>] rtmsg_ifinfo+0xf3/0x118
>>> [ 91.437165] [<ffffffff812dad0c>] rtnetlink_event+0x2b/0x2f
>>> [ 91.443003] [<ffffffff81369fe4>] notifier_call_chain+0x32/0x5e
>>> [ 91.449188] [<ffffffff8104d618>] raw_notifier_call_chain+0xf/0x11
>>> [ 91.455634] [<ffffffff812cfc73>] call_netdevice_notifiers+0x45/0x4a
>>> [ 91.462253] [<ffffffff812d04f7>] netdev_bonding_change+0x12/0x14
>>
>> This warning is because the notifier call is happening with spin
>> locks held.
>>
>>> [ 91.468614] [<ffffffffa029d589>] bond_select_active_slave+0xe8/0x123
>>> [bonding]
>>> [ 91.476408] [<ffffffffa029e241>] bond_mii_monitor+0x479/0x4ed [bonding]
>>> [ 91.483375] [<ffffffff81046009>] worker_thread+0x1ef/0x2e2
>>> [ 91.489212] [<ffffffff81045fb4>] ? worker_thread+0x19a/0x2e2
>>> [ 91.495227] [<ffffffffa029ddc8>] ? bond_mii_monitor+0x0/0x4ed [bonding]
>>> [ 91.502192] [<ffffffff81049c71>] ? autoremove_wake_function+0x0/0x34
>>> [ 91.508897] [<ffffffff81045e1a>] ? worker_thread+0x0/0x2e2
>>> [ 91.514734] [<ffffffff810498bb>] kthread+0x7a/0x82
>>> [ 91.519878] [<ffffffff81003714>] kernel_thread_helper+0x4/0x10
>>> [ 91.526060] [<ffffffff81366ffc>] ? restore_args+0x0/0x30
>>> [ 91.531723] [<ffffffff81049841>] ? kthread+0x0/0x82
>>> [ 91.536953] [<ffffffff81003710>] ? kernel_thread_helper+0x0/0x10
>>> [ 91.543343] bonding: bond0: making interface eth2 the new active one.
>>> [ 91.550554] bonding: bond0: first active interface up!
>>> [ 91.556859] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
>>>
>>>
>>> No other patch applied. Just started netconsole over bonding, so no need
>>> to pull the cable from slaves. Reproduced twice, one I got the
>>> backtrace above, and on the other one the system hangs completely
>>> after the BUG: scheduling message.
>>>
>>> fbl
>>>
>>>
>>>>
>>>> Thanks!
>>>>
>>>> ----------------------->
>>>>
>>>> We should notify netconsole that bond is changing its slaves
>>>> when we use active-backup mode.
>>>>
>>>> Signed-off-by: WANG Cong<amwang@redhat.com>
>>>>
>>>> ----
>>>>
>>>
>>>> diff --git a/drivers/net/bonding/bond_main.c
>>>> b/drivers/net/bonding/bond_main.c
>>>> index 5e12462..9494c02 100644
>>>> --- a/drivers/net/bonding/bond_main.c
>>>> +++ b/drivers/net/bonding/bond_main.c
>>>> @@ -1199,6 +1199,7 @@ void bond_select_active_slave(struct bonding
>>>> *bond)
>>>>
>>>> best_slave = bond_find_best_slave(bond);
>>>> if (best_slave != bond->curr_active_slave) {
>>>> + netdev_bonding_change(bond->dev, NETDEV_BONDING_DESLAVE);
>>>> bond_change_active_slave(bond, best_slave);
>>>> rv = bond_set_carrier(bond);
>>>> if (!rv)
>>
>> You can't do this here; the driver is holding various spin
>> locks, and notifier calls can sleep (hence the warning). If you look at
>> the bond_change_active_slave function, it drops all locks other than
>> RTNL before making a notifier call, e.g.,
>>
>> void bond_change_active_slave(struct bonding *bond, struct slave
>> *new_active)
>> {
>> [...]
>> if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
>> [...]
>> write_unlock_bh(&bond->curr_slave_lock);
>> read_unlock(&bond->lock);
>>
>> netdev_bonding_change(bond->dev, NETDEV_BONDING_FAILOVER);
>>
>> read_lock(&bond->lock);
>> write_lock_bh(&bond->curr_slave_lock);
>> }
>>
>>
>> You may be able to add your notifier to this case, or change
>> your handler to notice the _FAILOVER notifier.
>
>
> Thanks for your analysis! Hmm, I think let netconsole to handle
> NETDEV_BONDING_FAILOVER here is a better solution.
>

No, in bond_change_active_slave() does notification after
printing messages, thus will not solve the problem here,
we need to notify netconsole before printing any messages.

Thanks.

^ permalink raw reply

* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Cong Wang @ 2010-05-31  5:56 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: linux-kernel, Matt Mackall, netdev, bridge, Andy Gospodarek,
	Neil Horman, Jeff Moyer, Stephen Hemminger, bonding-devel,
	Jay Vosburgh, David Miller
In-Reply-To: <20100528194041.GC2345@sysclose.org>

[-- Attachment #1: Type: text/plain, Size: 111 bytes --]

Hi, Flavio,

Please use the attached patch instead, try to see if it solves
all your problems.

Thanks a lot!


[-- Attachment #2: drivers-net-bonding-fix-activebackup-deadlock.diff --]
[-- Type: text/x-patch, Size: 837 bytes --]

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index ca142c4..2d1d594 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -666,7 +666,8 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	struct net_device *dev = ptr;
 
 	if (!(event == NETDEV_CHANGENAME || event == NETDEV_UNREGISTER ||
-	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
+	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN ||
+	      event == NETDEV_BONDING_FAILOVER))
 		goto done;
 
 	spin_lock_irqsave(&target_list_lock, flags);
@@ -682,6 +683,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 				/* Fall through */
 			case NETDEV_GOING_DOWN:
 			case NETDEV_BONDING_DESLAVE:
+			case NETDEV_BONDING_FAILOVER:
 				nt->enabled = 0;
 				break;
 			}

^ permalink raw reply related

* RE: [PATCH v3] can: Add driver for esd CAN-USB/2 device
From: Viral Mehta @ 2010-05-31  6:35 UTC (permalink / raw)
  To: Matthias Fuchs
  Cc: netdev@vger.kernel.org, Socketcan-core@lists.berlios.de,
	linux-usb@vger.kernel.org
In-Reply-To: <201005281725.32049.matthias.fuchs@esd.eu>

Hi,

> -----Original Message-----
> From: Matthias Fuchs [mailto:matthias.fuchs@esd.eu]
> Sent: Friday, May 28, 2010 8:56 PM
> To: Viral Mehta
> Cc: netdev@vger.kernel.org; Socketcan-core@lists.berlios.de; linux-
> usb@vger.kernel.org
> Subject: Re: [PATCH v3] can: Add driver for esd CAN-USB/2 device
>
> Hi Viral,
>
> thanks for review. Please see some comments below.
>
> On Wednesday 26 May 2010 13:31, Viral Mehta wrote:
> > Hi,
> > >________________________________________
> > >From: linux-usb-owner@vger.kernel.org [linux-usb-
> owner@vger.kernel.org] On Behalf Of Matthias Fuchs
> [matthias.fuchs@esd.eu]
> > >Sent: Wednesday, May 26, 2010 2:44 PM
> > >To: netdev@vger.kernel.org
> > >Cc: Socketcan-core@lists.berlios.de; linux-usb@vger.kernel.org
> > >Subject: [PATCH v3] can: Add driver for esd CAN-USB/2 device
> > >+
> > >+       BUG_ON(!context);
> >
> > It is preferred to used WARN_ON and avoid using BUG_ON and thus dont
> kill the whole system....
> Really? Even when next line will reference a NULL pointer in this case?
> Ok. Will be changed.

How about,
if(!context)
        return;

Look at the other drivers for example.
I, personally, dont care since I will not be using your driver but I am sure that other people, too, wont like BUG_ON....

>
> > [...]
> > >+
> > >+       priv = context->priv;
> > >+       netdev = priv->netdev;
> > >+       dev = priv->usb2;
> > >+       err = usb_submit_urb(urb, GFP_ATOMIC);
> > >+       if (err) {
> > >+               can_free_echo_skb(netdev, context->echo_index);
> > >+
> > >+               atomic_dec(&priv->active_tx_jobs);
> > >+               usb_unanchor_urb(urb);
> > >+
> > >+               stats->tx_dropped++;
> > >+
> > >+               if (err == -ENODEV)
> > >+                       netif_device_detach(netdev);
> > >+               else
> > >+                       dev_warn(netdev->dev.parent, "failed tx_urb
> %d\n", err);
> > >+
> > >+               goto releasebuf;
> >
> > You probably want to set "ret" here or do you really want to return
> NETDEV_TX_OK
> As far as I can see netword device drivers xmit_start() return
> NETDEV_TX_OK or _BUSY.
> There are no real alternatives.
Yup. this is okie....

>But please correct me when I am wrong.
>
> Your other comments will be fixed by version 4 of my patch.
>
> Matthias
>
> ______________________________________________________________________

This Email may contain confidential or privileged information for the intended recipient (s) If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system.

______________________________________________________________________

^ permalink raw reply

* [PATCH net-next]atl1c: Add AR8151 v2 support and change L0s/L1 routine
From: jie.yang @ 2010-05-31  6:42 UTC (permalink / raw)
  To: davem; +Cc: Luis.Rodriguez, netdev, linux-kernel, Jie.Yang

From: Jie.Yang@atheros.com

Add AR8151 v2.0 Gigabit 1000 support
Change jumbo frame size to 6K
Update L0s/L1 rountine
        when link speed is 100M or 1G, set L1 link timer to 4 for l1d_2 and l2c_b2
        set L1 link timer to 7 for l2c_b, set L1 link timer to 0xF for others.
Update atl1c_suspend routine
	just refactory the function, add atl1c_phy_power_saving routine,
	when Wake On Lan enable, this func will be called to save power,
	it will reautoneg PHY to 10/100M speed depend on the link
	partners link capability.
Update atl1c_configure_des_ring
        do not use l2c_b default SRAM configuration.

Signed-off-by: Jie.Yang@atheros.com
---

 drivers/net/atl1c/atl1c.h      |    9 +-
 drivers/net/atl1c/atl1c_hw.c   |  107 +++++++++++--
 drivers/net/atl1c/atl1c_hw.h   |   49 ++++++-
 drivers/net/atl1c/atl1c_main.c |  348 +++++++++++++++++++++++-----------------
 4 files changed, 345 insertions(+), 168 deletions(-)
diff --git a/drivers/net/atl1c/atl1c.h b/drivers/net/atl1c/atl1c.h
index 84ae905..52abbbd 100644
--- a/drivers/net/atl1c/atl1c.h
+++ b/drivers/net/atl1c/atl1c.h
@@ -73,7 +73,8 @@
 #define FULL_DUPLEX        2
 
 #define AT_RX_BUF_SIZE		(ETH_FRAME_LEN + VLAN_HLEN + ETH_FCS_LEN)
-#define MAX_JUMBO_FRAME_SIZE 	(9*1024)
+#define MAX_JUMBO_FRAME_SIZE	(6*1024)
+#define MAX_TSO_FRAME_SIZE      (7*1024)
 #define MAX_TX_OFFLOAD_THRESH	(9*1024)
 
 #define AT_MAX_RECEIVE_QUEUE    4
@@ -87,10 +88,11 @@
 #define AT_MAX_INT_WORK		5
 #define AT_TWSI_EEPROM_TIMEOUT 	100
 #define AT_HW_MAX_IDLE_DELAY 	10
-#define AT_SUSPEND_LINK_TIMEOUT 28
+#define AT_SUSPEND_LINK_TIMEOUT 100
 
 #define AT_ASPM_L0S_TIMER	6
 #define AT_ASPM_L1_TIMER	12
+#define AT_LCKDET_TIMER		12
 
 #define ATL1C_PCIE_L0S_L1_DISABLE 	0x01
 #define ATL1C_PCIE_PHY_RESET		0x02
@@ -316,6 +318,7 @@ enum atl1c_nic_type {
 	athr_l2c_b,
 	athr_l2c_b2,
 	athr_l1d,
+	athr_l1d_2,
 };
 
 enum atl1c_trans_queue {
@@ -392,6 +395,8 @@ struct atl1c_hw {
 	u16 subsystem_id;
 	u16 subsystem_vendor_id;
 	u8 revision_id;
+	u16 phy_id1;
+	u16 phy_id2;
 
 	u32 intr_mask;
 	u8 dmaw_dly_cnt;
diff --git a/drivers/net/atl1c/atl1c_hw.c b/drivers/net/atl1c/atl1c_hw.c
index f1389d6..d8501f0 100644
--- a/drivers/net/atl1c/atl1c_hw.c
+++ b/drivers/net/atl1c/atl1c_hw.c
@@ -37,6 +37,9 @@ int atl1c_check_eeprom_exist(struct atl1c_hw *hw)
 	if (data & TWSI_DEBUG_DEV_EXIST)
 		return 1;
 
+	AT_READ_REG(hw, REG_MASTER_CTRL, &data);
+	if (data & MASTER_CTRL_OTP_SEL)
+		return 1;
 	return 0;
 }
 
@@ -69,6 +72,8 @@ static int atl1c_get_permanent_address(struct atl1c_hw *hw)
 	u32 i;
 	u32 otp_ctrl_data;
 	u32 twsi_ctrl_data;
+	u32 ltssm_ctrl_data;
+	u32 wol_data;
 	u8  eth_addr[ETH_ALEN];
 	u16 phy_data;
 	bool raise_vol = false;
@@ -104,6 +109,15 @@ static int atl1c_get_permanent_address(struct atl1c_hw *hw)
 			udelay(20);
 			raise_vol = true;
 		}
+		/* close open bit of ReadOnly*/
+		AT_READ_REG(hw, REG_LTSSM_ID_CTRL, &ltssm_ctrl_data);
+		ltssm_ctrl_data &= ~LTSSM_ID_EN_WRO;
+		AT_WRITE_REG(hw, REG_LTSSM_ID_CTRL, ltssm_ctrl_data);
+
+		/* clear any WOL settings */
+		AT_WRITE_REG(hw, REG_WOL_CTRL, 0);
+		AT_READ_REG(hw, REG_WOL_CTRL, &wol_data);
+
 
 		AT_READ_REG(hw, REG_TWSI_CTRL, &twsi_ctrl_data);
 		twsi_ctrl_data |= TWSI_CTRL_SW_LDSTART;
@@ -119,17 +133,15 @@ static int atl1c_get_permanent_address(struct atl1c_hw *hw)
 	}
 	/* Disable OTP_CLK */
 	if ((hw->nic_type == athr_l1c || hw->nic_type == athr_l2c)) {
-		if (otp_ctrl_data & OTP_CTRL_CLK_EN) {
-			otp_ctrl_data &= ~OTP_CTRL_CLK_EN;
-			AT_WRITE_REG(hw, REG_OTP_CTRL, otp_ctrl_data);
-			AT_WRITE_FLUSH(hw);
-			msleep(1);
-		}
+		otp_ctrl_data &= ~OTP_CTRL_CLK_EN;
+		AT_WRITE_REG(hw, REG_OTP_CTRL, otp_ctrl_data);
+		msleep(1);
 	}
 	if (raise_vol) {
 		if (hw->nic_type == athr_l2c_b ||
 		    hw->nic_type == athr_l2c_b2 ||
-		    hw->nic_type == athr_l1d) {
+		    hw->nic_type == athr_l1d ||
+		    hw->nic_type == athr_l1d_2) {
 			atl1c_write_phy_reg(hw, MII_DBG_ADDR, 0x00);
 			if (atl1c_read_phy_reg(hw, MII_DBG_DATA, &phy_data))
 				goto out;
@@ -456,14 +468,22 @@ int atl1c_phy_reset(struct atl1c_hw *hw)
 
 	if (hw->nic_type == athr_l2c_b ||
 	    hw->nic_type == athr_l2c_b2 ||
-	    hw->nic_type == athr_l1d) {
+	    hw->nic_type == athr_l1d ||
+	    hw->nic_type == athr_l1d_2) {
 		atl1c_write_phy_reg(hw, MII_DBG_ADDR, 0x3B);
 		atl1c_read_phy_reg(hw, MII_DBG_DATA, &phy_data);
 		atl1c_write_phy_reg(hw, MII_DBG_DATA, phy_data & 0xFFF7);
 		msleep(20);
 	}
-
-	/*Enable PHY LinkChange Interrupt */
+	if (hw->nic_type == athr_l1d) {
+		atl1c_write_phy_reg(hw, MII_DBG_ADDR, 0x29);
+		atl1c_write_phy_reg(hw, MII_DBG_DATA, 0x929D);
+	}
+	if (hw->nic_type == athr_l1c || hw->nic_type == athr_l2c_b2
+		|| hw->nic_type == athr_l2c || hw->nic_type == athr_l2c) {
+		atl1c_write_phy_reg(hw, MII_DBG_ADDR, 0x29);
+		atl1c_write_phy_reg(hw, MII_DBG_DATA, 0xB6DD);
+	}
 	err = atl1c_write_phy_reg(hw, MII_IER, mii_ier_data);
 	if (err) {
 		if (netif_msg_hw(adapter))
@@ -482,12 +502,10 @@ int atl1c_phy_init(struct atl1c_hw *hw)
 	struct pci_dev *pdev = adapter->pdev;
 	int ret_val;
 	u16 mii_bmcr_data = BMCR_RESET;
-	u16 phy_id1, phy_id2;
 
-	if ((atl1c_read_phy_reg(hw, MII_PHYSID1, &phy_id1) != 0) ||
-		(atl1c_read_phy_reg(hw, MII_PHYSID2, &phy_id2) != 0)) {
-			if (netif_msg_link(adapter))
-				dev_err(&pdev->dev, "Error get phy ID\n");
+	if ((atl1c_read_phy_reg(hw, MII_PHYSID1, &hw->phy_id1) != 0) ||
+		(atl1c_read_phy_reg(hw, MII_PHYSID2, &hw->phy_id2) != 0)) {
+		dev_err(&pdev->dev, "Error get phy ID\n");
 		return -1;
 	}
 	switch (hw->media_type) {
@@ -572,6 +590,65 @@ int atl1c_get_speed_and_duplex(struct atl1c_hw *hw, u16 *speed, u16 *duplex)
 	return 0;
 }
 
+int atl1c_phy_power_saving(struct atl1c_hw *hw)
+{
+	struct atl1c_adapter *adapter = (struct atl1c_adapter *)hw->adapter;
+	struct pci_dev *pdev = adapter->pdev;
+	int ret = 0;
+	u16 autoneg_advertised = ADVERTISED_10baseT_Half;
+	u16 save_autoneg_advertised;
+	u16 phy_data;
+	u16 mii_lpa_data;
+	u16 speed = SPEED_0;
+	u16 duplex = FULL_DUPLEX;
+	int i;
+
+	atl1c_read_phy_reg(hw, MII_BMSR, &phy_data);
+	atl1c_read_phy_reg(hw, MII_BMSR, &phy_data);
+	if (phy_data & BMSR_LSTATUS) {
+		atl1c_read_phy_reg(hw, MII_LPA, &mii_lpa_data);
+		if (mii_lpa_data & LPA_10FULL)
+			autoneg_advertised = ADVERTISED_10baseT_Full;
+		else if (mii_lpa_data & LPA_10HALF)
+			autoneg_advertised = ADVERTISED_10baseT_Half;
+		else if (mii_lpa_data & LPA_100HALF)
+			autoneg_advertised = ADVERTISED_100baseT_Half;
+		else if (mii_lpa_data & LPA_100FULL)
+			autoneg_advertised = ADVERTISED_100baseT_Full;
+
+		save_autoneg_advertised = hw->autoneg_advertised;
+		hw->phy_configured = false;
+		hw->autoneg_advertised = autoneg_advertised;
+		if (atl1c_restart_autoneg(hw) != 0) {
+			dev_dbg(&pdev->dev, "phy autoneg failed\n");
+			ret = -1;
+		}
+		hw->autoneg_advertised = save_autoneg_advertised;
+
+		if (mii_lpa_data) {
+			for (i = 0; i < AT_SUSPEND_LINK_TIMEOUT; i++) {
+				mdelay(100);
+				atl1c_read_phy_reg(hw, MII_BMSR, &phy_data);
+				atl1c_read_phy_reg(hw, MII_BMSR, &phy_data);
+				if (phy_data & BMSR_LSTATUS) {
+					if (atl1c_get_speed_and_duplex(hw, &speed,
+									&duplex) != 0)
+						dev_dbg(&pdev->dev,
+							"get speed and duplex failed\n");
+					break;
+				}
+			}
+		}
+	} else {
+		speed = SPEED_10;
+		duplex = HALF_DUPLEX;
+	}
+	adapter->link_speed = speed;
+	adapter->link_duplex = duplex;
+
+	return ret;
+}
+
 int atl1c_restart_autoneg(struct atl1c_hw *hw)
 {
 	int err = 0;
diff --git a/drivers/net/atl1c/atl1c_hw.h b/drivers/net/atl1c/atl1c_hw.h
index 1eeb3ed..3dd6759 100644
--- a/drivers/net/atl1c/atl1c_hw.h
+++ b/drivers/net/atl1c/atl1c_hw.h
@@ -42,7 +42,7 @@ bool atl1c_read_eeprom(struct atl1c_hw *hw, u32 offset, u32 *p_value);
 int atl1c_phy_init(struct atl1c_hw *hw);
 int atl1c_check_eeprom_exist(struct atl1c_hw *hw);
 int atl1c_restart_autoneg(struct atl1c_hw *hw);
-
+int atl1c_phy_power_saving(struct atl1c_hw *hw);
 /* register definition */
 #define REG_DEVICE_CAP              	0x5C
 #define DEVICE_CAP_MAX_PAYLOAD_MASK     0x7
@@ -120,6 +120,12 @@ int atl1c_restart_autoneg(struct atl1c_hw *hw);
 #define REG_PCIE_PHYMISC	    	0x1000
 #define PCIE_PHYMISC_FORCE_RCV_DET	0x4
 
+#define REG_PCIE_PHYMISC2		0x1004
+#define PCIE_PHYMISC2_SERDES_CDR_MASK	0x3
+#define PCIE_PHYMISC2_SERDES_CDR_SHIFT	16
+#define PCIE_PHYMISC2_SERDES_TH_MASK	0x3
+#define PCIE_PHYMISC2_SERDES_TH_SHIFT	18
+
 #define REG_TWSI_DEBUG			0x1108
 #define TWSI_DEBUG_DEV_EXIST		0x20000000
 
@@ -150,24 +156,28 @@ int atl1c_restart_autoneg(struct atl1c_hw *hw);
 #define PM_CTRL_ASPM_L0S_EN		0x00001000
 #define PM_CTRL_CLK_SWH_L1		0x00002000
 #define PM_CTRL_CLK_PWM_VER1_1		0x00004000
-#define PM_CTRL_PCIE_RECV		0x00008000
+#define PM_CTRL_RCVR_WT_TIMER		0x00008000
 #define PM_CTRL_L1_ENTRY_TIMER_MASK	0xF
 #define PM_CTRL_L1_ENTRY_TIMER_SHIFT	16
 #define PM_CTRL_PM_REQ_TIMER_MASK	0xF
 #define PM_CTRL_PM_REQ_TIMER_SHIFT	20
-#define PM_CTRL_LCKDET_TIMER_MASK	0x3F
+#define PM_CTRL_LCKDET_TIMER_MASK	0xF
 #define PM_CTRL_LCKDET_TIMER_SHIFT	24
 #define PM_CTRL_EN_BUFS_RX_L0S		0x10000000
 #define PM_CTRL_SA_DLY_EN		0x20000000
 #define PM_CTRL_MAC_ASPM_CHK		0x40000000
 #define PM_CTRL_HOTRST			0x80000000
 
+#define REG_LTSSM_ID_CTRL		0x12FC
+#define LTSSM_ID_EN_WRO			0x1000
 /* Selene Master Control Register */
 #define REG_MASTER_CTRL			0x1400
 #define MASTER_CTRL_SOFT_RST            0x1
 #define MASTER_CTRL_TEST_MODE_MASK	0x3
 #define MASTER_CTRL_TEST_MODE_SHIFT	2
 #define MASTER_CTRL_BERT_START		0x10
+#define MASTER_CTRL_OOB_DIS_OFF		0x40
+#define MASTER_CTRL_SA_TIMER_EN		0x80
 #define MASTER_CTRL_MTIMER_EN           0x100
 #define MASTER_CTRL_MANUAL_INT          0x200
 #define MASTER_CTRL_TX_ITIMER_EN	0x400
@@ -220,6 +230,12 @@ int atl1c_restart_autoneg(struct atl1c_hw *hw);
 		GPHY_CTRL_PWDOWN_HW	|\
 		GPHY_CTRL_PHY_IDDQ)
 
+#define GPHY_CTRL_POWER_SAVING (	\
+		GPHY_CTRL_SEL_ANA_RST	|\
+		GPHY_CTRL_HIB_EN	|\
+		GPHY_CTRL_HIB_PULSE	|\
+		GPHY_CTRL_PWDOWN_HW	|\
+		GPHY_CTRL_PHY_IDDQ)
 /* Block IDLE Status Register */
 #define REG_IDLE_STATUS  		0x1410
 #define IDLE_STATUS_MASK		0x00FF
@@ -287,6 +303,14 @@ int atl1c_restart_autoneg(struct atl1c_hw *hw);
 #define SERDES_LOCK_DETECT          	0x1  /* SerDes lock detected. This signal
 					      * comes from Analog SerDes */
 #define SERDES_LOCK_DETECT_EN       	0x2  /* 1: Enable SerDes Lock detect function */
+#define SERDES_LOCK_STS_SELFB_PLL_SHIFT 0xE
+#define SERDES_LOCK_STS_SELFB_PLL_MASK  0x3
+#define SERDES_OVCLK_18_25		0x0
+#define SERDES_OVCLK_12_18		0x1
+#define SERDES_OVCLK_0_4		0x2
+#define SERDES_OVCLK_4_12		0x3
+#define SERDES_MAC_CLK_SLOWDOWN		0x20000
+#define SERDES_PYH_CLK_SLOWDOWN		0x40000
 
 /* MAC Control Register  */
 #define REG_MAC_CTRL         		0x1480
@@ -693,6 +717,21 @@ int atl1c_restart_autoneg(struct atl1c_hw *hw);
 #define REG_MAC_TX_STATUS_BIN 		0x1760
 #define REG_MAC_TX_STATUS_END 		0x17c0
 
+#define REG_CLK_GATING_CTRL		0x1814
+#define CLK_GATING_DMAW_EN		0x0001
+#define CLK_GATING_DMAR_EN		0x0002
+#define CLK_GATING_TXQ_EN		0x0004
+#define CLK_GATING_RXQ_EN		0x0008
+#define CLK_GATING_TXMAC_EN		0x0010
+#define CLK_GATING_RXMAC_EN		0x0020
+
+#define CLK_GATING_EN_ALL	(CLK_GATING_DMAW_EN |\
+				 CLK_GATING_DMAR_EN |\
+				 CLK_GATING_TXQ_EN  |\
+				 CLK_GATING_RXQ_EN  |\
+				 CLK_GATING_TXMAC_EN|\
+				 CLK_GATING_RXMAC_EN)
+
 /* DEBUG ADDR */
 #define REG_DEBUG_DATA0 		0x1900
 #define REG_DEBUG_DATA1 		0x1904
@@ -734,6 +773,10 @@ int atl1c_restart_autoneg(struct atl1c_hw *hw);
 
 #define MII_PHYSID1			0x02
 #define MII_PHYSID2			0x03
+#define L1D_MPW_PHYID1			0xD01C  /* V7 */
+#define L1D_MPW_PHYID2			0xD01D  /* V1-V6 */
+#define L1D_MPW_PHYID3			0xD01E  /* V8 */
+
 
 /* Autoneg Advertisement Register */
 #define MII_ADVERTISE			0x04
diff --git a/drivers/net/atl1c/atl1c_main.c b/drivers/net/atl1c/atl1c_main.c
index 1c3c046..c7b8ef5 100644
--- a/drivers/net/atl1c/atl1c_main.c
+++ b/drivers/net/atl1c/atl1c_main.c
@@ -21,7 +21,7 @@
 
 #include "atl1c.h"
 
-#define ATL1C_DRV_VERSION "1.0.0.2-NAPI"
+#define ATL1C_DRV_VERSION "1.0.1.0-NAPI"
 char atl1c_driver_name[] = "atl1c";
 char atl1c_driver_version[] = ATL1C_DRV_VERSION;
 #define PCI_DEVICE_ID_ATTANSIC_L2C      0x1062
@@ -29,7 +29,7 @@ char atl1c_driver_version[] = ATL1C_DRV_VERSION;
 #define PCI_DEVICE_ID_ATHEROS_L2C_B	0x2060 /* AR8152 v1.1 Fast 10/100 */
 #define PCI_DEVICE_ID_ATHEROS_L2C_B2	0x2062 /* AR8152 v2.0 Fast 10/100 */
 #define PCI_DEVICE_ID_ATHEROS_L1D	0x1073 /* AR8151 v1.0 Gigabit 1000 */
-
+#define PCI_DEVICE_ID_ATHEROS_L1D_2_0	0x1083 /* AR8151 v2.0 Gigabit 1000 */
 #define L2CB_V10			0xc0
 #define L2CB_V11			0xc1
 
@@ -97,7 +97,28 @@ static const u16 atl1c_rrd_addr_lo_regs[AT_MAX_RECEIVE_QUEUE] =
 
 static const u32 atl1c_default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
 	NETIF_MSG_LINK | NETIF_MSG_TIMER | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP;
+static void atl1c_pcie_patch(struct atl1c_hw *hw)
+{
+	u32 data;
 
+	AT_READ_REG(hw, REG_PCIE_PHYMISC, &data);
+	data |= PCIE_PHYMISC_FORCE_RCV_DET;
+	AT_WRITE_REG(hw, REG_PCIE_PHYMISC, data);
+
+	if (hw->nic_type == athr_l2c_b && hw->revision_id == L2CB_V10) {
+		AT_READ_REG(hw, REG_PCIE_PHYMISC2, &data);
+
+		data &= ~(PCIE_PHYMISC2_SERDES_CDR_MASK <<
+			PCIE_PHYMISC2_SERDES_CDR_SHIFT);
+		data |= 3 << PCIE_PHYMISC2_SERDES_CDR_SHIFT;
+		data &= ~(PCIE_PHYMISC2_SERDES_TH_MASK <<
+			PCIE_PHYMISC2_SERDES_TH_SHIFT);
+		data |= 3 << PCIE_PHYMISC2_SERDES_TH_SHIFT;
+		AT_WRITE_REG(hw, REG_PCIE_PHYMISC2, data);
+	}
+}
+
+/* FIXME: no need any more ? */
 /*
  * atl1c_init_pcie - init PCIE module
  */
@@ -127,6 +148,11 @@ static void atl1c_reset_pcie(struct atl1c_hw *hw, u32 flag)
 	data &= ~PCIE_UC_SERVRITY_FCP;
 	AT_WRITE_REG(hw, REG_PCIE_UC_SEVERITY, data);
 
+	AT_READ_REG(hw, REG_LTSSM_ID_CTRL, &data);
+	data &= ~LTSSM_ID_EN_WRO;
+	AT_WRITE_REG(hw, REG_LTSSM_ID_CTRL, data);
+
+	atl1c_pcie_patch(hw);
 	if (flag & ATL1C_PCIE_L0S_L1_DISABLE)
 		atl1c_disable_l0s_l1(hw);
 	if (flag & ATL1C_PCIE_PHY_RESET)
@@ -135,7 +161,7 @@ static void atl1c_reset_pcie(struct atl1c_hw *hw, u32 flag)
 		AT_WRITE_REG(hw, REG_GPHY_CTRL,
 			GPHY_CTRL_DEFAULT | GPHY_CTRL_EXT_RESET);
 
-	msleep(1);
+	msleep(5);
 }
 
 /*
@@ -159,6 +185,7 @@ static inline void atl1c_irq_disable(struct atl1c_adapter *adapter)
 {
 	atomic_inc(&adapter->irq_sem);
 	AT_WRITE_REG(&adapter->hw, REG_IMR, 0);
+	AT_WRITE_REG(&adapter->hw, REG_ISR, ISR_DIS_INT);
 	AT_WRITE_FLUSH(&adapter->hw);
 	synchronize_irq(adapter->pdev->irq);
 }
@@ -231,15 +258,15 @@ static void atl1c_check_link_status(struct atl1c_adapter *adapter)
 
 	if ((phy_data & BMSR_LSTATUS) == 0) {
 		/* link down */
-		if (netif_carrier_ok(netdev)) {
-			hw->hibernate = true;
-			if (atl1c_stop_mac(hw) != 0)
-				if (netif_msg_hw(adapter))
-					dev_warn(&pdev->dev,
-						"stop mac failed\n");
-			atl1c_set_aspm(hw, false);
-		}
+		hw->hibernate = true;
+		if (atl1c_stop_mac(hw) != 0)
+			if (netif_msg_hw(adapter))
+				dev_warn(&pdev->dev, "stop mac failed\n");
+		atl1c_set_aspm(hw, false);
 		netif_carrier_off(netdev);
+		netif_stop_queue(netdev);
+		atl1c_phy_reset(hw);
+		atl1c_phy_init(&adapter->hw);
 	} else {
 		/* Link Up */
 		hw->hibernate = false;
@@ -308,6 +335,7 @@ static void atl1c_common_task(struct work_struct *work)
 	netdev = adapter->netdev;
 
 	if (adapter->work_event & ATL1C_WORK_EVENT_RESET) {
+		adapter->work_event &= ~ATL1C_WORK_EVENT_RESET;
 		netif_device_detach(netdev);
 		atl1c_down(adapter);
 		atl1c_up(adapter);
@@ -315,8 +343,11 @@ static void atl1c_common_task(struct work_struct *work)
 		return;
 	}
 
-	if (adapter->work_event & ATL1C_WORK_EVENT_LINK_CHANGE)
+	if (adapter->work_event & ATL1C_WORK_EVENT_LINK_CHANGE) {
+		adapter->work_event &= ~ATL1C_WORK_EVENT_LINK_CHANGE;
 		atl1c_check_link_status(adapter);
+	}
+	return;
 }
 
 
@@ -476,6 +507,13 @@ static int atl1c_change_mtu(struct net_device *netdev, int new_mtu)
 		netdev->mtu = new_mtu;
 		adapter->hw.max_frame_size = new_mtu;
 		atl1c_set_rxbufsize(adapter, netdev);
+		if (new_mtu > MAX_TSO_FRAME_SIZE) {
+			adapter->netdev->features &= ~NETIF_F_TSO;
+			adapter->netdev->features &= ~NETIF_F_TSO6;
+		} else {
+			adapter->netdev->features |= NETIF_F_TSO;
+			adapter->netdev->features |= NETIF_F_TSO6;
+		}
 		atl1c_down(adapter);
 		atl1c_up(adapter);
 		clear_bit(__AT_RESETTING, &adapter->flags);
@@ -613,6 +651,9 @@ static void atl1c_set_mac_type(struct atl1c_hw *hw)
 	case PCI_DEVICE_ID_ATHEROS_L1D:
 		hw->nic_type = athr_l1d;
 		break;
+	case PCI_DEVICE_ID_ATHEROS_L1D_2_0:
+		hw->nic_type = athr_l1d_2;
+		break;
 	default:
 		break;
 	}
@@ -627,9 +668,7 @@ static int atl1c_setup_mac_funcs(struct atl1c_hw *hw)
 	AT_READ_REG(hw, REG_PHY_STATUS, &phy_status_data);
 	AT_READ_REG(hw, REG_LINK_CTRL, &link_ctrl_data);
 
-	hw->ctrl_flags = ATL1C_INTR_CLEAR_ON_READ |
-			 ATL1C_INTR_MODRT_ENABLE  |
-			 ATL1C_RX_IPV6_CHKSUM	  |
+	hw->ctrl_flags = ATL1C_INTR_MODRT_ENABLE  |
 			 ATL1C_TXQ_MODE_ENHANCE;
 	if (link_ctrl_data & LINK_CTRL_L0S_EN)
 		hw->ctrl_flags |= ATL1C_ASPM_L0S_SUPPORT;
@@ -637,12 +676,12 @@ static int atl1c_setup_mac_funcs(struct atl1c_hw *hw)
 		hw->ctrl_flags |= ATL1C_ASPM_L1_SUPPORT;
 	if (link_ctrl_data & LINK_CTRL_EXT_SYNC)
 		hw->ctrl_flags |= ATL1C_LINK_EXT_SYNC;
+	hw->ctrl_flags |= ATL1C_ASPM_CTRL_MON;
 
 	if (hw->nic_type == athr_l1c ||
-	    hw->nic_type == athr_l1d) {
-		hw->ctrl_flags |= ATL1C_ASPM_CTRL_MON;
+	    hw->nic_type == athr_l1d ||
+	    hw->nic_type == athr_l1d_2)
 		hw->link_cap_flags |= ATL1C_LINK_CAP_1000M;
-	}
 	return 0;
 }
 /*
@@ -657,6 +696,8 @@ static int __devinit atl1c_sw_init(struct atl1c_adapter *adapter)
 {
 	struct atl1c_hw *hw   = &adapter->hw;
 	struct pci_dev	*pdev = adapter->pdev;
+	u32 revision;
+
 
 	adapter->wol = 0;
 	adapter->link_speed = SPEED_0;
@@ -669,7 +710,8 @@ static int __devinit atl1c_sw_init(struct atl1c_adapter *adapter)
 	hw->device_id = pdev->device;
 	hw->subsystem_vendor_id = pdev->subsystem_vendor;
 	hw->subsystem_id = pdev->subsystem_device;
-
+	AT_READ_REG(hw, PCI_CLASS_REVISION, &revision);
+	hw->revision_id = revision & 0xFF;
 	/* before link up, we assume hibernate is true */
 	hw->hibernate = true;
 	hw->media_type = MEDIA_TYPE_AUTO_SENSOR;
@@ -974,6 +1016,7 @@ static void atl1c_configure_des_ring(struct atl1c_adapter *adapter)
 	struct atl1c_cmb *cmb = (struct atl1c_cmb *) &adapter->cmb;
 	struct atl1c_smb *smb = (struct atl1c_smb *) &adapter->smb;
 	int i;
+	u32 data;
 
 	/* TPD */
 	AT_WRITE_REG(hw, REG_TX_BASE_ADDR_HI,
@@ -1017,6 +1060,23 @@ static void atl1c_configure_des_ring(struct atl1c_adapter *adapter)
 			(u32)((smb->dma & AT_DMA_HI_ADDR_MASK) >> 32));
 	AT_WRITE_REG(hw, REG_SMB_BASE_ADDR_LO,
 			(u32)(smb->dma & AT_DMA_LO_ADDR_MASK));
+	if (hw->nic_type == athr_l2c_b) {
+		AT_WRITE_REG(hw, REG_SRAM_RXF_LEN, 0x02a0L);
+		AT_WRITE_REG(hw, REG_SRAM_TXF_LEN, 0x0100L);
+		AT_WRITE_REG(hw, REG_SRAM_RXF_ADDR, 0x029f0000L);
+		AT_WRITE_REG(hw, REG_SRAM_RFD0_INFO, 0x02bf02a0L);
+		AT_WRITE_REG(hw, REG_SRAM_TXF_ADDR, 0x03bf02c0L);
+		AT_WRITE_REG(hw, REG_SRAM_TRD_ADDR, 0x03df03c0L);
+		AT_WRITE_REG(hw, REG_TXF_WATER_MARK, 0);	/* TX watermark, to enter l1 state.*/
+		AT_WRITE_REG(hw, REG_RXD_DMA_CTRL, 0);		/* RXD threshold.*/
+	}
+	if (hw->nic_type == athr_l2c_b || hw->nic_type == athr_l1d_2) {
+			/* Power Saving for L2c_B */
+		AT_READ_REG(hw, REG_SERDES_LOCK, &data);
+		data |= SERDES_MAC_CLK_SLOWDOWN;
+		data |= SERDES_PYH_CLK_SLOWDOWN;
+		AT_WRITE_REG(hw, REG_SERDES_LOCK, data);
+	}
 	/* Load all of base address above */
 	AT_WRITE_REG(hw, REG_LOAD_PTR, 1);
 }
@@ -1029,6 +1089,7 @@ static void atl1c_configure_tx(struct atl1c_adapter *adapter)
 	u16 tx_offload_thresh;
 	u32 txq_ctrl_data;
 	u32 extra_size = 0;     /* Jumbo frame threshold in QWORD unit */
+	u32 max_pay_load_data;
 
 	extra_size = ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN;
 	tx_offload_thresh = MAX_TX_OFFLOAD_THRESH;
@@ -1046,8 +1107,11 @@ static void atl1c_configure_tx(struct atl1c_adapter *adapter)
 			TXQ_NUM_TPD_BURST_SHIFT;
 	if (hw->ctrl_flags & ATL1C_TXQ_MODE_ENHANCE)
 		txq_ctrl_data |= TXQ_CTRL_ENH_MODE;
-	txq_ctrl_data |= (atl1c_pay_load_size[hw->dmar_block] &
+	max_pay_load_data = (atl1c_pay_load_size[hw->dmar_block] &
 			TXQ_TXF_BURST_NUM_MASK) << TXQ_TXF_BURST_NUM_SHIFT;
+	if (hw->nic_type == athr_l2c_b || hw->nic_type == athr_l2c_b2)
+		max_pay_load_data >>= 1;
+	txq_ctrl_data |= max_pay_load_data;
 
 	AT_WRITE_REG(hw, REG_TXQ_CTRL, txq_ctrl_data);
 }
@@ -1078,7 +1142,7 @@ static void atl1c_configure_rx(struct atl1c_adapter *adapter)
 	rxq_ctrl_data |= (hw->rss_hash_bits & RSS_HASH_BITS_MASK) <<
 			RSS_HASH_BITS_SHIFT;
 	if (hw->ctrl_flags & ATL1C_ASPM_CTRL_MON)
-		rxq_ctrl_data |= (ASPM_THRUPUT_LIMIT_100M &
+		rxq_ctrl_data |= (ASPM_THRUPUT_LIMIT_1M &
 			ASPM_THRUPUT_LIMIT_MASK) << ASPM_THRUPUT_LIMIT_SHIFT;
 
 	AT_WRITE_REG(hw, REG_RXQ_CTRL, rxq_ctrl_data);
@@ -1198,21 +1262,23 @@ static int atl1c_reset_mac(struct atl1c_hw *hw)
 {
 	struct atl1c_adapter *adapter = (struct atl1c_adapter *)hw->adapter;
 	struct pci_dev *pdev = adapter->pdev;
-	int ret;
+	u32 master_ctrl_data = 0;
 
 	AT_WRITE_REG(hw, REG_IMR, 0);
 	AT_WRITE_REG(hw, REG_ISR, ISR_DIS_INT);
 
-	ret = atl1c_stop_mac(hw);
-	if (ret)
-		return ret;
+	atl1c_stop_mac(hw);
 	/*
 	 * Issue Soft Reset to the MAC.  This will reset the chip's
 	 * transmit, receive, DMA.  It will not effect
 	 * the current PCI configuration.  The global reset bit is self-
 	 * clearing, and should clear within a microsecond.
 	 */
-	AT_WRITE_REGW(hw, REG_MASTER_CTRL, MASTER_CTRL_SOFT_RST);
+	AT_READ_REG(hw, REG_MASTER_CTRL, &master_ctrl_data);
+	master_ctrl_data |= MASTER_CTRL_OOB_DIS_OFF;
+	AT_WRITE_REGW(hw, REG_MASTER_CTRL, ((master_ctrl_data | MASTER_CTRL_SOFT_RST)
+			& 0xFFFF));
+
 	AT_WRITE_FLUSH(hw);
 	msleep(10);
 	/* Wait at least 10ms for All module to be Idle */
@@ -1253,42 +1319,39 @@ static void atl1c_set_aspm(struct atl1c_hw *hw, bool linkup)
 {
 	u32 pm_ctrl_data;
 	u32 link_ctrl_data;
+	u32 link_l1_timer = 0xF;
 
 	AT_READ_REG(hw, REG_PM_CTRL, &pm_ctrl_data);
 	AT_READ_REG(hw, REG_LINK_CTRL, &link_ctrl_data);
-	pm_ctrl_data &= ~PM_CTRL_SERDES_PD_EX_L1;
 
+	pm_ctrl_data &= ~PM_CTRL_SERDES_PD_EX_L1;
 	pm_ctrl_data &=  ~(PM_CTRL_L1_ENTRY_TIMER_MASK <<
 			PM_CTRL_L1_ENTRY_TIMER_SHIFT);
 	pm_ctrl_data &= ~(PM_CTRL_LCKDET_TIMER_MASK <<
-			  PM_CTRL_LCKDET_TIMER_SHIFT);
-
-	pm_ctrl_data |= PM_CTRL_MAC_ASPM_CHK;
-	pm_ctrl_data &= ~PM_CTRL_ASPM_L1_EN;
-	pm_ctrl_data |= PM_CTRL_RBER_EN;
-	pm_ctrl_data |= PM_CTRL_SDES_EN;
+			PM_CTRL_LCKDET_TIMER_SHIFT);
+	pm_ctrl_data |= AT_LCKDET_TIMER	<< PM_CTRL_LCKDET_TIMER_SHIFT;
 
-	if (hw->nic_type == athr_l2c_b ||
-	    hw->nic_type == athr_l1d ||
-	    hw->nic_type == athr_l2c_b2) {
+	if (hw->nic_type == athr_l2c_b || hw->nic_type == athr_l1d ||
+		hw->nic_type == athr_l2c_b2 || hw->nic_type == athr_l1d_2) {
 		link_ctrl_data &= ~LINK_CTRL_EXT_SYNC;
 		if (!(hw->ctrl_flags & ATL1C_APS_MODE_ENABLE)) {
-			if (hw->nic_type == athr_l2c_b &&
-			    hw->revision_id == L2CB_V10)
+			if (hw->nic_type == athr_l2c_b && hw->revision_id == L2CB_V10)
 				link_ctrl_data |= LINK_CTRL_EXT_SYNC;
 		}
 
 		AT_WRITE_REG(hw, REG_LINK_CTRL, link_ctrl_data);
 
-		pm_ctrl_data |= PM_CTRL_PCIE_RECV;
-		pm_ctrl_data |= AT_ASPM_L1_TIMER << PM_CTRL_PM_REQ_TIMER_SHIFT;
-		pm_ctrl_data &= ~PM_CTRL_EN_BUFS_RX_L0S;
+		pm_ctrl_data |= PM_CTRL_RCVR_WT_TIMER;
+		pm_ctrl_data &= ~(PM_CTRL_PM_REQ_TIMER_MASK <<
+			PM_CTRL_PM_REQ_TIMER_SHIFT);
+		pm_ctrl_data |= AT_ASPM_L1_TIMER <<
+			PM_CTRL_PM_REQ_TIMER_SHIFT;
 		pm_ctrl_data &= ~PM_CTRL_SA_DLY_EN;
 		pm_ctrl_data &= ~PM_CTRL_HOTRST;
 		pm_ctrl_data |= 1 << PM_CTRL_L1_ENTRY_TIMER_SHIFT;
 		pm_ctrl_data |= PM_CTRL_SERDES_PD_EX_L1;
 	}
-
+	pm_ctrl_data |= PM_CTRL_MAC_ASPM_CHK;
 	if (linkup) {
 		pm_ctrl_data &= ~PM_CTRL_ASPM_L1_EN;
 		pm_ctrl_data &= ~PM_CTRL_ASPM_L0S_EN;
@@ -1297,27 +1360,26 @@ static void atl1c_set_aspm(struct atl1c_hw *hw, bool linkup)
 		if (hw->ctrl_flags & ATL1C_ASPM_L0S_SUPPORT)
 			pm_ctrl_data |= PM_CTRL_ASPM_L0S_EN;
 
-		if (hw->nic_type == athr_l2c_b ||
-		    hw->nic_type == athr_l1d ||
-		    hw->nic_type == athr_l2c_b2) {
+		if (hw->nic_type == athr_l2c_b || hw->nic_type == athr_l1d ||
+			hw->nic_type == athr_l2c_b2 || hw->nic_type == athr_l1d_2) {
 			if (hw->nic_type == athr_l2c_b)
 				if (!(hw->ctrl_flags & ATL1C_APS_MODE_ENABLE))
-					pm_ctrl_data &= PM_CTRL_ASPM_L0S_EN;
+					pm_ctrl_data &= ~PM_CTRL_ASPM_L0S_EN;
 			pm_ctrl_data &= ~PM_CTRL_SERDES_L1_EN;
 			pm_ctrl_data &= ~PM_CTRL_SERDES_PLL_L1_EN;
 			pm_ctrl_data &= ~PM_CTRL_SERDES_BUDS_RX_L1_EN;
 			pm_ctrl_data |= PM_CTRL_CLK_SWH_L1;
-			if (hw->adapter->link_speed == SPEED_100 ||
-			    hw->adapter->link_speed == SPEED_1000) {
-				pm_ctrl_data &=
-					~(PM_CTRL_L1_ENTRY_TIMER_MASK <<
-					  PM_CTRL_L1_ENTRY_TIMER_SHIFT);
-				if (hw->nic_type == athr_l1d)
-					pm_ctrl_data |= 0xF <<
-						PM_CTRL_L1_ENTRY_TIMER_SHIFT;
-				else
-					pm_ctrl_data |= 7 <<
-						PM_CTRL_L1_ENTRY_TIMER_SHIFT;
+		if (hw->adapter->link_speed == SPEED_100 ||
+				hw->adapter->link_speed == SPEED_1000) {
+				pm_ctrl_data &=  ~(PM_CTRL_L1_ENTRY_TIMER_MASK <<
+					PM_CTRL_L1_ENTRY_TIMER_SHIFT);
+				if (hw->nic_type == athr_l2c_b)
+					link_l1_timer = 7;
+				else if (hw->nic_type == athr_l2c_b2 ||
+					hw->nic_type == athr_l1d_2)
+					link_l1_timer = 4;
+				pm_ctrl_data |= link_l1_timer <<
+					PM_CTRL_L1_ENTRY_TIMER_SHIFT;
 			}
 		} else {
 			pm_ctrl_data |= PM_CTRL_SERDES_L1_EN;
@@ -1326,24 +1388,12 @@ static void atl1c_set_aspm(struct atl1c_hw *hw, bool linkup)
 			pm_ctrl_data &= ~PM_CTRL_CLK_SWH_L1;
 			pm_ctrl_data &= ~PM_CTRL_ASPM_L0S_EN;
 			pm_ctrl_data &= ~PM_CTRL_ASPM_L1_EN;
-		}
-		atl1c_write_phy_reg(hw, MII_DBG_ADDR, 0x29);
-		if (hw->adapter->link_speed == SPEED_10)
-			if (hw->nic_type == athr_l1d)
-				atl1c_write_phy_reg(hw, MII_DBG_ADDR, 0xB69D);
-			else
-				atl1c_write_phy_reg(hw, MII_DBG_DATA, 0xB6DD);
-		else if (hw->adapter->link_speed == SPEED_100)
-			atl1c_write_phy_reg(hw, MII_DBG_DATA, 0xB2DD);
-		else
-			atl1c_write_phy_reg(hw, MII_DBG_DATA, 0x96DD);
 
+		}
 	} else {
-		pm_ctrl_data &= ~PM_CTRL_SERDES_BUDS_RX_L1_EN;
 		pm_ctrl_data &= ~PM_CTRL_SERDES_L1_EN;
 		pm_ctrl_data &= ~PM_CTRL_ASPM_L0S_EN;
 		pm_ctrl_data &= ~PM_CTRL_SERDES_PLL_L1_EN;
-
 		pm_ctrl_data |= PM_CTRL_CLK_SWH_L1;
 
 		if (hw->ctrl_flags & ATL1C_ASPM_L1_SUPPORT)
@@ -1351,8 +1401,9 @@ static void atl1c_set_aspm(struct atl1c_hw *hw, bool linkup)
 		else
 			pm_ctrl_data &= ~PM_CTRL_ASPM_L1_EN;
 	}
-
 	AT_WRITE_REG(hw, REG_PM_CTRL, pm_ctrl_data);
+
+	return;
 }
 
 static void atl1c_setup_mac_ctrl(struct atl1c_adapter *adapter)
@@ -1391,7 +1442,8 @@ static void atl1c_setup_mac_ctrl(struct atl1c_adapter *adapter)
 		mac_ctrl_data |= MAC_CTRL_MC_ALL_EN;
 
 	mac_ctrl_data |= MAC_CTRL_SINGLE_PAUSE_EN;
-	if (hw->nic_type == athr_l1d || hw->nic_type == athr_l2c_b2) {
+	if (hw->nic_type == athr_l1d || hw->nic_type == athr_l2c_b2 ||
+	    hw->nic_type == athr_l1d_2) {
 		mac_ctrl_data |= MAC_CTRL_SPEED_MODE_SW;
 		mac_ctrl_data |= MAC_CTRL_HASH_ALG_CRC32;
 	}
@@ -1409,6 +1461,7 @@ static int atl1c_configure(struct atl1c_adapter *adapter)
 	struct atl1c_hw *hw = &adapter->hw;
 	u32 master_ctrl_data = 0;
 	u32 intr_modrt_data;
+	u32 data;
 
 	/* clear interrupt status */
 	AT_WRITE_REG(hw, REG_ISR, 0xFFFFFFFF);
@@ -1418,6 +1471,15 @@ static int atl1c_configure(struct atl1c_adapter *adapter)
 	 * HW will enable self to assert interrupt event to system after
 	 * waiting x-time for software to notify it accept interrupt.
 	 */
+
+	data = CLK_GATING_EN_ALL;
+	if (hw->ctrl_flags & ATL1C_CLK_GATING_EN) {
+		if (hw->nic_type == athr_l2c_b)
+			data &= ~CLK_GATING_RXMAC_EN;
+	} else
+		data = 0;
+	AT_WRITE_REG(hw, REG_CLK_GATING_CTRL, data);
+
 	AT_WRITE_REG(hw, REG_INT_RETRIG_TIMER,
 		hw->ict & INT_RETRIG_TIMER_MASK);
 
@@ -1436,6 +1498,7 @@ static int atl1c_configure(struct atl1c_adapter *adapter)
 	if (hw->ctrl_flags & ATL1C_INTR_CLEAR_ON_READ)
 		master_ctrl_data |= MASTER_CTRL_INT_RDCLR;
 
+	master_ctrl_data |= MASTER_CTRL_SA_TIMER_EN;
 	AT_WRITE_REG(hw, REG_MASTER_CTRL, master_ctrl_data);
 
 	if (hw->ctrl_flags & ATL1C_CMB_ENABLE) {
@@ -1624,11 +1687,9 @@ static irqreturn_t atl1c_intr(int irq, void *data)
 					"atl1c hardware error (status = 0x%x)\n",
 					status & ISR_ERROR);
 			/* reset MAC */
-			hw->intr_mask &= ~ISR_ERROR;
-			AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
 			adapter->work_event |= ATL1C_WORK_EVENT_RESET;
 			schedule_work(&adapter->common_task);
-			break;
+			return IRQ_HANDLED;
 		}
 
 		if (status & ISR_OVER)
@@ -2303,7 +2364,6 @@ void atl1c_down(struct atl1c_adapter *adapter)
 	napi_disable(&adapter->napi);
 	atl1c_irq_disable(adapter);
 	atl1c_free_irq(adapter);
-	AT_WRITE_REG(&adapter->hw, REG_ISR, ISR_DIS_INT);
 	/* reset MAC to disable all RX/TX */
 	atl1c_reset_mac(&adapter->hw);
 	msleep(1);
@@ -2387,79 +2447,68 @@ static int atl1c_suspend(struct pci_dev *pdev, pm_message_t state)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct atl1c_adapter *adapter = netdev_priv(netdev);
 	struct atl1c_hw *hw = &adapter->hw;
-	u32 ctrl;
-	u32 mac_ctrl_data;
-	u32 master_ctrl_data;
+	u32 mac_ctrl_data = 0;
+	u32 master_ctrl_data = 0;
 	u32 wol_ctrl_data = 0;
-	u16 mii_bmsr_data;
-	u16 save_autoneg_advertised;
-	u16 mii_intr_status_data;
+	u16 mii_intr_status_data = 0;
 	u32 wufc = adapter->wol;
-	u32 i;
 	int retval = 0;
 
+	atl1c_disable_l0s_l1(hw);
 	if (netif_running(netdev)) {
 		WARN_ON(test_bit(__AT_RESETTING, &adapter->flags));
 		atl1c_down(adapter);
 	}
 	netif_device_detach(netdev);
-	atl1c_disable_l0s_l1(hw);
 	retval = pci_save_state(pdev);
 	if (retval)
 		return retval;
+
+	if (wufc)
+		if (atl1c_phy_power_saving(hw) != 0)
+			dev_dbg(&pdev->dev, "phy power saving failed");
+
+	AT_READ_REG(hw, REG_MASTER_CTRL, &master_ctrl_data);
+	AT_READ_REG(hw, REG_MAC_CTRL, &mac_ctrl_data);
+
+	master_ctrl_data &= ~MASTER_CTRL_CLK_SEL_DIS;
+	mac_ctrl_data &= ~(MAC_CTRL_PRMLEN_MASK << MAC_CTRL_PRMLEN_SHIFT);
+	mac_ctrl_data |= (((u32)adapter->hw.preamble_len &
+			MAC_CTRL_PRMLEN_MASK) <<
+			MAC_CTRL_PRMLEN_SHIFT);
+	mac_ctrl_data &= ~(MAC_CTRL_SPEED_MASK << MAC_CTRL_SPEED_SHIFT);
+	mac_ctrl_data &= ~MAC_CTRL_DUPLX;
+
 	if (wufc) {
-		AT_READ_REG(hw, REG_MASTER_CTRL, &master_ctrl_data);
-		master_ctrl_data &= ~MASTER_CTRL_CLK_SEL_DIS;
-
-		/* get link status */
-		atl1c_read_phy_reg(hw, MII_BMSR, (u16 *)&mii_bmsr_data);
-		atl1c_read_phy_reg(hw, MII_BMSR, (u16 *)&mii_bmsr_data);
-		save_autoneg_advertised = hw->autoneg_advertised;
-		hw->autoneg_advertised = ADVERTISED_10baseT_Half;
-		if (atl1c_restart_autoneg(hw) != 0)
-			if (netif_msg_link(adapter))
-				dev_warn(&pdev->dev, "phy autoneg failed\n");
-		hw->phy_configured = false; /* re-init PHY when resume */
-		hw->autoneg_advertised = save_autoneg_advertised;
+		mac_ctrl_data |= MAC_CTRL_RX_EN;
+		if (adapter->link_speed == SPEED_1000 ||
+			adapter->link_speed == SPEED_0) {
+			mac_ctrl_data |= atl1c_mac_speed_1000 <<
+					MAC_CTRL_SPEED_SHIFT;
+			mac_ctrl_data |= MAC_CTRL_DUPLX;
+		} else
+			mac_ctrl_data |= atl1c_mac_speed_10_100 <<
+					MAC_CTRL_SPEED_SHIFT;
+
+		if (adapter->link_duplex == DUPLEX_FULL)
+			mac_ctrl_data |= MAC_CTRL_DUPLX;
+
 		/* turn on magic packet wol */
 		if (wufc & AT_WUFC_MAG)
-			wol_ctrl_data = WOL_MAGIC_EN | WOL_MAGIC_PME_EN;
+			wol_ctrl_data |= WOL_MAGIC_EN | WOL_MAGIC_PME_EN;
 
 		if (wufc & AT_WUFC_LNKC) {
-			for (i = 0; i < AT_SUSPEND_LINK_TIMEOUT; i++) {
-				msleep(100);
-				atl1c_read_phy_reg(hw, MII_BMSR,
-					(u16 *)&mii_bmsr_data);
-				if (mii_bmsr_data & BMSR_LSTATUS)
-					break;
-			}
-			if ((mii_bmsr_data & BMSR_LSTATUS) == 0)
-				if (netif_msg_link(adapter))
-					dev_warn(&pdev->dev,
-						"%s: Link may change"
-						"when suspend\n",
-						atl1c_driver_name);
 			wol_ctrl_data |=  WOL_LINK_CHG_EN | WOL_LINK_CHG_PME_EN;
 			/* only link up can wake up */
 			if (atl1c_write_phy_reg(hw, MII_IER, IER_LINK_UP) != 0) {
-				if (netif_msg_link(adapter))
-					dev_err(&pdev->dev,
-						"%s: read write phy "
-						"register failed.\n",
-						atl1c_driver_name);
-				goto wol_dis;
+				dev_dbg(&pdev->dev, "%s: read write phy "
+						  "register failed.\n",
+						  atl1c_driver_name);
 			}
 		}
 		/* clear phy interrupt */
 		atl1c_read_phy_reg(hw, MII_ISR, &mii_intr_status_data);
 		/* Config MAC Ctrl register */
-		mac_ctrl_data = MAC_CTRL_RX_EN;
-		/* set to 10/100M halt duplex */
-		mac_ctrl_data |= atl1c_mac_speed_10_100 << MAC_CTRL_SPEED_SHIFT;
-		mac_ctrl_data |= (((u32)adapter->hw.preamble_len &
-				 MAC_CTRL_PRMLEN_MASK) <<
-				 MAC_CTRL_PRMLEN_SHIFT);
-
 		if (adapter->vlgrp)
 			mac_ctrl_data |= MAC_CTRL_RMV_VLAN;
 
@@ -2467,37 +2516,30 @@ static int atl1c_suspend(struct pci_dev *pdev, pm_message_t state)
 		if (wufc & AT_WUFC_MAG)
 			mac_ctrl_data |= MAC_CTRL_BC_EN;
 
-		if (netif_msg_hw(adapter))
-			dev_dbg(&pdev->dev,
-				"%s: suspend MAC=0x%x\n",
-				atl1c_driver_name, mac_ctrl_data);
+		dev_dbg(&pdev->dev,
+			"%s: suspend MAC=0x%x\n",
+			atl1c_driver_name, mac_ctrl_data);
 		AT_WRITE_REG(hw, REG_MASTER_CTRL, master_ctrl_data);
 		AT_WRITE_REG(hw, REG_WOL_CTRL, wol_ctrl_data);
 		AT_WRITE_REG(hw, REG_MAC_CTRL, mac_ctrl_data);
 
 		/* pcie patch */
-		AT_READ_REG(hw, REG_PCIE_PHYMISC, &ctrl);
-		ctrl |= PCIE_PHYMISC_FORCE_RCV_DET;
-		AT_WRITE_REG(hw, REG_PCIE_PHYMISC, ctrl);
+		device_set_wakeup_enable(&pdev->dev, 1);
 
-		pci_enable_wake(pdev, pci_choose_state(pdev, state), 1);
-		goto suspend_exit;
+		AT_WRITE_REG(hw, REG_GPHY_CTRL, GPHY_CTRL_DEFAULT |
+			GPHY_CTRL_EXT_RESET);
+		pci_prepare_to_sleep(pdev);
+	} else {
+		AT_WRITE_REG(hw, REG_GPHY_CTRL, GPHY_CTRL_POWER_SAVING);
+		master_ctrl_data |= MASTER_CTRL_CLK_SEL_DIS;
+		mac_ctrl_data |= atl1c_mac_speed_10_100 << MAC_CTRL_SPEED_SHIFT;
+		mac_ctrl_data |= MAC_CTRL_DUPLX;
+		AT_WRITE_REG(hw, REG_MASTER_CTRL, master_ctrl_data);
+		AT_WRITE_REG(hw, REG_MAC_CTRL, mac_ctrl_data);
+		AT_WRITE_REG(hw, REG_WOL_CTRL, 0);
+		hw->phy_configured = false; /* re-init PHY when resume */
+		pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
 	}
-wol_dis:
-
-	/* WOL disabled */
-	AT_WRITE_REG(hw, REG_WOL_CTRL, 0);
-
-	/* pcie patch */
-	AT_READ_REG(hw, REG_PCIE_PHYMISC, &ctrl);
-	ctrl |= PCIE_PHYMISC_FORCE_RCV_DET;
-	AT_WRITE_REG(hw, REG_PCIE_PHYMISC, ctrl);
-
-	atl1c_phy_disable(hw);
-	hw->phy_configured = false; /* re-init PHY when resume */
-
-	pci_enable_wake(pdev, pci_choose_state(pdev, state), 0);
-suspend_exit:
 
 	pci_disable_device(pdev);
 	pci_set_power_state(pdev, pci_choose_state(pdev, state));
@@ -2516,9 +2558,19 @@ static int atl1c_resume(struct pci_dev *pdev)
 	pci_enable_wake(pdev, PCI_D3cold, 0);
 
 	AT_WRITE_REG(&adapter->hw, REG_WOL_CTRL, 0);
+	atl1c_reset_pcie(&adapter->hw, ATL1C_PCIE_L0S_L1_DISABLE |
+			ATL1C_PCIE_PHY_RESET);
 
 	atl1c_phy_reset(&adapter->hw);
 	atl1c_reset_mac(&adapter->hw);
+	atl1c_phy_init(&adapter->hw);
+
+#if 0
+	AT_READ_REG(&adapter->hw, REG_PM_CTRLSTAT, &pm_data);
+	pm_data &= ~PM_CTRLSTAT_PME_EN;
+	AT_WRITE_REG(&adapter->hw, REG_PM_CTRLSTAT, pm_data);
+#endif
+
 	netif_device_attach(netdev);
 	if (netif_running(netdev))
 		atl1c_up(adapter);

^ permalink raw reply related

* [PATCH UPDATED 3/3] vhost: apply cpumask and cgroup to vhost pollers
From: Tejun Heo @ 2010-05-31  6:58 UTC (permalink / raw)
  To: Li Zefan
  Cc: Michael S. Tsirkin, Oleg Nesterov, Sridhar Samudrala, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C030CB8.505@cn.fujitsu.com>

Apply the cpumask and cgroup of the initializing task to the created
vhost poller.

Based on Sridhar Samudrala's patch.  Li Zefan spotted a bug in error
path, fixed.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
Updated accordingly.  Thanks.

 drivers/vhost/vhost.c |   36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

Index: work/drivers/vhost/vhost.c
===================================================================
--- work.orig/drivers/vhost/vhost.c
+++ work/drivers/vhost/vhost.c
@@ -23,6 +23,7 @@
 #include <linux/highmem.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/cgroup.h>

 #include <linux/net.h>
 #include <linux/if_packet.h>
@@ -176,12 +177,30 @@ repeat:
 long vhost_dev_init(struct vhost_dev *dev,
 		    struct vhost_virtqueue *vqs, int nvqs)
 {
-	struct task_struct *poller;
-	int i;
+	struct task_struct *poller = NULL;
+	cpumask_var_t mask;
+	int i, ret = -ENOMEM;
+
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		goto out;

 	poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
-	if (IS_ERR(poller))
-		return PTR_ERR(poller);
+	if (IS_ERR(poller)) {
+		ret = PTR_ERR(poller);
+		goto out;
+	}
+
+	ret = sched_getaffinity(current->pid, mask);
+	if (ret)
+		goto out;
+
+	ret = sched_setaffinity(poller->pid, mask);
+	if (ret)
+		goto out;
+
+	ret = cgroup_attach_task_current_cg(poller);
+	if (ret)
+		goto out;

 	dev->vqs = vqs;
 	dev->nvqs = nvqs;
@@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
 			vhost_poll_init(&dev->vqs[i].poll,
 					dev->vqs[i].handle_kick, POLLIN, dev);
 	}
-	return 0;
+
+	wake_up_process(poller);	/* avoid contributing to loadavg */
+	ret = 0;
+out:
+	if (ret && poller)
+		kthread_stop(poller);
+	free_cpumask_var(mask);
+	return ret;
 }

 /* Caller should have device mutex */

^ permalink raw reply

* Re: [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup
From: Tejun Heo @ 2010-05-31  7:00 UTC (permalink / raw)
  To: Li Zefan
  Cc: Michael S. Tsirkin, Oleg Nesterov, Sridhar Samudrala, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C030BC1.6000607@cn.fujitsu.com>

On 05/31/2010 03:07 AM, Li Zefan wrote:
> 04:24, Tejun Heo wrote:
>> From: Sridhar Samudrala <samudrala.sridhar@gmail.com>
>>
>> Add a new kernel API to attach a task to current task's cgroup
>> in all the active hierarchies.
>>
>> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> 
> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
> 
> btw: you lost the reviewed-by tag given by Paul Menage.

I only got bounced the original posting.  Michael, can you please add
it if/when you commit these?

Thank you.

-- 
tejun

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox