Netdev List
 help / color / mirror / Atom feed
* Re: tc: RTM_GETQDISC causes kernel OOPS
From: Patrick McHardy @ 2010-05-22  9:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Pfaff, Jamal Hadi Salim, netdev
In-Reply-To: <1274512687.5020.21.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le vendredi 21 mai 2010 à 15:42 -0700, Ben Pfaff a écrit :
>> Hi.  While working on some library code for working with qdiscs and
>> classes I came upon a kernel OOPS.  Originally I came across it with a
>> 2.6.26 kernel, but I can also reproduce it with unmodified v2.6.34 from
>> kernel.org.
>>
>> At the end of this mail I'm appending both an example of the OOPS and a
>> simple test program that reliably reproduces the problem for me when I
>> invoke it with "lo" as argument.  The program does not need to be run as
>> root.
>>
>> After the OOPS, a lot of networking and other system functions stop
>> working, so it seems to me a serious issue.
>>
>> The null pointer dereference that causes the OOPS is the dereference of
>> the return value of qdisc_dev() in tc_fill_qdisc() in
>> net/sched/sch_api.c line 1163:
>>
>>     1161	tcm->tcm__pad1 = 0;
>>     1162	tcm->tcm__pad2 = 0;
>>     1163	tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
>>     1164	tcm->tcm_parent = clid;
>>     1165	tcm->tcm_handle = q->handle;
>>
>> I am pretty sure about that, because if I add "WARN_ON(!qdisc_dev(q));"
>> just before line 1163 then that warning triggers.
>>
>> Thanks,
> 
> Indeed, thanks for this very useful report !
> 
> We could add a check for TCQ_F_BUILTIN flag, or just make 
> qdisc_notify() checks consistent for both old and new qdisc
> 
> What other people thinks ?

We already use TCQ_F_BUILTIN in tc_qdisc_dump_ignore(), so I
think it would be more consistent than checking for a handle
to use it here as well.

^ permalink raw reply

* [PATCH 26/27] drivers/isdn: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:26 UTC (permalink / raw)
  To: Karsten Keil, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/isdn/i4l/isdn_ppp.c |   11 +++--------
 drivers/isdn/pcbit/drv.c    |   10 +++-------
 drivers/isdn/sc/ioctl.c     |   23 ++++++-----------------
 3 files changed, 12 insertions(+), 32 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c
index f37b8f6..8c46bae 100644
--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -449,14 +449,9 @@ static int get_filter(void __user *arg, struct sock_filter **p)
 
 	/* uprog.len is unsigned short, so no overflow here */
 	len = uprog.len * sizeof(struct sock_filter);
-	code = kmalloc(len, GFP_KERNEL);
-	if (code == NULL)
-		return -ENOMEM;
-
-	if (copy_from_user(code, uprog.filter, len)) {
-		kfree(code);
-		return -EFAULT;
-	}
+	code = memdup_user(uprog.filter, len);
+	if (IS_ERR(code))
+		return PTR_ERR(code);
 
 	err = sk_chk_filter(code, uprog.len);
 	if (err) {
diff --git a/drivers/isdn/pcbit/drv.c b/drivers/isdn/pcbit/drv.c
index 123c1d6..1507d2e 100644
--- a/drivers/isdn/pcbit/drv.c
+++ b/drivers/isdn/pcbit/drv.c
@@ -411,14 +411,10 @@ static int pcbit_writecmd(const u_char __user *buf, int len, int driver, int cha
 			return -EINVAL;
 		}
 
-		cbuf = kmalloc(len, GFP_KERNEL);
-		if (!cbuf)
-			return -ENOMEM;
+		cbuf = memdup_user(buf, len);
+		if (IS_ERR(cbuf))
+			return PTR_ERR(cbuf);
 
-		if (copy_from_user(cbuf, buf, len)) {
-			kfree(cbuf);
-			return -EFAULT;
-		}
 		memcpy_toio(dev->sh_mem, cbuf, len);
 		kfree(cbuf);
 		return len;
diff --git a/drivers/isdn/sc/ioctl.c b/drivers/isdn/sc/ioctl.c
index 1081091..43c5dc3 100644
--- a/drivers/isdn/sc/ioctl.c
+++ b/drivers/isdn/sc/ioctl.c
@@ -215,19 +215,13 @@ int sc_ioctl(int card, scs_ioctl *data)
 		pr_debug("%s: DCBIOSETSPID: ioctl received\n",
 				sc_adapter[card]->devicename);
 
-		spid = kmalloc(SCIOC_SPIDSIZE, GFP_KERNEL);
-		if(!spid) {
-			kfree(rcvmsg);
-			return -ENOMEM;
-		}
-
 		/*
 		 * Get the spid from user space
 		 */
-		if (copy_from_user(spid, data->dataptr, SCIOC_SPIDSIZE)) {
+		spid = memdup_user(data->dataptr, SCIOC_SPIDSIZE);
+		if (IS_ERR(spid)) {
 			kfree(rcvmsg);
-			kfree(spid);
-			return -EFAULT;
+			return PTR_ERR(spid);
 		}
 
 		pr_debug("%s: SCIOCSETSPID: setting channel %d spid to %s\n", 
@@ -296,18 +290,13 @@ int sc_ioctl(int card, scs_ioctl *data)
 		pr_debug("%s: SCIOSETDN: ioctl received\n",
 				sc_adapter[card]->devicename);
 
-		dn = kmalloc(SCIOC_DNSIZE, GFP_KERNEL);
-		if (!dn) {
-			kfree(rcvmsg);
-			return -ENOMEM;
-		}
 		/*
 		 * Get the spid from user space
 		 */
-		if (copy_from_user(dn, data->dataptr, SCIOC_DNSIZE)) {
+		dn = memdup_user(data->dataptr, SCIOC_DNSIZE);
+		if (IS_ERR(dn)) {
 			kfree(rcvmsg);
-			kfree(dn);
-			return -EFAULT;
+			return PTR_ERR(dn);
 		}
 
 		pr_debug("%s: SCIOCSETDN: setting channel %d dn to %s\n", 

^ permalink raw reply related

* [PATCH 24/27] drivers/net/wan: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:26 UTC (permalink / raw)
  To: Kevin Curtis, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wan/farsync.c |   14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wan/farsync.c b/drivers/net/wan/farsync.c
index e087b9a..43b7727 100644
--- a/drivers/net/wan/farsync.c
+++ b/drivers/net/wan/farsync.c
@@ -2038,16 +2038,10 @@ fst_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 		/* Now copy the data to the card. */
 
-		buf = kmalloc(wrthdr.size, GFP_KERNEL);
-		if (!buf)
-			return -ENOMEM;
-
-		if (copy_from_user(buf,
-				   ifr->ifr_data + sizeof (struct fstioc_write),
-				   wrthdr.size)) {
-			kfree(buf);
-			return -EFAULT;
-		}
+		buf = memdup_user(ifr->ifr_data + sizeof(struct fstioc_write),
+				  wrthdr.size);
+		if (IS_ERR(buf))
+			return PTR_ERR(buf);
 
 		memcpy_toio(card->mem + wrthdr.offset, buf, wrthdr.size);
 		kfree(buf);

^ permalink raw reply related

* [PATCH 23/27] drivers/net/wireless/prism54: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:25 UTC (permalink / raw)
  To: Luis R. Rodriguez, John W. Linville, linux-wireless, netdev,
	linux-kernel, kernel-janit

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wireless/prism54/isl_ioctl.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/prism54/isl_ioctl.c b/drivers/net/wireless/prism54/isl_ioctl.c
index 8d1190c..2a62c60 100644
--- a/drivers/net/wireless/prism54/isl_ioctl.c
+++ b/drivers/net/wireless/prism54/isl_ioctl.c
@@ -2751,14 +2751,9 @@ prism54_hostapd(struct net_device *ndev, struct iw_point *p)
            p->length > PRISM2_HOSTAPD_MAX_BUF_SIZE || !p->pointer)
                return -EINVAL;
 
-       param = kmalloc(p->length, GFP_KERNEL);
-       if (param == NULL)
-               return -ENOMEM;
-
-       if (copy_from_user(param, p->pointer, p->length)) {
-               kfree(param);
-               return -EFAULT;
-       }
+	param = memdup_user(p->pointer, p->length);
+	if (IS_ERR(param))
+		return PTR_ERR(param);
 
        switch (param->cmd) {
        case PRISM2_SET_ENCRYPTION:

^ permalink raw reply related

* [PATCH 22/27] net/dccp: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David S. Miller, dccp, netdev,
	linux-kernel, kernel-janitor

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 net/dccp/proto.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index b03ecf6..f79bcef 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -473,14 +473,9 @@ static int dccp_setsockopt_ccid(struct sock *sk, int type,
 	if (optlen < 1 || optlen > DCCP_FEAT_MAX_SP_VALS)
 		return -EINVAL;
 
-	val = kmalloc(optlen, GFP_KERNEL);
-	if (val == NULL)
-		return -ENOMEM;
-
-	if (copy_from_user(val, optval, optlen)) {
-		kfree(val);
-		return -EFAULT;
-	}
+	val = memdup_user(optval, optlen);
+	if (IS_ERR(val))
+		return PTR_ERR(val);
 
 	lock_sock(sk);
 	if (type == DCCP_SOCKOPT_TX_CCID || type == DCCP_SOCKOPT_CCID)

^ permalink raw reply related

* [PATCH 7/27] drivers/net/wan: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:20 UTC (permalink / raw)
  To: Mike McLagan, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wan/sdla.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wan/sdla.c b/drivers/net/wan/sdla.c
index 43ae6f4..258efc2 100644
--- a/drivers/net/wan/sdla.c
+++ b/drivers/net/wan/sdla.c
@@ -1211,14 +1211,9 @@ static int sdla_xfer(struct net_device *dev, struct sdla_mem __user *info, int r
 	}
 	else
 	{
-		temp = kmalloc(mem.len, GFP_KERNEL);
-		if (!temp)
-			return(-ENOMEM);
-		if(copy_from_user(temp, mem.data, mem.len))
-		{
-			kfree(temp);
-			return -EFAULT;
-		}
+		temp = memdup_user(mem.data, mem.len);
+		if (IS_ERR(temp))
+			return PTR_ERR(temp);
 		sdla_write(dev, mem.addr, temp, mem.len);
 		kfree(temp);
 	}

^ permalink raw reply related

* [PATCH 6/27] drivers/net/cxgb3: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:20 UTC (permalink / raw)
  To: Divy Le Ray, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/cxgb3/cxgb3_main.c |   12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index e3f1b85..066fd5b 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -2311,15 +2311,9 @@ static int cxgb_extension_ioctl(struct net_device *dev, void __user *useraddr)
 		if (copy_from_user(&t, useraddr, sizeof(t)))
 			return -EFAULT;
 		/* Check t.len sanity ? */
-		fw_data = kmalloc(t.len, GFP_KERNEL);
-		if (!fw_data)
-			return -ENOMEM;
-
-		if (copy_from_user
-			(fw_data, useraddr + sizeof(t), t.len)) {
-			kfree(fw_data);
-			return -EFAULT;
-		}
+		fw_data = memdup_user(useraddr + sizeof(t), t.len);
+		if (IS_ERR(fw_data))
+			return PTR_ERR(fw_data);
 
 		ret = t3_load_fw(adapter, fw_data, t.len);
 		kfree(fw_data);

^ permalink raw reply related

* [PATCH 2/27] drivers/net: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:18 UTC (permalink / raw)
  To: Paul Mackerras, linux-ppp, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/ppp_generic.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 5441688..da7943d 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -539,14 +539,9 @@ static int get_filter(void __user *arg, struct sock_filter **p)
 	}
 
 	len = uprog.len * sizeof(struct sock_filter);
-	code = kmalloc(len, GFP_KERNEL);
-	if (code == NULL)
-		return -ENOMEM;
-
-	if (copy_from_user(code, uprog.filter, len)) {
-		kfree(code);
-		return -EFAULT;
-	}
+	code = memdup_user(uprog.filter, len);
+	if (IS_ERR(code))
+		return PTR_ERR(code);
 
 	err = sk_chk_filter(code, uprog.len);
 	if (err) {

^ permalink raw reply related

* [PATCH 1/27] net/can: Use memdup_user
From: Julia Lawall @ 2010-05-22  8:18 UTC (permalink / raw)
  To: Oliver Hartkopp, Urs Thuermann, David S. Miller, socketcan-core,
	netdev

From: Julia Lawall <julia@diku.dk>

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 net/can/raw.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/can/raw.c b/net/can/raw.c
index da99cf1..ccfe633 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -436,14 +436,9 @@ static int raw_setsockopt(struct socket *sock, int level, int optname,
 
 		if (count > 1) {
 			/* filter does not fit into dfilter => alloc space */
-			filter = kmalloc(optlen, GFP_KERNEL);
-			if (!filter)
-				return -ENOMEM;
-
-			if (copy_from_user(filter, optval, optlen)) {
-				kfree(filter);
-				return -EFAULT;
-			}
+			filter = memdup_user(optval, optlen);
+			if (IS_ERR(filter))
+				return PTR_ERR(filter);
 		} else if (count == 1) {
 			if (copy_from_user(&sfilter, optval, sizeof(sfilter)))
 				return -EFAULT;

^ permalink raw reply related

* Re: tc: RTM_GETQDISC causes kernel OOPS
From: Eric Dumazet @ 2010-05-22  7:18 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: Jamal Hadi Salim, netdev, Patrick McHardy
In-Reply-To: <20100521224243.GD10247@nicira.com>

Le vendredi 21 mai 2010 à 15:42 -0700, Ben Pfaff a écrit :
> Hi.  While working on some library code for working with qdiscs and
> classes I came upon a kernel OOPS.  Originally I came across it with a
> 2.6.26 kernel, but I can also reproduce it with unmodified v2.6.34 from
> kernel.org.
> 
> At the end of this mail I'm appending both an example of the OOPS and a
> simple test program that reliably reproduces the problem for me when I
> invoke it with "lo" as argument.  The program does not need to be run as
> root.
> 
> After the OOPS, a lot of networking and other system functions stop
> working, so it seems to me a serious issue.
> 
> The null pointer dereference that causes the OOPS is the dereference of
> the return value of qdisc_dev() in tc_fill_qdisc() in
> net/sched/sch_api.c line 1163:
> 
>     1161	tcm->tcm__pad1 = 0;
>     1162	tcm->tcm__pad2 = 0;
>     1163	tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
>     1164	tcm->tcm_parent = clid;
>     1165	tcm->tcm_handle = q->handle;
> 
> I am pretty sure about that, because if I add "WARN_ON(!qdisc_dev(q));"
> just before line 1163 then that warning triggers.
> 
> Thanks,

Indeed, thanks for this very useful report !

We could add a check for TCQ_F_BUILTIN flag, or just make 
qdisc_notify() checks consistent for both old and new qdisc

What other people thinks ?

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index fe35c1f..e454c73 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1210,7 +1210,7 @@ static int qdisc_notify(struct net *net, struct sk_buff *oskb,
 		if (tc_fill_qdisc(skb, old, clid, pid, n->nlmsg_seq, 0, RTM_DELQDISC) < 0)
 			goto err_out;
 	}
-	if (new) {
+	if (new && new->handle) {
 		if (tc_fill_qdisc(skb, new, clid, pid, n->nlmsg_seq, old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0)
 			goto err_out;
 	}







^ permalink raw reply related

* Re: [PATCH] rtnetlink: Fix error handling in do_setlink()
From: Chris Wright @ 2010-05-22  6:52 UTC (permalink / raw)
  To: David Howells; +Cc: netdev, Chris Wright, David S. Miller
In-Reply-To: <20100521122527.20442.77793.stgit@warthog.procyon.org.uk>

* David Howells (dhowells@redhat.com) wrote:
> Commit c02db8c6290bb992442fec1407643c94cc414375:
> 
> 	Author:  Chris Wright <chrisw@sous-sol.org>
> 	Date:    Sun May 16 01:05:45 2010 -0700
> 	Subject: rtnetlink: make SR-IOV VF interface symmetric
> 
> adds broken error handling to do_setlink() in net/core/rtnetlink.c.  The
> problem is the following chunk of code:
> 
> 	if (tb[IFLA_VFINFO_LIST]) {
> 		struct nlattr *attr;
> 		int rem;
> 		nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
> 			if (nla_type(attr) != IFLA_VF_INFO)
>   ---->				goto errout;
> 			err = do_setvfinfo(dev, attr);
> 			if (err < 0)
> 				goto errout;
> 			modified = 1;
> 		}
> 	}
> 
> which can get to errout without setting err, resulting in the following error:
> 
> net/core/rtnetlink.c: In function 'do_setlink':
> net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function
> 
> Change the code to return -EINVAL in this case.  Note that this might not be
> the appropriate error though.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

Acked-by: Chris Wright <chrisw@sous-sol.org>

Thank you David, that's correct.  I have some other pending changes
here, so I don't mind collecting them together.

thanks,
-chris

^ permalink raw reply

* Re: VLANs, bonding redux: vlan state does not follow ethernet
From: George B. @ 2010-05-22  6:45 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTimxJHKjAEm7wMYObFn9U3Sc1WPRYUuFFIKwwAyD@mail.gmail.com>

On Fri, May 21, 2010 at 11:24 PM, George B. <georgeb@gmail.com> wrote:
> Using 2.6.34 I am trying to remove bottlenecks.  Instead of bonding
> two ethernet interfaces and applying vlans to the bond, I am applying
> the vlans to the ethernet and bonding the vlans creating a separate
> bond interface for each vlan.
>
> The trouble now is that the bond interface does not see when the
> ethernet interface goes down.  The vlan reports to the bonding driver
> that it is up when the ethernet it is connected to is down.  This
> results in packet loss through the bond interface as the bond driver
> attempts to use that vlan.
>
> eth1 shows having no link:
>
> root@sandbox:/proc/net# ethtool eth1
> Settings for eth1:
>        Supported ports: [ TP ]
>        Supported link modes:   10baseT/Half 10baseT/Full
>                                100baseT/Half 100baseT/Full
>                                1000baseT/Full
>        Supports auto-negotiation: Yes
>        Advertised link modes:  10baseT/Half 10baseT/Full
>                                100baseT/Half 100baseT/Full
>                                1000baseT/Full
>        Advertised pause frame use: No
>        Advertised auto-negotiation: Yes
>        Link partner advertised link modes:  Not reported
>        Link partner advertised pause frame use: No
>        Link partner advertised auto-negotiation: No
>        Speed: Unknown!
>        Duplex: Unknown! (255)
>        Port: Twisted Pair
>        PHYAD: 1
>        Transceiver: internal
>        Auto-negotiation: on
>        MDI-X: Unknown
>        Supports Wake-on: pumbag
>        Wake-on: g
>        Current message level: 0x00000001 (1)
>        Link detected: no
>
> bonding driver says eth1.99 reports MII status up:
>
> root@sandbox:/proc/net# cat bonding/bond0
> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>
> Bonding Mode: load balancing (round-robin)
> MII Status: up
> MII Polling Interval (ms): 0
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> Slave Interface: eth0.99
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:26:9e:1c:d3:3e
>
> Slave Interface: eth1.99
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:26:9e:1c:d3:3f
>
> is there some parameter I can give that tells the vlan driver to
> follow the state of the interface it is attached to?  Having a vlan
> that reports being up all the time even when its underlying interface
> is down is less than useful. It would seem intuitive that a vlan's
> state would follow that of the interface it is attached to.
>
> root@sandbox:/proc/net# cat vlan/eth0.99
> eth0.99  VID: 99         REORDER_HDR: 1  dev->priv_flags: 21
>         total frames received           32
>          total bytes received         4735
>      Broadcast/Multicast Rcvd            0
>
>      total frames transmitted           50
>       total bytes transmitted         3852
>            total headroom inc            0
>           total encap on xmit            0
> Device: eth0
> INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
>  EGRESS priority mappings:
> root@sandbox:/proc/net# cat vlan/eth1.99
> eth1.99  VID: 99         REORDER_HDR: 1  dev->priv_flags: 21
>         total frames received            0
>          total bytes received            0
>      Broadcast/Multicast Rcvd            0
>
>      total frames transmitted            0
>       total bytes transmitted            0
>            total headroom inc            0
>           total encap on xmit            0
> Device: eth1
> INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
>  EGRESS priority mappings:
>
> root@sandbox:/proc/net# ping 10.1.99.1
> PING 10.1.99.1 (10.1.99.1) 56(84) bytes of data.
> 64 bytes from 10.1.99.1: icmp_seq=2 ttl=255 time=0.299 ms
> 64 bytes from 10.1.99.1: icmp_seq=4 ttl=255 time=0.311 ms
> 64 bytes from 10.1.99.1: icmp_seq=6 ttl=255 time=0.325 ms
> 64 bytes from 10.1.99.1: icmp_seq=8 ttl=255 time=0.291 ms
> 64 bytes from 10.1.99.1: icmp_seq=10 ttl=255 time=0.308 ms
>
> George
>

But interestingly, mii-tool reports the correct result:

root@sandbox:/usr/src/linux-source-2.6.34/Documentation/networking#
mii-tool -v eth1.99

eth1.99: no link
  product info: vendor 00:50:43, model 10 rev 0
  basic mode:   autonegotiation enabled
  basic status: no link
  capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

^ permalink raw reply

* VLANs, bonding redux: vlan state does not follow ethernet
From: George B. @ 2010-05-22  6:24 UTC (permalink / raw)
  To: netdev

Using 2.6.34 I am trying to remove bottlenecks.  Instead of bonding
two ethernet interfaces and applying vlans to the bond, I am applying
the vlans to the ethernet and bonding the vlans creating a separate
bond interface for each vlan.

The trouble now is that the bond interface does not see when the
ethernet interface goes down.  The vlan reports to the bonding driver
that it is up when the ethernet it is connected to is down.  This
results in packet loss through the bond interface as the bond driver
attempts to use that vlan.

eth1 shows having no link:

root@sandbox:/proc/net# ethtool eth1
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  Not reported
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: No
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: no

bonding driver says eth1.99 reports MII status up:

root@sandbox:/proc/net# cat bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0.99
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:26:9e:1c:d3:3e

Slave Interface: eth1.99
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:26:9e:1c:d3:3f

is there some parameter I can give that tells the vlan driver to
follow the state of the interface it is attached to?  Having a vlan
that reports being up all the time even when its underlying interface
is down is less than useful. It would seem intuitive that a vlan's
state would follow that of the interface it is attached to.

root@sandbox:/proc/net# cat vlan/eth0.99
eth0.99  VID: 99         REORDER_HDR: 1  dev->priv_flags: 21
         total frames received           32
          total bytes received         4735
      Broadcast/Multicast Rcvd            0

      total frames transmitted           50
       total bytes transmitted         3852
            total headroom inc            0
           total encap on xmit            0
Device: eth0
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
 EGRESS priority mappings:
root@sandbox:/proc/net# cat vlan/eth1.99
eth1.99  VID: 99         REORDER_HDR: 1  dev->priv_flags: 21
         total frames received            0
          total bytes received            0
      Broadcast/Multicast Rcvd            0

      total frames transmitted            0
       total bytes transmitted            0
            total headroom inc            0
           total encap on xmit            0
Device: eth1
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
 EGRESS priority mappings:

root@sandbox:/proc/net# ping 10.1.99.1
PING 10.1.99.1 (10.1.99.1) 56(84) bytes of data.
64 bytes from 10.1.99.1: icmp_seq=2 ttl=255 time=0.299 ms
64 bytes from 10.1.99.1: icmp_seq=4 ttl=255 time=0.311 ms
64 bytes from 10.1.99.1: icmp_seq=6 ttl=255 time=0.325 ms
64 bytes from 10.1.99.1: icmp_seq=8 ttl=255 time=0.291 ms
64 bytes from 10.1.99.1: icmp_seq=10 ttl=255 time=0.308 ms

George

^ permalink raw reply

* cls_cgroup: Store classid in struct sock
From: Herbert Xu @ 2010-05-22  1:49 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, eric.dumazet, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100521164054.GB29521@hmsreliant.think-freely.org>

On Fri, May 21, 2010 at 12:40:54PM -0400, Neil Horman wrote:
> 
> There may also be an issue with the setting of the classid (possible use of the
> wrong subsys id value when grabbing our cgroup_subsys_state), but I'm checking
> on that now.

Actually, I think it's because my patch mistook CONFIG_CGROUP
for CONFIG_CGROUPS.

Here is a fixed version (also includes a build fix on sk->classid).

cls_cgroup: Store classid in struct sock

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
new file mode 100644
index 0000000..f0e244e
--- /dev/null
+++ b/include/net/cls_cgroup.h
@@ -0,0 +1,63 @@
+/*
+ * cls_cgroup.h			Control Group Classifier
+ * 
+ * Authors:	Thomas Graf <tgraf@suug.ch>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option) 
+ * any later version.
+ *
+ */
+
+#ifndef _NET_CLS_CGROUP_H
+#define _NET_CLS_CGROUP_H
+
+#include <linux/cgroup.h>
+#include <linux/hardirq.h>
+#include <linux/rcupdate.h>
+
+struct cgroup_cls_state
+{
+	struct cgroup_subsys_state css;
+	u32 classid;
+};
+
+#ifdef CONFIG_CGROUPS
+#ifdef CONFIG_NET_CLS_CGROUP
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	if (in_interrupt())
+		return 0;
+
+	return container_of(task_subsys_state(p, net_cls_subsys_id),
+			    struct cgroup_cls_state, css).classid;
+}
+#else
+extern int net_cls_subsys_id;
+
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	int id;
+	u32 classid;
+
+	if (in_interrupt())
+		return 0;
+
+	rcu_read_lock();
+	id = rcu_dereference(net_cls_subsys_id);
+	if (id >= 0)
+		classid = container_of(task_subsys_state(p, id),
+				       struct cgroup_cls_state, css)->classid;
+	rcu_read_unlock();
+
+	return classid;
+}
+#endif
+#else
+static inline u32 task_cls_classid(struct task_struct *p)
+{
+	return 0;
+}
+#endif
+#endif  /* _NET_CLS_CGROUP_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 5697caf..d24f382 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -312,7 +312,7 @@ struct sock {
 	void			*sk_security;
 #endif
 	__u32			sk_mark;
-	/* XXX 4 bytes hole on 64 bit */
+	u32			sk_classid;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
@@ -1074,6 +1074,14 @@ extern void *sock_kmalloc(struct sock *sk, int size,
 extern void sock_kfree_s(struct sock *sk, void *mem, int size);
 extern void sk_send_sigurg(struct sock *sk);
 
+#ifdef CONFIG_CGROUPS
+extern void sock_update_classid(struct sock *sk);
+#else
+static inline void sock_update_classid(struct sock *sk)
+{
+}
+#endif
+
 /*
  * Functions to fill in entries in struct proto_ops when a protocol
  * does not implement a particular function.
diff --git a/net/core/sock.c b/net/core/sock.c
index bf88a16..a05ae7f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -123,6 +123,7 @@
 #include <linux/net_tstamp.h>
 #include <net/xfrm.h>
 #include <linux/ipsec.h>
+#include <net/cls_cgroup.h>
 
 #include <linux/filter.h>
 
@@ -217,6 +218,11 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
 
+#if defined(CONFIG_CGROUPS) && !defined(CONFIG_NET_CLS_CGROUP)
+int net_cls_subsys_id = -1;
+EXPORT_SYMBOL_GPL(net_cls_subsys_id);
+#endif
+
 static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
 {
 	struct timeval tv;
@@ -1050,6 +1056,16 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 	module_put(owner);
 }
 
+#ifdef CONFIG_CGROUPS
+void sock_update_classid(struct sock *sk)
+{
+	u32 classid = task_cls_classid(current);
+
+	if (classid && classid != sk->sk_classid)
+		sk->sk_classid = classid;
+}
+#endif
+
 /**
  *	sk_alloc - All socket objects are allocated here
  *	@net: the applicable net namespace
@@ -1073,6 +1089,8 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_lock_init(sk);
 		sock_net_set(sk, get_net(net));
 		atomic_set(&sk->sk_wmem_alloc, 1);
+
+		sock_update_classid(sk);
 	}
 
 	return sk;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 2211803..766124b 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -16,14 +16,11 @@
 #include <linux/errno.h>
 #include <linux/skbuff.h>
 #include <linux/cgroup.h>
+#include <linux/rcupdate.h>
 #include <net/rtnetlink.h>
 #include <net/pkt_cls.h>
-
-struct cgroup_cls_state
-{
-	struct cgroup_subsys_state css;
-	u32 classid;
-};
+#include <net/sock.h>
+#include <net/cls_cgroup.h>
 
 static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
 					       struct cgroup *cgrp);
@@ -112,6 +109,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	struct cls_cgroup_head *head = tp->root;
 	u32 classid;
 
+	rcu_read_lock();
+	classid = task_cls_state(current)->classid;
+	rcu_read_unlock();
+
 	/*
 	 * Due to the nature of the classifier it is required to ignore all
 	 * packets originating from softirq context as accessing `current'
@@ -122,12 +123,12 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	 * calls by looking at the number of nested bh disable calls because
 	 * softirqs always disables bh.
 	 */
-	if (softirq_count() != SOFTIRQ_OFFSET)
-		return -1;
-
-	rcu_read_lock();
-	classid = task_cls_state(current)->classid;
-	rcu_read_unlock();
+	if (softirq_count() != SOFTIRQ_OFFSET) {
+	 	/* If there is an sk_classid we'll use that. */
+		if (!skb->sk)
+			return -1;
+		classid = skb->sk->sk_classid;
+	}
 
 	if (!classid)
 		return -1;
@@ -289,18 +290,35 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
 
 static int __init init_cgroup_cls(void)
 {
-	int ret = register_tcf_proto_ops(&cls_cgroup_ops);
-	if (ret)
-		return ret;
+	int ret;
+
 	ret = cgroup_load_subsys(&net_cls_subsys);
 	if (ret)
-		unregister_tcf_proto_ops(&cls_cgroup_ops);
+		goto out;
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	/* We can't use rcu_assign_pointer because this is an int. */
+	smp_wmb();
+	net_cls_subsys_id = net_cls_subsys.subsys_id;
+#endif
+
+	ret = register_tcf_proto_ops(&cls_cgroup_ops);
+	if (ret)
+		cgroup_unload_subsys(&net_cls_subsys);
+
+out:
 	return ret;
 }
 
 static void __exit exit_cgroup_cls(void)
 {
 	unregister_tcf_proto_ops(&cls_cgroup_ops);
+
+#ifndef CONFIG_NET_CLS_CGROUP
+	net_cls_subsys_id = -1;
+	synchronize_rcu();
+#endif
+
 	cgroup_unload_subsys(&net_cls_subsys);
 }
 
diff --git a/net/socket.c b/net/socket.c
index f9f7d08..367d547 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -94,6 +94,7 @@
 
 #include <net/compat.h>
 #include <net/wext.h>
+#include <net/cls_cgroup.h>
 
 #include <net/sock.h>
 #include <linux/netfilter.h>
@@ -558,6 +559,8 @@ static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 	int err;
 
+	sock_update_classid(sock->sk);
+
 	si->sock = sock;
 	si->scm = NULL;
 	si->msg = msg;
@@ -684,6 +687,8 @@ static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
 {
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 
+	sock_update_classid(sock->sk);
+
 	si->sock = sock;
 	si->scm = NULL;
 	si->msg = msg;
@@ -777,6 +782,8 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
 	if (unlikely(!sock->ops->splice_read))
 		return -EINVAL;
 
+	sock_update_classid(sock->sk);
+
 	return sock->ops->splice_read(sock, ppos, pipe, len, flags);
 }
 
@@ -3069,6 +3076,8 @@ int kernel_setsockopt(struct socket *sock, int level, int optname,
 int kernel_sendpage(struct socket *sock, struct page *page, int offset,
 		    size_t size, int flags)
 {
+	sock_update_classid(sock->sk);
+
 	if (sock->ops->sendpage)
 		return sock->ops->sendpage(sock, page, offset, size, flags);

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* [PATCH]: Fix warning in _hfcpci_softirq()
From: Prarit Bhargava @ 2010-05-22  0:33 UTC (permalink / raw)
  To: netdev, isdn; +Cc: Prarit Bhargava

Fix warning:

drivers/isdn/hardware/mISDN/hfcpci.c: In function `hfcpci_softirq':
drivers/isdn/hardware/mISDN/hfcpci.c:2321: warning: ignoring return value of `driver_for_each_device', declared with attribute warn_unused_result

Signed-off-by: Prarit Bhargava <prarit@redhat.com>

diff --git a/drivers/isdn/hardware/mISDN/hfcpci.c b/drivers/isdn/hardware/mISDN/hfcpci.c
index 5940a2c..12484d7 100644
--- a/drivers/isdn/hardware/mISDN/hfcpci.c
+++ b/drivers/isdn/hardware/mISDN/hfcpci.c
@@ -2317,8 +2317,10 @@ _hfcpci_softirq(struct device *dev, void *arg)
 static void
 hfcpci_softirq(void *arg)
 {
-	(void) driver_for_each_device(&hfc_driver.driver, NULL, arg,
-					_hfcpci_softirq);
+	int ret;
+
+	ret = driver_for_each_device(&hfc_driver.driver, NULL, arg,
+				     _hfcpci_softirq);
 
 	/* if next event would be in the past ... */
 	if ((s32)(hfc_jiffies + tics - jiffies) <= 0)

^ permalink raw reply related

* tc: RTM_GETQDISC causes kernel OOPS
From: Ben Pfaff @ 2010-05-21 22:42 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev

Hi.  While working on some library code for working with qdiscs and
classes I came upon a kernel OOPS.  Originally I came across it with a
2.6.26 kernel, but I can also reproduce it with unmodified v2.6.34 from
kernel.org.

At the end of this mail I'm appending both an example of the OOPS and a
simple test program that reliably reproduces the problem for me when I
invoke it with "lo" as argument.  The program does not need to be run as
root.

After the OOPS, a lot of networking and other system functions stop
working, so it seems to me a serious issue.

The null pointer dereference that causes the OOPS is the dereference of
the return value of qdisc_dev() in tc_fill_qdisc() in
net/sched/sch_api.c line 1163:

    1161	tcm->tcm__pad1 = 0;
    1162	tcm->tcm__pad2 = 0;
    1163	tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
    1164	tcm->tcm_parent = clid;
    1165	tcm->tcm_handle = q->handle;

I am pretty sure about that, because if I add "WARN_ON(!qdisc_dev(q));"
just before line 1163 then that warning triggers.

Thanks,

Ben.

----------------------------------------------------------------------

BUG: unable to handle kernel NULL pointer dereference at 00000050
IP: [<c12280c0>] tc_fill_qdisc+0x68/0x1e5
*pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: 
Modules linked in:

Pid: 600, comm: qdisc Not tainted 2.6.34 #16 /
EIP: 0060:[<c12280c0>] EFLAGS: 00010282 CPU: 0
EIP is at tc_fill_qdisc+0x68/0x1e5
EAX: 00000000 EBX: ffffffff ECX: 00000000 EDX: c7222070
ESI: c14576e0 EDI: c7115200 EBP: c7239ca0 ESP: c7239c3c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process qdisc (pid: 600, ti=c7239000 task=c720b700 task.ti=c7239000)
Stack:
 00000024 00000014 00000000 c14323a0 c7222060 c7222060 c10a7abd 00001030
<0> 000000d0 c7222060 000000d0 c1228329 000000d0 00000fc4 000000d0 c7115200
<0> 000000d0 00000ec0 c7239cac c12104b1 00000ec0 c1457a98 c7115200 00000258
Call Trace:
 [<c10a7abd>] ? __kmalloc_track_caller+0x122/0x131
 [<c1228329>] ? qdisc_notify+0x2a/0xc8
 [<c12104b1>] ? __alloc_skb+0x4e/0x115
 [<c122838a>] ? qdisc_notify+0x8b/0xc8
 [<c12287ea>] ? tc_get_qdisc+0x143/0x15d
 [<c12286a7>] ? tc_get_qdisc+0x0/0x15d
 [<c1220c28>] ? rtnetlink_rcv_msg+0x195/0x1af
 [<c1220a93>] ? rtnetlink_rcv_msg+0x0/0x1af
 [<c12329e2>] ? netlink_rcv_skb+0x30/0x75
 [<c1220a8b>] ? rtnetlink_rcv+0x1e/0x26
 [<c123281b>] ? netlink_unicast+0xc4/0x11a
 [<c1232eda>] ? netlink_sendmsg+0x223/0x230
 [<c120978c>] ? sock_sendmsg+0xa8/0xbf
 [<c10514fc>] ? print_lock_contention_bug+0x14/0xd7
 [<c101ebe4>] ? __wake_up+0x15/0x3b
 [<c101ebe4>] ? __wake_up+0x15/0x3b
 [<c101ec00>] ? __wake_up+0x31/0x3b
 [<c10acda4>] ? fget_light+0x2d/0xaf
 [<c109488d>] ? might_fault+0x47/0x81
 [<c120acf9>] ? sys_sendto+0xa4/0xc0
 [<c116b82c>] ? _copy_from_user+0x2e/0x108
 [<c120ad92>] ? sys_connect+0x63/0x6e
 [<c120ad2d>] ? sys_send+0x18/0x1a
 [<c120aedb>] ? sys_socketcall+0xd4/0x1a5
 [<c12ca931>] ? syscall_call+0x7/0xb
Code: 50 8b 55 08 89 f8 6a 14 ff 75 14 e8 49 fa ff ff 89 c2 83 c2 10 89 45 ac c6 42 01 00 66 c7 42 02 00 00 c6 40 10 00 8b 46 40 8b 00 <8b> 40 50 89 5a 0c 89 42 04 8b 46 20 89 42 08 8b 46 28 89 42 10 
EIP: [<c12280c0>] tc_fill_qdisc+0x68/0x1e5 SS:ESP 0068:c7239c3c
CR2: 0000000000000050
---[ end trace 6fb85bbc66de8f42 ]---

----------------------------------------------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
#include <linux/rtnetlink.h>
#include <linux/pkt_sched.h>
#include <net/if.h>

int
main(int argc, char *argv[])
{
    struct {
        struct nlmsghdr nlmsg;
        struct tcmsg tcmsg;
    } msg;

    struct sockaddr_nl local, remote;
    int ifindex;
    int fd;

    if (argc != 2) {
        fprintf(stderr,
                "usage: %s <netdev>\n"
                "where <netdev> is a network device, e.g. \"lo\"\n",
                argv[0]);
        return EXIT_FAILURE;
    }

    /* Get ifindex. */
    ifindex = if_nametoindex(argv[1]);
    if (!ifindex) {
        fprintf(stderr, "no network device named \"%s\"", argv[1]);
        return EXIT_FAILURE;
    }

    /* Make rtnetlink socket. */
    fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
    if (fd < 0) {
        perror("socket");
        return EXIT_FAILURE;
    }

    /* Bind local address as our selected pid. */
    memset(&local, 0, sizeof local);
    local.nl_family = AF_NETLINK;
    local.nl_pid = getpid();
    if (bind(fd, (struct sockaddr *) &local, sizeof local) < 0) {
        perror("bind");
        return EXIT_FAILURE;
    }

    /* Bind remote address as the kernel (pid 0). */
    memset(&remote, 0, sizeof remote);
    remote.nl_family = AF_NETLINK;
    remote.nl_pid = 0;
    if (connect(fd, (struct sockaddr *) &remote, sizeof remote) < 0) {
        perror("connect");
        return EXIT_FAILURE;
    }

    /* Send "get" request. */
    memset(&msg, 0, sizeof msg);
    msg.nlmsg.nlmsg_len = sizeof msg;
    msg.nlmsg.nlmsg_type = RTM_GETQDISC;
    msg.nlmsg.nlmsg_flags = NLM_F_REQUEST | NLM_F_ECHO | NLM_F_ACK;
    msg.nlmsg.nlmsg_seq = 1;
    msg.nlmsg.nlmsg_pid = getpid();
    msg.tcmsg.tcm_family = AF_UNSPEC;
    msg.tcmsg.tcm_ifindex = ifindex;
    msg.tcmsg.tcm_handle = 0;
    msg.tcmsg.tcm_parent = TC_H_ROOT;
    if (send(fd, &msg, sizeof msg, 0) < 0) {
        perror("send");
        return EXIT_FAILURE;
    }

    return 0;
}

^ permalink raw reply

* Re: [PATCH 1/4] IPv6: keep route for tentative address
From: Emil S Tantilov @ 2010-05-21 22:21 UTC (permalink / raw)
  To: Stephen Hemminger, David S. Miller, NetDev; +Cc: Tantilov, Emil S
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A1365FE4FCE12C@rrsmsx501.amr.corp.intel.com>

On Mon, Apr 12, 2010 at 1:17 PM, Tantilov, Emil S
<emil.s.tantilov@intel.com> wrote:
> Stephen Hemminger wrote:
>> Recent changes preserve IPv6 address when link goes down (good).
>> But would cause address to point to dead dst entry (bad).
>> The simplest fix is to just not delete route if address is
>> being held for later use.
>>
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>
>>
>> --- a/net/ipv6/addrconf.c     2010-04-11 12:19:37.938082190 -0700
>> +++ b/net/ipv6/addrconf.c     2010-04-11 12:25:05.349309074 -0700
>> @@ -4046,7 +4046,8 @@ static void __ipv6_ifa_notify(int event,
>>                       addrconf_leave_anycast(ifp);
>>               addrconf_leave_solict(ifp->idev, &ifp->addr);
>>               dst_hold(&ifp->rt->u.dst);
>> -             if (ip6_del_rt(ifp->rt))
>> +
>> +             if (ifp->dead && ip6_del_rt(ifp->rt))
>>                       dst_free(&ifp->rt->u.dst);
>>               break;
>>       }
>
> Stephen,
>
> With these series of patches (1-4) applied I am no longer seeing the
> hangs and warnings associated with ipv6. I ran few rounds of
> tests with resetting the interface and loading/unloading modules.

Looks like these patches did not make it into net-2.6 and the issue can be seen
in the latest stable 2.6.34.

Any chance that the patches can be pushed to 2.6.34-stable?

Thanks,
Emil

^ permalink raw reply

* Re: bug fix patch lost: git problem or just incorrect merge?
From: David Miller @ 2010-05-21 22:07 UTC (permalink / raw)
  To: torvalds; +Cc: James.Bottomley, linux-kernel, netdev, linux-scsi
In-Reply-To: <alpine.LFD.2.00.1005210931010.4243@i5.linux-foundation.org>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 21 May 2010 09:45:49 -0700 (PDT)

> One of the reasons I ask people to let me merge is that it results in 
> cleaner history to not have criss-cross merges. And another is that I'm 
> pretty good at it, and letting me make merges also means that I am more 
> aware of problem spots.

That wasn't possible in this case.

This happened more than a week ago, as I needed to merge your tree into
net-2.6 to resolve a conflict there.  That's what took in the iscsi
bug fix, and this is way before the merge window.

Then I needed to pull net-2.6 into net-next-2.6 to resolve conflicts
existing between those two trees.

And this is why I ended up having to do the merge :-)

>> Either way, of course, we need the patch back ...
> 
> I'll fix it up.

Thanks Linus.

^ permalink raw reply

* Re: bug fix patch lost: git problem or just incorrect merge?
From: David Miller @ 2010-05-21 22:02 UTC (permalink / raw)
  To: James.Bottomley; +Cc: torvalds, linux-kernel, netdev, linux-scsi
In-Reply-To: <1274456515.9022.14.camel@mulgrave.site>

From: James Bottomley <James.Bottomley@suse.de>
Date: Fri, 21 May 2010 10:41:55 -0500

> Is this a git problem ... or is it just a mismerge in the net tree?

Mismerge, because sk->sk_sleep() doesn't exist any more I mistakenly
updated the original line to do the sk_sleep() stuff.

Sorry about that.

^ permalink raw reply

* [PATCH] net-caif: drop redundant Kconfig entries
From: Mike Frysinger @ 2010-05-21 20:45 UTC (permalink / raw)
  To: netdev, David S. Miller, Sjur Braendeland

There is already a submenu entry that is always displayed, so there is
no need to also show a dedicated CAIF comment.

Drop dead commented code while we're here, and change the submenu text
to better match the style everyone else is using.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 net/caif/Kconfig |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/caif/Kconfig b/net/caif/Kconfig
index cd1daf6..ed65178 100644
--- a/net/caif/Kconfig
+++ b/net/caif/Kconfig
@@ -2,10 +2,8 @@
 # CAIF net configurations
 #
 
-#menu "CAIF Support"
-comment "CAIF Support"
 menuconfig CAIF
-	tristate "Enable CAIF support"
+	tristate "CAIF support"
 	select CRC_CCITT
 	default n
 	---help---
@@ -45,4 +43,3 @@ config CAIF_NETDEV
 	If unsure say Y.
 
 endif
-#endmenu
-- 
1.7.1


^ permalink raw reply related

* ixgbe: macvlan on PF/VF when SRIOV is enabled
From: Shirley Ma @ 2010-05-21 20:30 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, kvm, netdev, e1000-devel

Hello Jeff,

macvlan doesn't work on PF when SRIOV is enabled. Creating macvlan has
been successful, but ping (icmp request) goes to VF interface not
PF/macvlan even arp entry is correct. I patched ixgbe driver, and
macvlan/PF has worked with the patch. But I am not sure whether it is
right since I don't have the HW spec. What I did for ixgbe driver was: 

1. PF's rar index is 0, VMDQ index is adatper->num_vfs;
2. VF's rar is based on rar_used_count and mc_addr_in_rar_count, VMDQ
index is ;
3. PF's secondary addresses is PF's rar index + i, VMDQ index is
adapter->num_vfs.


Before I submit the patch, I want to understand the right index
assignment for both rar index and VMDQ index, when SRIOV enabled:
1. VMDQ index for PF is adapter->num_vfs, or 0? rar index is 0?
2. PF's secondary address rar index is based on
rar_used_count/mc_addr_in_rar_count?
2. VF's VPDQ index is based on vf number?
3. VF's rar index is vf + 1, or should be based on rar_used_count?

I am also working on macvlan on VF. The question here is whether macvlan
on VF should work or not? Looks like ixgbevf secondary addresses are not
in receiver address filter, so macvlan on VF doesn't work.

Thanks
Shirley


^ permalink raw reply

* [RFC PATCH] packet_mmap: expose hw packet timestamps to network packet capture utilities
From: Mcmillan, Scott A @ 2010-05-21 20:24 UTC (permalink / raw)
  To: netdev@vger.kernel.org, davem@davemloft.net
  Cc: tcpdump-workers@lists.tcpdump.org

This patch adds a setting, PACKET_TIMESTAMP, to specify the packet timestamp source that is exported to capture utilities like tcpdump by packet_mmap.  

PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE
and SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.

If PACKET_TIMESTAMP is not set, a software timestamp generated inside the networking stack is used (the behavior before this setting was added).

I am concurrently submitting a patch to the tcpdump / libpcap maintainers adding support for this capability.  

Thanks,
Scott

Signed-off-by: Scott McMillan <scott.a.mcmillan@intel.com>

--- a/include/linux/if_packet.h	2010-05-18 17:22:59.000000000 -0500
+++ b/include/linux/if_packet.h	2010-05-21 14:37:30.000000000 -0500
@@ -48,6 +48,7 @@
 #define PACKET_LOSS			14
 #define PACKET_VNET_HDR			15
 #define PACKET_TX_TIMESTAMP		16
+#define PACKET_TIMESTAMP		17
 
 struct tpacket_stats {
 	unsigned int	tp_packets;
--- a/net/packet/af_packet.c	2010-05-18 17:21:48.000000000 -0500
+++ b/net/packet/af_packet.c	2010-05-21 14:48:41.000000000 -0500
@@ -83,6 +83,7 @@
 #include <linux/if_vlan.h>
 #include <linux/virtio_net.h>
 #include <linux/errqueue.h>
+#include <linux/net_tstamp.h>
 
 #ifdef CONFIG_INET
 #include <net/inet_common.h>
@@ -202,6 +203,7 @@
 	unsigned int		tp_hdrlen;
 	unsigned int		tp_reserve;
 	unsigned int		tp_loss:1;
+	unsigned int		tp_tstamp;
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
 };
 
@@ -656,6 +658,7 @@
 	struct sk_buff *copy_skb = NULL;
 	struct timeval tv;
 	struct timespec ts;
+	struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
 
 	if (skb->pkt_type == PACKET_LOOPBACK)
 		goto drop;
@@ -737,7 +740,13 @@
 		h.h1->tp_snaplen = snaplen;
 		h.h1->tp_mac = macoff;
 		h.h1->tp_net = netoff;
-		if (skb->tstamp.tv64)
+		if ((po->tp_tstamp & SOF_TIMESTAMPING_SYS_HARDWARE)
+				&& shhwtstamps->syststamp.tv64)
+			tv = ktime_to_timeval(shhwtstamps->syststamp);
+		else if ((po->tp_tstamp & SOF_TIMESTAMPING_RAW_HARDWARE)
+				&& shhwtstamps->hwtstamp.tv64)
+			tv = ktime_to_timeval(shhwtstamps->hwtstamp);
+		else if (skb->tstamp.tv64)
 			tv = ktime_to_timeval(skb->tstamp);
 		else
 			do_gettimeofday(&tv);
@@ -750,7 +759,13 @@
 		h.h2->tp_snaplen = snaplen;
 		h.h2->tp_mac = macoff;
 		h.h2->tp_net = netoff;
-		if (skb->tstamp.tv64)
+		if ((po->tp_tstamp & SOF_TIMESTAMPING_SYS_HARDWARE)
+				&& shhwtstamps->syststamp.tv64)
+			ts = ktime_to_timespec(shhwtstamps->syststamp);
+		else if ((po->tp_tstamp & SOF_TIMESTAMPING_RAW_HARDWARE)
+				&& shhwtstamps->hwtstamp.tv64)
+			ts = ktime_to_timespec(shhwtstamps->hwtstamp);
+		else if (skb->tstamp.tv64)
 			ts = ktime_to_timespec(skb->tstamp);
 		else
 			getnstimeofday(&ts);
@@ -2027,6 +2042,18 @@
 		po->has_vnet_hdr = !!val;
 		return 0;
 	}
+	case PACKET_TIMESTAMP:
+	{
+		int val;
+
+		if (optlen != sizeof(val))
+			return -EINVAL;
+		if (copy_from_user(&val, optval, sizeof(val)))
+			return -EFAULT;
+
+		po->tp_tstamp = val;
+		return 0;
+	}
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -2119,6 +2146,12 @@
 		val = po->tp_loss;
 		data = &val;
 		break;
+	case PACKET_TIMESTAMP:
+		if (len > sizeof(int))
+			len = sizeof(int);
+		val = po->tp_tstamp;
+		data = &val;
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
--- a/Documentation/networking/packet_mmap.txt	2010-05-18 17:24:18.000000000 -0500
+++ b/Documentation/networking/packet_mmap.txt	2010-05-21 14:39:48.000000000 -0500
@@ -493,6 +493,32 @@
     pfd.events = POLLOUT;
     retval = poll(&pfd, 1, timeout);
 
+-------------------------------------------------------------------------------
++ PACKET_TIMESTAMP
+-------------------------------------------------------------------------------
+
+The PACKET_TIMESTAMP setting determines the source of the timestamp in
+the packet meta information.  If your NIC is capable of timestamping
+packets in hardware, you can request those hardware timestamps to used.
+Note: you may need to enable the generation of hardware timestamps with
+SIOCSHWTSTAMP.
+
+PACKET_TIMESTAMP accepts the same integer bit field as
+SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE
+and SOF_TIMESTAMPING_RAW_HARDWARE values are recognized by
+PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
+SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.
+
+    int req = 0;
+    req |= SOF_TIMESTAMPING_SYS_HARDWARE;
+    setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
+
+If PACKET_TIMESTAMP is not set, a software timestamp generated inside
+the networking stack is used (the behavior before this setting was added).
+
+See include/linux/net_tstamp.h and Documentation/networking/timestamping
+for more information on hardware timestamps.
+
 --------------------------------------------------------------------------------
 + THANKS
 --------------------------------------------------------------------------------

^ permalink raw reply

* RE: [PATCH] net: add additional lock to qdisc to increase throughput
From: Duyck, Alexander H @ 2010-05-21 20:04 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <1274454480.2439.418.camel@edumazet-laptop>

Eric Dumazet wrote:
> Tests with following script gave a boost from ~50.000 pps to ~600.000
> pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
> (A single netperf flow can reach ~800.000 pps on this platform)
> 
> for j in `seq 0 3`; do
>   for i in `seq 0 7`; do
>     netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
>   done
> done

Running the same script with your patch my results went from 200Kpps to 1.2Mpps on a dual Xeon 5570.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>


^ permalink raw reply

* RE: ixgbe and SRIOV failure in driver?
From: Rose, Gregory V @ 2010-05-21 19:18 UTC (permalink / raw)
  To: Fischer, Anna, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org
In-Reply-To: <0199E0D51A61344794750DC57738F58E70B08A2A20@GVW1118EXC.americas.hpqcorp.net>

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Fischer, Anna
>Sent: Friday, May 21, 2010 9:33 AM
>To: e1000-devel@lists.sourceforge.net; netdev@vger.kernel.org
>Subject: ixgbe and SRIOV failure in driver?
>
>I am running a system with 3 Intel NICs. Two of them are 82598 devices,
>and one is a SRIOV capable 82599.
>
>All devices use the ixgbe driver. What happens (I believe) now is that
>when the driver loads at first, it sees the 82598 first (because of its
>position in the PCI tree) and then it says "Device not IOV capable -
>switching off IOV."
>
>So then it switches into non-IOV mode, and I can never enable SRIOV on
>my 82599, because the driver does not enable it any more for further
>devices.
>
>So to get around this issue, I tried to use pciback.hide to hide the
>82598 devices from the OS. That way I was hoping that the driver would
>switch on SRIOV on my 82599. However, then I got a kernel panic on boot
>(see below).
>
>I am running Xen 4 and the Dom0 kernel is a 2.6.31 kernel.

The ixgbe driver included with the 2.6.31 kernel does not support SR-IOV.
Where did you get the driver that does?  Please run ethtool -i <ethx> and
post the results.

- Greg Rose
Intel Corp.
Lan Access Division


^ permalink raw reply

* Re: [PATCH v2 2/2] phylib: Convert MDIO bitbang to new MDIO 45 format
From: Ben Hutchings @ 2010-05-21 18:55 UTC (permalink / raw)
  To: Andy Fleming; +Cc: davem, netdev
In-Reply-To: <1274466711-24962-3-git-send-email-afleming@freescale.com>

On Fri, 2010-05-21 at 13:31 -0500, Andy Fleming wrote:
> Now that we've added somewhat more complete MDIO 45 support to the PHY
> Lib, convert the MDIO bitbang driver to use this new infrastructure.
> 
> Signed-off-by: Andy Fleming <afleming@freescale.com>
> ---
>  drivers/net/phy/mdio-bitbang.c |   29 +++++++++++++++--------------
>  1 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/phy/mdio-bitbang.c b/drivers/net/phy/mdio-bitbang.c
> index 2f6f02e..be7ae74 100644
> --- a/drivers/net/phy/mdio-bitbang.c
> +++ b/drivers/net/phy/mdio-bitbang.c
[...]
> @@ -157,11 +154,13 @@ static int mdiobb_read(struct mii_bus *bus, int phy, int devad, int reg)
>  	struct mdiobb_ctrl *ctrl = bus->priv;
>  	int ret, i;
>  
> -	if (reg & MII_ADDR_C45) {
> -		reg = mdiobb_cmd_addr(ctrl, phy, reg);
> -		mdiobb_cmd(ctrl, MDIO_C45_READ, phy, reg);
> -	} else
> +	/* Clause 22 PHYs only use devad = 0, and Clause 45 only use nonzero */
> +	if (devad == MDIO_DEVAD_NONE)
>  		mdiobb_cmd(ctrl, MDIO_READ, phy, reg);
> +	else {
> +		mdiobb_cmd_addr(ctrl, phy, devad, reg);
> +		mdiobb_cmd(ctrl, MDIO_C45_READ, phy, devad);
> +	}
[...]

This comment is now wrong.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox