Netdev List

Netdev List
 help / color / mirror / Atom feed

* Hello
From: stephine dion @ 2011-07-05 14:02 UTC (permalink / raw)




Hello my Dear,
How are you today i hope you are fine, My Name is Miss stephine, I will want us to be friends, for something important which I would like to share with you, and we will get to know each other better, I hope you don't mind being my friend.  l want you to send an email to me to my email address  so that l can give you my picture for you to know whom l am.

please i am waiting for your responds to my email address, Remember the distance or colour does not matter but love matters allots in life)
yours new friend
stephine dion
 

^ permalink raw reply

* Re: libpcap and tc filters
From: Adam Katz @ 2011-07-05 14:21 UTC (permalink / raw)
  To: jhs; +Cc: netdev
In-Reply-To: <1309874213.1765.45.camel@mojatatu>

Yes. I understand the difference between ETH_P_ALL and ETH_P_IP...

Jamal, I've now tested both solutions - changing the rule to "protocol
all" and patching tcpreplay to use ETH_P_IP and both produced the
exact same problem as before...


On Tue, Jul 5, 2011 at 4:56 PM, jamal <hadi@cyberus.ca> wrote:
> On Tue, 2011-07-05 at 16:07 +0300, Adam Katz wrote:
>
>> second, I just took at the libpcap source code and it seems it's using
>> the same ETH_P_ALL option when binding to an interface. So based on
>> what you're saying, the same solution of patching libpcap and
>> replacing ETH_P_ALL with  ETH_P_IP should also make these rules work
>> with traffic sent using pure libpcap or any libpcap - based
>> application.
>
> ETH_P_ALL makes sense if you are unsure it is going to be IP. So i would
> change/optimize apps only for IP if they are intended to deal with IP
> only (same for ARP etc).
> In your case, it seems it is tcp only - which runs on top of IP. So
> it makes sense to do it for that specific use case etc.
>
> cheers,
> jamal
>
>
>

^ permalink raw reply

* [net-next PATCH v2 2/2] dcbnl: Add CEE notification
From: Shmulik Ravid @ 2011-07-05 16:16 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev

This patch add an unsolicited notification of the DCBX negotiated
parameters for the CEE flavor of the DCBX protocol. The notification
message is identical to the aggregated CEE get operation and holds all
the pertinent local and peer information. The notification routine is
exported so it can be invoked by drivers supporting an embedded DCBX
stack.

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 include/net/dcbnl.h |    5 +-
 net/dcb/dcbnl.c     |  415 ++++++++++++++++++++++++++++-----------------------
 2 files changed, 229 insertions(+), 191 deletions(-)

diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index d5bbb79..f5aa399 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -34,7 +34,10 @@ int dcb_ieee_setapp(struct net_device *, struct dcb_app *);
 int dcb_ieee_delapp(struct net_device *, struct dcb_app *);
 u8 dcb_ieee_getapp_mask(struct net_device *, struct dcb_app *);
 
-int dcbnl_notify(struct net_device *dev, int event, int cmd, u32 seq, u32 pid);
+int dcbnl_ieee_notify(struct net_device *dev, int event, int cmd,
+		      u32 seq, u32 pid);
+int dcbnl_cee_notify(struct net_device *dev, int event, int cmd,
+		     u32 seq, u32 pid);
 
 /*
  * Ops struct for the netlink callbacks.  Used by DCB-enabled drivers through
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index d5b45a2..6a015f2 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1310,8 +1310,196 @@ nla_put_failure:
 	return err;
 }
 
-int dcbnl_notify(struct net_device *dev, int event, int cmd,
-			u32 seq, u32 pid)
+static int dcbnl_cee_pg_fill(struct sk_buff *skb, struct net_device *dev,
+			     int dir)
+{
+	u8 pgid, up_map, prio, tc_pct;
+	const struct dcbnl_rtnl_ops *ops = dev->dcbnl_ops;
+	int i = dir ? DCB_ATTR_CEE_TX_PG : DCB_ATTR_CEE_RX_PG;
+	struct nlattr *pg = nla_nest_start(skb, i);
+
+	if (!pg)
+		goto nla_put_failure;
+
+	for (i = DCB_PG_ATTR_TC_0; i <= DCB_PG_ATTR_TC_7; i++) {
+		struct nlattr *tc_nest = nla_nest_start(skb, i);
+
+		if (!tc_nest)
+			goto nla_put_failure;
+
+		pgid = DCB_ATTR_VALUE_UNDEFINED;
+		prio = DCB_ATTR_VALUE_UNDEFINED;
+		tc_pct = DCB_ATTR_VALUE_UNDEFINED;
+		up_map = DCB_ATTR_VALUE_UNDEFINED;
+
+		if (!dir)
+			ops->getpgtccfgrx(dev, i - DCB_PG_ATTR_TC_0,
+					  &prio, &pgid, &tc_pct, &up_map);
+		else
+			ops->getpgtccfgtx(dev, i - DCB_PG_ATTR_TC_0,
+					  &prio, &pgid, &tc_pct, &up_map);
+
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_PGID, pgid);
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_UP_MAPPING, up_map);
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_STRICT_PRIO, prio);
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_BW_PCT, tc_pct);
+		nla_nest_end(skb, tc_nest);
+	}
+
+	for (i = DCB_PG_ATTR_BW_ID_0; i <= DCB_PG_ATTR_BW_ID_7; i++) {
+		tc_pct = DCB_ATTR_VALUE_UNDEFINED;
+
+		if (!dir)
+			ops->getpgbwgcfgrx(dev, i - DCB_PG_ATTR_BW_ID_0,
+					   &tc_pct);
+		else
+			ops->getpgbwgcfgtx(dev, i - DCB_PG_ATTR_BW_ID_0,
+					   &tc_pct);
+		NLA_PUT_U8(skb, i, tc_pct);
+	}
+	nla_nest_end(skb, pg);
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int dcbnl_cee_fill(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct nlattr *cee, *app;
+	struct dcb_app_type *itr;
+	const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops;
+	int dcbx, i, err = -EMSGSIZE;
+	u8 value;
+
+	NLA_PUT_STRING(skb, DCB_ATTR_IFNAME, netdev->name);
+
+	cee = nla_nest_start(skb, DCB_ATTR_CEE);
+	if (!cee)
+		goto nla_put_failure;
+
+	/* local pg */
+	if (ops->getpgtccfgtx && ops->getpgbwgcfgtx) {
+		err = dcbnl_cee_pg_fill(skb, netdev, 1);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	if (ops->getpgtccfgrx && ops->getpgbwgcfgrx) {
+		err = dcbnl_cee_pg_fill(skb, netdev, 0);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	/* local pfc */
+	if (ops->getpfccfg) {
+		struct nlattr *pfc_nest = nla_nest_start(skb, DCB_ATTR_CEE_PFC);
+
+		if (!pfc_nest)
+			goto nla_put_failure;
+
+		for (i = DCB_PFC_UP_ATTR_0; i <= DCB_PFC_UP_ATTR_7; i++) {
+			ops->getpfccfg(netdev, i - DCB_PFC_UP_ATTR_0, &value);
+			NLA_PUT_U8(skb, i, value);
+		}
+		nla_nest_end(skb, pfc_nest);
+	}
+
+	/* local app */
+	spin_lock(&dcb_lock);
+	app = nla_nest_start(skb, DCB_ATTR_CEE_APP_TABLE);
+	if (!app)
+		goto nla_put_failure;
+
+	list_for_each_entry(itr, &dcb_app_list, list) {
+		if (strncmp(itr->name, netdev->name, IFNAMSIZ) == 0) {
+			struct nlattr *app_nest = nla_nest_start(skb,
+								 DCB_ATTR_APP);
+			if (!app_nest)
+				goto dcb_unlock;
+
+			err = nla_put_u8(skb, DCB_APP_ATTR_IDTYPE,
+					 itr->app.selector);
+			if (err)
+				goto dcb_unlock;
+
+			err = nla_put_u16(skb, DCB_APP_ATTR_ID,
+					  itr->app.protocol);
+			if (err)
+				goto dcb_unlock;
+
+			err = nla_put_u8(skb, DCB_APP_ATTR_PRIORITY,
+					 itr->app.priority);
+			if (err)
+				goto dcb_unlock;
+
+			nla_nest_end(skb, app_nest);
+		}
+	}
+	nla_nest_end(skb, app);
+
+	if (netdev->dcbnl_ops->getdcbx)
+		dcbx = netdev->dcbnl_ops->getdcbx(netdev);
+	else
+		dcbx = -EOPNOTSUPP;
+
+	spin_unlock(&dcb_lock);
+
+	/* features flags */
+	if (ops->getfeatcfg) {
+		struct nlattr *feat = nla_nest_start(skb, DCB_ATTR_CEE_FEAT);
+		if (!feat)
+			goto nla_put_failure;
+
+		for (i = DCB_FEATCFG_ATTR_ALL + 1; i <= DCB_FEATCFG_ATTR_MAX;
+		     i++)
+			if (!ops->getfeatcfg(netdev, i, &value))
+				NLA_PUT_U8(skb, i, value);
+
+		nla_nest_end(skb, feat);
+	}
+
+	/* peer info if available */
+	if (ops->cee_peer_getpg) {
+		struct cee_pg pg;
+		err = ops->cee_peer_getpg(netdev, &pg);
+		if (!err)
+			NLA_PUT(skb, DCB_ATTR_CEE_PEER_PG, sizeof(pg), &pg);
+	}
+
+	if (ops->cee_peer_getpfc) {
+		struct cee_pfc pfc;
+		err = ops->cee_peer_getpfc(netdev, &pfc);
+		if (!err)
+			NLA_PUT(skb, DCB_ATTR_CEE_PEER_PFC, sizeof(pfc), &pfc);
+	}
+
+	if (ops->peer_getappinfo && ops->peer_getapptable) {
+		err = dcbnl_build_peer_app(netdev, skb,
+					   DCB_ATTR_CEE_PEER_APP_TABLE,
+					   DCB_ATTR_CEE_PEER_APP_INFO,
+					   DCB_ATTR_CEE_PEER_APP);
+		if (err)
+			goto nla_put_failure;
+	}
+	nla_nest_end(skb, cee);
+
+	/* DCBX state */
+	if (dcbx >= 0) {
+		err = nla_put_u8(skb, DCB_ATTR_DCBX, dcbx);
+		if (err)
+			goto nla_put_failure;
+	}
+	return 0;
+
+dcb_unlock:
+	spin_unlock(&dcb_lock);
+nla_put_failure:
+	return err;
+}
+
+static int dcbnl_notify(struct net_device *dev, int event, int cmd,
+			u32 seq, u32 pid, int dcbx_ver)
 {
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
@@ -1337,7 +1525,11 @@ int dcbnl_notify(struct net_device *dev, int event, int cmd,
 	dcb->dcb_family = AF_UNSPEC;
 	dcb->cmd = cmd;
 
-	err = dcbnl_ieee_fill(skb, dev);
+	if (dcbx_ver == DCB_CAP_DCBX_VER_IEEE)
+		err = dcbnl_ieee_fill(skb, dev);
+	else
+		err = dcbnl_cee_fill(skb, dev);
+
 	if (err < 0) {
 		/* Report error to broadcast listeners */
 		nlmsg_cancel(skb, nlh);
@@ -1351,7 +1543,20 @@ int dcbnl_notify(struct net_device *dev, int event, int cmd,
 
 	return err;
 }
-EXPORT_SYMBOL(dcbnl_notify);
+
+int dcbnl_ieee_notify(struct net_device *dev, int event, int cmd,
+		      u32 seq, u32 pid)
+{
+	return dcbnl_notify(dev, event, cmd, seq, pid, DCB_CAP_DCBX_VER_IEEE);
+}
+EXPORT_SYMBOL(dcbnl_ieee_notify);
+
+int dcbnl_cee_notify(struct net_device *dev, int event, int cmd,
+		     u32 seq, u32 pid)
+{
+	return dcbnl_notify(dev, event, cmd, seq, pid, DCB_CAP_DCBX_VER_CEE);
+}
+EXPORT_SYMBOL(dcbnl_cee_notify);
 
 /* Handle IEEE 802.1Qaz SET commands. If any requested operation can not
  * be completed the entire msg is aborted and error value is returned.
@@ -1411,7 +1616,7 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlattr **tb,
 err:
 	dcbnl_reply(err, RTM_SETDCB, DCB_CMD_IEEE_SET, DCB_ATTR_IEEE,
 		    pid, seq, flags);
-	dcbnl_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_SET, seq, 0);
+	dcbnl_ieee_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_SET, seq, 0);
 	return err;
 }
 
@@ -1495,7 +1700,7 @@ static int dcbnl_ieee_del(struct net_device *netdev, struct nlattr **tb,
 err:
 	dcbnl_reply(err, RTM_SETDCB, DCB_CMD_IEEE_DEL, DCB_ATTR_IEEE,
 		    pid, seq, flags);
-	dcbnl_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_DEL, seq, 0);
+	dcbnl_ieee_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_DEL, seq, 0);
 	return err;
 }
 
@@ -1642,72 +1847,16 @@ err:
 	return ret;
 }
 
-static int dcbnl_cee_pg_fill(struct sk_buff *skb, struct net_device *dev,
-			     int dir)
-{
-	u8 pgid, up_map, prio, tc_pct;
-	const struct dcbnl_rtnl_ops *ops = dev->dcbnl_ops;
-	int i = dir ? DCB_ATTR_CEE_TX_PG : DCB_ATTR_CEE_RX_PG;
-	struct nlattr *pg = nla_nest_start(skb, i);
-
-	if (!pg)
-		goto nla_put_failure;
-
-	for (i = DCB_PG_ATTR_TC_0; i <= DCB_PG_ATTR_TC_7; i++) {
-		struct nlattr *tc_nest = nla_nest_start(skb, i);
-
-		if (!tc_nest)
-			goto nla_put_failure;
-
-		pgid = DCB_ATTR_VALUE_UNDEFINED;
-		prio = DCB_ATTR_VALUE_UNDEFINED;
-		tc_pct = DCB_ATTR_VALUE_UNDEFINED;
-		up_map = DCB_ATTR_VALUE_UNDEFINED;
-
-		if (!dir)
-			ops->getpgtccfgrx(dev, i - DCB_PG_ATTR_TC_0,
-					  &prio, &pgid, &tc_pct, &up_map);
-		else
-			ops->getpgtccfgtx(dev, i - DCB_PG_ATTR_TC_0,
-					  &prio, &pgid, &tc_pct, &up_map);
-
-		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_PGID, pgid);
-		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_UP_MAPPING, up_map);
-		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_STRICT_PRIO, prio);
-		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_BW_PCT, tc_pct);
-		nla_nest_end(skb, tc_nest);
-	}
-
-	for (i = DCB_PG_ATTR_BW_ID_0; i <= DCB_PG_ATTR_BW_ID_7; i++) {
-		tc_pct = DCB_ATTR_VALUE_UNDEFINED;
-
-		if (!dir)
-			ops->getpgbwgcfgrx(dev, i - DCB_PG_ATTR_BW_ID_0,
-					   &tc_pct);
-		else
-			ops->getpgbwgcfgtx(dev, i - DCB_PG_ATTR_BW_ID_0,
-					   &tc_pct);
-		NLA_PUT_U8(skb, i, tc_pct);
-	}
-	nla_nest_end(skb, pg);
-	return 0;
-
-nla_put_failure:
-	return -EMSGSIZE;
-}
-
 /* Handle CEE DCBX GET commands. */
 static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
 			 u32 pid, u32 seq, u16 flags)
 {
+	struct net *net = dev_net(netdev);
 	struct sk_buff *skb;
 	struct nlmsghdr *nlh;
 	struct dcbmsg *dcb;
-	struct nlattr *cee, *app;
-	struct dcb_app_type *itr;
 	const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops;
-	int dcbx, i, err = -EMSGSIZE;
-	u8 value;
+	int err;
 
 	if (!ops)
 		return -EOPNOTSUPP;
@@ -1716,139 +1865,25 @@ static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
 	if (!skb)
 		return -ENOBUFS;
 
-	nlh = NLMSG_NEW(skb, pid, seq, RTM_GETDCB, sizeof(*dcb), flags);
+	nlh = nlmsg_put(skb, pid, seq, RTM_GETDCB, sizeof(*dcb), flags);
+	if (nlh == NULL) {
+		nlmsg_free(skb);
+		return -EMSGSIZE;
+	}
 
 	dcb = NLMSG_DATA(nlh);
 	dcb->dcb_family = AF_UNSPEC;
 	dcb->cmd = DCB_CMD_CEE_GET;
 
-	NLA_PUT_STRING(skb, DCB_ATTR_IFNAME, netdev->name);
-
-	cee = nla_nest_start(skb, DCB_ATTR_CEE);
-	if (!cee)
-		goto nla_put_failure;
-
-	/* local pg */
-	if (ops->getpgtccfgtx && ops->getpgbwgcfgtx) {
-		err = dcbnl_cee_pg_fill(skb, netdev, 1);
-		if (err)
-			goto nla_put_failure;
-	}
-
-	if (ops->getpgtccfgrx && ops->getpgbwgcfgrx) {
-		err = dcbnl_cee_pg_fill(skb, netdev, 0);
-		if (err)
-			goto nla_put_failure;
-	}
-
-	/* local pfc */
-	if (ops->getpfccfg) {
-		struct nlattr *pfc_nest = nla_nest_start(skb, DCB_ATTR_CEE_PFC);
-
-		if (!pfc_nest)
-			goto nla_put_failure;
+	err = dcbnl_cee_fill(skb, netdev);
 
-		for (i = DCB_PFC_UP_ATTR_0; i <= DCB_PFC_UP_ATTR_7; i++) {
-			ops->getpfccfg(netdev, i - DCB_PFC_UP_ATTR_0, &value);
-			NLA_PUT_U8(skb, i, value);
-		}
-		nla_nest_end(skb, pfc_nest);
-	}
-
-	/* local app */
-	spin_lock(&dcb_lock);
-	app = nla_nest_start(skb, DCB_ATTR_CEE_APP_TABLE);
-	if (!app)
-		goto nla_put_failure;
-
-	list_for_each_entry(itr, &dcb_app_list, list) {
-		if (strncmp(itr->name, netdev->name, IFNAMSIZ) == 0) {
-			struct nlattr *app_nest = nla_nest_start(skb,
-								 DCB_ATTR_APP);
-			if (!app_nest)
-				goto dcb_unlock;
-
-			err = nla_put_u8(skb, DCB_APP_ATTR_IDTYPE,
-					 itr->app.selector);
-			if (err)
-				goto dcb_unlock;
-
-			err = nla_put_u16(skb, DCB_APP_ATTR_ID,
-					  itr->app.protocol);
-			if (err)
-				goto dcb_unlock;
-
-			err = nla_put_u8(skb, DCB_APP_ATTR_PRIORITY,
-					 itr->app.priority);
-			if (err)
-				goto dcb_unlock;
-
-			nla_nest_end(skb, app_nest);
-		}
-	}
-	nla_nest_end(skb, app);
-
-	if (netdev->dcbnl_ops->getdcbx)
-		dcbx = netdev->dcbnl_ops->getdcbx(netdev);
-	else
-		dcbx = -EOPNOTSUPP;
-
-	spin_unlock(&dcb_lock);
-
-	/* features flags */
-	if (ops->getfeatcfg) {
-		struct nlattr *feat = nla_nest_start(skb, DCB_ATTR_CEE_FEAT);
-		if (!feat)
-			goto nla_put_failure;
-
-		for (i = DCB_FEATCFG_ATTR_ALL + 1; i <= DCB_FEATCFG_ATTR_MAX;
-		     i++)
-			if (!ops->getfeatcfg(netdev, i, &value))
-				NLA_PUT_U8(skb, i, value);
-
-		nla_nest_end(skb, feat);
-	}
-
-	/* peer info if available */
-	if (ops->cee_peer_getpg) {
-		struct cee_pg pg;
-		err = ops->cee_peer_getpg(netdev, &pg);
-		if (!err)
-			NLA_PUT(skb, DCB_ATTR_CEE_PEER_PG, sizeof(pg), &pg);
-	}
-
-	if (ops->cee_peer_getpfc) {
-		struct cee_pfc pfc;
-		err = ops->cee_peer_getpfc(netdev, &pfc);
-		if (!err)
-			NLA_PUT(skb, DCB_ATTR_CEE_PEER_PFC, sizeof(pfc), &pfc);
-	}
-
-	if (ops->peer_getappinfo && ops->peer_getapptable) {
-		err = dcbnl_build_peer_app(netdev, skb,
-					   DCB_ATTR_CEE_PEER_APP_TABLE,
-					   DCB_ATTR_CEE_PEER_APP_INFO,
-					   DCB_ATTR_CEE_PEER_APP);
-		if (err)
-			goto nla_put_failure;
-	}
-	nla_nest_end(skb, cee);
-
-	/* DCBX state */
-	if (dcbx >= 0) {
-		err = nla_put_u8(skb, DCB_ATTR_DCBX, dcbx);
-		if (err)
-			goto nla_put_failure;
+	if (err < 0) {
+		nlmsg_cancel(skb, nlh);
+		nlmsg_free(skb);
+	} else {
+		nlmsg_end(skb, nlh);
+		err = rtnl_unicast(skb, net, pid);
 	}
-	nlmsg_end(skb, nlh);
-	return rtnl_unicast(skb, &init_net, pid);
-
-dcb_unlock:
-	spin_unlock(&dcb_lock);
-nla_put_failure:
-	nlmsg_cancel(skb, nlh);
-nlmsg_failure:
-	nlmsg_free(skb);
 	return err;
 }
 
-- 
1.7.3.5





^ permalink raw reply related

* [net-next PATCH v2 1/2] dcbnl: Aggregated CEE GET operation
From: Shmulik Ravid @ 2011-07-05 16:16 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev

The following couple of patches add dcbnl an unsolicited notification of
the the DCB configuration for the CEE flavor of the DCBX protocol. This
is useful when the user-mode DCB client is not responsible for
conducting and resolving the DCBX negotiation (either because the DCBX
stack is embedded in the HW or the negotiation is handled by another
agent in he host), but still needs to get the negotiated parameters.
This functionality already exists for the IEEE flavor of the DCBX
protocol and these patches add it to the older CEE flavor.

The first patch extends the CEE attribute GET operation to include not
only the peer information, but also all the pertinent local
configuration (negotiated parameters). The second patch adds and export
a CEE specific notification routine.

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 include/linux/dcbnl.h |   23 +++++++-
 net/dcb/dcbnl.c       |  159 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 173 insertions(+), 9 deletions(-)

diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
index 66a6723..65a2562 100644
--- a/include/linux/dcbnl.h
+++ b/include/linux/dcbnl.h
@@ -333,18 +333,30 @@ enum ieee_attrs_app {
 #define DCB_ATTR_IEEE_APP_MAX (__DCB_ATTR_IEEE_APP_MAX - 1)
 
 /**
- * enum cee_attrs - CEE DCBX get attributes
+ * enum cee_attrs - CEE DCBX get attributes.
  *
  * @DCB_ATTR_CEE_UNSPEC: unspecified
  * @DCB_ATTR_CEE_PEER_PG: peer PG configuration - get only
  * @DCB_ATTR_CEE_PEER_PFC: peer PFC configuration - get only
- * @DCB_ATTR_CEE_PEER_APP: peer APP tlv - get only
+ * @DCB_ATTR_CEE_PEER_APP_TABLE: peer APP tlv - get only
+ * @DCB_ATTR_CEE_TX_PG: TX PG configuration (DCB_CMD_PGTX_GCFG)
+ * @DCB_ATTR_CEE_RX_PG: RX PG configuration (DCB_CMD_PGRX_GCFG)
+ * @DCB_ATTR_CEE_PFC: PFC configuration (DCB_CMD_PFC_GCFG)
+ * @DCB_ATTR_CEE_APP_TABLE: APP configuration (multi DCB_CMD_GAPP)
+ * @DCB_ATTR_CEE_FEAT: DCBX features flags (DCB_CMD_GFEATCFG)
+ *
+ * An aggregated collection of the cee std negotiated parameters.
  */
 enum cee_attrs {
 	DCB_ATTR_CEE_UNSPEC,
 	DCB_ATTR_CEE_PEER_PG,
 	DCB_ATTR_CEE_PEER_PFC,
 	DCB_ATTR_CEE_PEER_APP_TABLE,
+	DCB_ATTR_CEE_TX_PG,
+	DCB_ATTR_CEE_RX_PG,
+	DCB_ATTR_CEE_PFC,
+	DCB_ATTR_CEE_APP_TABLE,
+	DCB_ATTR_CEE_FEAT,
 	__DCB_ATTR_CEE_MAX
 };
 #define DCB_ATTR_CEE_MAX (__DCB_ATTR_CEE_MAX - 1)
@@ -357,6 +369,13 @@ enum peer_app_attr {
 };
 #define DCB_ATTR_CEE_PEER_APP_MAX (__DCB_ATTR_CEE_PEER_APP_MAX - 1)
 
+enum cee_attrs_app {
+	DCB_ATTR_CEE_APP_UNSPEC,
+	DCB_ATTR_CEE_APP,
+	__DCB_ATTR_CEE_APP_MAX
+};
+#define DCB_ATTR_CEE_APP_MAX (__DCB_ATTR_CEE_APP_MAX - 1)
+
 /**
  * enum dcbnl_pfc_attrs - DCB Priority Flow Control user priority nested attrs
  *
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index fc56e85..d5b45a2 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1642,6 +1642,60 @@ err:
 	return ret;
 }
 
+static int dcbnl_cee_pg_fill(struct sk_buff *skb, struct net_device *dev,
+			     int dir)
+{
+	u8 pgid, up_map, prio, tc_pct;
+	const struct dcbnl_rtnl_ops *ops = dev->dcbnl_ops;
+	int i = dir ? DCB_ATTR_CEE_TX_PG : DCB_ATTR_CEE_RX_PG;
+	struct nlattr *pg = nla_nest_start(skb, i);
+
+	if (!pg)
+		goto nla_put_failure;
+
+	for (i = DCB_PG_ATTR_TC_0; i <= DCB_PG_ATTR_TC_7; i++) {
+		struct nlattr *tc_nest = nla_nest_start(skb, i);
+
+		if (!tc_nest)
+			goto nla_put_failure;
+
+		pgid = DCB_ATTR_VALUE_UNDEFINED;
+		prio = DCB_ATTR_VALUE_UNDEFINED;
+		tc_pct = DCB_ATTR_VALUE_UNDEFINED;
+		up_map = DCB_ATTR_VALUE_UNDEFINED;
+
+		if (!dir)
+			ops->getpgtccfgrx(dev, i - DCB_PG_ATTR_TC_0,
+					  &prio, &pgid, &tc_pct, &up_map);
+		else
+			ops->getpgtccfgtx(dev, i - DCB_PG_ATTR_TC_0,
+					  &prio, &pgid, &tc_pct, &up_map);
+
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_PGID, pgid);
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_UP_MAPPING, up_map);
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_STRICT_PRIO, prio);
+		NLA_PUT_U8(skb, DCB_TC_ATTR_PARAM_BW_PCT, tc_pct);
+		nla_nest_end(skb, tc_nest);
+	}
+
+	for (i = DCB_PG_ATTR_BW_ID_0; i <= DCB_PG_ATTR_BW_ID_7; i++) {
+		tc_pct = DCB_ATTR_VALUE_UNDEFINED;
+
+		if (!dir)
+			ops->getpgbwgcfgrx(dev, i - DCB_PG_ATTR_BW_ID_0,
+					   &tc_pct);
+		else
+			ops->getpgbwgcfgtx(dev, i - DCB_PG_ATTR_BW_ID_0,
+					   &tc_pct);
+		NLA_PUT_U8(skb, i, tc_pct);
+	}
+	nla_nest_end(skb, pg);
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 /* Handle CEE DCBX GET commands. */
 static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
 			 u32 pid, u32 seq, u16 flags)
@@ -1649,9 +1703,11 @@ static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
 	struct sk_buff *skb;
 	struct nlmsghdr *nlh;
 	struct dcbmsg *dcb;
-	struct nlattr *cee;
+	struct nlattr *cee, *app;
+	struct dcb_app_type *itr;
 	const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops;
-	int err;
+	int dcbx, i, err = -EMSGSIZE;
+	u8 value;
 
 	if (!ops)
 		return -EOPNOTSUPP;
@@ -1672,7 +1728,88 @@ static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
 	if (!cee)
 		goto nla_put_failure;
 
-	/* get peer info if available */
+	/* local pg */
+	if (ops->getpgtccfgtx && ops->getpgbwgcfgtx) {
+		err = dcbnl_cee_pg_fill(skb, netdev, 1);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	if (ops->getpgtccfgrx && ops->getpgbwgcfgrx) {
+		err = dcbnl_cee_pg_fill(skb, netdev, 0);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	/* local pfc */
+	if (ops->getpfccfg) {
+		struct nlattr *pfc_nest = nla_nest_start(skb, DCB_ATTR_CEE_PFC);
+
+		if (!pfc_nest)
+			goto nla_put_failure;
+
+		for (i = DCB_PFC_UP_ATTR_0; i <= DCB_PFC_UP_ATTR_7; i++) {
+			ops->getpfccfg(netdev, i - DCB_PFC_UP_ATTR_0, &value);
+			NLA_PUT_U8(skb, i, value);
+		}
+		nla_nest_end(skb, pfc_nest);
+	}
+
+	/* local app */
+	spin_lock(&dcb_lock);
+	app = nla_nest_start(skb, DCB_ATTR_CEE_APP_TABLE);
+	if (!app)
+		goto nla_put_failure;
+
+	list_for_each_entry(itr, &dcb_app_list, list) {
+		if (strncmp(itr->name, netdev->name, IFNAMSIZ) == 0) {
+			struct nlattr *app_nest = nla_nest_start(skb,
+								 DCB_ATTR_APP);
+			if (!app_nest)
+				goto dcb_unlock;
+
+			err = nla_put_u8(skb, DCB_APP_ATTR_IDTYPE,
+					 itr->app.selector);
+			if (err)
+				goto dcb_unlock;
+
+			err = nla_put_u16(skb, DCB_APP_ATTR_ID,
+					  itr->app.protocol);
+			if (err)
+				goto dcb_unlock;
+
+			err = nla_put_u8(skb, DCB_APP_ATTR_PRIORITY,
+					 itr->app.priority);
+			if (err)
+				goto dcb_unlock;
+
+			nla_nest_end(skb, app_nest);
+		}
+	}
+	nla_nest_end(skb, app);
+
+	if (netdev->dcbnl_ops->getdcbx)
+		dcbx = netdev->dcbnl_ops->getdcbx(netdev);
+	else
+		dcbx = -EOPNOTSUPP;
+
+	spin_unlock(&dcb_lock);
+
+	/* features flags */
+	if (ops->getfeatcfg) {
+		struct nlattr *feat = nla_nest_start(skb, DCB_ATTR_CEE_FEAT);
+		if (!feat)
+			goto nla_put_failure;
+
+		for (i = DCB_FEATCFG_ATTR_ALL + 1; i <= DCB_FEATCFG_ATTR_MAX;
+		     i++)
+			if (!ops->getfeatcfg(netdev, i, &value))
+				NLA_PUT_U8(skb, i, value);
+
+		nla_nest_end(skb, feat);
+	}
+
+	/* peer info if available */
 	if (ops->cee_peer_getpg) {
 		struct cee_pg pg;
 		err = ops->cee_peer_getpg(netdev, &pg);
@@ -1695,16 +1832,24 @@ static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
 		if (err)
 			goto nla_put_failure;
 	}
-
 	nla_nest_end(skb, cee);
-	nlmsg_end(skb, nlh);
 
+	/* DCBX state */
+	if (dcbx >= 0) {
+		err = nla_put_u8(skb, DCB_ATTR_DCBX, dcbx);
+		if (err)
+			goto nla_put_failure;
+	}
+	nlmsg_end(skb, nlh);
 	return rtnl_unicast(skb, &init_net, pid);
+
+dcb_unlock:
+	spin_unlock(&dcb_lock);
 nla_put_failure:
 	nlmsg_cancel(skb, nlh);
 nlmsg_failure:
-	kfree_skb(skb);
-	return -1;
+	nlmsg_free(skb);
+	return err;
 }
 
 static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
-- 
1.7.3.5





^ permalink raw reply related

* Partnership seeking.
From: Mr Steve Martins. @ 2011-07-05 12:06 UTC (permalink / raw)


Kindly reply for details..

^ permalink raw reply

* Re: [PATCH RFC] igb: Fix false positive return of igb_get_auto_rd_done for 82580
From: Guenter Roeck @ 2011-07-05 14:31 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	Brandeburg, Jesse, Tong Ho, Ronciak, John
In-Reply-To: <1308691047.22851.60.camel@jtkirshe-mobl>

On Tue, Jun 21, 2011 at 05:17:26PM -0400, Jeff Kirsher wrote:
> On Tue, 2011-06-21 at 12:02 -0700, Guenter Roeck wrote:
> > From: Tong Ho <tong.ho@ericsson.com>
> > 
> > 82580 re-reads the port specific portion of eeprom after port reset.
> > If called immediately after a reset, igb_get_auto_rd_done() returns
> > false positive because the done bit has yet to transition from 1 to 0.
> > 
> > Add wrfl() immediately after resetting 82580 port or device,
> > plus a 1ms delay, to avoid the problem.
> > 
> > Signed-off-by: Tong Ho <tong.ho@ericsson.com>
> > Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
> > ---
> > Sent as RFC since I am not entirely sure if the solution is the
> > correct one
> > to address the problem we are seeing. If there is a better solution,
> > please
> > let me know. 
> 
> Thank you for the suggested patch.  Carolyn is the maintainer for igb
> and we will look into this issue you are seeing and the suggested fix. 

Hi Jeff and Carolyn,

Any update ?

Thanks,
Guenter

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: libpcap and tc filters
From: jamal @ 2011-07-05 14:41 UTC (permalink / raw)
  To: Adam Katz; +Cc: netdev
In-Reply-To: <CAA0qwj74cvZmkkmA8zBFuXeHdidMco2=de7Li9rDN5Wcp=-G7w@mail.gmail.com>

On Tue, 2011-07-05 at 17:21 +0300, Adam Katz wrote:
> Yes. I understand the difference between ETH_P_ALL and ETH_P_IP...
> 
> Jamal, I've now tested both solutions - changing the rule to "protocol
> all" and patching tcpreplay to use ETH_P_IP and both produced the
> exact same problem as before...

Sorry - dont have much time to chase further, but it works for me.

---
hadi@mojatatu10:~$ sudo tc qdisc del dev eth0 root handle 1:
RTNETLINK answers: Invalid argument
hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 root handle 1: prio
priomap 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 parent 1:1 handle 10:
pfifo  
hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 parent 1:2 handle 20:
pfifo
hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 parent 1:3 handle 30:
pfifo
hadi@mojatatu10:~$ sudo tc filter add dev eth0 protocol all parent 1:
prio 1 u32 match ip dport 22 0xffff flowid 1:1 action ok
hadi@mojatatu10:~$ sudo tc -s filter ls dev eth0
filter parent 1: protocol all pref 1 u32 
filter parent 1: protocol all pref 1 u32 fh 800: ht divisor 1 
filter parent 1: protocol all pref 1 u32 fh 800::800 order 2048 key ht
800 bkt 0 flowid 1:1 
  match 00000016/0000ffff at 20
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1 installed 15 sec used 15 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
	backlog 0b 0p requeues 0 

Note - the "OK" action is just a place holder to count packets.
Now replay Adam's pcap file:

hadi@mojatatu10:~/Downloads$ sudo tcpreplay
--intf1=eth0 ./port22example.pcap

sending out eth0 
processing file: ./port22example.pcap
Actual: 50 packets (11594 bytes) sent in 3.66 seconds
Rated: 3167.8 bps, 0.02 Mbps, 13.66 pps
Statistics for network device: eth0
	Attempted packets:         50
	Successful packets:        50
	Failed packets:            0
	Retried packets (ENOBUFS): 0
	Retried packets (EAGAIN):  0

I dont have any ssh running on this maching. So
lets check to see if anything was captured by the filter.

-----
hadi@mojatatu10:~$ sudo tc -s filter ls dev eth0
filter parent 1: protocol all pref 1 u32 
filter parent 1: protocol all pref 1 u32 fh 800: ht divisor 1 
filter parent 1: protocol all pref 1 u32 fh 800::800 order 2048 key ht
800 bkt 0 flowid 1:1 
  match 00000016/0000ffff at 20
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1 installed 76 sec used 1 sec
 	Action statistics:
	Sent 7763 bytes 26 pkt (dropped 0, overlimits 0 requeues 0) 
	backlog 0b 0p requeues 0 
------

cheers,
jamal

> 
> On Tue, Jul 5, 2011 at 4:56 PM, jamal <hadi@cyberus.ca> wrote:
> > On Tue, 2011-07-05 at 16:07 +0300, Adam Katz wrote:
> >
> >> second, I just took at the libpcap source code and it seems it's using
> >> the same ETH_P_ALL option when binding to an interface. So based on
> >> what you're saying, the same solution of patching libpcap and
> >> replacing ETH_P_ALL with  ETH_P_IP should also make these rules work
> >> with traffic sent using pure libpcap or any libpcap - based
> >> application.
> >
> > ETH_P_ALL makes sense if you are unsure it is going to be IP. So i would
> > change/optimize apps only for IP if they are intended to deal with IP
> > only (same for ARP etc).
> > In your case, it seems it is tcp only - which runs on top of IP. So
> > it makes sense to do it for that specific use case etc.
> >
> > cheers,
> > jamal
> >
> >
> >



^ permalink raw reply

* Re: [PATCH v2 net-next af-packet 1/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: chetan loke @ 2011-07-05 14:53 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, eric.dumazet, joe, bhutchings, shemminger, linux-kernel
In-Reply-To: <20110701.153633.267893668051099806.davem@davemloft.net>

On Fri, Jul 1, 2011 at 6:36 PM, David Miller <davem@davemloft.net> wrote:
> From: Chetan Loke <loke.chetan@gmail.com>
> Date: Tue, 21 Jun 2011 22:10:49 -0400
>
>> +struct bd_v1 {
>  -
>> +     __u32   block_status;
>> +     __u32   num_pkts;
>> +     __u32   offset_to_first_pkt;
>  -
>> +     __u32   blk_len;
>  -
>> +     __u64   seq_num;
>  ...
>> +     union {
>> +             struct {
>> +                     __u32   words[4];
>> +                     __u64   dword;
>> +             } __attribute__ ((__packed__));
>> +             struct bd_v1 bd1;
>  ...
>> +#define BLOCK_STATUS(x)      ((x)->words[0])
>> +#define BLOCK_NUM_PKTS(x)    ((x)->words[1])
>> +#define BLOCK_O2FP(x)                ((x)->words[2])
>> +#define BLOCK_LEN(x)         ((x)->words[3])
>> +#define BLOCK_SNUM(x)                ((x)->dword)
>

Sorry, I was out on the long weekend. So couldn't get to this sooner.

> This BLOCK_SNUM definition is buggy.  It modifies the
> first 64-bit word in the block descriptor.
>
> But the sequence number lives 16 bytes into the descriptor.

hmm? the words/dword are enveloped within a 'struct'. Can you please
double check?

>
> This value is only written to once and never used by anything.
> I would just remove it entirely.
>

It is used by the applications. Look at the code comments:
	/*
	 * Quite a few uses of sequence number:
	 * 1. Make sure cache flush etc worked.
	 *    Well, one can argue - why not use the increasing ts below?
	 *    But look at 2. below first.
	 * 2. When you pass around blocks to other user space decoders,
	 *    you can see which blk[s] is[are] outstanding etc.
	 * 3. Validate kernel code.
	 */

> Next, having this overlay thing is entirely pointless.  Just refer to

It is useful.
Also, future versions of the block-descriptor can append a new field.
When that happens,
none of the code needs to worry about the version etc for the unchanged fields.
Look at setsockopt - I had to add an 'union' and pass that around to
avoid minimal code churn.
So the overlay may not be pointless.

> the block descriptor members directly!  You certainly wouldn't have
> had this sequence number bug if you had done that.
>
Look at the sample app posted on:
git://lolpcap.git.sourceforge.net/gitroot/lolpcap/lolpcap

function - void validate_blk_seq_num(struct block_desc *pbd)

This function validates the block_sequence_number (which is
incremented sequentially).
The application attempts to validate the entire block layout.

Chetan Loke

^ permalink raw reply

* Re: [PATCH v2 net-next af-packet 1/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: David Miller @ 2011-07-05 15:01 UTC (permalink / raw)
  To: loke.chetan
  Cc: netdev, eric.dumazet, joe, bhutchings, shemminger, linux-kernel
In-Reply-To: <CAAsGZS5vvmb_qX+cG507=hU_+kwnowEEojXGNMt5ShEZ9+ZeAA@mail.gmail.com>

From: chetan loke <loke.chetan@gmail.com>
Date: Tue, 5 Jul 2011 10:53:26 -0400

>> Next, having this overlay thing is entirely pointless.  Just refer to
> 
> It is useful.
> Also, future versions of the block-descriptor can append a new field.
> When that happens,
> none of the code needs to worry about the version etc for the unchanged fields.

That issue only exists because you haven't defined a common header
struct that the current, and all future, block descriptor variants can
include at the start of their definitions.

I still contend that all of these abstractions are too much and
unnecessary.

Use real data structures, not opaque "offset+size" poking into the
descriptors.

^ permalink raw reply

* [PATCH v2 0/3] Add device tree probe support for imx fec driver
From: Shawn Guo @ 2011-07-05 15:13 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, devicetree-discuss, patches

The first two patches are a little off topic.  Patch #1 adds a helper
function of_get_phy_mode into of_net, and #2 converts ibm_newemac net
driver to use this helper function.  Patch #3 is the actual one adding
tree probe support for imx fec driver, with of_get_phy_mode being used.

Changes since v1:
 * Address review comments given by Grant
 * Add patch #1 and #2

Shawn Guo (3):
      dt/net: add helper function of_get_phy_mode
      net: ibm_newemac: convert it to use of_get_phy_mode
      net/fec: add device tree probe support

 Documentation/devicetree/bindings/net/fsl-fec.txt |   24 +++++
 drivers/net/fec.c                                 |   99 +++++++++++++++++++-
 drivers/net/ibm_newemac/core.c                    |   33 +------
 drivers/net/ibm_newemac/emac.h                    |   19 ++--
 drivers/net/ibm_newemac/phy.c                     |    7 +-
 drivers/of/of_net.c                               |   45 +++++++++
 include/linux/of_net.h                            |    1 +
 include/linux/phy.h                               |    4 +-
 8 files changed, 186 insertions(+), 46 deletions(-)


^ permalink raw reply

* [PATCH v2 1/3] dt/net: add helper function of_get_phy_mode
From: Shawn Guo @ 2011-07-05 15:13 UTC (permalink / raw)
  To: netdev
  Cc: linux-arm-kernel, devicetree-discuss, patches, Shawn Guo,
	Grant Likely
In-Reply-To: <1309878839-25743-1-git-send-email-shawn.guo@linaro.org>

It adds the helper function of_get_phy_mode getting phy interface
from device tree.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
---
 drivers/of/of_net.c    |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/of_net.h |    1 +
 2 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
index 86f334a..cc117db 100644
--- a/drivers/of/of_net.c
+++ b/drivers/of/of_net.c
@@ -8,6 +8,49 @@
 #include <linux/etherdevice.h>
 #include <linux/kernel.h>
 #include <linux/of_net.h>
+#include <linux/phy.h>
+
+/**
+ * It maps 'enum phy_interface_t' found in include/linux/phy.h
+ * into the device tree binding of 'phy-mode', so that Ethernet
+ * device driver can get phy interface from device tree.
+ */
+static const char *phy_modes[] = {
+	[PHY_INTERFACE_MODE_MII]	= "mii",
+	[PHY_INTERFACE_MODE_GMII]	= "gmii",
+	[PHY_INTERFACE_MODE_SGMII]	= "sgmii",
+	[PHY_INTERFACE_MODE_TBI]	= "tbi",
+	[PHY_INTERFACE_MODE_RMII]	= "rmii",
+	[PHY_INTERFACE_MODE_RGMII]	= "rgmii",
+	[PHY_INTERFACE_MODE_RGMII_ID]	= "rgmii-id",
+	[PHY_INTERFACE_MODE_RGMII_RXID]	= "rgmii-rxid",
+	[PHY_INTERFACE_MODE_RGMII_TXID] = "rgmii-txid",
+	[PHY_INTERFACE_MODE_RTBI]	= "rtbi",
+};
+
+/**
+ * of_get_phy_mode - Get phy mode for given device_node
+ * @np:	Pointer to the given device_node
+ *
+ * The function gets phy interface string from property 'phy-mode',
+ * and return its index in phy_modes table, or errno in error case.
+ */
+const int of_get_phy_mode(struct device_node *np)
+{
+	const char *pm;
+	int err, i;
+
+	err = of_property_read_string(np, "phy-mode", &pm);
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < ARRAY_SIZE(phy_modes); i++)
+		if (!strcasecmp(pm, phy_modes[i]))
+			return i;
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(of_get_phy_mode);
 
 /**
  * Search the device tree for the best MAC address to use.  'mac-address' is
diff --git a/include/linux/of_net.h b/include/linux/of_net.h
index e913081..f474641 100644
--- a/include/linux/of_net.h
+++ b/include/linux/of_net.h
@@ -9,6 +9,7 @@
 
 #ifdef CONFIG_OF_NET
 #include <linux/of.h>
+extern const int of_get_phy_mode(struct device_node *np);
 extern const void *of_get_mac_address(struct device_node *np);
 #endif
 
-- 
1.7.4.1



^ permalink raw reply related

* [PATCH v2 2/3] net: ibm_newemac: convert it to use of_get_phy_mode
From: Shawn Guo @ 2011-07-05 15:13 UTC (permalink / raw)
  To: netdev
  Cc: linux-arm-kernel, devicetree-discuss, patches, Shawn Guo,
	David S. Miller, Grant Likely
In-Reply-To: <1309878839-25743-1-git-send-email-shawn.guo@linaro.org>

The patch extends 'enum phy_interface_t' and of_get_phy_mode a little
bit with PHY_INTERFACE_MODE_NA and PHY_INTERFACE_MODE_SMII added,
and then converts ibm_newemac net driver to use of_get_phy_mode
getting phy mode from device tree.

It also resolves the namespace conflict on phy_read/write between
common mdiobus interface and ibm_newemac private one.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
---
 drivers/net/ibm_newemac/core.c |   33 ++++-----------------------------
 drivers/net/ibm_newemac/emac.h |   19 ++++++++++---------
 drivers/net/ibm_newemac/phy.c  |    7 +++++--
 drivers/of/of_net.c            |    2 ++
 include/linux/phy.h            |    4 +++-
 5 files changed, 24 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 725399e..70cb7d8 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -39,6 +39,7 @@
 #include <linux/bitops.h>
 #include <linux/workqueue.h>
 #include <linux/of.h>
+#include <linux/of_net.h>
 #include <linux/slab.h>
 
 #include <asm/processor.h>
@@ -2506,18 +2507,6 @@ static int __devinit emac_init_config(struct emac_instance *dev)
 {
 	struct device_node *np = dev->ofdev->dev.of_node;
 	const void *p;
-	unsigned int plen;
-	const char *pm, *phy_modes[] = {
-		[PHY_MODE_NA] = "",
-		[PHY_MODE_MII] = "mii",
-		[PHY_MODE_RMII] = "rmii",
-		[PHY_MODE_SMII] = "smii",
-		[PHY_MODE_RGMII] = "rgmii",
-		[PHY_MODE_TBI] = "tbi",
-		[PHY_MODE_GMII] = "gmii",
-		[PHY_MODE_RTBI] = "rtbi",
-		[PHY_MODE_SGMII] = "sgmii",
-	};
 
 	/* Read config from device-tree */
 	if (emac_read_uint_prop(np, "mal-device", &dev->mal_ph, 1))
@@ -2566,23 +2555,9 @@ static int __devinit emac_init_config(struct emac_instance *dev)
 		dev->mal_burst_size = 256;
 
 	/* PHY mode needs some decoding */
-	dev->phy_mode = PHY_MODE_NA;
-	pm = of_get_property(np, "phy-mode", &plen);
-	if (pm != NULL) {
-		int i;
-		for (i = 0; i < ARRAY_SIZE(phy_modes); i++)
-			if (!strcasecmp(pm, phy_modes[i])) {
-				dev->phy_mode = i;
-				break;
-			}
-	}
-
-	/* Backward compat with non-final DT */
-	if (dev->phy_mode == PHY_MODE_NA && pm != NULL && plen == 4) {
-		u32 nmode = *(const u32 *)pm;
-		if (nmode > PHY_MODE_NA && nmode <= PHY_MODE_SGMII)
-			dev->phy_mode = nmode;
-	}
+	dev->phy_mode = of_get_phy_mode(np);
+	if (dev->phy_mode < 0)
+		dev->phy_mode = PHY_MODE_NA;
 
 	/* Check EMAC version */
 	if (of_device_is_compatible(np, "ibm,emac4sync")) {
diff --git a/drivers/net/ibm_newemac/emac.h b/drivers/net/ibm_newemac/emac.h
index 8a61b597..1568278 100644
--- a/drivers/net/ibm_newemac/emac.h
+++ b/drivers/net/ibm_newemac/emac.h
@@ -26,6 +26,7 @@
 #define __IBM_NEWEMAC_H
 
 #include <linux/types.h>
+#include <linux/phy.h>
 
 /* EMAC registers 			Write Access rules */
 struct emac_regs {
@@ -106,15 +107,15 @@ struct emac_regs {
 /*
  * PHY mode settings (EMAC <-> ZMII/RGMII bridge <-> PHY)
  */
-#define PHY_MODE_NA	0
-#define PHY_MODE_MII	1
-#define PHY_MODE_RMII	2
-#define PHY_MODE_SMII	3
-#define PHY_MODE_RGMII	4
-#define PHY_MODE_TBI	5
-#define PHY_MODE_GMII	6
-#define PHY_MODE_RTBI	7
-#define PHY_MODE_SGMII	8
+#define PHY_MODE_NA	PHY_INTERFACE_MODE_NA
+#define PHY_MODE_MII	PHY_INTERFACE_MODE_MII
+#define PHY_MODE_RMII	PHY_INTERFACE_MODE_RMII
+#define PHY_MODE_SMII	PHY_INTERFACE_MODE_SMII
+#define PHY_MODE_RGMII	PHY_INTERFACE_MODE_RGMII
+#define PHY_MODE_TBI	PHY_INTERFACE_MODE_TBI
+#define PHY_MODE_GMII	PHY_INTERFACE_MODE_GMII
+#define PHY_MODE_RTBI	PHY_INTERFACE_MODE_RTBI
+#define PHY_MODE_SGMII	PHY_INTERFACE_MODE_SGMII
 
 /* EMACx_MR0 */
 #define EMAC_MR0_RXI			0x80000000
diff --git a/drivers/net/ibm_newemac/phy.c b/drivers/net/ibm_newemac/phy.c
index ac9d964..ab4e596 100644
--- a/drivers/net/ibm_newemac/phy.c
+++ b/drivers/net/ibm_newemac/phy.c
@@ -28,12 +28,15 @@
 #include "emac.h"
 #include "phy.h"
 
-static inline int phy_read(struct mii_phy *phy, int reg)
+#define phy_read _phy_read
+#define phy_write _phy_write
+
+static inline int _phy_read(struct mii_phy *phy, int reg)
 {
 	return phy->mdio_read(phy->dev, phy->address, reg);
 }
 
-static inline void phy_write(struct mii_phy *phy, int reg, int val)
+static inline void _phy_write(struct mii_phy *phy, int reg, int val)
 {
 	phy->mdio_write(phy->dev, phy->address, reg, val);
 }
diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
index cc117db..bb18471 100644
--- a/drivers/of/of_net.c
+++ b/drivers/of/of_net.c
@@ -16,6 +16,7 @@
  * device driver can get phy interface from device tree.
  */
 static const char *phy_modes[] = {
+	[PHY_INTERFACE_MODE_NA]		= "",
 	[PHY_INTERFACE_MODE_MII]	= "mii",
 	[PHY_INTERFACE_MODE_GMII]	= "gmii",
 	[PHY_INTERFACE_MODE_SGMII]	= "sgmii",
@@ -26,6 +27,7 @@ static const char *phy_modes[] = {
 	[PHY_INTERFACE_MODE_RGMII_RXID]	= "rgmii-rxid",
 	[PHY_INTERFACE_MODE_RGMII_TXID] = "rgmii-txid",
 	[PHY_INTERFACE_MODE_RTBI]	= "rtbi",
+	[PHY_INTERFACE_MODE_SMII]	= "smii",
 };
 
 /**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7da5fa8..1622081 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -53,6 +53,7 @@
 
 /* Interface Mode definitions */
 typedef enum {
+	PHY_INTERFACE_MODE_NA,
 	PHY_INTERFACE_MODE_MII,
 	PHY_INTERFACE_MODE_GMII,
 	PHY_INTERFACE_MODE_SGMII,
@@ -62,7 +63,8 @@ typedef enum {
 	PHY_INTERFACE_MODE_RGMII_ID,
 	PHY_INTERFACE_MODE_RGMII_RXID,
 	PHY_INTERFACE_MODE_RGMII_TXID,
-	PHY_INTERFACE_MODE_RTBI
+	PHY_INTERFACE_MODE_RTBI,
+	PHY_INTERFACE_MODE_SMII,
 } phy_interface_t;
 
 
-- 
1.7.4.1



^ permalink raw reply related

* [PATCH v2 3/3] net/fec: add device tree probe support
From: Shawn Guo @ 2011-07-05 15:13 UTC (permalink / raw)
  To: netdev
  Cc: linux-arm-kernel, devicetree-discuss, patches, Shawn Guo,
	Jason Liu, David S. Miller, Grant Likely
In-Reply-To: <1309878839-25743-1-git-send-email-shawn.guo@linaro.org>

It adds device tree probe support for fec driver.

Signed-off-by: Jason Liu <jason.hui@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
---
 Documentation/devicetree/bindings/net/fsl-fec.txt |   24 +++++
 drivers/net/fec.c                                 |   99 +++++++++++++++++++-
 2 files changed, 118 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/fsl-fec.txt

diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt
new file mode 100644
index 0000000..1dad888
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
@@ -0,0 +1,24 @@
+* Freescale Fast Ethernet Controller (FEC)
+
+Required properties:
+- compatible : Should be "fsl,<soc>-fec"
+- reg : Address and length of the register set for the device
+- interrupts : Should contain fec interrupt
+- phy-mode : String, operation mode of the PHY interface.
+  Supported values are: "mii", "gmii", "sgmii", "tbi", "rmii",
+  "rgmii", "rgmii-id", "rgmii-rxid", "rgmii-txid", "rtbi".
+- gpios : Should specify the gpio for phy reset
+
+Optional properties:
+- local-mac-address : 6 bytes, mac address
+
+Example:
+
+fec@83fec000 {
+	compatible = "fsl,imx51-fec", "fsl,imx27-fec";
+	reg = <0x83fec000 0x4000>;
+	interrupts = <87>;
+	phy-mode = "mii";
+	gpios = <&gpio1 14 0>; /* phy-reset, GPIO2_14 */
+	local-mac-address = [00 04 9F 01 1B B9];
+};
diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index 7ae3f28..dec94f4 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -44,6 +44,10 @@
 #include <linux/platform_device.h>
 #include <linux/phy.h>
 #include <linux/fec.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_gpio.h>
+#include <linux/of_net.h>
 
 #include <asm/cacheflush.h>
 
@@ -78,6 +82,17 @@ static struct platform_device_id fec_devtype[] = {
 	{ }
 };
 
+enum fec_type {
+	IMX27_FEC,
+	IMX28_FEC,
+};
+
+static const struct of_device_id fec_dt_ids[] = {
+	{ .compatible = "fsl,imx27-fec", .data = &fec_devtype[IMX27_FEC], },
+	{ .compatible = "fsl,imx28-fec", .data = &fec_devtype[IMX28_FEC], },
+	{ /* sentinel */ }
+};
+
 static unsigned char macaddr[ETH_ALEN];
 module_param_array(macaddr, byte, NULL, 0);
 MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
@@ -734,8 +749,22 @@ static void __inline__ fec_get_mac(struct net_device *ndev)
 	 */
 	iap = macaddr;
 
+#ifdef CONFIG_OF
+	/*
+	 * 2) from device tree data
+	 */
+	if (!is_valid_ether_addr(iap)) {
+		struct device_node *np = fep->pdev->dev.of_node;
+		if (np) {
+			const char *mac = of_get_mac_address(np);
+			if (mac)
+				iap = (unsigned char *) mac;
+		}
+	}
+#endif
+
 	/*
-	 * 2) from flash or fuse (via platform data)
+	 * 3) from flash or fuse (via platform data)
 	 */
 	if (!is_valid_ether_addr(iap)) {
 #ifdef CONFIG_M5272
@@ -748,7 +777,7 @@ static void __inline__ fec_get_mac(struct net_device *ndev)
 	}
 
 	/*
-	 * 3) FEC mac registers set by bootloader
+	 * 4) FEC mac registers set by bootloader
 	 */
 	if (!is_valid_ether_addr(iap)) {
 		*((unsigned long *) &tmpaddr[0]) =
@@ -1358,6 +1387,53 @@ static int fec_enet_init(struct net_device *ndev)
 	return 0;
 }
 
+#ifdef CONFIG_OF
+static int __devinit fec_get_phy_mode_dt(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+
+	if (np)
+		return of_get_phy_mode(np);
+
+	return -ENODEV;
+}
+
+static int __devinit fec_reset_phy(struct platform_device *pdev)
+{
+	int err, phy_reset;
+	struct device_node *np = pdev->dev.of_node;
+
+	if (!np)
+		return -ENODEV;
+
+	phy_reset = of_get_gpio(np, 0);
+	err = gpio_request_one(phy_reset, GPIOF_OUT_INIT_LOW, "phy-reset");
+	if (err) {
+		pr_warn("FEC: failed to get gpio phy-reset: %d\n", err);
+		return err;
+	}
+
+	msleep(1);
+	gpio_set_value(phy_reset, 1);
+
+	return 0;
+}
+#else /* CONFIG_OF */
+static inline int fec_get_phy_mode_dt(struct platform_device *pdev)
+{
+	return -ENODEV;
+}
+
+static inline int fec_reset_phy(struct platform_device *pdev)
+{
+	/*
+	 * In case of platform probe, the reset has been done
+	 * by machine code.
+	 */
+	return 0;
+}
+#endif /* CONFIG_OF */
+
 static int __devinit
 fec_probe(struct platform_device *pdev)
 {
@@ -1366,6 +1442,11 @@ fec_probe(struct platform_device *pdev)
 	struct net_device *ndev;
 	int i, irq, ret = 0;
 	struct resource *r;
+	const struct of_device_id *of_id;
+
+	of_id = of_match_device(fec_dt_ids, &pdev->dev);
+	if (of_id)
+		pdev->id_entry = of_id->data;
 
 	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!r)
@@ -1397,9 +1478,16 @@ fec_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, ndev);
 
-	pdata = pdev->dev.platform_data;
-	if (pdata)
-		fep->phy_interface = pdata->phy;
+	fep->phy_interface = fec_get_phy_mode_dt(pdev);
+	if (fep->phy_interface < 0) {
+		pdata = pdev->dev.platform_data;
+		if (pdata)
+			fep->phy_interface = pdata->phy;
+		else
+			fep->phy_interface = PHY_INTERFACE_MODE_MII;
+	}
+
+	fec_reset_phy(pdev);
 
 	/* This device has up to three irqs on some platforms */
 	for (i = 0; i < 3; i++) {
@@ -1534,6 +1622,7 @@ static struct platform_driver fec_driver = {
 #ifdef CONFIG_PM
 		.pm	= &fec_pm_ops,
 #endif
+		.of_match_table = fec_dt_ids,
 	},
 	.id_table = fec_devtype,
 	.probe	= fec_probe,
-- 
1.7.4.1



^ permalink raw reply related

* Re: libpcap and tc filters
From: Adam Katz @ 2011-07-05 15:16 UTC (permalink / raw)
  To: jhs; +Cc: netdev
In-Reply-To: <1309876868.1765.53.camel@mojatatu>

strange.
I've now tried the exact same configuration and it simply refuses to
work. Maybe your tcpreplay is configured differently...

What distro are you using? What kernel? What version of libpcap?


On Tue, Jul 5, 2011 at 5:41 PM, jamal <hadi@cyberus.ca> wrote:
> On Tue, 2011-07-05 at 17:21 +0300, Adam Katz wrote:
>> Yes. I understand the difference between ETH_P_ALL and ETH_P_IP...
>>
>> Jamal, I've now tested both solutions - changing the rule to "protocol
>> all" and patching tcpreplay to use ETH_P_IP and both produced the
>> exact same problem as before...
>
> Sorry - dont have much time to chase further, but it works for me.
>
> ---
> hadi@mojatatu10:~$ sudo tc qdisc del dev eth0 root handle 1:
> RTNETLINK answers: Invalid argument
> hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 root handle 1: prio
> priomap 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 parent 1:1 handle 10:
> pfifo
> hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 parent 1:2 handle 20:
> pfifo
> hadi@mojatatu10:~$ sudo tc qdisc add dev eth0 parent 1:3 handle 30:
> pfifo
> hadi@mojatatu10:~$ sudo tc filter add dev eth0 protocol all parent 1:
> prio 1 u32 match ip dport 22 0xffff flowid 1:1 action ok
> hadi@mojatatu10:~$ sudo tc -s filter ls dev eth0
> filter parent 1: protocol all pref 1 u32
> filter parent 1: protocol all pref 1 u32 fh 800: ht divisor 1
> filter parent 1: protocol all pref 1 u32 fh 800::800 order 2048 key ht
> 800 bkt 0 flowid 1:1
>  match 00000016/0000ffff at 20
>        action order 1: gact action pass
>         random type none pass val 0
>         index 1 ref 1 bind 1 installed 15 sec used 15 sec
>        Action statistics:
>        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>        backlog 0b 0p requeues 0
>
> Note - the "OK" action is just a place holder to count packets.
> Now replay Adam's pcap file:
>
> hadi@mojatatu10:~/Downloads$ sudo tcpreplay
> --intf1=eth0 ./port22example.pcap
>
> sending out eth0
> processing file: ./port22example.pcap
> Actual: 50 packets (11594 bytes) sent in 3.66 seconds
> Rated: 3167.8 bps, 0.02 Mbps, 13.66 pps
> Statistics for network device: eth0
>        Attempted packets:         50
>        Successful packets:        50
>        Failed packets:            0
>        Retried packets (ENOBUFS): 0
>        Retried packets (EAGAIN):  0
>
> I dont have any ssh running on this maching. So
> lets check to see if anything was captured by the filter.
>
> -----
> hadi@mojatatu10:~$ sudo tc -s filter ls dev eth0
> filter parent 1: protocol all pref 1 u32
> filter parent 1: protocol all pref 1 u32 fh 800: ht divisor 1
> filter parent 1: protocol all pref 1 u32 fh 800::800 order 2048 key ht
> 800 bkt 0 flowid 1:1
>  match 00000016/0000ffff at 20
>        action order 1: gact action pass
>         random type none pass val 0
>         index 1 ref 1 bind 1 installed 76 sec used 1 sec
>        Action statistics:
>        Sent 7763 bytes 26 pkt (dropped 0, overlimits 0 requeues 0)
>        backlog 0b 0p requeues 0
> ------
>
> cheers,
> jamal
>
>>
>> On Tue, Jul 5, 2011 at 4:56 PM, jamal <hadi@cyberus.ca> wrote:
>> > On Tue, 2011-07-05 at 16:07 +0300, Adam Katz wrote:
>> >
>> >> second, I just took at the libpcap source code and it seems it's using
>> >> the same ETH_P_ALL option when binding to an interface. So based on
>> >> what you're saying, the same solution of patching libpcap and
>> >> replacing ETH_P_ALL with  ETH_P_IP should also make these rules work
>> >> with traffic sent using pure libpcap or any libpcap - based
>> >> application.
>> >
>> > ETH_P_ALL makes sense if you are unsure it is going to be IP. So i would
>> > change/optimize apps only for IP if they are intended to deal with IP
>> > only (same for ARP etc).
>> > In your case, it seems it is tcp only - which runs on top of IP. So
>> > it makes sense to do it for that specific use case etc.
>> >
>> > cheers,
>> > jamal
>> >
>> >
>> >
>
>
>

^ permalink raw reply

* [RFC] non-preemptible kernel socket for RAMster
From: Dan Magenheimer @ 2011-07-05 15:54 UTC (permalink / raw)
  To: netdev; +Cc: Konrad Wilk, linux-mm

In working on a kernel project called RAMster* (where RAM on a
remote system may be used for clean page cache pages and for swap
pages), I found I have need for a kernel socket to be used when
in non-preemptible state.  I admit to being a networking idiot,
but I have been successfully using the following small patch.
I'm not sure whether I am lucky so far... perhaps more
sockets or larger/different loads will require a lot more
changes (or maybe even make my objective impossible).
So I thought I'd post it for comment.  I'd appreciate
any thoughts or suggestions.

Thanks,
Dan

* http://events.linuxfoundation.org/events/linuxcon/magenheimer 

diff -Napur linux-2.6.37/net/core/sock.c linux-2.6.37-ramster/net/core/sock.c
--- linux-2.6.37/net/core/sock.c	2011-07-03 19:14:52.267853088 -0600
+++ linux-2.6.37-ramster/net/core/sock.c	2011-07-03 19:10:04.340980799 -0600
@@ -1587,6 +1587,14 @@ static void __lock_sock(struct sock *sk)
 	__acquires(&sk->sk_lock.slock)
 {
 	DEFINE_WAIT(wait);
+	if (!preemptible()) {
+		while (sock_owned_by_user(sk)) {
+			spin_unlock_bh(&sk->sk_lock.slock);
+			cpu_relax();
+			spin_lock_bh(&sk->sk_lock.slock);
+		}
+		return;
+	}
 
 	for (;;) {
 		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
@@ -1623,7 +1631,8 @@ static void __release_sock(struct sock *
 			 * This is safe to do because we've taken the backlog
 			 * queue private:
 			 */
-			cond_resched_softirq();
+			if (preemptible())
+				cond_resched_softirq();
 			skb = next;
 		} while (skb != NULL);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCH 2/2] packet: Add fanout support.
From: Loke, Chetan @ 2011-07-05 16:03 UTC (permalink / raw)
  To: Eric Dumazet, Victor Julien; +Cc: David Miller, netdev
In-Reply-To: <1309849214.2720.45.camel@edumazet-laptop>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Eric Dumazet
> Sent: July 05, 2011 3:00 AM
> To: Victor Julien
> Cc: David Miller; netdev@vger.kernel.org
> Subject: Re: [PATCH 2/2] packet: Add fanout support.
> 
> Le mardi 05 juillet 2011 à 08:56 +0200, Victor Julien a écrit :
> 
> > Is this still also true for IP fragments?
> >
> 
> This point was already raised. IP fragments have rxhash = 0, obviously,
> since we dont have full information (source / destination ports for
> example)

Can we not do something like:

a = src_ip_addr;
b = dst_ip_addr;

if (ip_is_fragment(ip_hdr(skb)))
	c = ip_hdr->id;
else
	c = src_port | dest_port ; /* port_32 etc - Similar to what we have today */

/* swap a/b etc */
jhash3_words(a,b,c);



Chetan Loke

^ permalink raw reply

* Re: [PATCH v2 2/4] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 16:04 UTC (permalink / raw)
  To: David Miller; +Cc: victor, netdev
In-Reply-To: <20110705.020419.1998951566353182397.davem@davemloft.net>

Le mardi 05 juillet 2011 à 02:04 -0700, David Miller a écrit :


> +static int fanout_add(struct sock *sk, u16 id, u8 type)
> +{
> +	struct packet_sock *po = pkt_sk(sk);
> +	struct packet_fanout *f, *match;
> +	int err;
> +
> +	switch (type) {
> +	case PACKET_FANOUT_HASH:
> +	case PACKET_FANOUT_LB:
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (!po->running)
> +		return -EINVAL;
> +
> +	if (po->fanout)
> +		return -EALREADY;
> +
> +	mutex_lock(&fanout_mutex);
> +	match = NULL;
> +	list_for_each_entry(f, &fanout_list, list) {
> +		if (f->id == id &&
> +		    read_pnet(&f->net) == sock_net(sk)) {
> +			match = f;
> +			break;
> +		}
> +	}
> +	if (!match) {
> +		match = kzalloc(sizeof(*match), GFP_KERNEL);
> +		if (match) {
> +			write_pnet(&match->net, sock_net(sk));
> +			match->id = id;
> +			match->type = type;
> +			atomic_set(&match->rr_cur, 0);
> +			INIT_LIST_HEAD(&match->list);
> +			spin_lock_init(&match->lock);
> +			atomic_set(&match->sk_ref, 0);
> +			match->prot_hook.type = po->prot_hook.type;
> +			match->prot_hook.dev = po->prot_hook.dev;
> +			switch (type) {
> +			case PACKET_FANOUT_HASH:
> +				match->prot_hook.func = packet_rcv_fanout_hash;
> +				break;
> +			case PACKET_FANOUT_LB:
> +				match->prot_hook.func = packet_rcv_fanout_lb;
> +				break;
> +			}
> +			match->prot_hook.af_packet_priv = match;
> +			dev_add_pack(&match->prot_hook);

There is a small window where __fanout_link(sk, po) is not yet called,
but packets can be received on other cpus since dev_add_pack() was
called.

There is no divide by 0, but fanout_demux_hash() can return a NULL sk,
thus we can crash...

> +			list_add(&match->list, &fanout_list);
> +		}
> +	}
> +	err = -ENOMEM;
> +	if (match) {
> +		err = -EINVAL;
> +		if (match->type == type &&
> +		    match->prot_hook.type == po->prot_hook.type &&
> +		    match->prot_hook.dev == po->prot_hook.dev) {
> +			err = -ENOSPC;
> +			if (atomic_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
> +				__dev_remove_pack(&po->prot_hook);
> +				po->fanout = match;
> +				atomic_inc(&match->sk_ref);
> +				__fanout_link(sk, po);
> +				err = 0;
> +			}
> +		}
> +	}
> +	mutex_unlock(&fanout_mutex);
> +	return err;
> +}
> +



^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Neil Horman @ 2011-07-05 16:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309845573.2720.41.camel@edumazet-laptop>

On Tue, Jul 05, 2011 at 07:59:33AM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 07:33 +0200, Eric Dumazet a écrit :
> > Le mardi 05 juillet 2011 à 09:18 +0400, Alexey Zaytsev a écrit :
> > 
> > > Actually, I've added a trace to show b44_init_rings and b44_free_rings
> > > calls, and they are only called once, right after the driver is
> > > loaded. So it can't be related to START_RFO. Will attach the diff and
> > > dmesg.
> > 
> > Thanks
> > 
> > I was wondering if DMA could be faster if providing word aligned
> > addresses, could you try :
> > 
> > -#define RX_PKT_OFFSET          (RX_HEADER_LEN + 2)
> > +#define RX_PKT_OFFSET          (RX_HEADER_LEN + NET_IP_ALIGN)
> > 
> > (On x86, we now have NET_IP_ALIGN = 0 since commit ea812ca1)
> > 
> 
> I suspect a hardware bug.
> 
I'm not sure if this helps, but I've been reading over this bug, and it seems
that the rx path never checks the status of a buffers rx header prior to
unmapping it or otherwise modifying it in hardware.  If we were to start munging
pointers in the rx channel while a dma was active in it still, it sems like the
observed corruption might be the result.  The docs aren't super clear on this,
but I think a descriptor needs to be in the idle wait or stopped state prior to
being acessed.  This patch might help out there (although I don't have hardware
to test)
Neil

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 3d247f3..48540ad 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -769,7 +769,19 @@ static int b44_rx(struct b44 *bp, int budget)
 		dma_addr_t map = rp->mapping;
 		struct rx_header *rh;
 		u16 len;
-
+		u32 state = br32(bp, B44_DMARX_STAT) & DMARX_STAT_SMASK;
+		state >>= 12;
+
+		/*
+ 		 * I _think_ descriptors need to be in the idle or stopped state
+ 		 * before its safe to access them.  If the current buffer
+ 		 * pointed to by the dma channel is in state 1 or lower (active
+ 		 * or disabled), then we should just stop receving until the
+ 		 * next interrupt kicks us again (I think)
+ 		 */
+		if (state < 2)
+			return;
+ 
 		dma_sync_single_for_cpu(bp->sdev->dev, map,
 					    RX_PKT_BUF_SZ,
 					    DMA_FROM_DEVICE);

^ permalink raw reply related

* Re: [PATCH 2/2] packet: Add fanout support.
From: David Miller @ 2011-07-05 16:08 UTC (permalink / raw)
  To: Chetan.Loke; +Cc: eric.dumazet, victor, netdev
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B91257D61B@nsmail.netscout.com>

From: "Loke, Chetan" <Chetan.Loke@netscout.com>
Date: Tue, 5 Jul 2011 12:03:29 -0400

> Can we not do something like:
> 
> a = src_ip_addr;
> b = dst_ip_addr;
> 
> if (ip_is_fragment(ip_hdr(skb)))
> 	c = ip_hdr->id;
> else
> 	c = src_port | dest_port ; /* port_32 etc - Similar to what we have today 

A UDP flow can be composed of fragmented and non-fragmented
parts, we want all of the packets from that flow to land
on the same hash.

Your scheme does not provide that essential property.

^ permalink raw reply

* Re: [PATCH v2 2/4] packet: Add fanout support.
From: David Miller @ 2011-07-05 16:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: victor, netdev
In-Reply-To: <1309881840.2271.15.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 05 Jul 2011 18:04:00 +0200

> There is a small window where __fanout_link(sk, po) is not yet called,
> but packets can be received on other cpus since dev_add_pack() was
> called.
> 
> There is no divide by 0, but fanout_demux_hash() can return a NULL sk,
> thus we can crash...

Indeed, you're right, I'll fix this.

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05 16:12 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <20110705160531.GC2959@hmsreliant.think-freely.org>

Le mardi 05 juillet 2011 à 12:05 -0400, Neil Horman a écrit :
> On Tue, Jul 05, 2011 at 07:59:33AM +0200, Eric Dumazet wrote:
> > Le mardi 05 juillet 2011 à 07:33 +0200, Eric Dumazet a écrit :
> > > Le mardi 05 juillet 2011 à 09:18 +0400, Alexey Zaytsev a écrit :
> > > 
> > > > Actually, I've added a trace to show b44_init_rings and b44_free_rings
> > > > calls, and they are only called once, right after the driver is
> > > > loaded. So it can't be related to START_RFO. Will attach the diff and
> > > > dmesg.
> > > 
> > > Thanks
> > > 
> > > I was wondering if DMA could be faster if providing word aligned
> > > addresses, could you try :
> > > 
> > > -#define RX_PKT_OFFSET          (RX_HEADER_LEN + 2)
> > > +#define RX_PKT_OFFSET          (RX_HEADER_LEN + NET_IP_ALIGN)
> > > 
> > > (On x86, we now have NET_IP_ALIGN = 0 since commit ea812ca1)
> > > 
> > 
> > I suspect a hardware bug.
> > 
> I'm not sure if this helps, but I've been reading over this bug, and it seems
> that the rx path never checks the status of a buffers rx header prior to
> unmapping it or otherwise modifying it in hardware.  If we were to start munging
> pointers in the rx channel while a dma was active in it still, it sems like the
> observed corruption might be the result.  The docs aren't super clear on this,
> but I think a descriptor needs to be in the idle wait or stopped state prior to
> being acessed.  This patch might help out there (although I don't have hardware
> to test)
> Neil
> 
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> index 3d247f3..48540ad 100644
> --- a/drivers/net/b44.c
> +++ b/drivers/net/b44.c
> @@ -769,7 +769,19 @@ static int b44_rx(struct b44 *bp, int budget)
>  		dma_addr_t map = rp->mapping;
>  		struct rx_header *rh;
>  		u16 len;
> -
> +		u32 state = br32(bp, B44_DMARX_STAT) & DMARX_STAT_SMASK;
> +		state >>= 12;
> +
> +		/*
> + 		 * I _think_ descriptors need to be in the idle or stopped state
> + 		 * before its safe to access them.  If the current buffer
> + 		 * pointed to by the dma channel is in state 1 or lower (active
> + 		 * or disabled), then we should just stop receving until the
> + 		 * next interrupt kicks us again (I think)
> + 		 */
> +		if (state < 2)
> +			return;
> + 
>  		dma_sync_single_for_cpu(bp->sdev->dev, map,
>  					    RX_PKT_BUF_SZ,
>  					    DMA_FROM_DEVICE);

Hmm... We are in a NAPI handler... There wont be a new interrupt.

Plus, we do at start of b44_rx() :

prod  = br32(bp, B44_DMARX_STAT) & DMARX_STAT_CDMASK;

So all descriptors before prod are guaranteed to be ready for host
consume... Fact that a dma access is running on 'next descriptor' should
be irrelevant.

IMHO Peeking B44_DMARX_STAT for each packet would be a waste of time.




^ permalink raw reply

* Re: libpcap and tc filters
From: Eric Dumazet @ 2011-07-05 16:14 UTC (permalink / raw)
  To: Adam Katz; +Cc: jhs, netdev
In-Reply-To: <CAA0qwj49XzNa-nY82X3d_eZ95seS15qELbJDYCNfRkg03OJosQ@mail.gmail.com>

Le mardi 05 juillet 2011 à 18:16 +0300, Adam Katz a écrit :
> strange.
> I've now tried the exact same configuration and it simply refuses to
> work. Maybe your tcpreplay is configured differently...
> 
> What distro are you using? What kernel? What version of libpcap?

I did the same tests here and it works correctly for me.

latest kernel 3.0-rc6

# /usr/local/bin/tcpreplay -V
tcpreplay version: 3.4.4 (build 2450)
Copyright 2000-2010 by Aaron Turner <aturner at synfin dot net>
Cache file supported: 04
Not compiled with libdnet.
Compiled against libpcap: 1.1.1
64 bit packet counters: enabled
Verbose printing via tcpdump: enabled
Packet editing: disabled
Fragroute engine: disabled
Injection method: PF_PACKET send()

^ permalink raw reply

* RE: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 16:16 UTC (permalink / raw)
  To: Loke, Chetan; +Cc: Victor Julien, David Miller, netdev
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B91257D61B@nsmail.netscout.com>

Le mardi 05 juillet 2011 à 12:03 -0400, Loke, Chetan a écrit :
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Eric Dumazet
> > Sent: July 05, 2011 3:00 AM
> > To: Victor Julien
> > Cc: David Miller; netdev@vger.kernel.org
> > Subject: Re: [PATCH 2/2] packet: Add fanout support.
> > 
> > Le mardi 05 juillet 2011 à 08:56 +0200, Victor Julien a écrit :
> > 
> > > Is this still also true for IP fragments?
> > >
> > 
> > This point was already raised. IP fragments have rxhash = 0, obviously,
> > since we dont have full information (source / destination ports for
> > example)
> 
> Can we not do something like:
> 
> a = src_ip_addr;
> b = dst_ip_addr;
> 
> if (ip_is_fragment(ip_hdr(skb)))
> 	c = ip_hdr->id;
> else
> 	c = src_port | dest_port ; /* port_32 etc - Similar to what we have today */
> 
> /* swap a/b etc */
> jhash3_words(a,b,c);
> 
> 
> 

Sure, but non fragmented packets will then get a different rxhash.

Remember, goal is that _all_ packets of a given flow end in same queue.




^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Victor Julien @ 2011-07-05 16:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Loke, Chetan, David Miller, netdev
In-Reply-To: <1309882577.2271.23.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On 07/05/2011 06:16 PM, Eric Dumazet wrote:
> Remember, goal is that _all_ packets of a given flow end in same queue.
> 

What about a hashing scheme based on just the ip addresses? Would make
rxhash useless for this purpose, but would be a lot simpler overall maybe...

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------

^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 16:23 UTC (permalink / raw)
  To: Victor Julien; +Cc: Loke, Chetan, David Miller, netdev
In-Reply-To: <4E133A09.8030907@inliniac.net>

Le mardi 05 juillet 2011 à 18:21 +0200, Victor Julien a écrit :
> On 07/05/2011 06:16 PM, Eric Dumazet wrote:
> > Remember, goal is that _all_ packets of a given flow end in same queue.
> > 
> 
> What about a hashing scheme based on just the ip addresses? Would make
> rxhash useless for this purpose, but would be a lot simpler overall maybe...
> 

What about loads where a single IP address is used ?

I wonder what's the problem, since David added a defrag unit ;)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox