Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Eric Dumazet @ 2013-04-04 16:03 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <1365090287.3308.3.camel@edumazet-glaptop>

On Thu, 2013-04-04 at 08:44 -0700, Eric Dumazet wrote:

> Because maybe the synchronize_net() in netdev_rx_handler_unregister()
> is enough and you dont even need the call_rcu(). Thats was my question.

So we have the following sequence in team_port_del()

        netdev_rx_handler_unregister(port_dev);

        netdev_upper_dev_unlink(port_dev, dev);
        team_port_disable_netpoll(port);
        vlan_vids_del_by_dev(port_dev, dev);
        dev_uc_unsync(port_dev, dev);
        dev_mc_unsync(port_dev, dev);
        dev_close(port_dev);
        team_port_leave(team, port);

        __team_option_inst_mark_removed_port(team, port);
        __team_options_change_check(team);
        __team_option_inst_del_port(team, port);
        __team_port_change_port_removed(port);

        team_port_set_orig_dev_addr(port);
        dev_set_mtu(port_dev, port->orig.mtu);
        synchronize_rcu();
        kfree(port);

And I suspect we can remove synchronize_rcu() call.

But as this is a long list of operations, maybe some of them requires
the rcu grace period before kfree(port)

^ permalink raw reply

* Re: [PATCH] lib80211: make lib80211 can be enabled independently
From: Johannes Berg @ 2013-04-04 16:05 UTC (permalink / raw)
  To: Wang YanQing; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20130404160130.GA2577@udknight>

On Fri, 2013-04-05 at 00:01 +0800, Wang YanQing wrote:
> Current we can only enable lib80211 by enable a driver
> in tree use it which will select it, but some out tree's
> drivers also use it, so I think it has sense to make lib80211
> can be enabled independently.
> 
> A example of the out tree's drivers use lib80211 is:
> hybird driver(wl) for Broadcom Corporation BCM43225 802.11b/g/n

Tough luck.

NACK.

johannes

^ permalink raw reply

* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Jiri Pirko @ 2013-04-04 16:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <1365090287.3308.3.camel@edumazet-glaptop>

Thu, Apr 04, 2013 at 05:44:47PM CEST, eric.dumazet@gmail.com wrote:
>On Thu, 2013-04-04 at 17:35 +0200, Jiri Pirko wrote:
>
>> It can be converted now to call_rcu. synchronize_rcu is making sure
>> no packet is in flight when changing modes.
>
>What changes exactly ? You don't really answer to my question with this
>very vague sentence.

Sorry for vagueness. We discussed this in thread from Oct 2011:
Subject: [patch net-next V2] net: introduce ethernet teaming device

purpose of synchronize_rcu() here (instead of call_rcu) was to ensure that
rx_handler can not be in progress when __team_change_mode() changes
team->ops.receive to NULL

>
>Because maybe the synchronize_net() in netdev_rx_handler_unregister()
>is enough and you dont even need the call_rcu(). Thats was my question.

Yes, you are right. kfree(port) can be called right away.

>
>RCU barriers are not magical things we add when we are not exactly sure
>of what is happening.

I always thought that synchronize_rcu() is from the magic land of elfs and fairies...

>
>Like other barriers (wmb(), smb_wmb(), ...) we should document or
>understand why they are needed.
>
>

^ permalink raw reply

* Re: [PATCH net-next 3/3] net/mlx4_en: Enable open-lldp DCB support
From: Sagi Grimberg @ 2013-04-04 16:08 UTC (permalink / raw)
  To: John Fastabend; +Cc: Or Gerlitz, davem, netdev, amirv
In-Reply-To: <515DA342.5020503@intel.com>

On 4/4/2013 6:58 PM, John Fastabend wrote:
> On 4/4/2013 7:26 AM, Or Gerlitz wrote:
>> From: Sagi Grimberg <sagig@mellanox.com>
>>
>> The lldpad daemon queries the driver caps via the getcaps and getstate
>> routines. Added the prpoer dbcnl_ops entries to support that.
>>
>> Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>> ---
>
> Does lldpad work now with the mlx4_en driver?
>
> Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
>

Yes.

-Sagi

^ permalink raw reply

* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Jiri Pirko @ 2013-04-04 16:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <1365091425.3308.7.camel@edumazet-glaptop>

Thu, Apr 04, 2013 at 06:03:45PM CEST, eric.dumazet@gmail.com wrote:
>On Thu, 2013-04-04 at 08:44 -0700, Eric Dumazet wrote:
>
>> Because maybe the synchronize_net() in netdev_rx_handler_unregister()
>> is enough and you dont even need the call_rcu(). Thats was my question.
>
>So we have the following sequence in team_port_del()
>
>        netdev_rx_handler_unregister(port_dev);
>
>        netdev_upper_dev_unlink(port_dev, dev);
>        team_port_disable_netpoll(port);
>        vlan_vids_del_by_dev(port_dev, dev);
>        dev_uc_unsync(port_dev, dev);
>        dev_mc_unsync(port_dev, dev);
>        dev_close(port_dev);
>        team_port_leave(team, port);
>
>        __team_option_inst_mark_removed_port(team, port);
>        __team_options_change_check(team);
>        __team_option_inst_del_port(team, port);
>        __team_port_change_port_removed(port);
>
>        team_port_set_orig_dev_addr(port);
>        dev_set_mtu(port_dev, port->orig.mtu);
>        synchronize_rcu();
>        kfree(port);
>
>And I suspect we can remove synchronize_rcu() call.

I agree.

>
>But as this is a long list of operations, maybe some of them requires
>the rcu grace period before kfree(port)

None of them requires that.

>
>
>

^ permalink raw reply

* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Eric Dumazet @ 2013-04-04 16:18 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <20130404160706.GB1688@minipsycho.brq.redhat.com>

On Thu, 2013-04-04 at 18:07 +0200, Jiri Pirko wrote:

> I always thought that synchronize_rcu() is from the magic land of elfs and fairies...


;)

^ permalink raw reply

* Re: [net-next PATCH V3] net: frag queue per hash bucket locking
From: Eric Dumazet @ 2013-04-04 16:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, netdev, Florian Westphal, Daniel Borkmann,
	Hannes Frederic Sowa
In-Reply-To: <20130404093816.3613.42475.stgit@dragon>

On Thu, 2013-04-04 at 11:38 +0200, Jesper Dangaard Brouer wrote:
> This patch implements per hash bucket locking for the frag queue
> hash.  This removes two write locks, and the only remaining write
> lock is for protecting hash rebuild.  This essentially reduce the
> readers-writer lock to a rebuild lock.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: ipv6: add tokenized interface identifier support
From: YOSHIFUJI Hideaki @ 2013-04-04 16:29 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Hannes Frederic Sowa, YOSHIFUJI Hideaki
In-Reply-To: <1365086258-4512-2-git-send-email-dborkman@redhat.com>

Daniel Borkmann wrote:
> This patch adds support for tokenized IIDs, that allow for
> administrators to assign well-known host-part addresses to
> nodes whilst still obtaining global network prefix from
> Router Advertisements. It is currently in IETF RFC draft
> status [1]:
> 
>   The primary target for such support is server platforms
>   where addresses are usually manually configured, rather
>   than using DHCPv6 or SLAAC. By using tokenised identifiers,
>   hosts can still determine their network prefix by use of
>   SLAAC, but more readily be automatically renumbered should
>   their network prefix change.
> 
>  [1] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
> 
> The implementation is partially based on top of Mark K.
> Thompson's proof of concept. Successfully tested by myself.
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  include/net/if_inet6.h       |    2 +
>  include/net/ipv6.h           |    2 +
>  include/uapi/linux/if_link.h |    1 +
>  net/ipv6/addrconf.c          |   87 ++++++++++++++++++++++++++++++++++++++++-
>  net/ipv6/addrconf_core.c     |    2 -
>  5 files changed, 89 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
> index 9356322..f1063d6 100644
> --- a/include/net/if_inet6.h
> +++ b/include/net/if_inet6.h
> @@ -187,6 +187,8 @@ struct inet6_dev {
>  	struct list_head	tempaddr_list;
>  #endif
>  
> +	struct in6_addr		token;
> +
>  	struct neigh_parms	*nd_parms;
>  	struct inet6_dev	*next;
>  	struct ipv6_devconf	cnf;
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 0810aa5..da8c11e 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -88,6 +88,8 @@
>  #define IPV6_ADDR_SCOPE_ORGLOCAL	0x08
>  #define IPV6_ADDR_SCOPE_GLOBAL		0x0e
>  
> +#define IPV6_ADDR_SCOPE_TYPE(scope)	((scope) << 16)
> +
>  /*
>   *	Addr flags
>   */
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index c4edfe1..6b35c42 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -201,6 +201,7 @@ enum {
>  	IFLA_INET6_MCAST,	/* MC things. What of them?	*/
>  	IFLA_INET6_CACHEINFO,	/* time values and max reasm size */
>  	IFLA_INET6_ICMP6STATS,	/* statistics (icmpv6)		*/
> +	IFLA_INET6_TOKEN,	/* device token			*/
>  	__IFLA_INET6_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index a33b157..fb0e8a0 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -422,6 +422,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
>  		ipv6_regen_rndid((unsigned long) ndev);
>  	}
>  #endif
> +	memset(ndev->token.s6_addr, 0, sizeof(ndev->token.s6_addr));
>  
>  	if (netif_running(dev) && addrconf_qdisc_ok(dev))
>  		ndev->if_flags |= IF_READY;
> @@ -2136,8 +2137,14 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
>  
>  		if (pinfo->prefix_len == 64) {
>  			memcpy(&addr, &pinfo->prefix, 8);
> -			if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
> -			    ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
> +
> +			if (!ipv6_addr_any(&in6_dev->token)) {
> +				read_lock_bh(&in6_dev->lock);
> +				memcpy(addr.s6_addr + 8,
> +				       in6_dev->token.s6_addr + 8, 8);
> +				read_unlock_bh(&in6_dev->lock);
> +			} else if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
> +				   ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
>  				in6_dev_put(in6_dev);
>  				return;
>  			}

Why not initialize token by interface-identifier and then allow
users to "override"?

--yoshfuji

^ permalink raw reply

* [PATCH net-next 0/2] ieee802154_mlme_ops oops removal and cleanup
From: Werner Almesberger @ 2013-04-04 16:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-zigbee-devel

This fixes a kernel oops we get when some linux-zigbee user
space tools try to invoke unimplemented operations.

The second of the two patches checks whether the respective
functions are present and returne EOPNOTSUPP if they aren't.
It also updates the documentation.

The first one removes the unused get_bsn function, which was
a small obstacle for the other patch.

- Werner

Werner Almesberger (2):
  IEEE 802.15.4: remove get_bsn from "struct ieee802154_mlme_ops"
  ieee802154/nl-mac.c: make some MLME operations optional

 Documentation/networking/ieee802154.txt |    5 +++--
 drivers/net/ieee802154/fakehard.c       |   21 ---------------------
 include/net/ieee802154_netdev.h         |    5 ++++-
 net/ieee802154/nl-mac.c                 |   25 ++++++++++++++++++++-----
 4 files changed, 27 insertions(+), 29 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH net-next 1/2] IEEE 802.15.4: remove get_bsn from "struct ieee802154_mlme_ops"
From: Werner Almesberger @ 2013-04-04 16:32 UTC (permalink / raw)
  To: netdev; +Cc: linux-zigbee-devel
In-Reply-To: <cover.1365092820.git.werner@almesberger.net>

It served no purpose: we never call it from anywhere in the stack
and the only driver that did implement it (fakehard) merely provided
a dummy value.

There is also considerable doubt whether it would make sense to
even attempt beacon processing at this level in the Linux kernel.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
---
 drivers/net/ieee802154/fakehard.c |   21 ---------------------
 include/net/ieee802154_netdev.h   |    1 -
 2 files changed, 22 deletions(-)

diff --git a/drivers/net/ieee802154/fakehard.c b/drivers/net/ieee802154/fakehard.c
index 8f1c256..bf0d55e 100644
--- a/drivers/net/ieee802154/fakehard.c
+++ b/drivers/net/ieee802154/fakehard.c
@@ -106,26 +106,6 @@ static u8 fake_get_dsn(const struct net_device *dev)
 }
 
 /**
- * fake_get_bsn - Retrieve the BSN of the device.
- * @dev: The network device to retrieve the BSN for.
- *
- * Returns the IEEE 802.15.4 BSN for the network device.
- * The BSN is the sequence number which will be added to each
- * beacon frame sent by the MAC.
- *
- * BSN means 'Beacon Sequence Number'.
- *
- * Note: This is in section 7.2.1.2 of the IEEE 802.15.4-2006
- *       document.
- */
-static u8 fake_get_bsn(const struct net_device *dev)
-{
-	BUG_ON(dev->type != ARPHRD_IEEE802154);
-
-	return 0x00; /* BSN are implemented in HW, so return just 0 */
-}
-
-/**
  * fake_assoc_req - Make an association request to the HW.
  * @dev: The network device which we are associating to a network.
  * @addr: The coordinator with which we wish to associate.
@@ -264,7 +244,6 @@ static struct ieee802154_mlme_ops fake_mlme = {
 	.get_pan_id = fake_get_pan_id,
 	.get_short_addr = fake_get_short_addr,
 	.get_dsn = fake_get_dsn,
-	.get_bsn = fake_get_bsn,
 };
 
 static int ieee802154_fake_open(struct net_device *dev)
diff --git a/include/net/ieee802154_netdev.h b/include/net/ieee802154_netdev.h
index d104c88..642f94c 100644
--- a/include/net/ieee802154_netdev.h
+++ b/include/net/ieee802154_netdev.h
@@ -110,7 +110,6 @@ struct ieee802154_mlme_ops {
 	u16 (*get_pan_id)(const struct net_device *dev);
 	u16 (*get_short_addr)(const struct net_device *dev);
 	u8 (*get_dsn)(const struct net_device *dev);
-	u8 (*get_bsn)(const struct net_device *dev);
 };
 
 /* The IEEE 802.15.4 standard defines 2 type of the devices:
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next 2/2] ieee802154/nl-mac.c: make some MLME operations optional
From: Werner Almesberger @ 2013-04-04 16:32 UTC (permalink / raw)
  To: netdev; +Cc: linux-zigbee-devel
In-Reply-To: <cover.1365092820.git.werner@almesberger.net>

Check for NULL before calling the following operations from "struct
ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
and scan_req.

This fixes a current oops where those functions are called but not
implemented. It also updates the documentation to clarify that they
are now optional by design. If a call to an unimplemented function
is attempted, the kernel returns EOPNOTSUPP via netlink.

The following operations are still required: get_phy, get_pan_id,
get_short_addr, and get_dsn.

Note that the places where this patch changes the initialization
of "ret" should not affect the rest of the code since "ret" was
always set (again) before returning its value.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
---
 Documentation/networking/ieee802154.txt |    5 +++--
 include/net/ieee802154_netdev.h         |    4 ++++
 net/ieee802154/nl-mac.c                 |   25 ++++++++++++++++++++-----
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
index 703cf43..67a9cb2 100644
--- a/Documentation/networking/ieee802154.txt
+++ b/Documentation/networking/ieee802154.txt
@@ -71,8 +71,9 @@ submits skb to qdisc), so if you need something from that cb later, you should
 store info in the skb->data on your own.
 
 To hook the MLME interface you have to populate the ml_priv field of your
-net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are
-required.
+net_device with a pointer to struct ieee802154_mlme_ops instance. The fields
+assoc_req, assoc_resp, disassoc_req, start_req, and scan_req are optional.
+All other fields are required.
 
 We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c
 
diff --git a/include/net/ieee802154_netdev.h b/include/net/ieee802154_netdev.h
index 642f94c..8196d5d 100644
--- a/include/net/ieee802154_netdev.h
+++ b/include/net/ieee802154_netdev.h
@@ -85,6 +85,8 @@ struct wpan_phy;
  * Use wpan_wpy_put to put that reference.
  */
 struct ieee802154_mlme_ops {
+	/* The following fields are optional (can be NULL). */
+
 	int (*assoc_req)(struct net_device *dev,
 			struct ieee802154_addr *addr,
 			u8 channel, u8 page, u8 cap);
@@ -101,6 +103,8 @@ struct ieee802154_mlme_ops {
 	int (*scan_req)(struct net_device *dev,
 			u8 type, u32 channels, u8 page, u8 duration);
 
+	/* The fields below are required. */
+
 	struct wpan_phy *(*get_phy)(const struct net_device *dev);
 
 	/*
diff --git a/net/ieee802154/nl-mac.c b/net/ieee802154/nl-mac.c
index 96bb08a..b0bdd8c 100644
--- a/net/ieee802154/nl-mac.c
+++ b/net/ieee802154/nl-mac.c
@@ -315,7 +315,7 @@ static int ieee802154_associate_req(struct sk_buff *skb,
 	struct net_device *dev;
 	struct ieee802154_addr addr;
 	u8 page;
-	int ret = -EINVAL;
+	int ret = -EOPNOTSUPP;
 
 	if (!info->attrs[IEEE802154_ATTR_CHANNEL] ||
 	    !info->attrs[IEEE802154_ATTR_COORD_PAN_ID] ||
@@ -327,6 +327,8 @@ static int ieee802154_associate_req(struct sk_buff *skb,
 	dev = ieee802154_nl_get_dev(info);
 	if (!dev)
 		return -ENODEV;
+	if (!ieee802154_mlme_ops(dev)->assoc_req)
+		goto out;
 
 	if (info->attrs[IEEE802154_ATTR_COORD_HW_ADDR]) {
 		addr.addr_type = IEEE802154_ADDR_LONG;
@@ -350,6 +352,7 @@ static int ieee802154_associate_req(struct sk_buff *skb,
 			page,
 			nla_get_u8(info->attrs[IEEE802154_ATTR_CAPABILITY]));
 
+out:
 	dev_put(dev);
 	return ret;
 }
@@ -359,7 +362,7 @@ static int ieee802154_associate_resp(struct sk_buff *skb,
 {
 	struct net_device *dev;
 	struct ieee802154_addr addr;
-	int ret = -EINVAL;
+	int ret = -EOPNOTSUPP;
 
 	if (!info->attrs[IEEE802154_ATTR_STATUS] ||
 	    !info->attrs[IEEE802154_ATTR_DEST_HW_ADDR] ||
@@ -369,6 +372,8 @@ static int ieee802154_associate_resp(struct sk_buff *skb,
 	dev = ieee802154_nl_get_dev(info);
 	if (!dev)
 		return -ENODEV;
+	if (!ieee802154_mlme_ops(dev)->assoc_resp)
+		goto out;
 
 	addr.addr_type = IEEE802154_ADDR_LONG;
 	nla_memcpy(addr.hwaddr, info->attrs[IEEE802154_ATTR_DEST_HW_ADDR],
@@ -380,6 +385,7 @@ static int ieee802154_associate_resp(struct sk_buff *skb,
 		nla_get_u16(info->attrs[IEEE802154_ATTR_DEST_SHORT_ADDR]),
 		nla_get_u8(info->attrs[IEEE802154_ATTR_STATUS]));
 
+out:
 	dev_put(dev);
 	return ret;
 }
@@ -389,7 +395,7 @@ static int ieee802154_disassociate_req(struct sk_buff *skb,
 {
 	struct net_device *dev;
 	struct ieee802154_addr addr;
-	int ret = -EINVAL;
+	int ret = -EOPNOTSUPP;
 
 	if ((!info->attrs[IEEE802154_ATTR_DEST_HW_ADDR] &&
 		!info->attrs[IEEE802154_ATTR_DEST_SHORT_ADDR]) ||
@@ -399,6 +405,8 @@ static int ieee802154_disassociate_req(struct sk_buff *skb,
 	dev = ieee802154_nl_get_dev(info);
 	if (!dev)
 		return -ENODEV;
+	if (!ieee802154_mlme_ops(dev)->disassoc_req)
+		goto out;
 
 	if (info->attrs[IEEE802154_ATTR_DEST_HW_ADDR]) {
 		addr.addr_type = IEEE802154_ADDR_LONG;
@@ -415,6 +423,7 @@ static int ieee802154_disassociate_req(struct sk_buff *skb,
 	ret = ieee802154_mlme_ops(dev)->disassoc_req(dev, &addr,
 			nla_get_u8(info->attrs[IEEE802154_ATTR_REASON]));
 
+out:
 	dev_put(dev);
 	return ret;
 }
@@ -432,7 +441,7 @@ static int ieee802154_start_req(struct sk_buff *skb, struct genl_info *info)
 	u8 channel, bcn_ord, sf_ord;
 	u8 page;
 	int pan_coord, blx, coord_realign;
-	int ret;
+	int ret = -EOPNOTSUPP;
 
 	if (!info->attrs[IEEE802154_ATTR_COORD_PAN_ID] ||
 	    !info->attrs[IEEE802154_ATTR_COORD_SHORT_ADDR] ||
@@ -448,6 +457,8 @@ static int ieee802154_start_req(struct sk_buff *skb, struct genl_info *info)
 	dev = ieee802154_nl_get_dev(info);
 	if (!dev)
 		return -ENODEV;
+	if (!ieee802154_mlme_ops(dev)->start_req)
+		goto out;
 
 	addr.addr_type = IEEE802154_ADDR_SHORT;
 	addr.short_addr = nla_get_u16(
@@ -476,6 +487,7 @@ static int ieee802154_start_req(struct sk_buff *skb, struct genl_info *info)
 	ret = ieee802154_mlme_ops(dev)->start_req(dev, &addr, channel, page,
 		bcn_ord, sf_ord, pan_coord, blx, coord_realign);
 
+out:
 	dev_put(dev);
 	return ret;
 }
@@ -483,7 +495,7 @@ static int ieee802154_start_req(struct sk_buff *skb, struct genl_info *info)
 static int ieee802154_scan_req(struct sk_buff *skb, struct genl_info *info)
 {
 	struct net_device *dev;
-	int ret;
+	int ret = -EOPNOTSUPP;
 	u8 type;
 	u32 channels;
 	u8 duration;
@@ -497,6 +509,8 @@ static int ieee802154_scan_req(struct sk_buff *skb, struct genl_info *info)
 	dev = ieee802154_nl_get_dev(info);
 	if (!dev)
 		return -ENODEV;
+	if (!ieee802154_mlme_ops(dev)->scan_req)
+		goto out;
 
 	type = nla_get_u8(info->attrs[IEEE802154_ATTR_SCAN_TYPE]);
 	channels = nla_get_u32(info->attrs[IEEE802154_ATTR_CHANNELS]);
@@ -511,6 +525,7 @@ static int ieee802154_scan_req(struct sk_buff *skb, struct genl_info *info)
 	ret = ieee802154_mlme_ops(dev)->scan_req(dev, type, channels, page,
 			duration);
 
+out:
 	dev_put(dev);
 	return ret;
 }
-- 
1.7.10.4

^ permalink raw reply related

* Re: this cpu documentation
From: Tejun Heo @ 2013-04-04 16:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Randy Dunlap, Eric Dumazet, RongQing Li, Shan Wei, netdev
In-Reply-To: <0000013dd57e7ad7-d5959142-ca41-4a53-9497-7559766aafc9-000000@email.amazonses.com>

On Thu, Apr 04, 2013 at 02:41:08PM +0000, Christoph Lameter wrote:
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V2
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2: Improved after feedback from Randy Dunlap
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Updated patch applied to wq/for-3.10.

Thanks.

-- 
tejun

^ permalink raw reply

* Fw: [Bug 56221] New: bonding: ARP monitoring doesn't work with bridges
From: Stephen Hemminger @ 2013-04-04 16:46 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek; +Cc: netdev

Begin forwarded message:

Date: Thu, 4 Apr 2013 06:58:50 -0700
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 56221] New: bonding: ARP monitoring doesn't work with bridges

https://bugzilla.kernel.org/show_bug.cgi?id=56221

           Summary: bonding: ARP monitoring doesn't work with bridges
           Product: Networking
           Version: 2.5
    Kernel Version: 3.8.5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: c.ruppert@babiel.com
        Regression: No

Hi,

we tried to setup VLANs and bridges on top of the bonding device.

Creating a bond over eth0 and eth1:
eth0 \
      --> bond0
eth1 /

Adding VLANs 1, 2 and 3 on top of the bond:
       --> bond0.1
      /
bond0 ---> bond0.2
      \
       --> bond0.3

And finally adding bridges on top of the VLAN bonds:
bond0.1 -> br1
bond0.2 -> br2
bond0.3 -> br3

Those three VLANs are all tagged. br1 will also get the default route.

So the ARP monitoring doesn't work as soon as a bridge gets the default route
(or at least the route to the ARP IP target).
<snip>
[76655.096076] bonding: bond0: no path to arp_ip_target 172.16.0.1 via rt.dev
br1
</snip>

I then tried it with just the VLANs (no bridges) and it works fine.
Then I tried it with just bridges and no VLAN - it doesn't.

Here are some steps to reproduce it:
ifconfig eth0 0.0.0.0 up
ifconfig eth1 0.0.0.0 up

modprobe bonding mode="active-backup" primary="eth0" arp_interval=3000
arp_ip_target="172.16.0.1"

ifconfig bond0 0.0.0.0 up
ifenslave bond0 eth0
ifenslave bond0 eth1

brctl addbr br1 
brctl addif br1 bond0
ifconfig br1 172.16.0.2/28 up

With tcpdump you'll see that there are no ARP requests/pings at all and in
either dmesg or the kernel log you'll notice the "no path to ..." warnings.

So I took a look into the bonding driver souces (mainly bond_main.c):
The bonding driver asks for the route to the arp_ip_target which is br1 in this
case and it then compares it against the bond device(s) so bond0 and/or
bond0.1, bond0.2 and so forth. So neither bond->dev nor vlan_dev will ever be
the same as rt->dst.dev as long as you add the route to the bridge.
There is no check for bridged devices at all.
The driver should get an event/notify when the device (bond) became a member of
a bridge or has been removed and so forth, basically the same that has been
added for the VLAN stuff.

Just let me know if you need anything else. Thanks in advance.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: ipv6: add tokenized interface identifier support
From: Daniel Borkmann @ 2013-04-04 16:48 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: davem, netdev, Hannes Frederic Sowa
In-Reply-To: <515DAA6E.1050407@linux-ipv6.org>

On 04/04/2013 06:29 PM, YOSHIFUJI Hideaki wrote:
> Daniel Borkmann wrote:
>> This patch adds support for tokenized IIDs, that allow for
>> administrators to assign well-known host-part addresses to
>> nodes whilst still obtaining global network prefix from
>> Router Advertisements. It is currently in IETF RFC draft
>> status [1]:
>>
>>    The primary target for such support is server platforms
>>    where addresses are usually manually configured, rather
>>    than using DHCPv6 or SLAAC. By using tokenised identifiers,
>>    hosts can still determine their network prefix by use of
>>    SLAAC, but more readily be automatically renumbered should
>>    their network prefix change.
>>
>>   [1] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
>>
>> The implementation is partially based on top of Mark K.
>> Thompson's proof of concept. Successfully tested by myself.
>>
>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> ---
[...]
>>   		if (pinfo->prefix_len == 64) {
>>   			memcpy(&addr, &pinfo->prefix, 8);
>> -			if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
>> -			    ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
>> +
>> +			if (!ipv6_addr_any(&in6_dev->token)) {
>> +				read_lock_bh(&in6_dev->lock);
>> +				memcpy(addr.s6_addr + 8,
>> +				       in6_dev->token.s6_addr + 8, 8);
>> +				read_unlock_bh(&in6_dev->lock);
>> +			} else if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
>> +				   ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
>>   				in6_dev_put(in6_dev);
>>   				return;
>>   			}
> 
> Why not initialize token by interface-identifier and then allow
> users to "override"?

Sure this would simplify this part above ...

... maybe I'm wrong, but then, probably, if someone changes the netdev's
hw address during runtime, we could not keep track of that anymore as
dynamically done in e.g. ipv6_generate_eui64(), since we've already done
the token init at an earlier point in time, no?

With the current patch, we would have a clear separation of both concepts
like ``either you use token iids, or you don't''.

^ permalink raw reply

* [PATCH net-next] enic: be less verbose about non-critical firmware errors
From: Stefan Assmann @ 2013-04-04 16:57 UTC (permalink / raw)
  To: netdev; +Cc: davem, neepatel, sassmann

If a feature is not supported by firmware no need to print an error message.
This surpresses the following harmless message on boot up and ethtool query.
enic: Error 1 devcmd 36

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/ethernet/cisco/enic/vnic_dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index 605b222..97455c5 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -308,6 +308,9 @@ static int _vnic_dev_cmd(struct vnic_dev *vdev, enum vnic_devcmd_cmd cmd,
 
 			if (status & STAT_ERROR) {
 				err = (int)readq(&devcmd->args[0]);
+				if (err == ERR_EINVAL &&
+				    cmd == CMD_CAPABILITY)
+					return err;
 				if (err != ERR_ECMDUNKNOWN ||
 				    cmd != CMD_CAPABILITY)
 					pr_err("Error %d devcmd %d\n",
-- 
1.8.1.4

^ permalink raw reply related

* Re: this cpu documentation
From: Randy Dunlap @ 2013-04-04 17:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo
In-Reply-To: <0000013dd57e7ad7-d5959142-ca41-4a53-9497-7559766aafc9-000000@email.amazonses.com>

On 04/04/13 07:41, Christoph Lameter wrote:
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V2
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2: Improved after feedback from Randy Dunlap

Thanks.  I have a few more corrections to V2 (please see below).


> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-04 09:40:06.431946280 -0500
> @@ -0,0 +1,197 @@
> +this_cpu operations
> +-------------------
> +
> +this_cpu operations are a way of optimizing access to per cpu variables
> +associated with the *currently* executing processor
> +through the use of segment registers (or a dedicated register where the cpu
> +permanently stored the beginning of the per cpu area for a specific
> +processor).
> +
> +The this_cpu operations add a per cpu variable offset to the processor
> +specific percpu base and encode that operation in the instruction operating
> +on the per cpu variable.
> +
> +This meanthere are no atomicity issues between the calculation

        means there

> +of the offset and the operation on the data. Therefore it is not necessary
> +to disable preempt or interrupts to ensure that the processor is not changed
> +between the calculation of the address and the operation on the data.
> +
> +Read-modify-write operations are of particular interest. Frequently
> +processors have special lower latency instructions that can operate without
> +the typical synchronization overhead but still provide some sort of relaxed
> +atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
> +instructions like inc/dec/cmpxchg without the lock prefix and the
> +associated latency penalty.
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurirency

                                                                concurrency

> +issues with other processors in the system.
> +
> +On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
> +then possible to simply use the segment override to relocate a per cpu relative address
> +to the proper per cpu area for the processor. So the relocation to the per cpu base
> +is encoded in the instruction via a segment register prefix.
> +
> +For example:
> +
> +	DEFINE_PER_CPU(int, x);
> +	int z;
> +
> +	z = this_cpu_read(x);
> +
> +results in a single instruction
> +
> +	mov ax, gs:[x]
> +
> +instead of a sequence of calculation of the address and then a fetch from
> +that address which occurs with the percpu operations. Before this_cpu_ops
> +such sequence also required preempt disable/enable to prevent the kernel from
> +moving the thread to a different processor while the calculation is performed.
> +
> +
> +The main use of the this_cpu operations has been to optimize counter operations.
> +
> +
> +	this_cpu_inc(x)
> +
> +results in the following single instruction (no lock prefix!)
> +
> +	inc gs:[x]
> +
> +
> +instead of the following operations required if there is no segment register.
> +
> +	int *y;
> +	int cpu;
> +
> +	cpu = get_cpu();
> +	y = per_cpu_ptr(&x, cpu);
> +	(*y)++;
> +	put_cpu();
> +
> +
> +Note that these operations can only be used on percpu data that is reserved for
> +a specific processor. Without disabling preemption in the surrounding code
> +this_cpu_inc() will only guarantee that one of the percpu counters is correctly
> +incremented. However, there is no guarantee that the OS will not move the process
> +directly before or after the this_cpu instruction is executed. In general this
> +means that the value of the individual counters for each processor are
> +meaningless. The sum of all the per cpu counters is the only value that is of
> +interest.
> +
> +Per cpu variables are used for performance reasons. Bouncing cache lines can
> +be avoided if multiple processors concurrently go through the same code paths.
> +Since each processor has its own per cpu variables no concurrent cacheline
> +updates take place. The price that has to be paid for this optimization is
> +the need to add up the per cpu counters when the value of the counter is
> +needed.
> +
> +
> +Special operations:
> +-------------------
> +
> +	y = this_cpu_ptr(&x)
> +
> +Takes the offset of a per cpu variable (&x !) and returns the address of the
> +per cpu variable that belongs to the currently executing processor.
> +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
> +requires. No processor number is available. Instead the offset of the local
> +per cpu area is simply added to the percpu offset.
> +
> +
> +
> +Per cpu variables and offsets
> +-----------------------------
> +
> +Per cpu variables have *offsets* to the beginning of the percpu area. They do
> +not have addresses although they look like that in the code. Offsets
> +cannot be directly dereferenced. The offset must be added to a base pointer of
> +a percpu area of a processor in order to form a valid address.
> +
> +Therefore the use of x or &x outside of the context of per cpu operations
> +is invalid and will generally be treated like a NULL pointer dereference.
> +
> +In the context of per cpu operations
> +
> +	x is a per cpu variable. Most this_cpu operations take a cpu variable.
> +
> +	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
> +		of a per cpu variable which makes this look a bit strange.
> +
> +
> +
> +Operations on a field of a per cpu structure
> +--------------------------------------------
> +
> +Let's say we have a percpu structure
> +
> +	struct s {
> +		int n,m;
> +	};
> +
> +	DEFINE_PER_CPU(struct s, p);
> +
> +
> +Operations on these fields are straightforward
> +
> +	this_cpu_inc(p.m)
> +
> +	z = this_cpu_cmpxchg(p.m, 0, 1);
> +
> +
> +If we have an offset to struct s:
> +
> +	struct s __percpu *ps = &p;
> +
> +	z = this_cpu_dec(ps->m);
> +
> +	z = this_cpu_inc_return(ps->n);
> +
> +
> +The calculation of the pointer may require the use of this_cpu_ptr() if we
> +do not make use of this_cpu ops later to manipulate fields:
> +
> +	struct s *pp;
> +
> +	pp = this_cpu_ptr(&p);
> +
> +	pp->m--;
> +
> +	z = pp->n++;
> +
> +
> +Variants of this_cpu ops
> +-------------------------
> +
> +this_cpu ops are interrupt safe. Some architecture do not support these per
> +cpu local operations. In that case the operation must be replaced by code
> +that disables interrupts, then does the operations that are guaranteed to be
> +atomic and then reenable interrupts. Doing so is expensive. If there are
> +other reasons why the scheduler cannot change the processor we are executing
> +on then there is no reason to disable interrupts. For that purpose
> +the __this_cpu operations are provided. For example.
> +
> +	__this_cpu_inc(x);
> +
> +Will increment x and will not fallback to code that disables interrupts on
> +platforms that cannot accomplish atomicity through address relocation and
> +an Read-Modify-Write operation in the same instruction.

   a

> +
> +
> +
> +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
> +--------------------------------------------
> +
> +The first operation takes the offset and forms an address and then adds
> +the offset of the n field.
> +
> +The second one first adds the two offsets and then does the relocation.
> +IMHO the second form looks cleaner and has an easier time with (). The
> +second form also is consistent with the way this_cpu_read() and friends
> +are used.
> +
> +
> +Christoph Lameter, April 3rd, 2013



-- 
~Randy

^ permalink raw reply

* Re: [PATCH v2] MPLS: Add limited GSO support
From: Jesse Gross @ 2013-04-04 17:20 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev@openvswitch.org, netdev, Jarno Rajahalme, Pravin B Shelar
In-Reply-To: <20130404005539.GA17694@verge.net.au>

On Wed, Apr 3, 2013 at 5:55 PM, Simon Horman <horms@verge.net.au> wrote:
> On Wed, Apr 03, 2013 at 05:44:17PM -0700, Jesse Gross wrote:
>> On Wed, Apr 3, 2013 at 5:28 PM, Simon Horman <horms@verge.net.au> wrote:
>> > On Wed, Apr 03, 2013 at 04:51:38PM -0700, Jesse Gross wrote:
>> >> On Wed, Apr 3, 2013 at 2:11 AM, Simon Horman <horms@verge.net.au> wrote:
>> >> > In the case where a non-MPLS packet is recieved and an MPLS stack is
>> >> > added it may well be the case that the original skb is GSO but the
>> >> > NIC used for transmit does not support GSO of MPLS packets.
>> >> >
>> >> > The aim of this code is to provide GSO in software for MPLS packets
>> >> > whose skbs are GSO.
>> >> >
>> >> > When an implementation adds an MPLS stack to a non-MPLS packet it should do
>> >> > the following to skb metadata:
>> >> >
>> >> > * Set skb_mac_header(skb)->protocol to the new MPLS ethertype.
>> >> >   That is, either ETH_P_MPLS_UC or ETH_P_MPLS_MC.
>> >> >
>> >> > * Leave skb->protocol as the old non-MPLS ethertype.
>> >> >
>> >> > * Set skb->encapsulation = 1.
>> >> >
>> >> >   This may not strictly be necessary as I believe that checking
>> >> >   skb_mac_header(skb)->protocol and skb->protocol should be necessary and
>> >> >   sufficient.
>> >> >
>> >> >   However, it does seem to fit nicely with the current implementation of
>> >> >   dev_hard_start_xmit() where the more expensive check of
>> >> >   skb_mac_header(skb)->protocol may be guarded by an existing check of
>> >> >   skb->encapsulation.
>> >> >
>> >> > One aspect of this patch that I am unsure about is the modification I have
>> >> > made to skb_segment(). This seems to be necessary as checskum accelearation
>> >> > may no longer be possible as the packet has changed to be MPLS from some
>> >> > other packet type which may have been supported by the hardware in use.
>> >> >
>> >> > I will post a patch, "[PATCH v3.24] datapath: Add basic MPLS support to
>> >> > kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
>> >> > That patch sets the above requirements in
>> >> > datapath/actions.c:set_ethertype() and was used to exercise the MPLS GSO
>> >> > code. The datapath patch is against the Open vSwtich tree but it is
>> >> > intended that it be added to the Open vSwtich code present in the mainline
>> >> > Linux kernel at some point.
>> >> >
>> >> > Suggested by Jesse Gross. Based heavily on "v4 GRE: Add TCP segmentation
>> >> > offload for GRE" by Pravin B Shelar.
>> >> >
>> >> > Cc: Jesse Gross <jesse@nicira.com>
>> >> > Cc: Pravin B Shelar <pshelar@nicira.com>
>> >> > Signed-off-by: Simon Horman <horms@verge.net.au>
>> >>
>> >> MPLS is very similar to both the Ethernet header and vlans in that GSO
>> >> only requires replication without any modification.  That means that
>> >> if we look at the mac_len as containing all three then we can just
>> >> copy it without any special knowledge.  I don't know that we carefully
>> >> maintain mac_len in all places but you are already doing that in your
>> >> MPLS patches.
>> >
>> > At least for the cases that I am aware of I think that mac_len is
>> > predictable. But I'm a little unsure of what you are getting at here.
>>
>> If you have the MAC len then you don't need any new MPLS code here at
>> all; just replicate the whole thing onto each packet.
>
> The MAC len is set to cover everything up to the top of the MPLS stack.
> So it seems that something needs to be done to account for the length
> of the MPLS stack.
>
> Are you suggesting that if Open vSwtich set up the MAC len to extend
> to the end of the MPLS stack then mpls_gro_segment() would not be necessary?

Something along those lines.  I think this is very similar to the
skb->protocol discussion (and likely influenced by the outcome of
that).  MPLS is a little weird with respect to the existing layer
pointers but if we can find a good definition then I think that the
GSO code should be pretty minimal.

^ permalink raw reply

* Re: this cpu documentation
From: Tejun Heo @ 2013-04-04 17:26 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Christoph Lameter, Eric Dumazet, RongQing Li, Shan Wei, netdev
In-Reply-To: <515DB623.4060802@infradead.org>

On Thu, Apr 4, 2013 at 10:19 AM, Randy Dunlap <rdunlap@infradead.org> wrote:
> On 04/04/13 07:41, Christoph Lameter wrote:
>> From: Christoph Lameter <cl@linux.com>
>> Subject: this_cpu: Add documentation V2
>>
>> Document the rationale and the way to use this_cpu operations.
>>
>> V2: Improved after feedback from Randy Dunlap
>
> Thanks.  I have a few more corrections to V2 (please see below).

Updated the tree w/ v3. I also re-filled all the paragraphs to 75 column.

Thanks.

--
tejun

^ permalink raw reply

* Re: this cpu documentation
From: Christoph Lameter @ 2013-04-04 17:40 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo
In-Reply-To: <515DB623.4060802@infradead.org>

On Thu, 4 Apr 2013, Randy Dunlap wrote:

> Thanks.  I have a few more corrections to V2 (please see below).

From: Christoph Lameter <cl@linux.com>
Subject: this_cpu: Add documentation V3

Document the rationale and the way to use this_cpu operations.

V2/V3: Improved after feedback from Randy Dunlap

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/Documentation/this_cpu_ops
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/this_cpu_ops	2013-04-04 12:39:38.479720028 -0500
@@ -0,0 +1,197 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu variables
+associated with the *currently* executing processor
+through the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add a per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction operating
+on the per cpu variable.
+
+This means there are no atomicity issues between the calculation
+of the offset and the operation on the data. Therefore it is not necessary
+to disable preempt or interrupts to ensure that the processor is not changed
+between the calculation of the address and the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate without
+the typical synchronization overhead but still provide some sort of relaxed
+atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
+instructions like inc/dec/cmpxchg without the lock prefix and the
+associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu data
+specific to the currently executing processor. Only the current processor
+should be accessing that variable and therefore there are no concurirency
+issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
+then possible to simply use the segment override to relocate a per cpu relative address
+to the proper per cpu area for the processor. So the relocation to the per cpu base
+is encoded in the instruction via a segment register prefix.
+
+For example:
+
+	DEFINE_PER_CPU(int, x);
+	int z;
+
+	z = this_cpu_read(x);
+
+results in a single instruction
+
+	mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch from
+that address which occurs with the percpu operations. Before this_cpu_ops
+such sequence also required preempt disable/enable to prevent the kernel from
+moving the thread to a different processor while the calculation is performed.
+
+
+The main use of the this_cpu operations has been to optimize counter operations.
+
+
+	this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+	inc gs:[x]
+
+
+instead of the following operations required if there is no segment register.
+
+	int *y;
+	int cpu;
+
+	cpu = get_cpu();
+	y = per_cpu_ptr(&x, cpu);
+	(*y)++;
+	put_cpu();
+
+
+Note that these operations can only be used on percpu data that is reserved for
+a specific processor. Without disabling preemption in the surrounding code
+this_cpu_inc() will only guarantee that one of the percpu counters is correctly
+incremented. However, there is no guarantee that the OS will not move the process
+directly before or after the this_cpu instruction is executed. In general this
+means that the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value that is of
+interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache lines can
+be avoided if multiple processors concurrently go through the same code paths.
+Since each processor has its own per cpu variables no concurrent cacheline
+updates take place. The price that has to be paid for this optimization is
+the need to add up the per cpu counters when the value of the counter is
+needed.
+
+
+Special operations:
+-------------------
+
+	y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address of the
+per cpu variable that belongs to the currently executing processor.
+this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
+requires. No processor number is available. Instead the offset of the local
+per cpu area is simply added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu area. They do
+not have addresses although they look like that in the code. Offsets
+cannot be directly dereferenced. The offset must be added to a base pointer of
+a percpu area of a processor in order to form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu operations
+is invalid and will generally be treated like a NULL pointer dereference.
+
+In the context of per cpu operations
+
+	x is a per cpu variable. Most this_cpu operations take a cpu variable.
+
+	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
+		of a per cpu variable which makes this look a bit strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Let's say we have a percpu structure
+
+	struct s {
+		int n,m;
+	};
+
+	DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+	this_cpu_inc(p.m)
+
+	z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+	struct s __percpu *ps = &p;
+
+	z = this_cpu_dec(ps->m);
+
+	z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr() if we
+do not make use of this_cpu ops later to manipulate fields:
+
+	struct s *pp;
+
+	pp = this_cpu_ptr(&p);
+
+	pp->m--;
+
+	z = pp->n++;
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interrupt safe. Some architecture do not support these per
+cpu local operations. In that case the operation must be replaced by code
+that disables interrupts, then does the operations that are guaranteed to be
+atomic and then reenable interrupts. Doing so is expensive. If there are
+other reasons why the scheduler cannot change the processor we are executing
+on then there is no reason to disable interrupts. For that purpose
+the __this_cpu operations are provided. For example.
+
+	__this_cpu_inc(x);
+
+Will increment x and will not fallback to code that disables interrupts on
+platforms that cannot accomplish atomicity through address relocation and
+a Read-Modify-Write operation in the same instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then adds
+the offset of the n field.
+
+The second one first adds the two offsets and then does the relocation.
+IMHO the second form looks cleaner and has an easier time with (). The
+second form also is consistent with the way this_cpu_read() and friends
+are used.
+
+
+Christoph Lameter, April 4th, 2013
+

^ permalink raw reply

* Re: [Suggestion] ISDN: isdnloop: C grammar issue, '}' miss match 'if' and 'switch' statement.
From: David Miller @ 2013-04-04 18:09 UTC (permalink / raw)
  To: gang.chen
  Cc: torvalds, eric.dumazet, mkubecek, fengguang.wu, isdn, netdev, joe
In-Reply-To: <515D3A09.7040202@asianux.com>

From: Chen Gang <gang.chen@asianux.com>
Date: Thu, 04 Apr 2013 16:30:01 +0800

>> Of course, nobody sane actually cares about ISDN any more, so I think
>> this is all pretty academic. I think even Germany (where ISDN *used*
>> to be very common due to telephone monopolies and odd rules) no longer
>> uses it. I can't imagine that anybody else does either.
>> 
> 
>   can we delete it ?

I think the point is no that we can delete it, but rather that we
should concentrate our efforts on code that more people use rather
than trying to clean up antiquated code with very few users.

^ permalink raw reply

* [patch net] net: ipv4: notify when address lifetime changes
From: Jiri Pirko @ 2013-04-04 18:33 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber

if userspace changes lifetime of address, send netlink notification and
call notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv4/devinet.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index f678507..96083b7 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -802,8 +802,10 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg
 		if (nlh->nlmsg_flags & NLM_F_EXCL ||
 		    !(nlh->nlmsg_flags & NLM_F_REPLACE))
 			return -EEXIST;
-
-		set_ifa_lifetime(ifa_existing, valid_lft, prefered_lft);
+		ifa = ifa_existing;
+		set_ifa_lifetime(ifa, valid_lft, prefered_lft);
+		rtmsg_ifa(RTM_NEWADDR, ifa, nlh, NETLINK_CB(skb).portid);
+		blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);
 	}
 	return 0;
 }
-- 
1.8.1.2

^ permalink raw reply related

* Re: this cpu documentation
From: Randy Dunlap @ 2013-04-04 18:35 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo
In-Reply-To: <0000013dd622cebd-a7fee90b-b297-4e92-9143-87f4771718a4-000000@email.amazonses.com>

On 04/04/13 10:40, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Randy Dunlap wrote:
> 
>> Thanks.  I have a few more corrections to V2 (please see below).
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation V3
> 
> Document the rationale and the way to use this_cpu operations.
> 
> V2/V3: Improved after feedback from Randy Dunlap
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-04 12:39:38.479720028 -0500
> @@ -0,0 +1,197 @@
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurirency

                                                                concurrency
again.  but hopefully Tejun has already corrected that.

Thanks.

> +issues with other processors in the system.
> +


-- 
~Randy

^ permalink raw reply

* Re: this cpu documentation
From: Tejun Heo @ 2013-04-04 18:52 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Christoph Lameter, Eric Dumazet, RongQing Li, Shan Wei, netdev
In-Reply-To: <515DC80B.5090403@infradead.org>

On Thu, Apr 04, 2013 at 11:35:55AM -0700, Randy Dunlap wrote:
> > +should be accessing that variable and therefore there are no concurirency
> 
>                                                                 concurrency
> again.  but hopefully Tejun has already corrected that.

Yeap, the committed version is at

  https://git.kernel.org/cgit/linux/kernel/git/tj/percpu.git/tree/Documentation/this_cpu_ops.txt?h=for-3.10

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH net-next 3/3] net/mlx4_en: Enable open-lldp DCB support
From: John Fastabend @ 2013-04-04 19:01 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Or Gerlitz, davem, netdev, amirv
In-Reply-To: <515DA579.3050006@mellanox.com>

On 4/4/2013 9:08 AM, Sagi Grimberg wrote:
> On 4/4/2013 6:58 PM, John Fastabend wrote:
>> On 4/4/2013 7:26 AM, Or Gerlitz wrote:
>>> From: Sagi Grimberg <sagig@mellanox.com>
>>>
>>> The lldpad daemon queries the driver caps via the getcaps and getstate
>>> routines. Added the prpoer dbcnl_ops entries to support that.
>>>
>>> Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
>>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>> ---
>>
>> Does lldpad work now with the mlx4_en driver?
>>
>> Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
>>
>
> Yes.
>
> -Sagi

So I looked at your code and lldpad a bit closer. I think this
just works around a bug in lldpad. Also with this patch lldpad
would try to xmit lldp pkts which would conflict with your firmware
agent right?

For IEEE DCBX modes lldpad should check the return values from
the getdcbx dcbnl_rtnl_ops which you already have filled out. I
would expect your driver to return DCB_CAP_DCBX_LLD_MANAGED to
indicate it is being managed by firmware.

See bnx2x_dcbnl_get_dcbx() for an example of an implementation
which I believe is correct. I'll fix the user space lldpad bug.

Looks like I added my reveiwed-by line too quickly.

Thanks,
.John

^ permalink raw reply

* Re: [net-next 05/13] igb: Support to read and export SFF-8472/8079 data
From: Ben Hutchings @ 2013-04-04 19:07 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, Akeem G. Abodunrin, netdev, gospo, sassmann,
	Aurélien Guillaume
In-Reply-To: <1365075480-20183-6-git-send-email-jeffrey.t.kirsher@intel.com>

On Thu, 2013-04-04 at 04:37 -0700, Jeff Kirsher wrote:
> From: "Akeem G. Abodunrin" <akeem.g.abodunrin@intel.com>
> 
> This patch adds support to read and export SFF-8472/8079 (SFP data)
> over i2c, through Ethtool.
[...]
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
[...]
> +static int igb_get_module_eeprom(struct net_device *netdev,
> +				 struct ethtool_eeprom *ee, u8 *data)
> +{
> +	struct igb_adapter *adapter = netdev_priv(netdev);
> +	struct e1000_hw *hw = &adapter->hw;
> +	u32 status = E1000_SUCCESS;
> +	u16 dataword = 0xFF;
> +	int i = 0;
> +
> +	/* Read the first block, SFF-8079, 2 bytes at a time */
> +	for (i = 0; i < ETH_MODULE_SFF_8079_LEN; i += 2) {
> +		status = igb_read_phy_reg_i2c(hw, i, &dataword);
> +		if (status != E1000_SUCCESS)
> +			/* Error occurred while reading module */
> +			return -EIO;
> +
> +		data[i] = (dataword >> 8) & 0xFF;
> +		data[i+1] = dataword & 0xFF;
> +	}

What if ee->offset != 0 or ee->len < ETH_MODULE_SFF_8079_LEN?

> +	/* If the second block is requested, check if SFF-8472 is supported. */
> +	if (ee->len == ETH_MODULE_SFF_8472_LEN) {
> +		if (data[IGB_SFF_8472_COMP] == IGB_SFF_8472_UNSUP)
> +			return -EOPNOTSUPP;
> +
> +		/* Read the second block, SFF-8472, 2 bytes at a time */
> +		for (i = ETH_MODULE_SFF_8079_LEN;
> +		     i < ETH_MODULE_SFF_8472_LEN; i += 2) {
> +			status = igb_read_phy_reg_i2c(hw, i, &dataword);
> +			if (status != E1000_SUCCESS)
> +				return -EIO;
> +
> +			data[i] = (dataword >> 8) & 0xFF;
> +			data[i+1] = dataword & 0xFF;
> +		}
> +	}
[...]

What if ee->offset != 0 or ETH_MODULE_SFF_8079_LEN < ee->len <
ETH_MODULE_SFF_8472_LEN?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox