* Re: [net-next 09/13] igb: Pull adapter out of main path in igb_xmit_frame_ring
From: Sergei Shtylyov @ 2013-04-04 14:15 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, Alexander Duyck, netdev, gospo, sassmann
In-Reply-To: <1365075480-20183-10-git-send-email-jeffrey.t.kirsher@intel.com>
Hello.
On 04-04-2013 15:37, Jeff Kirsher wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> We only need the adapter pointer in the case of ptp. As such we can pull the
> adapter out of the main path and place it inside the if statement to avoid
> the temptation of accessing the adapter pointer in the fast path.
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Two minor nits.
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 29facb5..6043245 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
[...]
> @@ -4628,15 +4627,17 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
>
> skb_tx_timestamp(skb);
>
> - if (unlikely((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
> - !(adapter->ptp_tx_skb))) {
> - skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> - tx_flags |= IGB_TX_FLAGS_TSTAMP;
> + if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
> + struct igb_adapter *adapter = netdev_priv(tx_ring->netdev);
An empty line wouldn't hurt here.
> + if (!(adapter->ptp_tx_skb)) {
Parens not needed here.
WBR, Sergei
^ permalink raw reply
* Re: [net-next 07/13] igb: random code and comments fix
From: Sergei Shtylyov @ 2013-04-04 14:22 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, Akeem G. Abodunrin, netdev, gospo, sassmann
In-Reply-To: <1365075480-20183-8-git-send-email-jeffrey.t.kirsher@intel.com>
Hello.
On 04-04-2013 15:37, Jeff Kirsher wrote:
> From: "Akeem G. Abodunrin" <akeem.g.abodunrin@intel.com>
> This patch fixes code and comments as identified in the driver.
It seems you are doing 3 different things in 3 different files in this
patch... it would be better to split it up.
> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[...]
> diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c b/drivers/net/ethernet/intel/igb/e1000_mac.c
> index a5c7200..5d407f4 100644
> --- a/drivers/net/ethernet/intel/igb/e1000_mac.c
> +++ b/drivers/net/ethernet/intel/igb/e1000_mac.c
> @@ -1007,9 +1007,9 @@ s32 igb_config_fc_after_link_up(struct e1000_hw *hw)
> * be asked to delay transmission of packets than asking
> * our link partner to pause transmission of frames.
> */
> - else if ((hw->fc.requested_mode == e1000_fc_none ||
> - hw->fc.requested_mode == e1000_fc_tx_pause) ||
> - hw->fc.strict_ieee) {
> + else if ((hw->fc.requested_mode == e1000_fc_none) ||
> + (hw->fc.requested_mode == e1000_fc_tx_pause) ||
> + (hw->fc.strict_ieee)) {
The code was alright bnefore this change, so this isn't really a fix at
all. Aside of that, () around == are not needed, and yet less around
'hw->fc.strict_ieee'.
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index fb162ef..8752f4f 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3859,9 +3859,8 @@ static bool igb_thermal_sensor_event(struct e1000_hw *hw, u32 event)
> ctrl_ext = rd32(E1000_CTRL_EXT);
>
> if ((hw->phy.media_type == e1000_media_type_copper) &&
> - !(ctrl_ext & E1000_CTRL_EXT_LINK_MODE_SGMII)) {
> + !(ctrl_ext & E1000_CTRL_EXT_LINK_MODE_SGMII))
> ret = !!(thstat & event);
> - }
This was checkpatch.pl's message right?
WBR, Sergei
^ permalink raw reply
* Re: [PERCPU] Remove & in front of this_cpu_ptr
From: Tejun Heo @ 2013-04-04 14:25 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev
In-Reply-To: <0000013dd56ce988-aff8039a-dcd8-4267-b1c8-b8db60b4e4cd-000000@email.amazonses.com>
On Thu, Apr 04, 2013 at 02:21:57PM +0000, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Tejun Heo wrote:
>
> > Right, this is true, and we *do* wanna support this_cpu ops other than
> > this_cpu_ptr on per-cpu struct fields. The usage is still somewhat
> > unusual tho. Can we please add documentation in the comments too?
>
> I posted a patch adding documentation yesterday and you took it.
> ???
>
> Add comments where?
I was thinking above this_cpu_*() ops. Let's make it as conspicious
as reasonably possible. It's a similar problem with declaring per-cpu
arrays - there are a couple ways to do it and there's no way to
automatically reject the one which isn't preferred. I don't know.
Maybe all we can do is periodic sweep through the source tree and fix
up the "wrong" ones.
Thanks.
--
tejun
^ permalink raw reply
* [PATCH net-next 0/3] Mellanox Core and Ethernet driver updates 2013-04-04
From: Or Gerlitz @ 2013-04-04 14:26 UTC (permalink / raw)
To: davem; +Cc: netdev, amirv, Or Gerlitz
Hi Dave,
Here's a batch of mlx4 driver updates for 3.10, mostly DCB related.
Series done against the net-next tree as of commit a210576c "Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"
Or.
Or Gerlitz (2):
net/mlx4_core: Added proper description for two device capabilities
net/mlx4_en: Enable DCB ETS ops only when supported by the firmware
Sagi Grimberg (1):
net/mlx4_en: Enable open-lldp DCB support
drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c | 42 +++++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 12 ++++++-
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 ++-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 5 ++-
include/linux/mlx4/device.h | 1 +
5 files changed, 59 insertions(+), 5 deletions(-)
^ permalink raw reply
* [PATCH net-next 1/3] net/mlx4_core: Added proper description for two device capabilities
From: Or Gerlitz @ 2013-04-04 14:26 UTC (permalink / raw)
To: davem; +Cc: netdev, amirv, Or Gerlitz
In-Reply-To: <1365085574-12057-1-git-send-email-ogerlitz@mellanox.com>
Added readable description for the DPDP and port sensing device capabilities.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index f624557..8764397 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -91,7 +91,7 @@ static void dump_dev_cap_flags(struct mlx4_dev *dev, u64 flags)
[ 8] = "P_Key violation counter",
[ 9] = "Q_Key violation counter",
[10] = "VMM",
- [12] = "DPDP",
+ [12] = "Dual Port Different Protocol (DPDP) support",
[15] = "Big LSO headers",
[16] = "MW support",
[17] = "APM support",
@@ -109,6 +109,7 @@ static void dump_dev_cap_flags(struct mlx4_dev *dev, u64 flags)
[41] = "Unicast VEP steering support",
[42] = "Multicast VEP steering support",
[48] = "Counters support",
+ [55] = "Port link type sensing support",
[59] = "Port management change event support",
[61] = "64 byte EQE support",
[62] = "64 byte CQE support",
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 2/3] net/mlx4_en: Enable DCB ETS ops only when supported by the firmware
From: Or Gerlitz @ 2013-04-04 14:26 UTC (permalink / raw)
To: davem; +Cc: netdev, amirv, Or Gerlitz, Eugenia Emantayev
In-Reply-To: <1365085574-12057-1-git-send-email-ogerlitz@mellanox.com>
Enable the DCB ETS ops only when supported by the firmware. For older firmware/cards
which don't support ETS, advertize only PFC DCB ops.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c | 5 +++++
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 10 ++++++++--
drivers/net/ethernet/mellanox/mlx4/fw.c | 1 +
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
include/linux/mlx4/device.h | 1 +
5 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
index b799ab1..b7dc59f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
@@ -253,3 +253,8 @@ const struct dcbnl_rtnl_ops mlx4_en_dcbnl_ops = {
.getdcbx = mlx4_en_dcbnl_getdcbx,
.setdcbx = mlx4_en_dcbnl_setdcbx,
};
+
+const struct dcbnl_rtnl_ops mlx4_en_dcbnl_pfc_ops = {
+ .ieee_getpfc = mlx4_en_dcbnl_ieee_getpfc,
+ .ieee_setpfc = mlx4_en_dcbnl_ieee_setpfc,
+};
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 61b5678..62795b5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2013,8 +2013,14 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
INIT_WORK(&priv->linkstate_task, mlx4_en_linkstate);
INIT_DELAYED_WORK(&priv->stats_task, mlx4_en_do_get_stats);
#ifdef CONFIG_MLX4_EN_DCB
- if (!mlx4_is_slave(priv->mdev->dev))
- dev->dcbnl_ops = &mlx4_en_dcbnl_ops;
+ if (!mlx4_is_slave(priv->mdev->dev)) {
+ if (mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_SET_ETH_SCHED) {
+ dev->dcbnl_ops = &mlx4_en_dcbnl_ops;
+ } else {
+ en_info(priv, "enabling only PFC DCB ops\n");
+ dev->dcbnl_ops = &mlx4_en_dcbnl_pfc_ops;
+ }
+ }
#endif
for (i = 0; i < MLX4_EN_MAC_HASH_SIZE; ++i)
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 8764397..ab470d9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -109,6 +109,7 @@ static void dump_dev_cap_flags(struct mlx4_dev *dev, u64 flags)
[41] = "Unicast VEP steering support",
[42] = "Multicast VEP steering support",
[48] = "Counters support",
+ [53] = "Port ETS Scheduler support",
[55] = "Port link type sensing support",
[59] = "Port management change event support",
[61] = "64 byte EQE support",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index f710b7c..d4cb5d3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -624,6 +624,7 @@ int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port);
#ifdef CONFIG_MLX4_EN_DCB
extern const struct dcbnl_rtnl_ops mlx4_en_dcbnl_ops;
+extern const struct dcbnl_rtnl_ops mlx4_en_dcbnl_pfc_ops;
#endif
int mlx4_en_setup_tc(struct net_device *dev, u8 up);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 811f91c..1bc5a75 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -140,6 +140,7 @@ enum {
MLX4_DEV_CAP_FLAG_VEP_UC_STEER = 1LL << 41,
MLX4_DEV_CAP_FLAG_VEP_MC_STEER = 1LL << 42,
MLX4_DEV_CAP_FLAG_COUNTERS = 1LL << 48,
+ MLX4_DEV_CAP_FLAG_SET_ETH_SCHED = 1LL << 53,
MLX4_DEV_CAP_FLAG_SENSE_SUPPORT = 1LL << 55,
MLX4_DEV_CAP_FLAG_PORT_MNG_CHG_EV = 1LL << 59,
MLX4_DEV_CAP_FLAG_64B_EQE = 1LL << 61,
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 3/3] net/mlx4_en: Enable open-lldp DCB support
From: Or Gerlitz @ 2013-04-04 14:26 UTC (permalink / raw)
To: davem; +Cc: netdev, amirv, Sagi Grimberg, Or Gerlitz
In-Reply-To: <1365085574-12057-1-git-send-email-ogerlitz@mellanox.com>
From: Sagi Grimberg <sagig@mellanox.com>
The lldpad daemon queries the driver caps via the getcaps and getstate
routines. Added the prpoer dbcnl_ops entries to support that.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c | 37 +++++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 ++-
3 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
index b7dc59f..f9f9164 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
@@ -36,6 +36,31 @@
#include "mlx4_en.h"
+static u8 mlx4_en_dcbnl_getcap(struct net_device *dev, int capid, u8 *cap)
+{
+ struct mlx4_en_priv *priv = netdev_priv(dev);
+
+ switch (capid) {
+ case DCB_CAP_ATTR_PFC:
+ *cap = true;
+ break;
+ case DCB_CAP_ATTR_UP2TC:
+ if (priv->mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_SET_ETH_SCHED)
+ *cap = true;
+ else
+ *cap = false;
+ break;
+ case DCB_CAP_ATTR_DCBX:
+ *cap = priv->dcbx_cap;
+ break;
+ default:
+ *cap = false;
+ break;
+ }
+
+ return 0;
+}
+
static int mlx4_en_dcbnl_ieee_getets(struct net_device *dev,
struct ieee_ets *ets)
{
@@ -217,6 +242,13 @@ static int mlx4_en_dcbnl_ieee_getmaxrate(struct net_device *dev,
return 0;
}
+static u8 mlx4_en_dcbnl_get_state(struct net_device *dev)
+{
+ struct mlx4_en_priv *priv = netdev_priv(dev);
+
+ return !!(priv->flags & MLX4_EN_FLAG_DCB_ENABLED);
+}
+
static int mlx4_en_dcbnl_ieee_setmaxrate(struct net_device *dev,
struct ieee_maxrate *maxrate)
{
@@ -243,18 +275,21 @@ static int mlx4_en_dcbnl_ieee_setmaxrate(struct net_device *dev,
}
const struct dcbnl_rtnl_ops mlx4_en_dcbnl_ops = {
+ .getstate = mlx4_en_dcbnl_get_state,
.ieee_getets = mlx4_en_dcbnl_ieee_getets,
.ieee_setets = mlx4_en_dcbnl_ieee_setets,
.ieee_getmaxrate = mlx4_en_dcbnl_ieee_getmaxrate,
.ieee_setmaxrate = mlx4_en_dcbnl_ieee_setmaxrate,
.ieee_getpfc = mlx4_en_dcbnl_ieee_getpfc,
.ieee_setpfc = mlx4_en_dcbnl_ieee_setpfc,
-
+ .getcap = mlx4_en_dcbnl_getcap,
.getdcbx = mlx4_en_dcbnl_getdcbx,
.setdcbx = mlx4_en_dcbnl_setdcbx,
};
const struct dcbnl_rtnl_ops mlx4_en_dcbnl_pfc_ops = {
+ .getstate = mlx4_en_dcbnl_get_state,
.ieee_getpfc = mlx4_en_dcbnl_ieee_getpfc,
.ieee_setpfc = mlx4_en_dcbnl_ieee_setpfc,
+ .getcap = mlx4_en_dcbnl_getcap,
};
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 62795b5..a390de0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2013,6 +2013,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
INIT_WORK(&priv->linkstate_task, mlx4_en_linkstate);
INIT_DELAYED_WORK(&priv->stats_task, mlx4_en_do_get_stats);
#ifdef CONFIG_MLX4_EN_DCB
+ priv->dcbx_cap = DCB_CAP_DCBX_HOST;
+ priv->flags |= MLX4_EN_FLAG_DCB_ENABLED;
if (!mlx4_is_slave(priv->mdev->dev)) {
if (mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_SET_ETH_SCHED) {
dev->dcbnl_ops = &mlx4_en_dcbnl_ops;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d4cb5d3..71960b1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -440,7 +440,8 @@ enum {
MLX4_EN_FLAG_ENABLE_HW_LOOPBACK = (1 << 2),
/* whether we need to drop packets that hardware loopback-ed */
MLX4_EN_FLAG_RX_FILTER_NEEDED = (1 << 3),
- MLX4_EN_FLAG_FORCE_PROMISC = (1 << 4)
+ MLX4_EN_FLAG_FORCE_PROMISC = (1 << 4),
+ MLX4_EN_FLAG_DCB_ENABLED = (1 << 5)
};
#define MLX4_EN_MAC_HASH_SIZE (1 << BITS_PER_BYTE)
@@ -529,6 +530,7 @@ struct mlx4_en_priv {
#ifdef CONFIG_MLX4_EN_DCB
struct ieee_ets ets;
u16 maxrate[IEEE_8021QAZ_MAX_TCS];
+ u8 dcbx_cap;
#endif
#ifdef CONFIG_RFS_ACCEL
spinlock_t filters_lock;
--
1.7.1
^ permalink raw reply related
* Re: [PERCPU] Remove & in front of this_cpu_ptr
From: Eric Dumazet @ 2013-04-04 14:29 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Tejun Heo, RongQing Li, Shan Wei, netdev
In-Reply-To: <0000013dd5517d80-151c27ac-ebf3-4ec7-bcd4-c85a7975852e-000000@email.amazonses.com>
On Thu, 2013-04-04 at 13:52 +0000, Christoph Lameter wrote:
> On Wed, 3 Apr 2013, Eric Dumazet wrote:
>
> > I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field
> >
> > The offset is added after getting the address of the (percpu) base
> > object.
>
> There are two offsets being added!
I was speaking of the offsetof(struct ..., field), not on the 'offset'
you think (the percpu one).
Thats why I prefer &this_cpu_ptr(percpu_pointer)->field
Its clearer for me, but thats a very minor issue.
^ permalink raw reply
* Re: [PERCPU] Remove & in front of this_cpu_ptr
From: Christoph Lameter @ 2013-04-04 14:21 UTC (permalink / raw)
To: Tejun Heo; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev
In-Reply-To: <20130404140040.GB9425@htj.dyndns.org>
On Thu, 4 Apr 2013, Tejun Heo wrote:
> Right, this is true, and we *do* wanna support this_cpu ops other than
> this_cpu_ptr on per-cpu struct fields. The usage is still somewhat
> unusual tho. Can we please add documentation in the comments too?
I posted a patch adding documentation yesterday and you took it.
???
Add comments where?
^ permalink raw reply
* [PATCH RFC 0/2] Add IPv6 tokenized interface identifier support
From: Daniel Borkmann @ 2013-04-04 14:37 UTC (permalink / raw)
To: davem; +Cc: netdev
This RFC patchset adds IPv6 tokenized interface identifier
support for the net-next kernel as well as for iproute2 in
order to configure a networking device for IPv6 Token IIDs.
For a more detailled description, have a look at the two
patches directly.
--
1.7.11.7
^ permalink raw reply
* [PATCH net-next 1/2] net: ipv6: add tokenized interface identifier support
From: Daniel Borkmann @ 2013-04-04 14:37 UTC (permalink / raw)
To: davem; +Cc: netdev, Hannes Frederic Sowa, YOSHIFUJI Hideaki
In-Reply-To: <1365086258-4512-1-git-send-email-dborkman@redhat.com>
This patch adds support for tokenized IIDs, that allow for
administrators to assign well-known host-part addresses to
nodes whilst still obtaining global network prefix from
Router Advertisements. It is currently in IETF RFC draft
status [1]:
The primary target for such support is server platforms
where addresses are usually manually configured, rather
than using DHCPv6 or SLAAC. By using tokenised identifiers,
hosts can still determine their network prefix by use of
SLAAC, but more readily be automatically renumbered should
their network prefix change.
[1] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
The implementation is partially based on top of Mark K.
Thompson's proof of concept. Successfully tested by myself.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
include/net/if_inet6.h | 2 +
include/net/ipv6.h | 2 +
include/uapi/linux/if_link.h | 1 +
net/ipv6/addrconf.c | 87 ++++++++++++++++++++++++++++++++++++++++-
net/ipv6/addrconf_core.c | 2 -
5 files changed, 89 insertions(+), 5 deletions(-)
diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 9356322..f1063d6 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -187,6 +187,8 @@ struct inet6_dev {
struct list_head tempaddr_list;
#endif
+ struct in6_addr token;
+
struct neigh_parms *nd_parms;
struct inet6_dev *next;
struct ipv6_devconf cnf;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 0810aa5..da8c11e 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -88,6 +88,8 @@
#define IPV6_ADDR_SCOPE_ORGLOCAL 0x08
#define IPV6_ADDR_SCOPE_GLOBAL 0x0e
+#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16)
+
/*
* Addr flags
*/
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c4edfe1..6b35c42 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -201,6 +201,7 @@ enum {
IFLA_INET6_MCAST, /* MC things. What of them? */
IFLA_INET6_CACHEINFO, /* time values and max reasm size */
IFLA_INET6_ICMP6STATS, /* statistics (icmpv6) */
+ IFLA_INET6_TOKEN, /* device token */
__IFLA_INET6_MAX
};
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a33b157..fb0e8a0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -422,6 +422,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
ipv6_regen_rndid((unsigned long) ndev);
}
#endif
+ memset(ndev->token.s6_addr, 0, sizeof(ndev->token.s6_addr));
if (netif_running(dev) && addrconf_qdisc_ok(dev))
ndev->if_flags |= IF_READY;
@@ -2136,8 +2137,14 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
if (pinfo->prefix_len == 64) {
memcpy(&addr, &pinfo->prefix, 8);
- if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
- ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
+
+ if (!ipv6_addr_any(&in6_dev->token)) {
+ read_lock_bh(&in6_dev->lock);
+ memcpy(addr.s6_addr + 8,
+ in6_dev->token.s6_addr + 8, 8);
+ read_unlock_bh(&in6_dev->lock);
+ } else if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
+ ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
in6_dev_put(in6_dev);
return;
}
@@ -4165,7 +4172,8 @@ static inline size_t inet6_ifla6_size(void)
+ nla_total_size(sizeof(struct ifla_cacheinfo))
+ nla_total_size(DEVCONF_MAX * 4) /* IFLA_INET6_CONF */
+ nla_total_size(IPSTATS_MIB_MAX * 8) /* IFLA_INET6_STATS */
- + nla_total_size(ICMP6_MIB_MAX * 8); /* IFLA_INET6_ICMP6STATS */
+ + nla_total_size(ICMP6_MIB_MAX * 8) /* IFLA_INET6_ICMP6STATS */
+ + nla_total_size(sizeof(struct in6_addr)); /* IFLA_INET6_TOKEN */
}
static inline size_t inet6_if_nlmsg_size(void)
@@ -4252,6 +4260,13 @@ static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev)
goto nla_put_failure;
snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_ICMP6STATS, nla_len(nla));
+ nla = nla_reserve(skb, IFLA_INET6_TOKEN, sizeof(struct in6_addr));
+ if (nla == NULL)
+ goto nla_put_failure;
+ read_lock_bh(&idev->lock);
+ memcpy(nla_data(nla), idev->token.s6_addr, nla_len(nla));
+ read_unlock_bh(&idev->lock);
+
return 0;
nla_put_failure:
@@ -4279,6 +4294,71 @@ static int inet6_fill_link_af(struct sk_buff *skb, const struct net_device *dev)
return 0;
}
+static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
+{
+ struct in6_addr ll_addr;
+ struct inet6_ifaddr *ifp;
+ struct net_device *dev = idev->dev;
+
+ if (token == NULL)
+ return -EINVAL;
+ if (ipv6_addr_any(token))
+ return -EINVAL;
+ if (dev->flags & (IFF_LOOPBACK | IFF_NOARP))
+ return -EINVAL;
+ if (idev->dead || !(idev->if_flags & IF_READY))
+ return -EINVAL;
+ if (!ipv6_accept_ra(idev))
+ return -EINVAL;
+ if (idev->cnf.rtr_solicits <= 0)
+ return -EINVAL;
+
+ write_lock_bh(&idev->lock);
+
+ BUILD_BUG_ON(sizeof(token->s6_addr) != 16);
+ memcpy(idev->token.s6_addr + 8, token->s6_addr + 8, 8);
+
+ write_unlock_bh(&idev->lock);
+
+ ipv6_get_lladdr(dev, &ll_addr, IFA_F_TENTATIVE | IFA_F_OPTIMISTIC);
+ ndisc_send_rs(dev, &ll_addr, &in6addr_linklocal_allrouters);
+
+ write_lock_bh(&idev->lock);
+ idev->if_flags |= IF_RS_SENT;
+
+ /* Well, that's kinda nasty ... */
+ list_for_each_entry(ifp, &idev->addr_list, if_list) {
+ spin_lock(&ifp->lock);
+ if (__ipv6_addr_type(&ifp->addr) &
+ IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)) {
+ ifp->valid_lft = 0;
+ ifp->prefered_lft = 0;
+ }
+ spin_unlock(&ifp->lock);
+ }
+
+ write_unlock_bh(&idev->lock);
+ return 0;
+}
+
+static int inet6_set_link_af(struct net_device *dev, const struct nlattr *nla)
+{
+ int err = -EINVAL;
+ struct inet6_dev *idev = __in6_dev_get(dev);
+ struct nlattr *tb[IFLA_INET6_MAX + 1];
+
+ if (!idev)
+ return -EAFNOSUPPORT;
+
+ if (nla_parse_nested(tb, IFLA_INET6_MAX, nla, NULL) < 0)
+ BUG();
+
+ if (tb[IFLA_INET6_TOKEN])
+ err = inet6_set_iftoken(idev, nla_data(tb[IFLA_INET6_TOKEN]));
+
+ return err;
+}
+
static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev,
u32 portid, u32 seq, int event, unsigned int flags)
{
@@ -4981,6 +5061,7 @@ static struct rtnl_af_ops inet6_ops = {
.family = AF_INET6,
.fill_link_af = inet6_fill_link_af,
.get_link_af_size = inet6_get_link_af_size,
+ .set_link_af = inet6_set_link_af,
};
/*
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index d051e5f..8b723de 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -6,8 +6,6 @@
#include <linux/export.h>
#include <net/ipv6.h>
-#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16)
-
static inline unsigned int ipv6_addr_scope2type(unsigned int scope)
{
switch (scope) {
--
1.7.1
^ permalink raw reply related
* [PATCH iproute2 2/2] ip: ipv6: add tokenized interface identifier support
From: Daniel Borkmann @ 2013-04-04 14:37 UTC (permalink / raw)
To: davem; +Cc: netdev, Hannes Frederic Sowa, YOSHIFUJI Hideaki
In-Reply-To: <1365086258-4512-1-git-send-email-dborkman@redhat.com>
This is experimental support for tokenized IIDs, that enable
administrators to assign well-known host-part addresses to nodes
whilst still obtaining global network prefix from Router
Advertisements. It is currently in IETF RFC draft status [1].
Example commands with iproute2:
Setting a device token:
# ip token set ::1a:2b:3c:4d/64 dev eth1
Getting a device token:
# ip token get dev eth1
token ::1a:2b:3c:4d dev eth1
Listing all tokens:
# ip token list (or: ip token)
token :: dev eth0
token ::1a:2b:3c:4d dev eth1
[1] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
include/linux/if_link.h | 1 +
ip/Makefile | 2 +-
ip/ip.c | 3 +-
ip/ip_common.h | 1 +
ip/iptoken.c | 208 ++++++++++++++++++++++++++++++++++++++++++++++++
man/man8/Makefile | 2 +-
man/man8/ip-token.8 | 66 +++++++++++++++
7 files changed, 280 insertions(+), 3 deletions(-)
create mode 100644 ip/iptoken.c
create mode 100644 man/man8/ip-token.8
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 40167af..f3a1b29 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -199,6 +199,7 @@ enum {
IFLA_INET6_MCAST, /* MC things. What of them? */
IFLA_INET6_CACHEINFO, /* time values and max reasm size */
IFLA_INET6_ICMP6STATS, /* statistics (icmpv6) */
+ IFLA_INET6_TOKEN, /* device token */
__IFLA_INET6_MAX
};
diff --git a/ip/Makefile b/ip/Makefile
index 2b606d4..48bd4a1 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -1,6 +1,6 @@
IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
- ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
+ ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o iptoken.o \
ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o \
diff --git a/ip/ip.c b/ip/ip.c
index e10ddb2..69bd5ff 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -45,7 +45,7 @@ static void usage(void)
" ip [ -force ] -batch filename\n"
"where OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable |\n"
" tunnel | tuntap | maddr | mroute | mrule | monitor | xfrm |\n"
-" netns | l2tp | tcp_metrics }\n"
+" netns | l2tp | tcp_metrics | token }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
" -f[amily] { inet | inet6 | ipx | dnet | bridge | link } |\n"
" -4 | -6 | -I | -D | -B | -0 |\n"
@@ -80,6 +80,7 @@ static const struct cmd {
{ "tunl", do_iptunnel },
{ "tuntap", do_iptuntap },
{ "tap", do_iptuntap },
+ { "token", do_iptoken },
{ "tcpmetrics", do_tcp_metrics },
{ "tcp_metrics",do_tcp_metrics },
{ "monitor", do_ipmonitor },
diff --git a/ip/ip_common.h b/ip/ip_common.h
index de56810..f9b4734 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -49,6 +49,7 @@ extern int do_xfrm(int argc, char **argv);
extern int do_ipl2tp(int argc, char **argv);
extern int do_tcp_metrics(int argc, char **argv);
extern int do_ipnetconf(int argc, char **argv);
+extern int do_iptoken(int argc, char **argv);
static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
{
diff --git a/ip/iptoken.c b/ip/iptoken.c
new file mode 100644
index 0000000..1dd071d
--- /dev/null
+++ b/ip/iptoken.c
@@ -0,0 +1,208 @@
+/*
+ * iptoken.c "ip token"
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: Daniel Borkmann, <borkmann@redhat.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <netinet/ip.h>
+#include <arpa/inet.h>
+#include <linux/types.h>
+#include <linux/if.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+extern struct rtnl_handle rth;
+
+struct rtnl_dump_args {
+ FILE *fp;
+ int ifindex;
+};
+
+static void usage(void) __attribute__((noreturn));
+
+static void usage(void)
+{
+ fprintf(stderr, "Usage: ip token [ list | set | get ] [ TOKEN ] [ dev DEV ]\n");
+ exit(-1);
+}
+
+static int print_token(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+ struct rtnl_dump_args *args = arg;
+ FILE *fp = args->fp;
+ int ifindex = args->ifindex;
+ struct ifinfomsg *ifi = NLMSG_DATA(n);
+ int len = n->nlmsg_len;
+ struct rtattr *tb[IFLA_MAX + 1];
+ struct rtattr *ltb[IFLA_INET6_MAX + 1];
+ char abuf[256];
+
+ if (n->nlmsg_type != RTM_NEWLINK)
+ return -1;
+
+ len -= NLMSG_LENGTH(sizeof(*ifi));
+ if (len < 0)
+ return -1;
+
+ if (ifi->ifi_family != AF_INET6)
+ return -1;
+ if (ifi->ifi_index == 0)
+ return -1;
+ if (ifindex > 0 && ifi->ifi_index != ifindex)
+ return 0;
+ if (ifi->ifi_flags & (IFF_LOOPBACK | IFF_NOARP))
+ return 0;
+
+ parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+ if (!tb[IFLA_PROTINFO])
+ return -1;
+
+ parse_rtattr_nested(ltb, IFLA_INET6_MAX, tb[IFLA_PROTINFO]);
+ if (!ltb[IFLA_INET6_TOKEN])
+ return -1;
+
+ fprintf(fp, "token %s ",
+ format_host(ifi->ifi_family,
+ RTA_PAYLOAD(ltb[IFLA_INET6_TOKEN]),
+ RTA_DATA(ltb[IFLA_INET6_TOKEN]),
+ abuf, sizeof(abuf)));
+ fprintf(fp, "dev %s ", ll_index_to_name(ifi->ifi_index));
+ fprintf(fp, "\n");
+ fflush(fp);
+
+ return 0;
+}
+
+static int iptoken_list(int argc, char **argv)
+{
+ int af = AF_INET6;
+ struct rtnl_dump_args da;
+ const struct rtnl_dump_filter_arg a[2] = {
+ { .filter = print_token, .arg1 = &da, },
+ { .filter = NULL, .arg1 = NULL, },
+ };
+
+ memset(&da, 0, sizeof(da));
+ da.fp = stdout;
+
+ while (argc > 0) {
+ if (strcmp(*argv, "dev") == 0) {
+ NEXT_ARG();
+ if ((da.ifindex = ll_name_to_index(*argv)) == 0)
+ invarg("dev is invalid\n", *argv);
+ break;
+ }
+ argc--; argv++;
+ }
+
+ if (rtnl_wilddump_request(&rth, af, RTM_GETLINK) < 0) {
+ perror("Cannot send dump request");
+ return -1;
+ }
+
+ if (rtnl_dump_filter_l(&rth, a) < 0) {
+ fprintf(stderr, "Dump terminated\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int iptoken_set(int argc, char **argv)
+{
+ struct {
+ struct nlmsghdr n;
+ struct ifinfomsg ifi;
+ char buf[512];
+ } req;
+ struct rtattr *afs, *afs6;
+ bool have_token = false, have_dev = false;
+ inet_prefix addr;
+
+ memset(&addr, 0, sizeof(addr));
+ memset(&req, 0, sizeof(req));
+
+ req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+ req.n.nlmsg_flags = NLM_F_REQUEST;
+ req.n.nlmsg_type = RTM_SETLINK;
+ req.ifi.ifi_family = AF_INET6;
+
+ while (argc > 0) {
+ if (strcmp(*argv, "dev") == 0) {
+ NEXT_ARG();
+ if (!have_dev) {
+ if ((req.ifi.ifi_index =
+ ll_name_to_index(*argv)) == 0)
+ invarg("dev is invalid\n", *argv);
+ have_dev = true;
+ }
+ } else {
+ if (matches(*argv, "help") == 0)
+ usage();
+ if (!have_token) {
+ afs = addattr_nest(&req.n, sizeof(req), IFLA_AF_SPEC);
+ afs6 = addattr_nest(&req.n, sizeof(req), AF_INET6);
+ get_prefix(&addr, *argv, req.ifi.ifi_family);
+ addattr_l(&req.n, sizeof(req), IFLA_INET6_TOKEN,
+ &addr.data, addr.bytelen);
+ addattr_nest_end(&req.n, afs6);
+ addattr_nest_end(&req.n, afs);
+ have_token = true;
+ }
+ }
+ argc--; argv++;
+ }
+
+ if (!have_token) {
+ fprintf(stderr, "Not enough information: token "
+ "is required.\n");
+ return -1;
+ }
+ if (!have_dev) {
+ fprintf(stderr, "Not enough information: \"dev\" "
+ "argument is required.\n");
+ return -1;
+ }
+
+ if (rtnl_talk(&rth, &req.n, 0, 0, NULL) < 0)
+ return -2;
+
+ return 0;
+}
+
+int do_iptoken(int argc, char **argv)
+{
+ ll_init_map(&rth);
+
+ if (argc < 1) {
+ return iptoken_list(0, NULL);
+ } else if (matches(argv[0], "list") == 0 ||
+ matches(argv[0], "show") == 0) {
+ return iptoken_list(argc - 1, argv + 1);
+ } else if (matches(argv[0], "set") == 0 ||
+ matches(argv[0], "add") == 0) {
+ return iptoken_set(argc - 1, argv + 1);
+ } else if (matches(argv[0], "get") == 0) {
+ return iptoken_list(argc - 1, argv + 1);
+ } else if (matches(argv[0], "help") == 0)
+ usage();
+
+ fprintf(stderr, "Command \"%s\" is unknown, try \"ip token help\".\n", *argv);
+ exit(-1);
+}
diff --git a/man/man8/Makefile b/man/man8/Makefile
index d208f3b..ff80c98 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -9,7 +9,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 ss.8 \
ip-addrlabel.8 ip-l2tp.8 \
ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8 \
ip-netns.8 ip-ntable.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 \
- ip-tcp_metrics.8 ip-netconf.8
+ ip-tcp_metrics.8 ip-netconf.8 ip-token.8
all: $(TARGETS)
diff --git a/man/man8/ip-token.8 b/man/man8/ip-token.8
new file mode 100644
index 0000000..2085cb5
--- /dev/null
+++ b/man/man8/ip-token.8
@@ -0,0 +1,66 @@
+.TH IP\-TOKEN 8 "28 Mar 2013" "iproute2" "Linux"
+.SH "NAME"
+ip-token \- tokenized interface identifer support
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.B ip token
+.RI " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.BR "ip token" " { " set " } "
+.IR TOKEN
+.B dev
+.IR DEV
+
+.ti -8
+.BR "ip token" " { " get " } "
+.B dev
+.IR DEV
+
+.ti -8
+.BR "ip token" " { " list " }"
+
+.SH "DESCRIPTION"
+IPv6 tokenized interface identifer support is used for assigning well-known
+host-part addresses to nodes whilst still obtaining a global network prefix
+from Router advertisements. The primary target for tokenized identifiers is
+server platforms where addresses are usually manually configured, rather than
+using DHCPv6 or SLAAC. By using tokenized identifiers, hosts can still
+determine their network prefix by use of SLAAC, but more readily be
+automatically renumbered should their network prefix change [1]. Tokenized
+IPv6 Identifiers are described in the RFC draft
+[1]: <draft-chown-6man-tokenised-ipv6-identifiers-02>.
+
+.SS ip token set - set an interface token
+set the interface token to the kernel. Once a token is set, it cannot be
+removed from the interface, only overwritten.
+.TP
+.I TOKEN
+the interface identifer token address.
+.TP
+.BI dev " DEV"
+the networking interface.
+
+.SS ip token get - get the interface token from the kernel
+show a tokenized interface identifer of a particular networking device.
+.B Arguments:
+coincide with the arguments of
+.B ip token set
+but the
+.I TOKEN
+must be left out.
+.SS ip token list - list all interface tokens
+list all tokenized interface identifers for the networking interfaces from
+the kernel.
+
+.SH SEE ALSO
+.br
+.BR ip (8)
+
+.SH AUTHOR
+Manpage by Daniel Borkmann
--
1.7.11.7
^ permalink raw reply related
* Re: [Suggestion] ISDN: isdnloop: C grammar issue, '}' miss match 'if' and 'switch' statement.
From: Joe Perches @ 2013-04-04 14:42 UTC (permalink / raw)
To: Chen Gang
Cc: Michal Kubecek, fengguang.wu, isdn, Linus Torvalds, David Miller,
netdev
In-Reply-To: <515D4253.5040205@asianux.com>
On Thu, 2013-04-04 at 17:05 +0800, Chen Gang wrote:
> > As far as I can see, this rather comes from
> > commit 475be4d85a274d0961593db41cf85689db1d583c
[]
> Joe Perches only beautified the code, not change the contents.
I modified alll those files using emacs
c-indent-line-or-region
which does a decent job in most cases
but emacs is easily confused.
^ permalink raw reply
* Re: this cpu documentation
From: Christoph Lameter @ 2013-04-04 14:41 UTC (permalink / raw)
To: Randy Dunlap; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo
In-Reply-To: <515F679E.8020203@infradead.org>
From: Christoph Lameter <cl@linux.com>
Subject: this_cpu: Add documentation V2
Document the rationale and the way to use this_cpu operations.
V2: Improved after feedback from Randy Dunlap
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/Documentation/this_cpu_ops
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/this_cpu_ops 2013-04-04 09:40:06.431946280 -0500
@@ -0,0 +1,197 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu variables
+associated with the *currently* executing processor
+through the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add a per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction operating
+on the per cpu variable.
+
+This meanthere are no atomicity issues between the calculation
+of the offset and the operation on the data. Therefore it is not necessary
+to disable preempt or interrupts to ensure that the processor is not changed
+between the calculation of the address and the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate without
+the typical synchronization overhead but still provide some sort of relaxed
+atomicity guarantee. The x86 for example can execute RMV (Read Modify Write)
+instructions like inc/dec/cmpxchg without the lock prefix and the
+associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu data
+specific to the currently executing processor. Only the current processor
+should be accessing that variable and therefore there are no concurirency
+issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the base of the per cpu area. It is
+then possible to simply use the segment override to relocate a per cpu relative address
+to the proper per cpu area for the processor. So the relocation to the per cpu base
+is encoded in the instruction via a segment register prefix.
+
+For example:
+
+ DEFINE_PER_CPU(int, x);
+ int z;
+
+ z = this_cpu_read(x);
+
+results in a single instruction
+
+ mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch from
+that address which occurs with the percpu operations. Before this_cpu_ops
+such sequence also required preempt disable/enable to prevent the kernel from
+moving the thread to a different processor while the calculation is performed.
+
+
+The main use of the this_cpu operations has been to optimize counter operations.
+
+
+ this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+ inc gs:[x]
+
+
+instead of the following operations required if there is no segment register.
+
+ int *y;
+ int cpu;
+
+ cpu = get_cpu();
+ y = per_cpu_ptr(&x, cpu);
+ (*y)++;
+ put_cpu();
+
+
+Note that these operations can only be used on percpu data that is reserved for
+a specific processor. Without disabling preemption in the surrounding code
+this_cpu_inc() will only guarantee that one of the percpu counters is correctly
+incremented. However, there is no guarantee that the OS will not move the process
+directly before or after the this_cpu instruction is executed. In general this
+means that the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value that is of
+interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache lines can
+be avoided if multiple processors concurrently go through the same code paths.
+Since each processor has its own per cpu variables no concurrent cacheline
+updates take place. The price that has to be paid for this optimization is
+the need to add up the per cpu counters when the value of the counter is
+needed.
+
+
+Special operations:
+-------------------
+
+ y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address of the
+per cpu variable that belongs to the currently executing processor.
+this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
+requires. No processor number is available. Instead the offset of the local
+per cpu area is simply added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu area. They do
+not have addresses although they look like that in the code. Offsets
+cannot be directly dereferenced. The offset must be added to a base pointer of
+a percpu area of a processor in order to form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu operations
+is invalid and will generally be treated like a NULL pointer dereference.
+
+In the context of per cpu operations
+
+ x is a per cpu variable. Most this_cpu operations take a cpu variable.
+
+ &x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
+ of a per cpu variable which makes this look a bit strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Let's say we have a percpu structure
+
+ struct s {
+ int n,m;
+ };
+
+ DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+ this_cpu_inc(p.m)
+
+ z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+ struct s __percpu *ps = &p;
+
+ z = this_cpu_dec(ps->m);
+
+ z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr() if we
+do not make use of this_cpu ops later to manipulate fields:
+
+ struct s *pp;
+
+ pp = this_cpu_ptr(&p);
+
+ pp->m--;
+
+ z = pp->n++;
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interrupt safe. Some architecture do not support these per
+cpu local operations. In that case the operation must be replaced by code
+that disables interrupts, then does the operations that are guaranteed to be
+atomic and then reenable interrupts. Doing so is expensive. If there are
+other reasons why the scheduler cannot change the processor we are executing
+on then there is no reason to disable interrupts. For that purpose
+the __this_cpu operations are provided. For example.
+
+ __this_cpu_inc(x);
+
+Will increment x and will not fallback to code that disables interrupts on
+platforms that cannot accomplish atomicity through address relocation and
+an Read-Modify-Write operation in the same instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then adds
+the offset of the n field.
+
+The second one first adds the two offsets and then does the relocation.
+IMHO the second form looks cleaner and has an easier time with (). The
+second form also is consistent with the way this_cpu_read() and friends
+are used.
+
+
+Christoph Lameter, April 3rd, 2013
+
^ permalink raw reply
* Re: [PERCPU] Remove & in front of this_cpu_ptr
From: Christoph Lameter @ 2013-04-04 15:02 UTC (permalink / raw)
To: Tejun Heo; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev
In-Reply-To: <20130404142526.GG9425@htj.dyndns.org>
On Thu, 4 Apr 2013, Tejun Heo wrote:
> I was thinking above this_cpu_*() ops. Let's make it as conspicious
> as reasonably possible. It's a similar problem with declaring per-cpu
> arrays - there are a couple ways to do it and there's no way to
> automatically reject the one which isn't preferred. I don't know.
> Maybe all we can do is periodic sweep through the source tree and fix
> up the "wrong" ones.
Both ways are working just fine. I'd like to use more of these though and
would like to tighten things up a bit before doing sweeps through the
kernel.
^ permalink raw reply
* Re: [net-next.git 2/7] stmmac: review barriers
From: Eric Dumazet @ 2013-04-04 15:08 UTC (permalink / raw)
To: Giuseppe CAVALLARO
Cc: Shiraz HASHIM, netdev@vger.kernel.org, Deepak Sikri,
sergei.shtylyov
In-Reply-To: <515D1863.2070801@st.com>
On Thu, 2013-04-04 at 08:06 +0200, Giuseppe CAVALLARO wrote:
> In fact, if we can demonstrate that barriers are needed no problem to
> keep them in the code. Otherwise I prefer to remove them.
>
> What do you think?
I think there are needed, and its really obvious.
Now maybe your arch can define wmb() as a pure compiler barrier(), but
thats a completely different patch.
^ permalink raw reply
* [RFC PATCH ipsec] xfrm: use the right dev to fill xdst
From: Nicolas Dichtel @ 2013-04-04 15:12 UTC (permalink / raw)
To: steffen.klassert, herbert, davem; +Cc: netdev, dbaluta, Nicolas Dichtel
Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
interface, hence in6_dev_get(dev) returns NULL.
After looking again into commit 25ee3286dcbc ([IPSEC]: Merge common code into
xfrm_bundle_create), it seems that previously we were using dev from the route,
for both IPv4 and IPv6.
In fact, xfrm_fill_dst() is called during a loop on chained dst, but dev points
always to the same device.
By analogy, I made the same change for IPv4 side (only IPv6 part is tested).
Reported-by: Daniel Baluta <dbaluta@ixiacom.com>
Tested-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
This patch is only a RFC, it needs more tests. Any comments/help is welcome to
understand if the patch do the right thing or if the bug if somewere else.
If the patch is correct, I can also remove the argument dev from
xfrm[4|6]_fill_dst, because it will not be used anymore.
FYI, the initial thread for commit bc8e4b954e46 can be found here:
http://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274817
net/ipv4/xfrm4_policy.c | 4 ++--
net/ipv6/xfrm6_policy.c | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 9a459be..3cffae9 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -81,8 +81,8 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
xdst->u.rt.rt_iif = fl4->flowi4_iif;
- xdst->u.dst.dev = dev;
- dev_hold(dev);
+ xdst->u.dst.dev = rt->dst.dev;
+ dev_hold(rt->dst.dev);
/* Sheit... I remember I did this right. Apparently,
* it was magically lost, so this code needs audit */
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 4ef7bdb..680b890 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -99,10 +99,10 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
{
struct rt6_info *rt = (struct rt6_info*)xdst->route;
- xdst->u.dst.dev = dev;
- dev_hold(dev);
+ xdst->u.dst.dev = rt->dst.dev;
+ dev_hold(rt->dst.dev);
- xdst->u.rt6.rt6i_idev = in6_dev_get(dev);
+ xdst->u.rt6.rt6i_idev = in6_dev_get(rt->dst.dev);
if (!xdst->u.rt6.rt6i_idev)
return -ENODEV;
--
1.8.0.1
^ permalink raw reply related
* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Jiri Pirko @ 2013-04-04 15:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <1364920077.5113.185.camel@edumazet-glaptop>
Tue, Apr 02, 2013 at 06:27:57PM CEST, eric.dumazet@gmail.com wrote:
>On Tue, 2013-04-02 at 12:12 -0400, David Miller wrote:
>
>> Note that we have a few spots now that do two synchronize_net()'s per
>> opertaion, such as team port removal, and openvswitch has such a path
>> as well. They all are of the form:
>>
>> netdev_rx_handler_unregister()
>> ...
>> lots of other stuff
>> ...
>> synchronize_net();
>>
>> So might be harder to factor back out than this br_if.c case.
>>
>
>Strange, I do see call_rcu() in openvswitch, not a synchronize_{net|
>rcu}(). Probably OK to leave as is, as it's not a big deal.
>
>I'll let Jiri handle the team driver change, as its not clear what
>synchronize_rcu() call in team_port_del() is protecting
It can be converted now to call_rcu. synchronize_rcu is making sure
no packet is in flight when changing modes.
>
>
>
^ permalink raw reply
* Re: [PATCH 1/5] net: Add EMAC ethernet driver found on Allwinner A10 SoC's
From: Stefan Roese @ 2013-04-04 15:37 UTC (permalink / raw)
To: Florian Fainelli
Cc: linux-arm-kernel, Maxime Ripard, linux-doc, Alejandro Mery,
netdev, devicetree-discuss, linux-kernel, Rob Herring,
Grant Likely, Rob Landley, sunny, shuge, kevin
In-Reply-To: <201303242003.52827.florian@openwrt.org>
Hi Florian,
On 24.03.2013 20:03, Florian Fainelli wrote:
> Your phylib implementation looks good now, just some minor comments below:
Thanks for the review. I'll try to address your new comments in a few
days (currently swamped).
Thanks,
Stefan
^ permalink raw reply
* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Eric Dumazet @ 2013-04-04 15:44 UTC (permalink / raw)
To: Jiri Pirko; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <20130404153530.GA1688@minipsycho.brq.redhat.com>
On Thu, 2013-04-04 at 17:35 +0200, Jiri Pirko wrote:
> It can be converted now to call_rcu. synchronize_rcu is making sure
> no packet is in flight when changing modes.
What changes exactly ? You don't really answer to my question with this
very vague sentence.
Because maybe the synchronize_net() in netdev_rx_handler_unregister()
is enough and you dont even need the call_rcu(). Thats was my question.
RCU barriers are not magical things we add when we are not exactly sure
of what is happening.
Like other barriers (wmb(), smb_wmb(), ...) we should document or
understand why they are needed.
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: ipv6: add tokenized interface identifier support
From: Hannes Frederic Sowa @ 2013-04-04 15:58 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: davem, netdev, YOSHIFUJI Hideaki
In-Reply-To: <1365086258-4512-2-git-send-email-dborkman@redhat.com>
On Thu, Apr 04, 2013 at 04:37:37PM +0200, Daniel Borkmann wrote:
> This patch adds support for tokenized IIDs, that allow for
> administrators to assign well-known host-part addresses to
> nodes whilst still obtaining global network prefix from
> Router Advertisements. It is currently in IETF RFC draft
> status [1]:
>
> The primary target for such support is server platforms
> where addresses are usually manually configured, rather
> than using DHCPv6 or SLAAC. By using tokenised identifiers,
> hosts can still determine their network prefix by use of
> SLAAC, but more readily be automatically renumbered should
> their network prefix change.
>
> [1] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
>
> The implementation is partially based on top of Mark K.
> Thompson's proof of concept. Successfully tested by myself.
Cool, this looks really useful.
One comment so far:
> +#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16)
> +
I think we should not export this macro but instead...
> + /* Well, that's kinda nasty ... */
> + list_for_each_entry(ifp, &idev->addr_list, if_list) {
> + spin_lock(&ifp->lock);
> + if (__ipv6_addr_type(&ifp->addr) &
> + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)) {
...use
if (ipv6_addr_src_scope(&ifp->addr) == IPV6_ADDR_SCOPE_GLOBAL) {
here.
> diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
> index d051e5f..8b723de 100644
> --- a/net/ipv6/addrconf_core.c
> +++ b/net/ipv6/addrconf_core.c
> @@ -6,8 +6,6 @@
> #include <linux/export.h>
> #include <net/ipv6.h>
>
> -#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16)
> -
This hunk can be dropped then.
Thanks,
Hannes
^ permalink raw reply
* Re: [PATCH net-next 3/3] net/mlx4_en: Enable open-lldp DCB support
From: John Fastabend @ 2013-04-04 15:58 UTC (permalink / raw)
To: Or Gerlitz, Sagi Grimberg; +Cc: davem, netdev, amirv
In-Reply-To: <1365085574-12057-4-git-send-email-ogerlitz@mellanox.com>
On 4/4/2013 7:26 AM, Or Gerlitz wrote:
> From: Sagi Grimberg <sagig@mellanox.com>
>
> The lldpad daemon queries the driver caps via the getcaps and getstate
> routines. Added the prpoer dbcnl_ops entries to support that.
>
> Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
Does lldpad work now with the mlx4_en driver?
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
^ permalink raw reply
* [PATCH] lib80211: make lib80211 can be enabled independently
From: Wang YanQing @ 2013-04-04 16:01 UTC (permalink / raw)
To: johannes; +Cc: linux-wireless, netdev, linux-kernel
Current we can only enable lib80211 by enable a driver
in tree use it which will select it, but some out tree's
drivers also use it, so I think it has sense to make lib80211
can be enabled independently.
A example of the out tree's drivers use lib80211 is:
hybird driver(wl) for Broadcom Corporation BCM43225 802.11b/g/n
Signed-off-by: Wang YanQing <udknight@gmail.com>
---
net/wireless/Kconfig | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig
index 16d08b3..6e83f0a 100644
--- a/net/wireless/Kconfig
+++ b/net/wireless/Kconfig
@@ -140,22 +140,34 @@ config CFG80211_WEXT
extensions with cfg80211-based drivers.
config LIB80211
- tristate
+ tristate "common routines used by IEEE802.11 wireless LAN drivers"
default n
help
This options enables a library of common routines used
by IEEE802.11 wireless LAN drivers.
- Drivers should select this themselves if needed.
+ Drivers could select this themselves if needed.
config LIB80211_CRYPT_WEP
- tristate
+ tristate "host-based WEP encryption implementation for lib80211"
+ depends on LIB80211
+ default n
+ ---help---
+ host-based WEP encryption implementation for lib80211
config LIB80211_CRYPT_CCMP
- tristate
+ tristate "host-based CCMP encryption implementation for lib80211"
+ depends on LIB80211
+ default n
+ ---help---
+ host-based CCMP encryption implementation for lib80211
config LIB80211_CRYPT_TKIP
- tristate
+ tristate "host-based TKIP encryption implementation for lib80211"
+ depends on LIB80211
+ default n
+ ---help---
+ host-based TKIP encryption implementation for lib80211
config LIB80211_DEBUG
bool "lib80211 debugging messages"
--
1.7.12.4.dirty
^ permalink raw reply related
* Re: [PATCH net-next 1/2] net: ipv6: add tokenized interface identifier support
From: Daniel Borkmann @ 2013-04-04 16:02 UTC (permalink / raw)
To: hannes; +Cc: davem, netdev, YOSHIFUJI Hideaki
In-Reply-To: <20130404155837.GA23056@order.stressinduktion.org>
On 04/04/2013 05:58 PM, Hannes Frederic Sowa wrote:
> On Thu, Apr 04, 2013 at 04:37:37PM +0200, Daniel Borkmann wrote:
>> This patch adds support for tokenized IIDs, that allow for
>> administrators to assign well-known host-part addresses to
>> nodes whilst still obtaining global network prefix from
>> Router Advertisements. It is currently in IETF RFC draft
>> status [1]:
>>
>> The primary target for such support is server platforms
>> where addresses are usually manually configured, rather
>> than using DHCPv6 or SLAAC. By using tokenised identifiers,
>> hosts can still determine their network prefix by use of
>> SLAAC, but more readily be automatically renumbered should
>> their network prefix change.
>>
>> [1] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
>>
>> The implementation is partially based on top of Mark K.
>> Thompson's proof of concept. Successfully tested by myself.
>
> Cool, this looks really useful.
>
> One comment so far:
>
>> +#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16)
>> +
>
> I think we should not export this macro but instead...
>
>> + /* Well, that's kinda nasty ... */
>> + list_for_each_entry(ifp, &idev->addr_list, if_list) {
>> + spin_lock(&ifp->lock);
>> + if (__ipv6_addr_type(&ifp->addr) &
>> + IPV6_ADDR_SCOPE_TYPE(IPV6_ADDR_SCOPE_GLOBAL)) {
>
> ...use
>
> if (ipv6_addr_src_scope(&ifp->addr) == IPV6_ADDR_SCOPE_GLOBAL) {
>
> here.
>
>> diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
>> index d051e5f..8b723de 100644
>> --- a/net/ipv6/addrconf_core.c
>> +++ b/net/ipv6/addrconf_core.c
>> @@ -6,8 +6,6 @@
>> #include <linux/export.h>
>> #include <net/ipv6.h>
>>
>> -#define IPV6_ADDR_SCOPE_TYPE(scope) ((scope) << 16)
>> -
>
> This hunk can be dropped then.
Thanks for the review Hannes, I'll do that in a version 2 of the set.
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH net-next] bridge: remove a redundant synchronize_net()
From: Eric Dumazet @ 2013-04-04 16:03 UTC (permalink / raw)
To: Jiri Pirko; +Cc: David Miller, vfalico, netdev, stephen
In-Reply-To: <1365090287.3308.3.camel@edumazet-glaptop>
On Thu, 2013-04-04 at 08:44 -0700, Eric Dumazet wrote:
> Because maybe the synchronize_net() in netdev_rx_handler_unregister()
> is enough and you dont even need the call_rcu(). Thats was my question.
So we have the following sequence in team_port_del()
netdev_rx_handler_unregister(port_dev);
netdev_upper_dev_unlink(port_dev, dev);
team_port_disable_netpoll(port);
vlan_vids_del_by_dev(port_dev, dev);
dev_uc_unsync(port_dev, dev);
dev_mc_unsync(port_dev, dev);
dev_close(port_dev);
team_port_leave(team, port);
__team_option_inst_mark_removed_port(team, port);
__team_options_change_check(team);
__team_option_inst_del_port(team, port);
__team_port_change_port_removed(port);
team_port_set_orig_dev_addr(port);
dev_set_mtu(port_dev, port->orig.mtu);
synchronize_rcu();
kfree(port);
And I suspect we can remove synchronize_rcu() call.
But as this is a long list of operations, maybe some of them requires
the rcu grace period before kfree(port)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox