Linux userland API discussions
 help / color / mirror / Atom feed
* [PATCH net-next v2 07/17] net: ipvlan: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 4f4099d..79e3516 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -342,12 +342,12 @@ static const struct header_ops ipvlan_header_ops = {
 	.cache_update	= eth_header_cache_update,
 };
 
-static int ipvlan_ethtool_get_settings(struct net_device *dev,
-				       struct ethtool_cmd *cmd)
+static int ipvlan_ethtool_get_ksettings(struct net_device *dev,
+					struct ethtool_ksettings *cmd)
 {
 	const struct ipvl_dev *ipvlan = netdev_priv(dev);
 
-	return __ethtool_get_settings(ipvlan->phy_dev, cmd);
+	return __ethtool_get_ksettings(ipvlan->phy_dev, cmd);
 }
 
 static void ipvlan_ethtool_get_drvinfo(struct net_device *dev,
@@ -373,7 +373,7 @@ static void ipvlan_ethtool_set_msglevel(struct net_device *dev, u32 value)
 
 static const struct ethtool_ops ipvlan_ethtool_ops = {
 	.get_link	= ethtool_op_get_link,
-	.get_settings	= ipvlan_ethtool_get_settings,
+	.get_ksettings	= ipvlan_ethtool_get_ksettings,
 	.get_drvinfo	= ipvlan_ethtool_get_drvinfo,
 	.get_msglevel	= ipvlan_ethtool_get_msglevel,
 	.set_msglevel	= ipvlan_ethtool_set_msglevel,
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 08/17] net: macvlan: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/macvlan.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 1df38bd..ce934c3 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -925,12 +925,12 @@ static void macvlan_ethtool_get_drvinfo(struct net_device *dev,
 	strlcpy(drvinfo->version, "0.1", sizeof(drvinfo->version));
 }
 
-static int macvlan_ethtool_get_settings(struct net_device *dev,
-					struct ethtool_cmd *cmd)
+static int macvlan_ethtool_get_ksettings(struct net_device *dev,
+					 struct ethtool_ksettings *cmd)
 {
 	const struct macvlan_dev *vlan = netdev_priv(dev);
 
-	return __ethtool_get_settings(vlan->lowerdev, cmd);
+	return __ethtool_get_ksettings(vlan->lowerdev, cmd);
 }
 
 static netdev_features_t macvlan_fix_features(struct net_device *dev,
@@ -998,7 +998,7 @@ static void macvlan_dev_netpoll_cleanup(struct net_device *dev)
 
 static const struct ethtool_ops macvlan_ethtool_ops = {
 	.get_link		= ethtool_op_get_link,
-	.get_settings		= macvlan_ethtool_get_settings,
+	.get_ksettings		= macvlan_ethtool_get_ksettings,
 	.get_drvinfo		= macvlan_ethtool_get_drvinfo,
 };
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 09/17] net: team: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/team/team.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 0e62274..85196c5 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2770,12 +2770,12 @@ static void __team_port_change_send(struct team_port *port, bool linkup)
 	port->state.linkup = linkup;
 	team_refresh_port_linkup(port);
 	if (linkup) {
-		struct ethtool_cmd ecmd;
+		struct ethtool_ksettings ecmd;
 
-		err = __ethtool_get_settings(port->dev, &ecmd);
+		err = __ethtool_get_ksettings(port->dev, &ecmd);
 		if (!err) {
-			port->state.speed = ethtool_cmd_speed(&ecmd);
-			port->state.duplex = ecmd.duplex;
+			port->state.speed = ecmd.parent.speed;
+			port->state.duplex = ecmd.parent.duplex;
 			goto send_event;
 		}
 	}
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 10/17] net: fcoe: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/scsi/fcoe/fcoe_transport.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/fcoe/fcoe_transport.c b/drivers/scsi/fcoe/fcoe_transport.c
index bdc8989..6097f0d 100644
--- a/drivers/scsi/fcoe/fcoe_transport.c
+++ b/drivers/scsi/fcoe/fcoe_transport.c
@@ -93,36 +93,40 @@ static struct notifier_block libfcoe_notifier = {
 int fcoe_link_speed_update(struct fc_lport *lport)
 {
 	struct net_device *netdev = fcoe_get_netdev(lport);
-	struct ethtool_cmd ecmd;
+	struct ethtool_ksettings ecmd;
 
-	if (!__ethtool_get_settings(netdev, &ecmd)) {
+	if (!__ethtool_get_ksettings(netdev, &ecmd)) {
 		lport->link_supported_speeds &= ~(FC_PORTSPEED_1GBIT  |
 		                                  FC_PORTSPEED_10GBIT |
 		                                  FC_PORTSPEED_20GBIT |
 		                                  FC_PORTSPEED_40GBIT);
 
-		if (ecmd.supported & (SUPPORTED_1000baseT_Half |
-		                      SUPPORTED_1000baseT_Full |
-		                      SUPPORTED_1000baseKX_Full))
+		if (ecmd.link_modes.supported.mask[0] & (
+			    SUPPORTED_1000baseT_Half |
+			    SUPPORTED_1000baseT_Full |
+			    SUPPORTED_1000baseKX_Full))
 			lport->link_supported_speeds |= FC_PORTSPEED_1GBIT;
 
-		if (ecmd.supported & (SUPPORTED_10000baseT_Full   |
-		                      SUPPORTED_10000baseKX4_Full |
-		                      SUPPORTED_10000baseKR_Full  |
-		                      SUPPORTED_10000baseR_FEC))
+		if (ecmd.link_modes.supported.mask[0] & (
+			    SUPPORTED_10000baseT_Full   |
+			    SUPPORTED_10000baseKX4_Full |
+			    SUPPORTED_10000baseKR_Full  |
+			    SUPPORTED_10000baseR_FEC))
 			lport->link_supported_speeds |= FC_PORTSPEED_10GBIT;
 
-		if (ecmd.supported & (SUPPORTED_20000baseMLD2_Full |
-		                      SUPPORTED_20000baseKR2_Full))
+		if (ecmd.link_modes.supported.mask[0] & (
+			    SUPPORTED_20000baseMLD2_Full |
+			    SUPPORTED_20000baseKR2_Full))
 			lport->link_supported_speeds |= FC_PORTSPEED_20GBIT;
 
-		if (ecmd.supported & (SUPPORTED_40000baseKR4_Full |
-		                      SUPPORTED_40000baseCR4_Full |
-		                      SUPPORTED_40000baseSR4_Full |
-		                      SUPPORTED_40000baseLR4_Full))
+		if (ecmd.link_modes.supported.mask[0] & (
+			    SUPPORTED_40000baseKR4_Full |
+			    SUPPORTED_40000baseCR4_Full |
+			    SUPPORTED_40000baseSR4_Full |
+			    SUPPORTED_40000baseLR4_Full))
 			lport->link_supported_speeds |= FC_PORTSPEED_40GBIT;
 
-		switch (ethtool_cmd_speed(&ecmd)) {
+		switch (ecmd.parent.speed) {
 		case SPEED_1000:
 			lport->link_speed = FC_PORTSPEED_1GBIT;
 			break;
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 11/17] net: rdma: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 include/rdma/ib_addr.h | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index ce55906..782bb8c 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -241,24 +241,22 @@ static inline enum ib_mtu iboe_get_mtu(int mtu)
 
 static inline int iboe_get_rate(struct net_device *dev)
 {
-	struct ethtool_cmd cmd;
-	u32 speed;
+	struct ethtool_ksettings cmd;
 	int err;
 
 	rtnl_lock();
-	err = __ethtool_get_settings(dev, &cmd);
+	err = __ethtool_get_ksettings(dev, &cmd);
 	rtnl_unlock();
 	if (err)
 		return IB_RATE_PORT_CURRENT;
 
-	speed = ethtool_cmd_speed(&cmd);
-	if (speed >= 40000)
+	if (cmd.parent.speed >= 40000)
 		return IB_RATE_40_GBPS;
-	else if (speed >= 30000)
+	else if (cmd.parent.speed >= 30000)
 		return IB_RATE_30_GBPS;
-	else if (speed >= 20000)
+	else if (cmd.parent.speed >= 20000)
 		return IB_RATE_20_GBPS;
-	else if (speed >= 10000)
+	else if (cmd.parent.speed >= 10000)
 		return IB_RATE_10_GBPS;
 	else
 		return IB_RATE_PORT_CURRENT;
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 12/17] net: 8021q: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 net/8021q/vlan_dev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 1189564..b301bf6 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -655,12 +655,12 @@ static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
 	return features;
 }
 
-static int vlan_ethtool_get_settings(struct net_device *dev,
-				     struct ethtool_cmd *cmd)
+static int vlan_ethtool_get_ksettings(struct net_device *dev,
+				      struct ethtool_ksettings *cmd)
 {
 	const struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
 
-	return __ethtool_get_settings(vlan->real_dev, cmd);
+	return __ethtool_get_ksettings(vlan->real_dev, cmd);
 }
 
 static void vlan_ethtool_get_drvinfo(struct net_device *dev,
@@ -768,7 +768,7 @@ static void vlan_dev_netpoll_cleanup(struct net_device *dev)
 #endif /* CONFIG_NET_POLL_CONTROLLER */
 
 static const struct ethtool_ops vlan_ethtool_ops = {
-	.get_settings	        = vlan_ethtool_get_settings,
+	.get_ksettings	        = vlan_ethtool_get_ksettings,
 	.get_drvinfo	        = vlan_ethtool_get_drvinfo,
 	.get_link		= ethtool_op_get_link,
 	.get_ts_info		= vlan_ethtool_get_ts_info,
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 13/17] net: bridge: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 net/bridge/br_if.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index b087d27..57282a5 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -35,10 +35,10 @@
  */
 static int port_cost(struct net_device *dev)
 {
-	struct ethtool_cmd ecmd;
+	struct ethtool_ksettings ecmd;
 
-	if (!__ethtool_get_settings(dev, &ecmd)) {
-		switch (ethtool_cmd_speed(&ecmd)) {
+	if (!__ethtool_get_ksettings(dev, &ecmd)) {
+		switch (ecmd.parent.speed) {
 		case SPEED_10000:
 			return 2;
 		case SPEED_1000:
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 14/17] net: core: use __ethtool_get_ksettings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	fcoe-devel-s9riP+hp16TNLxjTenLetw
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: David Decotigny <decot-Ypc/8FJVVoBWk0Htik3J/w@public.gmane.org>

Signed-off-by: David Decotigny <decot-Ypc/8FJVVoBWk0Htik3J/w@public.gmane.org>
---
 net/core/net-sysfs.c   | 15 +++++++++------
 net/packet/af_packet.c | 11 +++++------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9993412..f54ae21 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -191,9 +191,10 @@ static ssize_t speed_show(struct device *dev,
 		return restart_syscall();
 
 	if (netif_running(netdev)) {
-		struct ethtool_cmd cmd;
-		if (!__ethtool_get_settings(netdev, &cmd))
-			ret = sprintf(buf, fmt_udec, ethtool_cmd_speed(&cmd));
+		struct ethtool_ksettings cmd;
+
+		if (!__ethtool_get_ksettings(netdev, &cmd))
+			ret = sprintf(buf, fmt_udec, cmd.parent.speed);
 	}
 	rtnl_unlock();
 	return ret;
@@ -210,10 +211,12 @@ static ssize_t duplex_show(struct device *dev,
 		return restart_syscall();
 
 	if (netif_running(netdev)) {
-		struct ethtool_cmd cmd;
-		if (!__ethtool_get_settings(netdev, &cmd)) {
+		struct ethtool_ksettings cmd;
+
+		if (!__ethtool_get_ksettings(netdev, &cmd)) {
 			const char *duplex;
-			switch (cmd.duplex) {
+
+			switch (cmd.parent.duplex) {
 			case DUPLEX_HALF:
 				duplex = "half";
 				break;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9c28cec..fe2fcbe 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -554,9 +554,8 @@ static int prb_calc_retire_blk_tmo(struct packet_sock *po,
 {
 	struct net_device *dev;
 	unsigned int mbits = 0, msec = 0, div = 0, tmo = 0;
-	struct ethtool_cmd ecmd;
+	struct ethtool_ksettings ecmd;
 	int err;
-	u32 speed;
 
 	rtnl_lock();
 	dev = __dev_get_by_index(sock_net(&po->sk), po->ifindex);
@@ -564,19 +563,19 @@ static int prb_calc_retire_blk_tmo(struct packet_sock *po,
 		rtnl_unlock();
 		return DEFAULT_PRB_RETIRE_TOV;
 	}
-	err = __ethtool_get_settings(dev, &ecmd);
-	speed = ethtool_cmd_speed(&ecmd);
+	err = __ethtool_get_ksettings(dev, &ecmd);
 	rtnl_unlock();
 	if (!err) {
 		/*
 		 * If the link speed is so slow you don't really
 		 * need to worry about perf anyways
 		 */
-		if (speed < SPEED_1000 || speed == SPEED_UNKNOWN) {
+		if (ecmd.parent.speed < SPEED_1000 ||
+		    ecmd.parent.speed == SPEED_UNKNOWN) {
 			return DEFAULT_PRB_RETIRE_TOV;
 		} else {
 			msec = 1;
-			div = speed / 1000;
+			div = ecmd.parent.speed / 1000;
 		}
 	}
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 15/17] net: ethtool: remove unused __ethtool_get_settings
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	fcoe-devel-s9riP+hp16TNLxjTenLetw
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: David Decotigny <decot-Ypc/8FJVVoBWk0Htik3J/w@public.gmane.org>

replaced by __ethtool_get_ksettings.

Signed-off-by: David Decotigny <decot-Ypc/8FJVVoBWk0Htik3J/w@public.gmane.org>
---
 include/linux/ethtool.h |  4 ----
 net/core/ethtool.c      | 49 ++++++++++++++-----------------------------------
 2 files changed, 14 insertions(+), 39 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 49881b6..d9c8fab 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -160,10 +160,6 @@ __ethtool_add_link_modes(ethtool_link_mode_mask_t *dst,
 extern int __ethtool_get_ksettings(struct net_device *dev,
 				   struct ethtool_ksettings *ksettings);
 
-/* DEPRECATED, use __ethtool_get_ksettings */
-extern int __ethtool_get_settings(struct net_device *dev,
-				  struct ethtool_cmd *cmd);
-
 /**
  * struct ethtool_ops - optional netdev operations
  * @get_settings: DEPRECATED, use %get_ksettings/%set_ksettings
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 11cb1c9..2b432ef 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -487,15 +487,16 @@ int __ethtool_get_ksettings(struct net_device *dev,
 		return dev->ethtool_ops->get_ksettings(dev, ksettings);
 	}
 
-	/* TODO: remove what follows when ethtool_ops::get_settings
-	 * disappears internally
-	 */
-
 	/* driver doesn't support %ethtool_ksettings API. revert to
 	 * legacy %ethtool_cmd API, unless it's not supported either.
 	 * TODO: remove when ethtool_ops::get_settings disappears internally
 	 */
-	err = __ethtool_get_settings(dev, &cmd);
+	if (!dev->ethtool_ops->get_settings)
+		return -EOPNOTSUPP;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.cmd = ETHTOOL_GSET;
+	err = dev->ethtool_ops->get_settings(dev, &cmd);
 	if (err < 0)
 		return err;
 
@@ -711,30 +712,6 @@ static int ethtool_set_ksettings(struct net_device *dev, void __user *useraddr)
 	return dev->ethtool_ops->set_ksettings(dev, &ksettings);
 }
 
-/* Internal kernel helper to query a device ethtool_cmd settings.
- *
- * Note about transition to ethtool_settings API: We do not need (or
- * want) this function to support "dev" instances that implement the
- * ethtool_settings API as we will update the drivers calling this
- * function to call __ethtool_get_ksettings instead, before the first
- * drivers implement ethtool_ops::get_ksettings.
- *
- * TODO 1: at least make this function static when no driver is using it
- * TODO 2: remove when ethtool_ops::get_settings disappears internally
- */
-int __ethtool_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
-{
-	ASSERT_RTNL();
-
-	if (!dev->ethtool_ops->get_settings)
-		return -EOPNOTSUPP;
-
-	memset(cmd, 0, sizeof(struct ethtool_cmd));
-	cmd->cmd = ETHTOOL_GSET;
-	return dev->ethtool_ops->get_settings(dev, cmd);
-}
-EXPORT_SYMBOL(__ethtool_get_settings);
-
 /* Query device for its ethtool_cmd settings.
  *
  * Backward compatibility note: for compatibility with legacy ethtool,
@@ -776,16 +753,18 @@ static int ethtool_get_settings(struct net_device *dev, void __user *useraddr)
 		/* send a sensible cmd tag back to user */
 		cmd.cmd = ETHTOOL_GSET;
 	} else {
-		int err;
-		/* TODO: return -EOPNOTSUPP when
-		 * ethtool_ops::get_settings disappears internally
-		 */
-
 		/* driver doesn't support %ethtool_ksettings
 		 * API. revert to legacy %ethtool_cmd API, unless it's
 		 * not supported either.
 		 */
-		err = __ethtool_get_settings(dev, &cmd);
+		int err;
+
+		if (!dev->ethtool_ops->get_settings)
+			return -EOPNOTSUPP;
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.cmd = ETHTOOL_GSET;
+		err = dev->ethtool_ops->get_settings(dev, &cmd);
 		if (err < 0)
 			return err;
 	}
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 16/17] net: mlx4: identify predicate for debug messages
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai, linux-kernel, netdev,
	linux-api, linux-mips, fcoe-devel
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig@gmail.com>

From: David Decotigny <decot@googlers.com>

Signed-off-by: David Decotigny <decot@googlers.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 944a112..96bf728 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -869,9 +869,11 @@ __printf(3, 4)
 void en_print(const char *level, const struct mlx4_en_priv *priv,
 	      const char *format, ...);
 
+#define en_dbg_enabled(mlevel, priv)		\
+	(NETIF_MSG_##mlevel & (priv)->msg_enable)
 #define en_dbg(mlevel, priv, format, ...)				\
 do {									\
-	if (NETIF_MSG_##mlevel & (priv)->msg_enable)			\
+	if (en_dbg_enabled(mlevel, priv))				\
 		en_print(KERN_DEBUG, priv, format, ##__VA_ARGS__);	\
 } while (0)
 #define en_warn(priv, format, ...)					\
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* [PATCH net-next v2 17/17] net: mlx4: use new ETHTOOL_G/SSETTINGS API
From: David Decotigny @ 2015-02-04 19:53 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings, Amir Vadai,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	fcoe-devel-s9riP+hp16TNLxjTenLetw
  Cc: Eric Dumazet, Eugenia Emantayev, Or Gerlitz, Ido Shamay,
	Joe Perches, Saeed Mahameed, Govindarajulu Varadarajan,
	Venkata Duvvuru, Jeff Kirsher, Eyal Perry, Pravin B Shelar,
	Ed Swierk, Upinder Malhi, Robert Love, James E.J. Bottomley,
	David Decotigny
In-Reply-To: <1423079621-1374-1-git-send-email-ddecotig-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: David Decotigny <decot-Ypc/8FJVVoBWk0Htik3J/w@public.gmane.org>

Signed-off-by: David Decotigny <decot-Ypc/8FJVVoBWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 328 ++++++++++++------------
 drivers/net/ethernet/mellanox/mlx4/en_main.c    |   1 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |   1 +
 3 files changed, 162 insertions(+), 168 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index a7b58ba..1437695 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -388,34 +388,30 @@ static u32 mlx4_en_autoneg_get(struct net_device *dev)
 	return autoneg;
 }
 
-static u32 ptys_get_supported_port(struct mlx4_ptys_reg *ptys_reg)
+static void ptys2ethtool_update_supported_port(ethtool_link_mode_mask_t *mask,
+					       struct mlx4_ptys_reg *ptys_reg)
 {
 	u32 eth_proto = be32_to_cpu(ptys_reg->eth_proto_cap);
 
 	if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_T)
 			 | MLX4_PROT_MASK(MLX4_1000BASE_T)
 			 | MLX4_PROT_MASK(MLX4_100BASE_TX))) {
-			return SUPPORTED_TP;
-	}
-
-	if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_CR)
+		ethtool_add_link_modes(mask, ETHTOOL_LINK_MODE_TP_BIT);
+	} else if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_CR)
 			 | MLX4_PROT_MASK(MLX4_10GBASE_SR)
 			 | MLX4_PROT_MASK(MLX4_56GBASE_SR4)
 			 | MLX4_PROT_MASK(MLX4_40GBASE_CR4)
 			 | MLX4_PROT_MASK(MLX4_40GBASE_SR4)
 			 | MLX4_PROT_MASK(MLX4_1000BASE_CX_SGMII))) {
-			return SUPPORTED_FIBRE;
-	}
-
-	if (eth_proto & (MLX4_PROT_MASK(MLX4_56GBASE_KR4)
+		ethtool_add_link_modes(mask, ETHTOOL_LINK_MODE_FIBRE_BIT);
+	} else if (eth_proto & (MLX4_PROT_MASK(MLX4_56GBASE_KR4)
 			 | MLX4_PROT_MASK(MLX4_40GBASE_KR4)
 			 | MLX4_PROT_MASK(MLX4_20GBASE_KR2)
 			 | MLX4_PROT_MASK(MLX4_10GBASE_KR)
 			 | MLX4_PROT_MASK(MLX4_10GBASE_KX4)
 			 | MLX4_PROT_MASK(MLX4_1000BASE_KX))) {
-			return SUPPORTED_Backplane;
+		ethtool_add_link_modes(mask, ETHTOOL_LINK_MODE_Backplane_BIT);
 	}
-	return 0;
 }
 
 static u32 ptys_get_active_port(struct mlx4_ptys_reg *ptys_reg)
@@ -461,122 +457,91 @@ static u32 ptys_get_active_port(struct mlx4_ptys_reg *ptys_reg)
 enum ethtool_report {
 	SUPPORTED = 0,
 	ADVERTISED = 1,
-	SPEED = 2
 };
 
+struct ptys2ethtool_config {
+	ethtool_link_mode_mask_t link_modes[2];  /* SUPPORTED/ADVERTISED */
+	u32 speed;
+};
+
+#define MLX4_BUILD_PTYS2ETHTOOL_CONFIG(reg_, speed_, ...)		\
+	({								\
+		struct ptys2ethtool_config *cfg;			\
+		cfg = &ptys2ethtool_map[reg_];				\
+		cfg->speed = speed_;					\
+		ethtool_build_link_mode(&cfg->link_modes[SUPPORTED],	\
+					__VA_ARGS__);			\
+		ethtool_build_link_mode(&cfg->link_modes[ADVERTISED],	\
+					__VA_ARGS__);			\
+	})
+
 /* Translates mlx4 link mode to equivalent ethtool Link modes/speed */
-static u32 ptys2ethtool_map[MLX4_LINK_MODES_SZ][3] = {
-	[MLX4_100BASE_TX] = {
-		SUPPORTED_100baseT_Full,
-		ADVERTISED_100baseT_Full,
-		SPEED_100
-		},
-
-	[MLX4_1000BASE_T] = {
-		SUPPORTED_1000baseT_Full,
-		ADVERTISED_1000baseT_Full,
-		SPEED_1000
-		},
-	[MLX4_1000BASE_CX_SGMII] = {
-		SUPPORTED_1000baseKX_Full,
-		ADVERTISED_1000baseKX_Full,
-		SPEED_1000
-		},
-	[MLX4_1000BASE_KX] = {
-		SUPPORTED_1000baseKX_Full,
-		ADVERTISED_1000baseKX_Full,
-		SPEED_1000
-		},
-
-	[MLX4_10GBASE_T] = {
-		SUPPORTED_10000baseT_Full,
-		ADVERTISED_10000baseT_Full,
-		SPEED_10000
-		},
-	[MLX4_10GBASE_CX4] = {
-		SUPPORTED_10000baseKX4_Full,
-		ADVERTISED_10000baseKX4_Full,
-		SPEED_10000
-		},
-	[MLX4_10GBASE_KX4] = {
-		SUPPORTED_10000baseKX4_Full,
-		ADVERTISED_10000baseKX4_Full,
-		SPEED_10000
-		},
-	[MLX4_10GBASE_KR] = {
-		SUPPORTED_10000baseKR_Full,
-		ADVERTISED_10000baseKR_Full,
-		SPEED_10000
-		},
-	[MLX4_10GBASE_CR] = {
-		SUPPORTED_10000baseKR_Full,
-		ADVERTISED_10000baseKR_Full,
-		SPEED_10000
-		},
-	[MLX4_10GBASE_SR] = {
-		SUPPORTED_10000baseKR_Full,
-		ADVERTISED_10000baseKR_Full,
-		SPEED_10000
-		},
-
-	[MLX4_20GBASE_KR2] = {
-		SUPPORTED_20000baseMLD2_Full | SUPPORTED_20000baseKR2_Full,
-		ADVERTISED_20000baseMLD2_Full | ADVERTISED_20000baseKR2_Full,
-		SPEED_20000
-		},
-
-	[MLX4_40GBASE_CR4] = {
-		SUPPORTED_40000baseCR4_Full,
-		ADVERTISED_40000baseCR4_Full,
-		SPEED_40000
-		},
-	[MLX4_40GBASE_KR4] = {
-		SUPPORTED_40000baseKR4_Full,
-		ADVERTISED_40000baseKR4_Full,
-		SPEED_40000
-		},
-	[MLX4_40GBASE_SR4] = {
-		SUPPORTED_40000baseSR4_Full,
-		ADVERTISED_40000baseSR4_Full,
-		SPEED_40000
-		},
-
-	[MLX4_56GBASE_KR4] = {
-		SUPPORTED_56000baseKR4_Full,
-		ADVERTISED_56000baseKR4_Full,
-		SPEED_56000
-		},
-	[MLX4_56GBASE_CR4] = {
-		SUPPORTED_56000baseCR4_Full,
-		ADVERTISED_56000baseCR4_Full,
-		SPEED_56000
-		},
-	[MLX4_56GBASE_SR4] = {
-		SUPPORTED_56000baseSR4_Full,
-		ADVERTISED_56000baseSR4_Full,
-		SPEED_56000
-		},
+static struct ptys2ethtool_config ptys2ethtool_map[MLX4_LINK_MODES_SZ];
+
+void __init mlx4_en_init_ptys2ethtool_map(void)
+{
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_100BASE_TX, SPEED_100,
+				       ETHTOOL_LINK_MODE_100baseT_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_1000BASE_T, SPEED_1000,
+				       ETHTOOL_LINK_MODE_1000baseT_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_1000BASE_CX_SGMII, SPEED_1000,
+				       ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_1000BASE_KX, SPEED_1000,
+				       ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_10GBASE_T, SPEED_10000,
+				       ETHTOOL_LINK_MODE_10000baseT_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_10GBASE_CX4, SPEED_10000,
+				       ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_10GBASE_KX4, SPEED_10000,
+				       ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_10GBASE_KR, SPEED_10000,
+				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_10GBASE_CR, SPEED_10000,
+				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_10GBASE_SR, SPEED_10000,
+				       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_20GBASE_KR2, SPEED_20000,
+				       ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT,
+				       ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_40GBASE_CR4, SPEED_40000,
+				       ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_40GBASE_KR4, SPEED_40000,
+				       ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_40GBASE_SR4, SPEED_40000,
+				       ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_56GBASE_KR4, SPEED_56000,
+				       ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_56GBASE_CR4, SPEED_56000,
+				       ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT);
+	MLX4_BUILD_PTYS2ETHTOOL_CONFIG(MLX4_56GBASE_SR4, SPEED_56000,
+				       ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT);
 };
 
-static u32 ptys2ethtool_link_modes(u32 eth_proto, enum ethtool_report report)
+static void ptys2ethtool_update_link_modes(ethtool_link_mode_mask_t *link_modes,
+					   u32 eth_proto,
+					   enum ethtool_report report)
 {
 	int i;
-	u32 link_modes = 0;
-
 	for (i = 0; i < MLX4_LINK_MODES_SZ; i++) {
 		if (eth_proto & MLX4_PROT_MASK(i))
-			link_modes |= ptys2ethtool_map[i][report];
+			bitmap_or(link_modes->mask,
+				  link_modes->mask,
+				  ptys2ethtool_map[i].link_modes[report].mask,
+				  __ETHTOOL_LINK_MODE_MASK_NBITS);
 	}
-	return link_modes;
 }
 
-static u32 ethtool2ptys_link_modes(u32 link_modes, enum ethtool_report report)
+static u32 ethtool2ptys_link_modes(const ethtool_link_mode_mask_t *link_modes,
+				   enum ethtool_report report)
 {
 	int i;
 	u32 ptys_modes = 0;
 
 	for (i = 0; i < MLX4_LINK_MODES_SZ; i++) {
-		if (ptys2ethtool_map[i][report] & link_modes)
+		if (bitmap_intersects(
+			    ptys2ethtool_map[i].link_modes[report].mask,
+			    link_modes->mask,
+			    __ETHTOOL_LINK_MODE_MASK_NBITS))
 			ptys_modes |= 1 << i;
 	}
 	return ptys_modes;
@@ -589,14 +554,14 @@ static u32 speed2ptys_link_modes(u32 speed)
 	u32 ptys_modes = 0;
 
 	for (i = 0; i < MLX4_LINK_MODES_SZ; i++) {
-		if (ptys2ethtool_map[i][SPEED] == speed)
+		if (ptys2ethtool_map[i].speed == speed)
 			ptys_modes |= 1 << i;
 	}
 	return ptys_modes;
 }
 
-static int ethtool_get_ptys_settings(struct net_device *dev,
-				     struct ethtool_cmd *cmd)
+static int ethtool_get_ptys_ksettings(struct net_device *dev,
+				      struct ethtool_ksettings *ksettings)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_ptys_reg ptys_reg;
@@ -624,79 +589,94 @@ static int ethtool_get_ptys_settings(struct net_device *dev,
 	en_dbg(DRV, priv, "ptys_reg.eth_proto_lp_adv %x\n",
 	       be32_to_cpu(ptys_reg.eth_proto_lp_adv));
 
-	cmd->supported = 0;
-	cmd->advertising = 0;
+	/* reset supported/advertising masks */
+	ethtool_build_link_mode(&ksettings->link_modes.supported);
+	ethtool_build_link_mode(&ksettings->link_modes.advertising);
 
-	cmd->supported |= ptys_get_supported_port(&ptys_reg);
+	ptys2ethtool_update_supported_port(&ksettings->link_modes.supported,
+					   &ptys_reg);
 
 	eth_proto = be32_to_cpu(ptys_reg.eth_proto_cap);
-	cmd->supported |= ptys2ethtool_link_modes(eth_proto, SUPPORTED);
+	ptys2ethtool_update_link_modes(&ksettings->link_modes.supported,
+				       eth_proto, SUPPORTED);
 
 	eth_proto = be32_to_cpu(ptys_reg.eth_proto_admin);
-	cmd->advertising |= ptys2ethtool_link_modes(eth_proto, ADVERTISED);
+	ptys2ethtool_update_link_modes(&ksettings->link_modes.advertising,
+				       eth_proto, ADVERTISED);
 
-	cmd->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
-	cmd->advertising |= (priv->prof->tx_pause) ? ADVERTISED_Pause : 0;
+	ethtool_add_link_modes(&ksettings->link_modes.supported,
+			       ETHTOOL_LINK_MODE_Pause_BIT,
+			       ETHTOOL_LINK_MODE_Pause_BIT);
 
-	cmd->advertising |= (priv->prof->tx_pause ^ priv->prof->rx_pause) ?
-		ADVERTISED_Asym_Pause : 0;
+	if (priv->prof->tx_pause)
+		ethtool_add_link_modes(&ksettings->link_modes.advertising,
+				       ETHTOOL_LINK_MODE_Pause_BIT);
+	if (priv->prof->tx_pause ^ priv->prof->rx_pause)
+		ethtool_add_link_modes(&ksettings->link_modes.advertising,
+				       ETHTOOL_LINK_MODE_Asym_Pause_BIT);
 
-	cmd->port = ptys_get_active_port(&ptys_reg);
-	cmd->transceiver = (SUPPORTED_TP & cmd->supported) ?
-		XCVR_EXTERNAL : XCVR_INTERNAL;
+	ksettings->parent.port = ptys_get_active_port(&ptys_reg);
 
 	if (mlx4_en_autoneg_get(dev)) {
-		cmd->supported |= SUPPORTED_Autoneg;
-		cmd->advertising |= ADVERTISED_Autoneg;
+		ethtool_add_link_modes(&ksettings->link_modes.supported,
+				       ETHTOOL_LINK_MODE_Autoneg_BIT);
+		ethtool_add_link_modes(&ksettings->link_modes.advertising,
+				       ETHTOOL_LINK_MODE_Autoneg_BIT);
 	}
 
-	cmd->autoneg = (priv->port_state.flags & MLX4_EN_PORT_ANC) ?
+	ksettings->parent.autoneg
+		= (priv->port_state.flags & MLX4_EN_PORT_ANC) ?
 		AUTONEG_ENABLE : AUTONEG_DISABLE;
 
 	eth_proto = be32_to_cpu(ptys_reg.eth_proto_lp_adv);
-	cmd->lp_advertising = ptys2ethtool_link_modes(eth_proto, ADVERTISED);
 
-	cmd->lp_advertising |= (priv->port_state.flags & MLX4_EN_PORT_ANC) ?
-			ADVERTISED_Autoneg : 0;
+	ethtool_build_link_mode(&ksettings->link_modes.lp_advertising);
+	ptys2ethtool_update_link_modes(&ksettings->link_modes.lp_advertising,
+				       eth_proto, ADVERTISED);
+	if (priv->port_state.flags & MLX4_EN_PORT_ANC)
+		ethtool_add_link_modes(&ksettings->link_modes.lp_advertising,
+				       ETHTOOL_LINK_MODE_Autoneg_BIT);
 
-	cmd->phy_address = 0;
-	cmd->mdio_support = 0;
-	cmd->maxtxpkt = 0;
-	cmd->maxrxpkt = 0;
-	cmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
-	cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO;
+	ksettings->parent.phy_address = 0;
+	ksettings->parent.mdio_support = 0;
+	ksettings->parent.eth_tp_mdix = ETH_TP_MDI_INVALID;
+	ksettings->parent.eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO;
 
 	return ret;
 }
 
-static void ethtool_get_default_settings(struct net_device *dev,
-					 struct ethtool_cmd *cmd)
+static void ethtool_get_default_ksettings(struct net_device *dev,
+					  struct ethtool_ksettings *ksettings)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int trans_type;
 
-	cmd->autoneg = AUTONEG_DISABLE;
-	cmd->supported = SUPPORTED_10000baseT_Full;
-	cmd->advertising = ADVERTISED_10000baseT_Full;
+	ksettings->parent.autoneg = AUTONEG_DISABLE;
+	ethtool_build_link_mode(&ksettings->link_modes.supported,
+				ETHTOOL_LINK_MODE_10000baseT_Full_BIT);
+	ethtool_build_link_mode(&ksettings->link_modes.advertising,
+				ETHTOOL_LINK_MODE_10000baseT_Full_BIT);
 	trans_type = priv->port_state.transceiver;
 
 	if (trans_type > 0 && trans_type <= 0xC) {
-		cmd->port = PORT_FIBRE;
-		cmd->transceiver = XCVR_EXTERNAL;
-		cmd->supported |= SUPPORTED_FIBRE;
-		cmd->advertising |= ADVERTISED_FIBRE;
+		ksettings->parent.port = PORT_FIBRE;
+		ethtool_add_link_modes(&ksettings->link_modes.supported,
+				       ETHTOOL_LINK_MODE_FIBRE_BIT);
+		ethtool_add_link_modes(&ksettings->link_modes.advertising,
+				       ETHTOOL_LINK_MODE_FIBRE_BIT);
 	} else if (trans_type == 0x80 || trans_type == 0) {
-		cmd->port = PORT_TP;
-		cmd->transceiver = XCVR_INTERNAL;
-		cmd->supported |= SUPPORTED_TP;
-		cmd->advertising |= ADVERTISED_TP;
+		ksettings->parent.port = PORT_TP;
+		ethtool_add_link_modes(&ksettings->link_modes.supported,
+				       ETHTOOL_LINK_MODE_TP_BIT);
+		ethtool_add_link_modes(&ksettings->link_modes.advertising,
+				       ETHTOOL_LINK_MODE_TP_BIT);
 	} else  {
-		cmd->port = -1;
-		cmd->transceiver = -1;
+		ksettings->parent.port = -1;
 	}
 }
 
-static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int mlx4_en_get_ksettings(struct net_device *dev,
+				 struct ethtool_ksettings *ksettings)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int ret = -EINVAL;
@@ -709,16 +689,16 @@ static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 	       priv->port_state.flags & MLX4_EN_PORT_ANE);
 
 	if (priv->mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL)
-		ret = ethtool_get_ptys_settings(dev, cmd);
+		ret = ethtool_get_ptys_ksettings(dev, ksettings);
 	if (ret) /* ETH PROT CRTL is not supported or PTYS CMD failed */
-		ethtool_get_default_settings(dev, cmd);
+		ethtool_get_default_ksettings(dev, ksettings);
 
 	if (netif_carrier_ok(dev)) {
-		ethtool_cmd_speed_set(cmd, priv->port_state.link_speed);
-		cmd->duplex = DUPLEX_FULL;
+		ksettings->parent.speed = priv->port_state.link_speed;
+		ksettings->parent.duplex = DUPLEX_FULL;
 	} else {
-		ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
-		cmd->duplex = DUPLEX_UNKNOWN;
+		ksettings->parent.speed = SPEED_UNKNOWN;
+		ksettings->parent.duplex = DUPLEX_UNKNOWN;
 	}
 	return 0;
 }
@@ -742,21 +722,33 @@ static __be32 speed_set_ptys_admin(struct mlx4_en_priv *priv, u32 speed,
 	return proto_admin;
 }
 
-static int mlx4_en_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int mlx4_en_set_ksettings(struct net_device *dev,
+				 const struct ethtool_ksettings *ksettings)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_ptys_reg ptys_reg;
 	__be32 proto_admin;
 	int ret;
 
-	u32 ptys_adv = ethtool2ptys_link_modes(cmd->advertising, ADVERTISED);
-	int speed = ethtool_cmd_speed(cmd);
+	u32 ptys_adv = ethtool2ptys_link_modes(
+		&ksettings->link_modes.advertising, ADVERTISED);
+	const int speed = ksettings->parent.speed;
+
+	if (en_dbg_enabled(DRV, priv)) {
+		char dbg_link_mode_buff[64];
 
-	en_dbg(DRV, priv, "Set Speed=%d adv=0x%x autoneg=%d duplex=%d\n",
-	       speed, cmd->advertising, cmd->autoneg, cmd->duplex);
+		bitmap_scnprintf(dbg_link_mode_buff,
+				 sizeof(dbg_link_mode_buff),
+				 ksettings->link_modes.supported.mask,
+				 __ETHTOOL_LINK_MODE_MASK_NBITS);
+		en_dbg(DRV, priv,
+		       "Set Speed=%d adv={%s} autoneg=%d duplex=%d\n",
+		       speed, dbg_link_mode_buff,
+		       ksettings->parent.autoneg, ksettings->parent.duplex);
+	}
 
 	if (!(priv->mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL) ||
-	    (cmd->duplex == DUPLEX_HALF))
+	    (ksettings->parent.duplex == DUPLEX_HALF))
 		return -EINVAL;
 
 	memset(&ptys_reg, 0, sizeof(ptys_reg));
@@ -770,7 +762,7 @@ static int mlx4_en_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 		return 0;
 	}
 
-	proto_admin = cmd->autoneg == AUTONEG_ENABLE ?
+	proto_admin = ksettings->parent.autoneg == AUTONEG_ENABLE ?
 		cpu_to_be32(ptys_adv) :
 		speed_set_ptys_admin(priv, speed,
 				     ptys_reg.eth_proto_cap);
@@ -1820,8 +1812,8 @@ static int mlx4_en_get_module_eeprom(struct net_device *dev,
 
 const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_drvinfo = mlx4_en_get_drvinfo,
-	.get_settings = mlx4_en_get_settings,
-	.set_settings = mlx4_en_set_settings,
+	.get_ksettings = mlx4_en_get_ksettings,
+	.set_ksettings = mlx4_en_set_ksettings,
 	.get_link = ethtool_op_get_link,
 	.get_strings = mlx4_en_get_strings,
 	.get_sset_count = mlx4_en_get_sset_count,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index c643d2b..b435349 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -348,6 +348,7 @@ static void mlx4_en_verify_params(void)
 static int __init mlx4_en_init(void)
 {
 	mlx4_en_verify_params();
+	mlx4_en_init_ptys2ethtool_map();
 
 	return mlx4_register_interface(&mlx4_en_interface);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 96bf728..609fc3e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -749,6 +749,7 @@ static inline bool mlx4_en_cq_busy_polling(struct mlx4_en_cq *cq)
 
 #define MLX4_EN_WOL_DO_MODIFY (1ULL << 63)
 
+void mlx4_en_init_ptys2ethtool_map(void);
 void mlx4_en_update_loopback_state(struct net_device *dev,
 				   netdev_features_t features);
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* Re: [PATCH 1/5] WIP: Add syscall unlinkat_s (currently x86* only)
From: Alexander Holler @ 2015-02-04 19:56 UTC (permalink / raw)
  To: Theodore Ts'o, Lukáš Czerner, Michael Kerrisk,
	Al Viro, Linux-Fsdevel, Linux Kernel, Linux API
In-Reply-To: <20150204193316.GH2509@thunk.org>

Am 04.02.2015 um 20:33 schrieb Theodore Ts'o:

> And indeed, people who do have salaries paid by companies who care
> about this general problem in actual products have been working on
> addressing it using encryption, such that when the user is removed
> from the device, the key is blasted.  More importantly, when the user
> is not logged in, the key isn't even *available* on the device.  So it
> solves more problems than the one that you are concerned about, and in
> general maintainers prefer solutions that solve multiple problems,
> because that minimizes the number of one-time hacks and partial/toy
> solutions which turn into long-term maintainance headaches.  (After
> all, if you insist on having a partial/toy solution merged, that turns
> into an unfunded mandate which the maintainers effectively have to
> support for free, forever.)

It's just another layer above and an rather ugly workaround which ends 
up in having to manage keys and doesn't solve the real problem. Besides 
that it's much more complicated especially in kind of kernel sources to 
manage.

> You've rejected encryption as a proposed solution as not meeting your
> requirements (which if I understand your objections, can be summarized
> as "encryption is too hard").  This is fine, but if you want someone
> *else* to implement your partial toy solution which is less secure,
> then you will either need to pay someone to do it or do it yourself.

I haven't rejected it. I'm using that myself since around 10 years, 
because of the impossibility to really delete files when using Linux.

>>> Wrong. I don't want my partial solution to be part of the official kernel. I
>>> don't care. I offered it for other users because I'm aware that has become
>>> almost impossible for normal people to get something into the kernel without
>>> spending an unbelievable amount of time most people can't afford to spend.
>
> So you expect other users to just apply your patches and use an
> unofficial system call number that might get reassigned to some other
> user later on?

People do such all the time because the mainline kernel is otherwise 
unusable on many boards.

Besides that, I don't expect that anyone uses my patches.

As said multiple times before, they are an offer and were primarily 
meant to show a possible simple solution for many use cases. They 
already work with inside some, of course maybe uncomfortable, limits and 
don't do any worse. just better.

Alexander Holler

^ permalink raw reply

* Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Andy Lutomirski @ 2015-02-04 21:38 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Kees Cook, David Herrmann,
	Alexei Starovoitov, Miklos Szeredi, David Drysdale, Oleg Nesterov,
	David S. Miller, Vivek Goyal, Mike Frysinger, Theodore Ts'o,
	Heiko Carstens, Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra <pe>
In-Reply-To: <1423046213-7043-1-git-send-email-famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, Feb 4, 2015 at 2:36 AM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 2) epoll_pwait1
> ---------------
>
> NAME
>        epoll_pwait1 - wait for an I/O event on an epoll file descriptor
>
> SYNOPSIS
>
>        #include <sys/epoll.h>
>
>        int epoll_pwait1(int epfd, int flags,
>                         struct epoll_event *events,
>                         int maxevents,
>                         struct timespec *timeout,
>                         struct sigargs *sig);
>
> DESCRIPTION
>
>        The epoll_pwait1 system call differs from epoll_pwait only in parameter
>        types. The first difference is timeout, a pointer to timespec structure
>        which allows nanosecond presicion; the second difference, which should
>        probably be wrapper by glibc and only expose a sigset_t pointer as in
>        pselect6.
>
>        If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
>        (return immediately). Otherwise it's converted to nanosecond scalar,
>        again, with the same convention as epoll_pwait's timeout.

Is the timeout absolute or relative?

I'd kind of like the ability to set timeouts on multiple clocks at the
same time, but I can live without that.

--Andy

>
> RETURN VALUE
>
>        The same as said in epoll_pwait(2).
>
> ERRORS
>
>        The same as said in man epoll_pwait(2), plus:
>
>        EINVAL flags is not zero.
>
>
> CONFORMING TO
>
>        epoll_pwait1() is Linux-specific.
>
> SEE ALSO
>
>        epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)
>
> Fam Zheng (7):
>   epoll: Extract epoll_wait_do and epoll_pwait_do
>   epoll: Specify clockid explicitly
>   epoll: Extract ep_ctl_do
>   epoll: Add implementation for epoll_ctl_batch
>   x86: Hook up epoll_ctl_batch syscall
>   epoll: Add implementation for epoll_pwait1
>   x86: Hook up epoll_pwait1 syscall
>
>  arch/x86/syscalls/syscall_32.tbl |   2 +
>  arch/x86/syscalls/syscall_64.tbl |   2 +
>  fs/eventpoll.c                   | 241 ++++++++++++++++++++++++++-------------
>  include/linux/syscalls.h         |   9 ++
>  include/uapi/linux/eventpoll.h   |  11 ++
>  5 files changed, 186 insertions(+), 79 deletions(-)
>
> --
> 1.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* [PATCH] coresight-stm: adding driver for CoreSight STM component
From: mathieu.poirier @ 2015-02-04 22:22 UTC (permalink / raw)
  To: corbet, pratikp, al.grant, liviu.dudau, kaixu.xia,
	jeenu.viswambharan, gregkh
  Cc: linux-arm-kernel, linux-api, linux-kernel, linux-doc

From: Pratik Patel <pratikp@codeaurora.org>

This driver adds support for the STM CoreSight IP block,
allowing any system compoment (HW or SW) to log and
aggregate messages via a single entity.

The STM exposes an application defined number of channels
called stimulus port.  Configuration is done using entries
in sysfs and channels made available to userspace via devfs.

Signed-off-by: Pratik Patel <pratikp@codeaurora.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 .../ABI/testing/sysfs-bus-coresight-devices-stm    |   62 ++
 Documentation/trace/coresight.txt                  |   88 +-
 drivers/coresight/Kconfig                          |   10 +
 drivers/coresight/Makefile                         |    1 +
 drivers/coresight/coresight-stm.c                  | 1090 ++++++++++++++++++++
 include/linux/coresight-stm.h                      |   35 +
 include/uapi/linux/coresight-stm.h                 |   23 +
 7 files changed, 1307 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
 create mode 100644 drivers/coresight/coresight-stm.c
 create mode 100644 include/linux/coresight-stm.h
 create mode 100644 include/uapi/linux/coresight-stm.h

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-stm b/Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
new file mode 100644
index 000000000000..3ddb676831ab
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
@@ -0,0 +1,62 @@
+What:		/sys/bus/coresight/devices/<memory_map>.stm/enable_source
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Enable/disable tracing on this specific trace macrocell.
+		Enabling the trace macrocell implies it has been configured
+		properly and a sink has been identidifed for it.  The path
+		of coresight components linking the source to the sink is
+		configured and managed automatically by the coresight framework.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/entities
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Controls which entities have been allowed to use the
+		stimulus ports, regarless of the channel they were assigned.
+		Entity definition can be found in
+		include/uapi/linux/coresight-stm32.h
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/hwevent_enable
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Provides access to the HW event enable register, used in
+		conjunction with HW event bank select register.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/hwevent_select
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Gives access to the HW event block select register
+		(STMHEBSR) in order to configure up to 256 channels.  Used in
+		conjunction with "hwevent_enable" register as described above.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/port_enable
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Provides access to the stimlus port enable register
+		(STMSPER).  Used in conjunction with "port_select" described
+		below.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/port_select
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Used to determine which bank of stimulus port bit in
+		register STMSPER (see above) apply to.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/status
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(R) List various control and status registers.  The specific
+		layout and content is driver specific.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/traceid
+Date:		February 2015
+KernelVersion:	3.20
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Holds the trace ID that will appear in the trace stream
+		coming from this trace entity.
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index 02361552a3ea..a041477698d9 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -190,8 +190,8 @@ expected to be accessed and controlled using those entries.
 Last but not least, "struct module *owner" is expected to be set to reflect
 the information carried in "THIS_MODULE".
 
-How to use
-----------
+How to use the tracer modules
+-----------------------------
 
 Before trace collection can start, a coresight sink needs to be identify.
 There is no limit on the amount of sinks (nor sources) that can be enabled at
@@ -297,3 +297,87 @@ Info                                    Tracing enabled
 Instruction     13570831        0x8026B584      E28DD00C        false   ADD      sp,sp,#0xc
 Instruction     0       0x8026B588      E8BD8000        true    LDM      sp!,{pc}
 Timestamp                                       Timestamp: 17107041535
+
+How to use the STM module
+-------------------------
+
+Using the System Trace Macrocell module is the same as the tracers - the only
+difference is that components (entities) are driving the trace capture rather
+than the program flow through the code.
+
+As with any other CoreSight component specifics about the STM tracer can be
+found in sysfs, with more information on each entry being found in [1]:
+
+root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
+enable_source   hwevent_select  power           traceid
+entities        port_enable     status          uevent
+hwevent_enable  port_select     subsystem
+root@genericarmv8:~#
+
+Like any other source a sink needs to be identified and the STM enabled before
+being used:
+
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
+
+From there user space applications can request and use channels using the devfs
+interface provided for that purpose:
+
+root@genericarmv8:~# ls -l /dev/20100000.stm
+crw-------    1 root     root       10,  61 Jan  3 18:11 /dev/20100000.stm
+root@genericarmv8:~#
+
+The following sample program provides an example of the supported operations:
+
+#include <stdio.h>
+#include <fcntl.h>
+#include <string.h>
+#include <linux/coresight-stm.h>
+
+#define BUFSIZE	20
+
+int main(int argc, char *argv[])
+{
+	int fd, n;
+	unsigned long options;
+	char buf[BUFSIZE];
+	char data[BUFSIZE] = "this is a test";
+
+	fd = open (argv[1], O_RDWR, 0);
+
+	if (n == -1) {
+		printf("can't open %s\n", argv[1]);
+		return 0;
+	}
+
+	n = read(fd, buf, BUFSIZE);
+	printf("channel_id: %d\n", atoi(buf));
+
+	options = STM_OPTION_TIMESTAMPED;
+	ioctl(fd, STM_IOCTL_SET_OPTIONS, &options);
+	options = 0;
+	ioctl(fd, STM_IOCTL_GET_OPTIONS, &options);
+	printf("options: 0x%x\n", options);
+
+	write(fd, data, strlen(data));
+
+	close(fd);
+
+	return 0;
+}
+
+When opening the devfs entry the first available channel is reserved for the
+requesting application.  That channel will remain the same until close() is
+called where it will go back to the channel pool.  From there calling open()
+again may or may _not_ yield the same channel number.
+
+From user space applications can determine what channel they've been given by
+issueing a read() on the file descriptor returned by open().  An ioctl() call is
+provided to set the channel options and the write() method will inject logging
+information in the channel.  There is no limit on the amount of channels an
+application can reserve, granted they use a different file descriptor for each
+one.
+
+If no more channels are available value of returned channel ID is '-1'.
+
+[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
diff --git a/drivers/coresight/Kconfig b/drivers/coresight/Kconfig
index fc1f1ae7a49d..08806cc7d737 100644
--- a/drivers/coresight/Kconfig
+++ b/drivers/coresight/Kconfig
@@ -58,4 +58,14 @@ config CORESIGHT_SOURCE_ETM3X
 	  which allows tracing the instructions that a processor is executing
 	  This is primarily useful for instruction level tracing.  Depending
 	  the ETM version data tracing may also be available.
+
+config CORESIGHT_STM
+	bool "CoreSight System Trace Macrocell driver"
+	depends on (ARM && !(CPU_32v4 || CPU_32v4T)) || ARM64 || (64BIT && COMPILE_TEST)
+	select CORESIGHT_LINKS_AND_SINKS
+	help
+	  This driver provides support for hardware assisted software
+	  instrumentation based tracing. This is primarily used for
+	  logging useful software events or data coming from various entities
+	  in the system, possibly running different OSs
 endif
diff --git a/drivers/coresight/Makefile b/drivers/coresight/Makefile
index 4b4bec890ef5..7005b48a33ed 100644
--- a/drivers/coresight/Makefile
+++ b/drivers/coresight/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_CORESIGHT_SINK_ETBV10) += coresight-etb10.o
 obj-$(CONFIG_CORESIGHT_LINKS_AND_SINKS) += coresight-funnel.o \
 					   coresight-replicator.o
 obj-$(CONFIG_CORESIGHT_SOURCE_ETM3X) += coresight-etm3x.o coresight-etm-cp14.o
+obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
diff --git a/drivers/coresight/coresight-stm.c b/drivers/coresight/coresight-stm.c
new file mode 100644
index 000000000000..e59b0fe01d87
--- /dev/null
+++ b/drivers/coresight/coresight-stm.c
@@ -0,0 +1,1090 @@
+/* Copyright (c) 2014-2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
+#include <linux/clk.h>
+#include <linux/bitmap.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/coresight.h>
+#include <linux/coresight-stm.h>
+#include <linux/amba/bus.h>
+#include <asm/unaligned.h>
+
+#include "coresight-priv.h"
+
+#define STMDMASTARTR			0xc04
+#define STMDMASTOPR			0xc08
+#define STMDMASTATR			0xc0c
+#define STMDMACTLR			0xc10
+#define STMDMAIDR			0xcfc
+#define STMHEER				0xd00
+#define STMHETER			0xd20
+#define STMHEBSR			0xd60
+#define STMHEMCR			0xd64
+#define STMHEMASTR			0xdf4
+#define STMHEFEAT1R			0xdf8
+#define STMHEIDR			0xdfc
+#define STMSPER				0xe00
+#define STMSPTER			0xe20
+#define STMPRIVMASKR			0xe40
+#define STMSPSCR			0xe60
+#define STMSPMSCR			0xe64
+#define STMSPOVERRIDER			0xe68
+#define STMSPMOVERRIDER			0xe6c
+#define STMSPTRIGCSR			0xe70
+#define STMTCSR				0xe80
+#define STMTSSTIMR			0xe84
+#define STMTSFREQR			0xe8c
+#define STMSYNCR			0xe90
+#define STMAUXCR			0xe94
+#define STMSPFEAT1R			0xea0
+#define STMSPFEAT2R			0xea4
+#define STMSPFEAT3R			0xea8
+#define STMITTRIGGER			0xee8
+#define STMITATBDATA0			0xeec
+#define STMITATBCTR2			0xef0
+#define STMITATBID			0xef4
+#define STMITATBCTR0			0xef8
+
+#define STM_32_CHANNEL			32
+#define BYTES_PER_CHANNEL		256
+#define STM_TRACE_BUF_SIZE		4096
+
+/* Register bit definition */
+#define STMTCSR_BUSY_BIT		23
+
+enum stm_pkt_type {
+	STM_PKT_TYPE_DATA	= 0x98,
+	STM_PKT_TYPE_FLAG	= 0xE8,
+	STM_PKT_TYPE_TRIG	= 0xF8,
+};
+
+enum {
+	STM_OPTION_MARKED	= 0x10,
+};
+
+#define stm_channel_addr(drvdata, ch)	(drvdata->chs.base +	\
+					(ch * BYTES_PER_CHANNEL))
+#define stm_channel_off(type, opts)	(type & ~opts)
+
+#ifndef CONFIG_64BIT
+static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+{
+	asm volatile("strd %1, %0"
+		     : "+Qo" (*(volatile u64 __force *)addr)
+		     : "r" (val));
+}
+
+static inline u64 __raw_readq(const volatile void __iomem *addr)
+{
+	u64 val;
+
+	asm volatile("ldrd %1, %0"
+		     : "+Qo" (*(volatile u64 __force *)addr),
+		       "=r" (val));
+	return val;
+}
+
+#undef readq_relaxed
+#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \
+					__raw_readq(c)); __r; })
+#undef writeq_relaxed
+#define writeq_relaxed(v, c)	__raw_writeq((__force u64) cpu_to_le64(v), c)
+#endif
+
+static int boot_nr_channel;
+
+module_param_named(
+	boot_nr_channel, boot_nr_channel, int, S_IRUGO
+);
+
+/**
+ * struct channel_space - central management entity for extended ports
+ * @base:		memory mapped base address where channels start.
+ * @bitmap:		tally of which channel is being used.
+ */
+struct channel_space {
+	void __iomem		*base;
+	unsigned long		*bitmap;
+};
+
+/**
+ * struct stm_node - aggregation of channel information for userspace access
+ * @channel_id:		the channel number associated to this file descriptor.
+ * @options:		options for this channel - none, timestamped,
+i			guaranteed.
+ * @drvdata:		STM driver specifics.
+ */
+struct stm_node {
+	int			channel_id;
+	u32			options;
+	struct stm_drvdata	*drvdata;
+};
+
+/**
+ * struct stm_drvdata - specifics associated to an STM component
+ * @ base:		memory mapped base address for this component.
+ * @dev:		the device entity associated to this component.
+ * @csdev:		component vitals needed by the framework.
+ * @miscdev:		specifics to handle "/dev/xyz.stm" entry.
+ * @clk:		the clock this component is associated to.
+ * @spinlock:		only one at a time pls.
+ * @chs:		the channels accociated to this STM.
+ * @enable:		this STM is being used.
+ * @entities:		set of entities allowed to access the STM ports.
+ * @traceid:		value of the current ID for this component.
+ * @write_64bit:	whether this STM supports 64 bit access.
+ * @stmsper:		settings for register STMSPER.
+ * @stmspscr:		settings for register STMSPSCR.
+ * @numsp:		the total number of stimulus port support by this STM.
+ * @stmheer:		settings for register STMHEER.
+ * @stmheter:		settings for register STMHETER.
+ * @stmhebsr:		settings for register STMHEBSR.
+ */
+struct stm_drvdata {
+	void __iomem		*base;
+	struct device		*dev;
+	struct coresight_device	*csdev;
+	struct miscdevice	miscdev;
+	struct clk		*clk;
+	spinlock_t		spinlock;
+	struct channel_space	chs;
+	bool			enable;
+	DECLARE_BITMAP(entities, STM_ENTITY_MAX);
+	u8			traceid;
+	u32			write_64bit;
+	u32			stmsper;
+	u32			stmspscr;
+	u32			numsp;
+	u32			stmheer;
+	u32			stmheter;
+	u32			stmhebsr;
+};
+
+static struct stm_drvdata *stmdrvdata;
+
+static void stm_hwevent_enable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	writel_relaxed(drvdata->stmhebsr, drvdata->base + STMHEBSR);
+	writel_relaxed(drvdata->stmheter, drvdata->base + STMHETER);
+	writel_relaxed(drvdata->stmheer, drvdata->base + STMHEER);
+	writel_relaxed(0x01 |	/* Enable HW event tracing */
+		       0x04,	/* Error detection on event tracing */
+		       drvdata->base + STMHEMCR);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_port_enable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+	/* ATB trigger enable on direct writes to TRIG locations */
+	writel_relaxed(0x10,
+		       drvdata->base + STMSPTRIGCSR);
+	writel_relaxed(drvdata->stmspscr, drvdata->base + STMSPSCR);
+	writel_relaxed(drvdata->stmsper, drvdata->base + STMSPER);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_enable_hw(struct stm_drvdata *drvdata)
+{
+	if (drvdata->stmheer)
+		stm_hwevent_enable_hw(drvdata);
+
+	stm_port_enable_hw(drvdata);
+
+	CS_UNLOCK(drvdata->base);
+
+	/* 4096 byte between synchronisation packets */
+	writel_relaxed(0xFFF, drvdata->base + STMSYNCR);
+	writel_relaxed((drvdata->traceid << 16 | /* trace id */
+			0x02 |			 /* timestamp enable */
+			0x01),			 /* global STM enable */
+			drvdata->base + STMTCSR);
+
+	CS_LOCK(drvdata->base);
+}
+
+static int stm_enable(struct coresight_device *csdev)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	int ret;
+
+	ret = clk_prepare_enable(drvdata->clk);
+	if (ret)
+		return ret;
+
+	spin_lock(&drvdata->spinlock);
+	stm_enable_hw(drvdata);
+	drvdata->enable = true;
+	spin_unlock(&drvdata->spinlock);
+
+	dev_info(drvdata->dev, "STM tracing enabled\n");
+	return 0;
+}
+
+static void stm_hwevent_disable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	writel_relaxed(0x0, drvdata->base + STMHEMCR);
+	writel_relaxed(0x0, drvdata->base + STMHEER);
+	writel_relaxed(0x0, drvdata->base + STMHETER);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_port_disable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	writel_relaxed(0x0, drvdata->base + STMSPER);
+	writel_relaxed(0x0, drvdata->base + STMSPTRIGCSR);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_disable_hw(struct stm_drvdata *drvdata)
+{
+	u32 val;
+
+	CS_UNLOCK(drvdata->base);
+
+	val = readl_relaxed(drvdata->base + STMTCSR);
+	val &= ~0x1; /* clear global STM enable [0] */
+	writel_relaxed(val, drvdata->base + STMTCSR);
+
+	CS_LOCK(drvdata->base);
+
+	stm_port_disable_hw(drvdata);
+	if (drvdata->stmheer)
+		stm_hwevent_disable_hw(drvdata);
+}
+
+static void stm_disable(struct coresight_device *csdev)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	spin_lock(&drvdata->spinlock);
+	stm_disable_hw(drvdata);
+	drvdata->enable = false;
+	spin_unlock(&drvdata->spinlock);
+
+	/* Wait until the engine has completely stopped */
+	coresight_timeout(drvdata, STMTCSR, STMTCSR_BUSY_BIT, 0);
+
+	clk_disable_unprepare(drvdata->clk);
+
+	dev_info(drvdata->dev, "STM tracing disabled\n");
+}
+
+static int stm_trace_id(struct coresight_device *csdev)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	return drvdata->traceid;
+}
+
+static const struct coresight_ops_source stm_source_ops = {
+	.trace_id	= stm_trace_id,
+	.enable		= stm_enable,
+	.disable	= stm_disable,
+};
+
+static const struct coresight_ops stm_cs_ops = {
+	.source_ops	= &stm_source_ops,
+};
+
+static int stm_channel_alloc(u32 off)
+{
+	struct stm_drvdata *drvdata = stmdrvdata;
+	int ch = -1;
+
+	do {
+		ch = find_next_zero_bit(drvdata->chs.bitmap,
+					drvdata->numsp, off);
+	} while ((ch < drvdata->numsp) &&
+		 test_and_set_bit(ch, drvdata->chs.bitmap));
+
+	return ch;
+}
+
+static void stm_channel_free(u32 ch)
+{
+	struct stm_drvdata *drvdata = stmdrvdata;
+
+	clear_bit(ch, drvdata->chs.bitmap);
+}
+
+static int stm_send_64bit(void *addr, const void *data, u32 size)
+{
+	u64 prepad = 0;
+	u64 postpad = 0;
+	char *pad;
+	u8 off, endoff;
+	u32 len = size;
+
+	off = (unsigned long)data & 0x7;
+
+	if (off) {
+		endoff = 8 - off;
+		pad = (char *)&prepad;
+		pad += off;
+
+		while (endoff && size) {
+			*pad++ = *(char *)data++;
+			endoff--;
+			size--;
+		}
+		writeq_relaxed(prepad, addr);
+	}
+
+	/* now we are 64bit aligned */
+	while (size >= 8) {
+		writeq_relaxed(*(u64 *)data, addr);
+		data += 8;
+		size -= 8;
+	}
+
+	endoff = 0;
+
+	if (size) {
+		endoff = 8 - (u8)size;
+		pad = (char *)&postpad;
+
+		while (size) {
+			*pad++ = *(char *)data++;
+			size--;
+		}
+		writeq_relaxed(postpad, addr);
+	}
+
+	return len + off + endoff;
+}
+
+static int stm_trace_data_64bit(unsigned long ch_addr, u32 options,
+				const void *data, u32 size)
+{
+	void *addr;
+
+	options &= ~STM_OPTION_TIMESTAMPED;
+	addr = (void *)(ch_addr | stm_channel_off(STM_PKT_TYPE_DATA, options));
+
+	return stm_send_64bit(addr, data, size);
+}
+
+static int stm_send(void *addr, const void *data, u32 size)
+{
+	u32 len = size;
+
+	if (((unsigned long)data & 0x1) && (size >= 1)) {
+		writeb_relaxed(*(u8 *)data, addr);
+		data++;
+		size--;
+	}
+	if (((unsigned long)data & 0x2) && (size >= 2)) {
+		writew_relaxed(*(u16 *)data, addr);
+		data += 2;
+		size -= 2;
+	}
+
+	/* now we are 32bit aligned */
+	while (size >= 4) {
+		writel_relaxed(*(u32 *)data, addr);
+		data += 4;
+		size -= 4;
+	}
+
+	if (size >= 2) {
+		writew_relaxed(*(u16 *)data, addr);
+		data += 2;
+		size -= 2;
+	}
+	if (size >= 1) {
+		writeb_relaxed(*(u8 *)data, addr);
+		data++;
+		size--;
+	}
+
+	return len;
+}
+
+static int stm_trace_data(unsigned long ch_addr, u32 options,
+			  const void *data, u32 size)
+{
+	void *addr;
+
+	options &= ~STM_OPTION_TIMESTAMPED;
+	addr = (void *)(ch_addr | stm_channel_off(STM_PKT_TYPE_DATA, options));
+
+	return stm_send(addr, data, size);
+}
+
+static inline int stm_trace_hw(u32 options, u32 channel, u8 entity_id,
+			       const void *data, u32 size)
+{
+	int len = 0;
+	unsigned long ch_addr;
+	struct stm_drvdata *drvdata = stmdrvdata;
+
+
+	/* get the channel address */
+	ch_addr = (unsigned long)stm_channel_addr(drvdata, channel);
+
+	if (drvdata->write_64bit)
+		len = stm_trace_data_64bit(ch_addr, options, data, size);
+	else
+		/* send the payload data */
+		len = stm_trace_data(ch_addr, options, data, size);
+
+	return len;
+}
+
+/**
+ * stm_trace - trace the binary or string data through STM
+ * @options: tracing options - guaranteed, timestamped, etc
+ * @entity_id: entity representing the trace data
+ * @data: pointer to binary or string data buffer
+ * @size: size of data to send
+ *
+ * Packetizes the data as the payload to an OST packet and sends it over STM
+ *
+ * CONTEXT:
+ * Can be called from any context.
+ *
+ * RETURNS:
+ * number of bytes transferred over STM
+ */
+int stm_trace(u32 options, int channel_id,
+	      u8 entity_id, const void *data, u32 size)
+{
+	struct stm_drvdata *drvdata = stmdrvdata;
+
+	if (channel_id < 0)
+		return 0;
+
+	/* we don't support sizes more than 24bits (0 to 23) */
+	if (!(drvdata && drvdata->enable &&
+	      test_bit(entity_id, drvdata->entities) && size &&
+	      (size < 0x1000000)))
+		return 0;
+
+	return stm_trace_hw(options, (u32)channel_id,
+			    entity_id, data, size);
+}
+EXPORT_SYMBOL(stm_trace);
+
+static int stm_open(struct inode *inode, struct file *file)
+{
+	struct stm_node *node;
+	struct stm_drvdata *drvdata = container_of(file->private_data,
+						   struct stm_drvdata, miscdev);
+
+	node = kmalloc(sizeof(struct stm_node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	node->drvdata = drvdata;
+	node->options = STM_OPTION_TIMESTAMPED;
+	node->channel_id = stm_channel_alloc(0);
+	if (node->channel_id < 0)
+		return -ENOMEM;
+
+	file->private_data = node;
+	return 0;
+}
+
+static int stm_release(struct inode *inode, struct file *file)
+{
+	struct stm_node *node = file->private_data;
+
+	/* we are done, free the channel */
+	if (node->channel_id >= 0)
+		stm_channel_free((u32)node->channel_id);
+	file->private_data = NULL;
+	kfree(node);
+	return 0;
+}
+
+static ssize_t stm_read(struct file *file, char __user *data,
+			size_t size, loff_t *ppos)
+{
+	char buf[20];
+	struct stm_node *node = file->private_data;
+
+	snprintf(buf, sizeof(buf), "%d", node->channel_id);
+	return simple_read_from_buffer(data, size, ppos,
+				       buf, strlen(buf));
+}
+
+static ssize_t stm_write(struct file *file, const char __user *data,
+			 size_t size, loff_t *ppos)
+{
+	char *buf;
+	struct stm_node *node = file->private_data;
+	struct stm_drvdata *drvdata = node->drvdata;
+
+	if (node->channel_id < 0)
+		return -EINVAL;
+
+	if (!drvdata->enable || !size)
+		return -EINVAL;
+
+	if (size > STM_TRACE_BUF_SIZE)
+		size = STM_TRACE_BUF_SIZE;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	if (copy_from_user(buf, data, size)) {
+		kfree(buf);
+		dev_dbg(drvdata->dev, "%s: copy_from_user failed\n", __func__);
+		return -EFAULT;
+	}
+
+	if (!test_bit(STM_ENTITY_TRACE_USPACE, drvdata->entities)) {
+		kfree(buf);
+		return size;
+	}
+
+	stm_trace_hw(node->options, (u32)node->channel_id,
+		     STM_ENTITY_TRACE_USPACE, buf, size);
+
+	kfree(buf);
+
+	return size;
+}
+
+static long stm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	u32 options;
+	struct stm_node *node = file->private_data;
+
+	switch (cmd) {
+	case STM_IOCTL_SET_OPTIONS:
+		if (copy_from_user(&options, (void __user *)arg, sizeof(u32)))
+			return -EFAULT;
+
+		options &= (STM_OPTION_TIMESTAMPED | STM_OPTION_GUARANTEED);
+		node->options = options;
+		break;
+	case STM_IOCTL_GET_OPTIONS:
+		options = node->options;
+		if (copy_to_user((void __user *)arg, &options, sizeof(options)))
+			return -EFAULT;
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	return 0;
+}
+
+static const struct file_operations stm_fops = {
+	.owner		= THIS_MODULE,
+	.open		= stm_open,
+	.write		= stm_write,
+	.read		= stm_read,
+	.llseek		= no_llseek,
+	.unlocked_ioctl	= stm_ioctl,
+	.release	= stm_release,
+};
+
+static ssize_t hwevent_enable_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val = drvdata->stmheer;
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t hwevent_enable_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return -EINVAL;
+
+	drvdata->stmheer = val;
+	/* HW event enable and trigger go hand in hand */
+	drvdata->stmheter = val;
+
+	return size;
+}
+static DEVICE_ATTR_RW(hwevent_enable);
+
+static ssize_t hwevent_select_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val = drvdata->stmhebsr;
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t hwevent_select_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return -EINVAL;
+
+	drvdata->stmhebsr = val;
+
+	return size;
+}
+static DEVICE_ATTR_RW(hwevent_select);
+
+static ssize_t port_select_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+
+	if (!drvdata->enable) {
+		val = drvdata->stmspscr;
+	} else {
+		spin_lock(&drvdata->spinlock);
+		val = readl_relaxed(drvdata->base + STMSPSCR);
+		spin_unlock(&drvdata->spinlock);
+	}
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t port_select_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val, stmsper;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return ret;
+
+	spin_lock(&drvdata->spinlock);
+	drvdata->stmspscr = val;
+
+	if (drvdata->enable) {
+		CS_UNLOCK(drvdata->base);
+		/* Process as per ARM's TRM recommendation */
+		stmsper = readl_relaxed(drvdata->base + STMSPER);
+		writel_relaxed(0x0, drvdata->base + STMSPER);
+		writel_relaxed(drvdata->stmspscr, drvdata->base + STMSPSCR);
+		writel_relaxed(stmsper, drvdata->base + STMSPER);
+		CS_LOCK(drvdata->base);
+	}
+	spin_unlock(&drvdata->spinlock);
+
+	return size;
+}
+static DEVICE_ATTR_RW(port_select);
+
+static ssize_t port_enable_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+
+	if (!drvdata->enable) {
+		val = drvdata->stmsper;
+	} else {
+		spin_lock(&drvdata->spinlock);
+		val = readl_relaxed(drvdata->base + STMSPER);
+		spin_unlock(&drvdata->spinlock);
+	}
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t port_enable_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return ret;
+
+	spin_lock(&drvdata->spinlock);
+	drvdata->stmsper = val;
+
+	if (drvdata->enable) {
+		CS_UNLOCK(drvdata->base);
+		writel_relaxed(drvdata->stmsper, drvdata->base + STMSPER);
+		CS_LOCK(drvdata->base);
+	}
+	spin_unlock(&drvdata->spinlock);
+
+	return size;
+}
+static DEVICE_ATTR_RW(port_enable);
+
+static ssize_t entities_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	ssize_t len;
+
+	len = bitmap_scnprintf(buf, PAGE_SIZE, drvdata->entities,
+			       STM_ENTITY_MAX);
+
+	if (PAGE_SIZE - len < 2)
+		len = -EINVAL;
+	else
+		len += scnprintf(buf + len, 2, "\n");
+
+	return len;
+}
+
+static ssize_t entities_store(struct device *dev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val1, val2;
+
+	if (sscanf(buf, "%lx %lx", &val1, &val2) != 2)
+		return -EINVAL;
+
+	if (val1 >= STM_ENTITY_MAX)
+		return -EINVAL;
+
+	if (val2)
+		__set_bit(val1, drvdata->entities);
+	else
+		__clear_bit(val1, drvdata->entities);
+
+	return size;
+}
+static DEVICE_ATTR_RW(entities);
+
+static ssize_t status_show(struct device *dev,
+			   struct device_attribute *attr, char *buf)
+{
+	int ret;
+	unsigned long flags;
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	ret = clk_prepare_enable(drvdata->clk);
+	if (ret)
+		return ret;
+
+	spin_lock_irqsave(&drvdata->spinlock, flags);
+
+	CS_UNLOCK(drvdata->base);
+	ret = sprintf(buf,
+		      "STMTCSR:\t0x%08x\n"
+		      "STMTSFREQR:\t0x%08x\n"
+		      "STMTSYNCR:\t0x%08x\n"
+		      "STMSPER:\t0x%08x\n"
+		      "STMSPTER:\t0x%08x\n"
+		      "STMPRIVMASKR:\t0x%08x\n"
+		      "STMSPSCR:\t0x%08x\n"
+		      "STMSPMSCR:\t0x%08x\n"
+		      "STMFEAT1R:\t0x%08x\n"
+		      "STMFEAT2R:\t0x%08x\n"
+		      "STMFEAT3R:\t0x%08x\n"
+		      "STMDEVID:\t0x%08x\n",
+		      readl_relaxed(drvdata->base + STMTCSR),
+		      readl_relaxed(drvdata->base + STMTSFREQR),
+		      readl_relaxed(drvdata->base + STMSYNCR),
+		      readl_relaxed(drvdata->base + STMSPER),
+		      readl_relaxed(drvdata->base + STMSPTER),
+		      readl_relaxed(drvdata->base + STMPRIVMASKR),
+		      readl_relaxed(drvdata->base + STMSPSCR),
+		      readl_relaxed(drvdata->base + STMSPMSCR),
+		      readl_relaxed(drvdata->base + STMSPFEAT1R),
+		      readl_relaxed(drvdata->base + STMSPFEAT2R),
+		      readl_relaxed(drvdata->base + STMSPFEAT3R),
+		      readl_relaxed(drvdata->base + CORESIGHT_DEVID));
+
+	CS_LOCK(drvdata->base);
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);
+	clk_disable_unprepare(drvdata->clk);
+
+	return ret;
+}
+static DEVICE_ATTR_RO(status);
+
+static ssize_t traceid_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	unsigned long val;
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	val = drvdata->traceid;
+	return sprintf(buf, "%#lx\n", val);
+}
+
+static ssize_t traceid_store(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t size)
+{
+	int ret;
+	unsigned long val;
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return ret;
+
+	/* traceid field is 7bit wide on STM32 */
+	drvdata->traceid = val & 0x7f;
+	return size;
+}
+static DEVICE_ATTR_RW(traceid);
+
+static struct attribute *coresight_stm_attrs[] = {
+	&dev_attr_hwevent_enable.attr,
+	&dev_attr_hwevent_select.attr,
+	&dev_attr_port_enable.attr,
+	&dev_attr_port_select.attr,
+	&dev_attr_entities.attr,
+	&dev_attr_status.attr,
+	&dev_attr_traceid.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(coresight_stm);
+
+static int stm_get_resource_byname(struct device_node *np,
+				   char *ch_base, struct resource *res)
+{
+	const char *name = NULL;
+	int index = 0, found = 0;
+
+	while (!of_property_read_string_index(np, "reg-names", index, &name)) {
+		if (strcmp(ch_base, name)) {
+			index++;
+			continue;
+		}
+
+		/* We have a match and @index is where it's at */
+		found = 1;
+		break;
+	}
+
+	if (!found)
+		return -EINVAL;
+
+	return of_address_to_resource(np, index, res);
+}
+
+static u32 stm_fundamental_data_size(struct stm_drvdata *drvdata)
+{
+	u32 stmspfeat2r;
+
+	stmspfeat2r = readl_relaxed(drvdata->base + STMSPFEAT2R);
+	return BMVAL(stmspfeat2r, 12, 15);
+}
+
+static u32 stm_num_stimulus_port(struct stm_drvdata *drvdata)
+{
+	u32 numsp;
+
+	numsp = readl_relaxed(drvdata->base + CORESIGHT_DEVID);
+	/*
+	 * NUMPS in STMDEVID is 17 bit long and if equal to 0x0,
+	 * 32 stimulus ports are supported.
+	 */
+	numsp &= 0x1ffff;
+	if (!numsp)
+		numsp = STM_32_CHANNEL;
+	return numsp;
+}
+
+static void stm_init_default_data(struct stm_drvdata *drvdata)
+{
+	/* Don't use port selection */
+	drvdata->stmspscr = 0x0;
+	/*
+	 * Enable all channel regardless of their number.  When port
+	 * selection isn't used (see above) STMSPER applies to all
+	 * 32 channel group available, hence setting all 32 bits to 1
+	 */
+	drvdata->stmsper = ~0x0;
+
+	/*
+	 * Select arbitrary value to start with.  If there is a conflict
+	 * with other tracers the framework will deal with it.
+	 */
+	drvdata->traceid = 0x20;
+
+	bitmap_zero(drvdata->entities, STM_ENTITY_MAX);
+}
+
+static int stm_probe(struct amba_device *adev, const struct amba_id *id)
+{
+	int ret;
+	void __iomem *base;
+	unsigned long *bitmap;
+	struct device *dev = &adev->dev;
+	struct coresight_platform_data *pdata = NULL;
+	struct stm_drvdata *drvdata;
+	struct resource *res = &adev->res;
+	struct resource ch_res;
+	size_t res_size, bitmap_size;
+	struct coresight_desc *desc;
+	struct device_node *np = adev->dev.of_node;
+
+	if (np) {
+		pdata = of_get_coresight_platform_data(dev, np);
+		if (IS_ERR(pdata))
+			return PTR_ERR(pdata);
+		adev->dev.platform_data = pdata;
+	}
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (!drvdata)
+		return -ENOMEM;
+
+	/* Store the driver data pointer for use in exported functions */
+	stmdrvdata = drvdata;
+	drvdata->dev = &adev->dev;
+	dev_set_drvdata(dev, drvdata);
+
+	base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+	drvdata->base = base;
+
+	ret = stm_get_resource_byname(np, "stm-channel-base", &ch_res);
+	if (ret)
+		return ret;
+
+	base = devm_ioremap_resource(dev, &ch_res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+	drvdata->chs.base = base;
+
+	ret = clk_prepare_enable(drvdata->clk);
+	if (ret)
+		return ret;
+
+	drvdata->write_64bit = stm_fundamental_data_size(drvdata);
+
+	if (boot_nr_channel) {
+		drvdata->numsp = boot_nr_channel;
+		res_size = min((resource_size_t)(boot_nr_channel *
+				  BYTES_PER_CHANNEL), resource_size(res));
+		bitmap_size = boot_nr_channel * sizeof(long);
+	} else {
+		drvdata->numsp = stm_num_stimulus_port(drvdata);
+		res_size = min((resource_size_t)(drvdata->numsp *
+				 BYTES_PER_CHANNEL), resource_size(res));
+		bitmap_size = drvdata->numsp * sizeof(long);
+	}
+
+	clk_disable_unprepare(drvdata->clk);
+
+	bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
+	if (!bitmap)
+		return -ENOMEM;
+	drvdata->chs.bitmap = bitmap;
+
+	spin_lock_init(&drvdata->spinlock);
+
+	drvdata->clk = adev->pclk;
+
+	stm_init_default_data(drvdata);
+
+	desc = devm_kzalloc(dev, sizeof(*desc), GFP_KERNEL);
+	if (!desc)
+		return -ENOMEM;
+
+	desc->type = CORESIGHT_DEV_TYPE_SOURCE;
+	desc->subtype.source_subtype = CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE;
+	desc->ops = &stm_cs_ops;
+	desc->pdata = pdata;
+	desc->dev = dev;
+	desc->groups = coresight_stm_groups;
+	drvdata->csdev = coresight_register(desc);
+	if (IS_ERR(drvdata->csdev))
+		return PTR_ERR(drvdata->csdev);
+
+	drvdata->miscdev.name = pdata->name;
+	drvdata->miscdev.minor = MISC_DYNAMIC_MINOR;
+	drvdata->miscdev.fops = &stm_fops;
+	ret = misc_register(&drvdata->miscdev);
+	if (ret)
+		goto err;
+
+	dev_info(drvdata->dev, "STM initialized\n");
+
+	return 0;
+err:
+	coresight_unregister(drvdata->csdev);
+	return ret;
+}
+
+static int stm_remove(struct amba_device *adev)
+{
+	struct stm_drvdata *drvdata = amba_get_drvdata(adev);
+
+	misc_deregister(&drvdata->miscdev);
+	coresight_unregister(drvdata->csdev);
+	return 0;
+}
+
+static struct amba_id stm_ids[] = {
+	{
+		.id     = 0x0003b962,
+		.mask   = 0x0003ffff,
+	},
+	{ 0, 0},
+};
+
+static struct amba_driver stm_driver = {
+	.drv = {
+		.name   = "coresight-stm",
+		.owner	= THIS_MODULE,
+	},
+	.probe          = stm_probe,
+	.remove         = stm_remove,
+	.id_table	= stm_ids,
+};
+
+module_amba_driver(stm_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("CoreSight System Trace Macrocell driver");
diff --git a/include/linux/coresight-stm.h b/include/linux/coresight-stm.h
new file mode 100644
index 000000000000..98d57eb85adf
--- /dev/null
+++ b/include/linux/coresight-stm.h
@@ -0,0 +1,35 @@
+#ifndef __LINUX_CORESIGHT_STM_H_
+#define __LINUX_CORESIGHT_STM_H_
+
+#include <uapi/linux/coresight-stm.h>
+
+#define stm_log_inv(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_NONE, ch_id, entity_id, data, size)
+
+#define stm_log_inv_ts(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_TIMESTAMPED, ch_id, entity_id,	\
+		  data, size)
+
+#define stm_log_gtd(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_GUARANTEED, ch_id, entity_id,	\
+		  data, size)
+
+#define stm_log_gtd_ts(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_GUARANTEED | STM_OPTION_TIMESTAMPED,	\
+		  ch_id, entity_id, data, size)
+
+#define stm_log(entity_id, ch_id, data, size)				\
+	stm_log_inv_ts(entity_id, ch_id, data, size)
+
+#ifdef CONFIG_CORESIGHT_STM
+extern int stm_trace(u32 options, int channel_id,
+		     u8 entity_id, const void *data, u32 size);
+#else
+static inline int stm_trace(u32 options, int channel_id,
+			    u8 entity_id, const void *data, u32 size)
+{
+	return 0;
+}
+#endif
+
+#endif
diff --git a/include/uapi/linux/coresight-stm.h b/include/uapi/linux/coresight-stm.h
new file mode 100644
index 000000000000..d39231d50127
--- /dev/null
+++ b/include/uapi/linux/coresight-stm.h
@@ -0,0 +1,23 @@
+#ifndef __UAPI_CORESIGHT_STM_H_
+#define __UAPI_CORESIGHT_STM_H_
+
+enum {
+	STM_ENTITY_NONE			= 0x00,
+	STM_ENTITY_TRACE_PRINTK		= 0x01,
+	STM_ENTITY_TRACE_USPACE		= 0x02,
+	STM_ENTITY_MAX			= 0xFF,
+};
+
+enum {
+	STM_IOCTL_NONE			= 0x00,
+	STM_IOCTL_SET_OPTIONS		= 0x01,
+	STM_IOCTL_GET_OPTIONS		= 0x10,
+};
+
+enum {
+	STM_OPTION_NONE			= 0x0,
+	STM_OPTION_TIMESTAMPED		= 0x08,
+	STM_OPTION_GUARANTEED		= 0x80,
+};
+
+#endif
-- 
1.9.1


^ permalink raw reply related

* Re: [PATCH 01/13] kdbus: add documentation
From: Andy Lutomirski @ 2015-02-04 23:03 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig
In-Reply-To: <54D09E6B.2020903-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.

I see "latencies" of around 20 microseconds with lockdep and context
tracking off.  For example:

stats  (UNIX): 226730 packets processed, latency (nsecs) min/max/avg
 3845 //   34828 //    4069
stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg
19123 //   99660 //   20696

This is IMO not very good.  With memfds off:

stats  (UNIX): 226061 packets processed, latency (nsecs) min/max/avg
 3885 //   32019 //    4079
stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg
10525 //   42578 //   10932

With memfds off and the payload set to 8 bytes:

stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg
9963 //   64325 //   11645
stats  (UNIX): 253695 packets processed, latency (nsecs) min/max/avg
 2986 //   56094 //    3565

Am I missing something here?  This is slow enough that a lightweight
userspace dbus daemon should be able to outperform kdbus, or at least
come very close.

It would be kind of nice to know how long just the send call takes, too.

--Andy

^ permalink raw reply

* Re: [PATCH 01/13] kdbus: add documentation
From: David Herrmann @ 2015-02-05  0:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Mack, Arnd Bergmann, Ted Ts'o, Michael Kerrisk,
	Linux API, One Thousand Gnomes, Austin S Hemmelgarn,
	Tom Gundersen, Greg Kroah-Hartman, linux-kernel,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig
In-Reply-To: <CALCETrXVDWJOvOTeTwCqrF6sn8e8XQfZSUbgg6hRnv+h-eX1Uw@mail.gmail.com>

Hi

On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> I see "latencies" of around 20 microseconds with lockdep and context
> tracking off.  For example:

Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
suspect you enabled metadata transmission, which I think is not a fair
comparison.

A few notes on that:

* kdbus is a bus layer. We don't intend to replace UDS, but improve
dbus. Comparing roundtrip times with UDS is tempting, but in no way
fair. To the very least, a bus layer has to perform peer-lookup, which
UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
Compare this to ~77us for dbus1 without marshaling.

* We have not optimized kdbus code-paths for speed, yet. Our main
concerns are algorithmic challenges, and we believe they've been
improved considerably with kdbus. I have constantly measured kdbus
performance with 'perf' and flame-graphs, and there're a lot of
possible optimizations (especially on locking). However, I think this
can be done afterwards just fine. Neither API nor ioctl overhead has
shown up in my measurements. If anyone has counter evidence, please
let us know. But I'm a bit reluctant to change our API solely based on
performance guesses.

* We're about 50% slower than UDS on 1-byte transmissions. With 32k
we're on-par. How can a lightweight user-space daemon even get close
to that?

* Broadcast performance is a completely different story. SEND gets
around 30% faster compared to kdbus unicasts (as most of the
control-paths are only taken once per message, instead of once per
destination).

* test-benchmark.c does performance tests in a single process. If the
bus-layer is implemented in user-space, you need to account for
context-switches and task wakeups. My UDS and pipe round-trip latency
tests got around 3x slower if done cross processes (3.7us instead of
1.2us). With a user-space daemon, those slow-downs are taken two times
more often for each roundtrip.

* Process time is accounted on the sender, instead of a shared process
(dbus-daemon). Broadcasts will thus no longer consume time-slices of
dbus-daemon, but only the sender's.


With kdbus, we implement a bus-layer. This is our only target! If your
target environment does not require a bus, then don't use kdbus. We
don't intend to replace UDS. On a bus-layer, we need peer-discovery,
policy-handling, destination-lookups, broadcast-management and more.
Pipes/UDS do not provide any of this.
I cannot see how any other existing bus-implementation comes even
close to kdbus, performance-wise. If someone does, please let us know!

Thanks
David

^ permalink raw reply

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
From: Minchan Kim @ 2015-02-05  1:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm@kvack.org, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins
In-Reply-To: <CAKgNAkhNbHQX7RukSsSe3bMqY11f493rYbDpTOA2jH7vsziNww@mail.gmail.com>

Hello,

On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> > On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>
> >> Hello Vlastimil,
> >>
> >> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>>>>
> >>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>> case
> >>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>> code.
> >>>>
> >>>>
> >>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>> added mention of "Huge TLB pages" for this error.
> >>>
> >>>
> >>> Thanks.
> >>
> >>
> >> I also added those cases for MADV_REMOVE, BTW.
> >
> >
> > Right. There's also the following for MADV_REMOVE that needs updating:
> >
> > "Currently, only shmfs/tmpfs supports this; other filesystems return with
> > the error ENOSYS."
> >
> > - it's not just shmem/tmpfs anymore. It should be best to refer to
> > fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> > date.
> >
> > - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> > listed in the ERRORS section.
> 
> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> 
> >>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>> here it
> >>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>
> >>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>> the
> >>>>>> beheviour.
> >>>>>
> >>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>> appears
> >>>>> that
> >>>>> jemalloc does.
> >>>>
> >>>> So, first a brief question: in the cases where the call does not error
> >>>> out,
> >>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>> always result in zero-filled pages when the region is faulted back in
> >>>> (when we consider pages that are not backed by a file)?
> >>>
> >>> I'd agree at this point.
> >>
> >> Thanks for the confirmation.
> >>
> >>> Also we should probably mention anonymously shared pages (shmem). I think
> >>> they behave the same as file here.
> >>
> >> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >
> > shmem is tmpfs (that by itself would fit under "files" just fine), but also
> > sys V segments created by shmget(2) and also mappings created by mmap with
> > MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> > refer to the full list.
> 
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the seman‐
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file
>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Hmm, I'd like to clarify.

Whether it was intention or not, some of userspace developers thought
about that syscall drop pages instantly if was no-error return so that
they will see more free pages(ie, rss for the process will be decreased)
with keeping the VMA. Can we rely on it?

And we should make error section, too.
"locked" covers mlock(2) and you said you will add hugetlb. Then,
VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
special mapping for some drivers?

One more thing, "The kernel is free to ignore the advice".
It conflicts "This call does not influence the semantics of the
application (except in the case of MADV_DONTNEED)" so
is it okay we can believe "The kernel is free to ingmore the advise
except MADV_DONTNEED"?

Thanks.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Fam Zheng @ 2015-02-05  1:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Kees Cook, David Herrmann,
	Alexei Starovoitov, Miklos Szeredi, David Drysdale, Oleg Nesterov,
	David S. Miller, Vivek Goyal, Mike Frysinger, Theodore Ts'o,
	Heiko Carstens, Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra <pe>
In-Reply-To: <CALCETrV731w37pxM0fzFZ3HHkn=9uoJX0-Rj6XqOf+hCRfnx8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 02/04 13:38, Andy Lutomirski wrote:
> On Wed, Feb 4, 2015 at 2:36 AM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 2) epoll_pwait1
> > ---------------
> >
> > NAME
> >        epoll_pwait1 - wait for an I/O event on an epoll file descriptor
> >
> > SYNOPSIS
> >
> >        #include <sys/epoll.h>
> >
> >        int epoll_pwait1(int epfd, int flags,
> >                         struct epoll_event *events,
> >                         int maxevents,
> >                         struct timespec *timeout,
> >                         struct sigargs *sig);
> >
> > DESCRIPTION
> >
> >        The epoll_pwait1 system call differs from epoll_pwait only in parameter
> >        types. The first difference is timeout, a pointer to timespec structure
> >        which allows nanosecond presicion; the second difference, which should
> >        probably be wrapper by glibc and only expose a sigset_t pointer as in
> >        pselect6.
> >
> >        If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
> >        (return immediately). Otherwise it's converted to nanosecond scalar,
> >        again, with the same convention as epoll_pwait's timeout.
> 
> Is the timeout absolute or relative?

Relative. Will document it. We can add a first flag for absolute timeout later.

Thanks.

Fam
> 
> I'd kind of like the ability to set timeouts on multiple clocks at the
> same time, but I can live without that.

Please see my reply to Michael.

Fam

^ permalink raw reply

* Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Fam Zheng @ 2015-02-05  1:52 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86-DgEjT+Ai2ygdnm+yROfE0A, Alexander Viro,
	Andrew Morton, Kees Cook, Andy Lutomirski, David Herrmann,
	Alexei Starovoitov, Miklos Szeredi, David Drysdale, Oleg Nesterov,
	David S. Miller, Vivek Goyal, Mike Frysinger, Theodore Ts'o,
	Heiko Carstens, Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra <peter>
In-Reply-To: <54D2142B.8090105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wed, 02/04 13:44, Michael Kerrisk (man-pages) wrote:
> Hello Fam Zheng,
> 
> On 02/04/2015 11:36 AM, Fam Zheng wrote:
> > Changes from v1 (https://lkml.org/lkml/2015/1/20/189):
> > 
> >   - As discussed in previous thread [1], split the call to epoll_ctl_batch and
> >     epoll_pwait. [Michael]
> > 
> >   - Fix memory leaks. [Omar]
> > 
> >   - Add a short comment about the ignored copy_to_user failure. [Omar]
> > 
> >   - Cover letter rewritten.
> > 
> > This adds two new system calls as described below. I mentioned glibc wrapping
> > of sigarg in epoll_pwait1 description, so don't read it as end user man page
> > yet.
> 
> Fair enough. But I think it would be helpful to say a little more already.
> See my comment below.
> 
> > One caveat is the possible failure of the last copy_to_user in epoll_ctl_batch,
> > which returns per command error code. Ideas to improve that are welcome!
> >
> > 1) epoll_ctl_batch
> > ------------------
> > 
> > NAME
> >        epoll_ctl_batch - modify an epoll descriptor in batch
> > 
> > SYNOPSIS
> > 
> >        #include <sys/epoll.h>
> > 
> >        int epoll_ctl_batch(int epfd, int flags,
> >                            int ncmds, struct epoll_ctl_cmd *cmds);
> > 
> > DESCRIPTION
> > 
> >        The system call performs a batch of epoll_ctl operations. It allows
> >        efficient update of events on this epoll descriptor.
> > 
> >        Flags is reserved and must be 0.
> > 
> >        Each operation is specified as an element in the cmds array, defined as:
> > 
> >            struct epoll_ctl_cmd {
> > 
> >                   /* Reserved flags for future extension, must be 0. */
> >                   int flags;
> > 
> >                   /* The same as epoll_ctl() op parameter. */
> >                   int op;
> > 
> >                   /* The same as epoll_ctl() fd parameter. */
> >                   int fd;
> > 
> >                   /* The same as the "events" field in struct epoll_event. */
> >                   uint32_t events;
> > 
> >                   /* The same as the "data" field in struct epoll_event. */
> >                   uint64_t data;
> > 
> >                   /* Output field, will be set to the return code after this
> >                    * command is executed by kernel */
> >                   int error_hint;
> 
> Why 'error_hint', rather than just stay 'status' or 'result'? I mean
> is it really a "hint"? Also, it can be 0 meaning "success" (no error).

Because the convention is weak here: the copy_to_user, to assign values to this
field in user provided array, could (very unlikely) fail, at which point the
commands are already executed so there is no return. This corner case
inconsistency makes it hard to be a contract.

> 
> >            };
> > 
> >        Commands are executed in their order in cmds, and if one of them failed,
> >        the rest after it are not tried.
> > 
> >        Not that this call isn't atomic in terms of updating the epoll
> >        descriptor, which means a second epoll_ctl or epoll_ctl_batch call
> >        during the first epoll_ctl_batch may make the operation sequence
> >        interleaved. However, each single epoll_ctl_cmd operation has the same
> >        semantics as a epoll_ctl call.
> 
> (Good to include that warning!)
> 
> > RETURN VALUE
> > 
> >        If one or more of the parameters are incorrect, -1 is returned and errno
> >        is set appropriately. Otherwise, the number of succeeded commands is
> >        returned.
> > 
> >        Each error_hint field may be set to the error code or 0, depending on
> >        the result of the command. If there is some error in returning the error
> >        to user, it may also be unchanged, even though the command may actually
> >        be executed. In this case, it's still ensured that the i-th (i = ret)
> >        command is the failed command.
> 
> Sorry -- I'm not clear here. Can you say some more here? What sort
> of error might there be when "returning the error to the user"?

As explained above. See also the last comment of Omar:

https://lkml.org/lkml/2015/1/21/103

<...>

> > DESCRIPTION
> > 
> >        The epoll_pwait1 system call differs from epoll_pwait only in parameter
> >        types. The first difference is timeout, a pointer to timespec structure
> >        which allows nanosecond presicion; the second difference, which should
> >        probably be wrapper by glibc and only expose a sigset_t pointer as in
> >        pselect6.
> 
> Here I think it still helps to explain that 'struct sigags' is a sigset_t* +
> the size of the pointed-to set.

Yes, will add.

> 
> >        If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
> 
> The convention I would expect here is that NULL means infinite timeout,
> like select(), and timeout == {0,0} would get the "return immediately"
> behavior. Why did you choose your convention? And, how does one otherwise 
> request an infinite timeout?

I'm wrong. I should follow the conventions of pselect and ppoll. Thanks for
catching that.

> 
> >        (return immediately). Otherwise it's converted to nanosecond scalar,
> >        again, with the same convention as epoll_pwait's timeout.
> > 
> > RETURN VALUE
> > 
> >        The same as said in epoll_pwait(2).
> > 
> > ERRORS
> > 
> >        The same as said in man epoll_pwait(2), plus:
> > 
> >        EINVAL flags is not zero.
> > 
> > 
> > CONFORMING TO
> > 
> >        epoll_pwait1() is Linux-specific.
> > 
> > SEE ALSO
> > 
> >        epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)
> 
> In your earlier patch set, there was the ability to select the clock
> used for timeouts. Why did this go away? I'm not sure whether we need
> that functionality or not, but it would be good to know why you
> dropped it this time.
> 

No room for another "int clockid". Options:

0) Don't have it.

1) "Or" into lower bits of flags.

2) Encode into flags bits.

3) Squash "int clockid" in the last argument (currently "sig" but need to name
it "spec", again).

Which one do you prefer?

Fam

^ permalink raw reply

* RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask
From: Weiny, Ira @ 2015-02-05  2:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Yann Droneaud
  Cc: Sagi Grimberg, Shachar Raindel, Eli Cohen, Haggai Eran,
	Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20150129211256.GA22099-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

> 
> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
> 
> > Anyway, I recognize that uverb way of abusing write() syscall is
> > borderline (at best) regarding other Linux subsystems and Unix
> > paradigm in general. But it's not enough to screw it more.
> 
> Then we must return the correct output size explicitly in the struct.

I was thinking this very same thing as I read through this thread.

I too would like to avoid the use of comp_masks if at all possible.  The query call seems to be a verb where the structure size is all you really need to know the set of values returned.

As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate "missing".

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC] Implement ambient capability set.
From: Michael Kerrisk @ 2015-02-05  7:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Serge E. Hallyn, Andrew G. Morgan, Andy Lutomirski, Serge Hallyn,
	Serge Hallyn, Jonathan Corbet, Aaron Jones, Ted Ts'o,
	LSM List, lkml, Andrew Morton, Linux API
In-Reply-To: <alpine.DEB.2.11.1502041242380.17411@gentwo.org>

[CC += linux-api@vger.kernel.org]

Christoph,

Since this is a kernel-user-space API change, please CC linux-api@.
The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to linux-api@vger.kernel.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html

Thanks,

Michael


On Wed, Feb 4, 2015 at 7:49 PM, Christoph Lameter <cl@linux.com> wrote:
> An attempt to implement this. Probably missing some fine points:
>
> Subject: [capabilities] Implement ambient capability set.
>
> DRAFT -- untested -- DRAFT
>
> Implement an ambient capabilty set to allow capabilties
> to be inherited with unix semantics used also for other
> attributes.
>
> Implements PR_CAP_AMBIENT. The second argument to prctl
> is a the capability number and the third the desired state.
> 0 for off. Otherwise on.
>
> Serge:
> A new capability set, pA, is empty by default.  You can
> add bits to it using prctl if ns_capable(CAP_SETPCAP) and
> all the new bits are in your pE.  Once set, they stay until
> they are removed using prctl.  At exec, pA' = pA, and
> fI |= pA (after reading fI from disk but before
> calculating pI').
>
> Since the ambient caps "stay on" cap_inheritable does not
> really matter anymore. Simply set the permitted caps when
> the ambient cap is set.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/security/commoncap.c
> ===================================================================
> --- linux.orig/security/commoncap.c     2015-02-04 09:44:25.000000000 -0600
> +++ linux/security/commoncap.c  2015-02-04 12:48:44.100471600 -0600
> @@ -353,7 +353,7 @@ static inline int bprm_caps_from_vfs_cap
>                 /*
>                  * pP' = (X & fP) | (pI & fI)
>                  */
> -               new->cap_permitted.cap[i] =
> +               new->cap_permitted.cap[i] = current_cred()->cap_ambient.cap[i] |
>                         (new->cap_bset.cap[i] & permitted) |
>                         (new->cap_inheritable.cap[i] & inheritable);
>
> @@ -577,6 +577,7 @@ skip:
>         }
>
>         new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
> +       new->cap_ambient = old->cap_ambient;
>         return 0;
>  }
>
> @@ -933,6 +934,20 @@ int cap_task_prctl(int option, unsigned
>                         new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
>                 return commit_creds(new);
>
> +       case PR_CAP_AMBIENT:
> +               if (!ns_capable(current_user_ns(), CAP_SETPCAP))
> +                       return -EPERM;
> +
> +               if (!cap_valid(arg2))
> +                       return -EINVAL;
> +
> +               new =prepare_creds();
> +               if (arg3 == 0)
> +                       cap_lower(new->cap_ambient, arg2);
> +               else
> +                       cap_raise(new->cap_ambient, arg2);
> +               return commit_creds(new);
> +
>         default:
>                 /* No functionality available - continue with default */
>                 return -ENOSYS;
> Index: linux/include/linux/cred.h
> ===================================================================
> --- linux.orig/include/linux/cred.h     2015-02-04 09:39:46.000000000 -0600
> +++ linux/include/linux/cred.h  2015-02-04 12:32:43.719846530 -0600
> @@ -122,6 +122,7 @@ struct cred {
>         kernel_cap_t    cap_permitted;  /* caps we're permitted */
>         kernel_cap_t    cap_effective;  /* caps we can actually use */
>         kernel_cap_t    cap_bset;       /* capability bounding set */
> +       kernel_cap_t    cap_ambient;    /* Ambient capability set */
>  #ifdef CONFIG_KEYS
>         unsigned char   jit_keyring;    /* default keyring to attach requested
>                                          * keys to */
> Index: linux/include/uapi/linux/prctl.h
> ===================================================================
> --- linux.orig/include/uapi/linux/prctl.h       2014-12-12 10:27:49.332800377 -0600
> +++ linux/include/uapi/linux/prctl.h    2015-02-04 12:39:10.651205059 -0600
> @@ -185,4 +185,7 @@ struct prctl_mm_map {
>  #define PR_MPX_ENABLE_MANAGEMENT  43
>  #define PR_MPX_DISABLE_MANAGEMENT 44
>
> +/* Control the ambient capability set */
> +#define PR_CAP_AMBIENT 45
> +
>  #endif /* _LINUX_PRCTL_H */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply

* Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Michael Kerrisk (man-pages) @ 2015-02-05  7:44 UTC (permalink / raw)
  To: Fam Zheng
  Cc: mtk.manpages, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Alexander Viro, Andrew Morton, Kees Cook,
	Andy Lutomirski, David Herrmann, Alexei Starovoitov,
	Miklos Szeredi, David Drysdale, Oleg Nesterov, David S. Miller,
	Vivek Goyal, Mike Frysinger, Theodore Ts'o, Heiko Carstens,
	Rasmus Villemoes, Rashika Kheria, Hugh Dickins, Mathieu Desnoyers
In-Reply-To: <20150205015251.GA23497@ad.nay.redhat.com>

Hello Fam Zheng,

On 02/05/2015 02:52 AM, Fam Zheng wrote:
> On Wed, 02/04 13:44, Michael Kerrisk (man-pages) wrote:
>> Hello Fam Zheng,
>>
>> On 02/04/2015 11:36 AM, Fam Zheng wrote:
>>> Changes from v1 (https://lkml.org/lkml/2015/1/20/189):
>>>
>>>   - As discussed in previous thread [1], split the call to epoll_ctl_batch and
>>>     epoll_pwait. [Michael]
>>>
>>>   - Fix memory leaks. [Omar]
>>>
>>>   - Add a short comment about the ignored copy_to_user failure. [Omar]
>>>
>>>   - Cover letter rewritten.
>>>
>>> This adds two new system calls as described below. I mentioned glibc wrapping
>>> of sigarg in epoll_pwait1 description, so don't read it as end user man page
>>> yet.
>>
>> Fair enough. But I think it would be helpful to say a little more already.
>> See my comment below.
>>
>>> One caveat is the possible failure of the last copy_to_user in epoll_ctl_batch,
>>> which returns per command error code. Ideas to improve that are welcome!
>>>
>>> 1) epoll_ctl_batch
>>> ------------------
>>>
>>> NAME
>>>        epoll_ctl_batch - modify an epoll descriptor in batch
>>>
>>> SYNOPSIS
>>>
>>>        #include <sys/epoll.h>
>>>
>>>        int epoll_ctl_batch(int epfd, int flags,
>>>                            int ncmds, struct epoll_ctl_cmd *cmds);
>>>
>>> DESCRIPTION
>>>
>>>        The system call performs a batch of epoll_ctl operations. It allows
>>>        efficient update of events on this epoll descriptor.
>>>
>>>        Flags is reserved and must be 0.
>>>
>>>        Each operation is specified as an element in the cmds array, defined as:
>>>
>>>            struct epoll_ctl_cmd {
>>>
>>>                   /* Reserved flags for future extension, must be 0. */
>>>                   int flags;
>>>
>>>                   /* The same as epoll_ctl() op parameter. */
>>>                   int op;
>>>
>>>                   /* The same as epoll_ctl() fd parameter. */
>>>                   int fd;
>>>
>>>                   /* The same as the "events" field in struct epoll_event. */
>>>                   uint32_t events;
>>>
>>>                   /* The same as the "data" field in struct epoll_event. */
>>>                   uint64_t data;
>>>
>>>                   /* Output field, will be set to the return code after this
>>>                    * command is executed by kernel */
>>>                   int error_hint;
>>
>> Why 'error_hint', rather than just stay 'status' or 'result'? I mean
>> is it really a "hint"? Also, it can be 0 meaning "success" (no error).
> 
> Because the convention is weak here: the copy_to_user, to assign values to this
> field in user provided array, could (very unlikely) fail, at which point the
> commands are already executed so there is no return. This corner case
> inconsistency makes it hard to be a contract.

I still think it's a poor name. Yes, it could unlikely fail.
Still, 'status' or 'result_status' or something like that is better.

>>>            };
>>>
>>>        Commands are executed in their order in cmds, and if one of them failed,
>>>        the rest after it are not tried.
>>>
>>>        Not that this call isn't atomic in terms of updating the epoll
>>>        descriptor, which means a second epoll_ctl or epoll_ctl_batch call
>>>        during the first epoll_ctl_batch may make the operation sequence
>>>        interleaved. However, each single epoll_ctl_cmd operation has the same
>>>        semantics as a epoll_ctl call.
>>
>> (Good to include that warning!)
>>
>>> RETURN VALUE
>>>
>>>        If one or more of the parameters are incorrect, -1 is returned and errno
>>>        is set appropriately. Otherwise, the number of succeeded commands is
>>>        returned.
>>>
>>>        Each error_hint field may be set to the error code or 0, depending on
>>>        the result of the command. If there is some error in returning the error
>>>        to user, it may also be unchanged, even though the command may actually
>>>        be executed. In this case, it's still ensured that the i-th (i = ret)
>>>        command is the failed command.
>>
>> Sorry -- I'm not clear here. Can you say some more here? What sort
>> of error might there be when "returning the error to the user"?
> 
> As explained above. See also the last comment of Omar:
> 
> https://lkml.org/lkml/2015/1/21/103

Okay -- but could you please actually explain this in more detail in the 
man page. Then others do not need to ask the same question.

> <...>
> 
>>> DESCRIPTION
>>>
>>>        The epoll_pwait1 system call differs from epoll_pwait only in parameter
>>>        types. The first difference is timeout, a pointer to timespec structure
>>>        which allows nanosecond presicion; the second difference, which should
>>>        probably be wrapper by glibc and only expose a sigset_t pointer as in
>>>        pselect6.
>>
>> Here I think it still helps to explain that 'struct sigags' is a sigset_t* +
>> the size of the pointed-to set.
> 
> Yes, will add.
> 
>>
>>>        If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
>>
>> The convention I would expect here is that NULL means infinite timeout,
>> like select(), and timeout == {0,0} would get the "return immediately"
>> behavior. Why did you choose your convention? And, how does one otherwise 
>> request an infinite timeout?
> 
> I'm wrong. I should follow the conventions of pselect and ppoll. Thanks for
> catching that.

Good.

>>
>>>        (return immediately). Otherwise it's converted to nanosecond scalar,
>>>        again, with the same convention as epoll_pwait's timeout.
>>>
>>> RETURN VALUE
>>>
>>>        The same as said in epoll_pwait(2).
>>>
>>> ERRORS
>>>
>>>        The same as said in man epoll_pwait(2), plus:
>>>
>>>        EINVAL flags is not zero.
>>>
>>>
>>> CONFORMING TO
>>>
>>>        epoll_pwait1() is Linux-specific.
>>>
>>> SEE ALSO
>>>
>>>        epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)
>>
>> In your earlier patch set, there was the ability to select the clock
>> used for timeouts. Why did this go away? I'm not sure whether we need
>> that functionality or not, but it would be good to know why you
>> dropped it this time.
>>
> 
> No room for another "int clockid". Options:

Ahh yes. I overlooked that. But again, if your commit message or
the man page said something about why you dropped this argument,
then we would not run around in circles having the same 
conversations repeatedly.

I keep loving this recent quote from akpm:
http://lwn.net/Articles/616226/

> 0) Don't have it.
> 
> 1) "Or" into lower bits of flags.
> 
> 2) Encode into flags bits.
> 
> 3) Squash "int clockid" in the last argument (currently "sig" but need to name
> it "spec", again).

Well, it's up to you. In https://lkml.org/lkml/2015/1/20/189
you initially proposed to have the clockid. Do you need it,
or not? I am agnostic about it. If you do decide you want it,
then I think (3) is the best option.

I'm very pleased that you expand this man page as you go.
But, for the next iteration, I think it would be better to make
it even more complete--it really is helpful to have documentation
as a starting point to discuss the API, and the better the doc,
the better the discussion.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask
From: Haggai Eran @ 2015-02-05  8:26 UTC (permalink / raw)
  To: Weiny, Ira, Jason Gunthorpe, Yann Droneaud
  Cc: Sagi Grimberg, Shachar Raindel, Eli Cohen, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <2807E5FD2F6FDA4886F6618EAC48510E0CC12C30-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>

On 05/02/2015 04:54, Weiny, Ira wrote:
>>
>> On Thu, Jan 29, 2015 at 09:50:38PM +0100, Yann Droneaud wrote:
>>
>>> Anyway, I recognize that uverb way of abusing write() syscall is
>>> borderline (at best) regarding other Linux subsystems and Unix
>>> paradigm in general. But it's not enough to screw it more.
>>
>> Then we must return the correct output size explicitly in the struct.
> 
> I was thinking this very same thing as I read through this thread.
> 
> I too would like to avoid the use of comp_masks if at all possible.  The query call seems to be a verb where the structure size is all you really need to know the set of values returned.
> 
> As Jason says, other calls may require the comp_mask where 0 is not sufficient to indicate "missing".

Would it be okay to return it in the ib_uverbs_cmd_hdr.out_words? That
would further abuse the write() syscall by writing to the input buffer.
However, the only other alternative I see is to add it explicitly to
every uverb response struct.

Haggai

^ permalink raw reply

* Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
From: Fam Zheng @ 2015-02-05  9:01 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86-DgEjT+Ai2ygdnm+yROfE0A, Alexander Viro,
	Andrew Morton, Kees Cook, Andy Lutomirski, David Herrmann,
	Alexei Starovoitov, Miklos Szeredi, David Drysdale, Oleg Nesterov,
	David S. Miller, Vivek Goyal, Mike Frysinger, Theodore Ts'o,
	Heiko Carstens, Rasmus Villemoes, Rashika Kheria, Hugh Dickins,
	Mathieu Desnoyers, Peter Zijlstra <peter>
In-Reply-To: <54D31F6F.4030301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, 02/05 08:44, Michael Kerrisk (man-pages) wrote:
> Hello Fam Zheng,
> 
> On 02/05/2015 02:52 AM, Fam Zheng wrote:
> > On Wed, 02/04 13:44, Michael Kerrisk (man-pages) wrote:
> >> Hello Fam Zheng,
> >>
> >> On 02/04/2015 11:36 AM, Fam Zheng wrote:
> >>> Changes from v1 (https://lkml.org/lkml/2015/1/20/189):
> >>>
> >>>   - As discussed in previous thread [1], split the call to epoll_ctl_batch and
> >>>     epoll_pwait. [Michael]
> >>>
> >>>   - Fix memory leaks. [Omar]
> >>>
> >>>   - Add a short comment about the ignored copy_to_user failure. [Omar]
> >>>
> >>>   - Cover letter rewritten.
> >>>
> >>> This adds two new system calls as described below. I mentioned glibc wrapping
> >>> of sigarg in epoll_pwait1 description, so don't read it as end user man page
> >>> yet.
> >>
> >> Fair enough. But I think it would be helpful to say a little more already.
> >> See my comment below.
> >>
> >>> One caveat is the possible failure of the last copy_to_user in epoll_ctl_batch,
> >>> which returns per command error code. Ideas to improve that are welcome!
> >>>
> >>> 1) epoll_ctl_batch
> >>> ------------------
> >>>
> >>> NAME
> >>>        epoll_ctl_batch - modify an epoll descriptor in batch
> >>>
> >>> SYNOPSIS
> >>>
> >>>        #include <sys/epoll.h>
> >>>
> >>>        int epoll_ctl_batch(int epfd, int flags,
> >>>                            int ncmds, struct epoll_ctl_cmd *cmds);
> >>>
> >>> DESCRIPTION
> >>>
> >>>        The system call performs a batch of epoll_ctl operations. It allows
> >>>        efficient update of events on this epoll descriptor.
> >>>
> >>>        Flags is reserved and must be 0.
> >>>
> >>>        Each operation is specified as an element in the cmds array, defined as:
> >>>
> >>>            struct epoll_ctl_cmd {
> >>>
> >>>                   /* Reserved flags for future extension, must be 0. */
> >>>                   int flags;
> >>>
> >>>                   /* The same as epoll_ctl() op parameter. */
> >>>                   int op;
> >>>
> >>>                   /* The same as epoll_ctl() fd parameter. */
> >>>                   int fd;
> >>>
> >>>                   /* The same as the "events" field in struct epoll_event. */
> >>>                   uint32_t events;
> >>>
> >>>                   /* The same as the "data" field in struct epoll_event. */
> >>>                   uint64_t data;
> >>>
> >>>                   /* Output field, will be set to the return code after this
> >>>                    * command is executed by kernel */
> >>>                   int error_hint;
> >>
> >> Why 'error_hint', rather than just stay 'status' or 'result'? I mean
> >> is it really a "hint"? Also, it can be 0 meaning "success" (no error).
> > 
> > Because the convention is weak here: the copy_to_user, to assign values to this
> > field in user provided array, could (very unlikely) fail, at which point the
> > commands are already executed so there is no return. This corner case
> > inconsistency makes it hard to be a contract.
> 
> I still think it's a poor name. Yes, it could unlikely fail.
> Still, 'status' or 'result_status' or something like that is better.
> 
> >>>            };
> >>>
> >>>        Commands are executed in their order in cmds, and if one of them failed,
> >>>        the rest after it are not tried.
> >>>
> >>>        Not that this call isn't atomic in terms of updating the epoll
> >>>        descriptor, which means a second epoll_ctl or epoll_ctl_batch call
> >>>        during the first epoll_ctl_batch may make the operation sequence
> >>>        interleaved. However, each single epoll_ctl_cmd operation has the same
> >>>        semantics as a epoll_ctl call.
> >>
> >> (Good to include that warning!)
> >>
> >>> RETURN VALUE
> >>>
> >>>        If one or more of the parameters are incorrect, -1 is returned and errno
> >>>        is set appropriately. Otherwise, the number of succeeded commands is
> >>>        returned.
> >>>
> >>>        Each error_hint field may be set to the error code or 0, depending on
> >>>        the result of the command. If there is some error in returning the error
> >>>        to user, it may also be unchanged, even though the command may actually
> >>>        be executed. In this case, it's still ensured that the i-th (i = ret)
> >>>        command is the failed command.
> >>
> >> Sorry -- I'm not clear here. Can you say some more here? What sort
> >> of error might there be when "returning the error to the user"?
> > 
> > As explained above. See also the last comment of Omar:
> > 
> > https://lkml.org/lkml/2015/1/21/103
> 
> Okay -- but could you please actually explain this in more detail in the 
> man page. Then others do not need to ask the same question.

OK, I'll name it 'status' and add more details in next revision.

> 
> > <...>
> > 
> >>> DESCRIPTION
> >>>
> >>>        The epoll_pwait1 system call differs from epoll_pwait only in parameter
> >>>        types. The first difference is timeout, a pointer to timespec structure
> >>>        which allows nanosecond presicion; the second difference, which should
> >>>        probably be wrapper by glibc and only expose a sigset_t pointer as in
> >>>        pselect6.
> >>
> >> Here I think it still helps to explain that 'struct sigags' is a sigset_t* +
> >> the size of the pointed-to set.
> > 
> > Yes, will add.
> > 
> >>
> >>>        If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
> >>
> >> The convention I would expect here is that NULL means infinite timeout,
> >> like select(), and timeout == {0,0} would get the "return immediately"
> >> behavior. Why did you choose your convention? And, how does one otherwise 
> >> request an infinite timeout?
> > 
> > I'm wrong. I should follow the conventions of pselect and ppoll. Thanks for
> > catching that.
> 
> Good.
> 
> >>
> >>>        (return immediately). Otherwise it's converted to nanosecond scalar,
> >>>        again, with the same convention as epoll_pwait's timeout.
> >>>
> >>> RETURN VALUE
> >>>
> >>>        The same as said in epoll_pwait(2).
> >>>
> >>> ERRORS
> >>>
> >>>        The same as said in man epoll_pwait(2), plus:
> >>>
> >>>        EINVAL flags is not zero.
> >>>
> >>>
> >>> CONFORMING TO
> >>>
> >>>        epoll_pwait1() is Linux-specific.
> >>>
> >>> SEE ALSO
> >>>
> >>>        epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)
> >>
> >> In your earlier patch set, there was the ability to select the clock
> >> used for timeouts. Why did this go away? I'm not sure whether we need
> >> that functionality or not, but it would be good to know why you
> >> dropped it this time.
> >>
> > 
> > No room for another "int clockid". Options:
> 
> Ahh yes. I overlooked that. But again, if your commit message or
> the man page said something about why you dropped this argument,
> then we would not run around in circles having the same 
> conversations repeatedly.
> 
> I keep loving this recent quote from akpm:
> http://lwn.net/Articles/616226/

Good point!

> 
> > 0) Don't have it.
> > 
> > 1) "Or" into lower bits of flags.
> > 
> > 2) Encode into flags bits.
> > 
> > 3) Squash "int clockid" in the last argument (currently "sig" but need to name
> > it "spec", again).
> 
> Well, it's up to you. In https://lkml.org/lkml/2015/1/20/189
> you initially proposed to have the clockid. Do you need it,
> or not? I am agnostic about it. If you do decide you want it,
> then I think (3) is the best option.

Then let's do it this way. clockid would be useful because different
applications care about different clocks.

> 
> I'm very pleased that you expand this man page as you go.
> But, for the next iteration, I think it would be better to make
> it even more complete--it really is helpful to have documentation
> as a starting point to discuss the API, and the better the doc,
> the better the discussion.
> 

Sure, I'll take your points. Thanks for the advice!

Fam

^ permalink raw reply

* Re: [PATCH] coresight-stm: adding driver for CoreSight STM component
From: Paul Bolle @ 2015-02-05  9:26 UTC (permalink / raw)
  To: mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A
  Cc: corbet-T1hC0tSOHrs, pratikp-sgV2jX0FEOL9JmXXK+q4OQ,
	al.grant-5wv7dgnIgG8, liviu.dudau-5wv7dgnIgG8,
	kaixu.xia-QSEj5FYQhm4dnm+yROfE0A, jeenu.viswambharan-5wv7dgnIgG8,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1423088550-15780-1-git-send-email-mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

On Wed, 2015-02-04 at 15:22 -0700, mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org wrote:
> From: Pratik Patel <pratikp-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> 
> This driver adds support for the STM CoreSight IP block,
> allowing any system compoment (HW or SW) to log and
> aggregate messages via a single entity.
> 
> The STM exposes an application defined number of channels
> called stimulus port.  Configuration is done using entries
> in sysfs and channels made available to userspace via devfs.
> 
> Signed-off-by: Pratik Patel <pratikp-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Mathieu Poirier <mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>[...]
> +/**
> + * struct stm_node - aggregation of channel information for userspace access
> + * @channel_id:		the channel number associated to this file descriptor.
> + * @options:		options for this channel - none, timestamped,
> +i			guaranteed.

Did you perhaps use vim?

> + * @drvdata:		STM driver specifics.
> + */
> +struct stm_node {
> +	int			channel_id;
> +	u32			options;
> +	struct stm_drvdata	*drvdata;
> +};


Paul Bolle

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox