Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] bpf, doc: update bpf maintainers entry
From: David Miller @ 2017-04-21 19:25 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <509f15e49a4546260785ab72cab8dcc153251e66.1492701796.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 20 Apr 2017 17:27:58 +0200

> Add various related files that have been missing under
> BPF entry covering essential parts of its infrastructure
> and also add myself as co-maintainer.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Applied, thanks Daniel.

^ permalink raw reply

* Re: macvlan: Fix device ref leak when purging bc_queue
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-04-21 19:23 UTC (permalink / raw)
  To: Joe.Ghalam; +Cc: herbert, David Miller, Clifford.Wichmann, linux-netdev
In-Reply-To: <1492785652578.53801@Dell.com>

On Fri, Apr 21, 2017 at 7:40 AM,  <Joe.Ghalam@dell.com> wrote:
> That's not true. macvlan_dellink() unregisters the queue, and macvlan_process_broadcast() will never get called. Please note that I'm not speculating. I have traced enabled on the dev_put and dev_hold, and I'm reporting a real, reproducible issue.
> Her is a sequence of calls logged, when the issue happens. macvlan_process_broadcast() never happens.
>
> Apr 19 04:35:39 OS10 kernel: e101-001-0.v257: dev_put 16 dst_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_hold 15 dev_get_by_index
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 16 do_ip_setsockopt
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_hold 15 dst_alloc
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 16 dst_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_hold 15 macvlan_broadcast_enqueue
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 16 macvlan_process_broadcast
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_hold 15 macvlan_broadcast_enqueue
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 16 macvlan_process_broadcast
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_hold 15 macvlan_broadcast_enqueue    <---insert
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 16 neigh_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 15 neigh_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 14 neigh_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 13 dst_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 12 dst_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 11 __netdev_adjacent_dev_remove <--- macvlan_dellink()
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 10 __netdev_adjacent_dev_remove <--- macvlan_dellink()
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 9 neigh_parms_release
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 8 neigh_parms_release
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 7 in6_dev_finish_destroy
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 6 rx_queue_release
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 5 netdev_queue_release
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 4 rollback_registered_many
> Apr 19 04:35:41 OS10 kernel: e101-001-0.v257: dev_put 3 free_fib_info_rcu
>
May be the system is busy and snapshot is too small, and eventually
process_broadcast() should get called. Deleting a slave does nothing
about cancelling the work-queue so it would happen eventually.

The change that Herbert proposed is correct. When packets are enqueued
for processing later a dev reference is taken and it's removed when
it's processed when it gets scheduled. The backlog is per port so it
makes sense to remove reference(s) before purging the queue prior to
deleting the port.

> ________________________________________
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Sent: Thursday, April 20, 2017 9:40 PM
> To: Ghalam, Joe
> Cc: davem@davemloft.net; Wichmann, Clifford; netdev@vger.kernel.org
> Subject: Re: macvlan: Fix device ref leak when purging bc_queue
>
> On Thu, Apr 20, 2017 at 04:09:56PM +0000, Joe.Ghalam@dell.com wrote:
>> I agree with this change, but the same purge would be needed for the macvlan_dellink() call also.
>
> I don't think that's necessary because as long as the master
> device is still around it will continue to process the broadcast
> queue, thus removing any reference counts held.
>
> It's only when the queue is purged that we run into trouble.
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next v6 3/3] net sched actions: add time filter for action dumping
From: Jamal Hadi Salim @ 2017-04-21 15:13 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <20170421131347.GD1874@nanopsycho.orion>

On 17-04-21 09:13 AM, Jiri Pirko wrote:
> Fri, Apr 21, 2017 at 12:55:32PM CEST, jhs@mojatatu.com wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>>

>> @@ -1000,6 +1008,7 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
>>
>> static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
>> 	[TCA_ROOT_FLAGS]      = { .type = NLA_U32 },
>> +	[TCA_ROOT_TIME_DELTA]      = { .type = NLA_U32 },
>
> Align these assignments please.
>

Will do next update.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH v2] net: arc_emac: switch to phy_start()/phy_stop()
From: David Miller @ 2017-04-21 19:24 UTC (permalink / raw)
  To: al.kochet; +Cc: tremyfr, peter.chen, weiyongjun1, netdev, linux-kernel
In-Reply-To: <1492694974-31023-1-git-send-email-al.kochet@gmail.com>

From: Alexander Kochetkov <al.kochet@gmail.com>
Date: Thu, 20 Apr 2017 16:29:34 +0300

> Currently driver use phy_start_aneg() in arc_emac_open() to bring
> up PHY. But phy_start() function is more appropriate for this purposes.
> Besides that it call phy_start_aneg() as part of PHY startup sequence
> it also can correctly bring up PHY from error and suspended states.
> So the patch replace phy_start_aneg() to phy_start().
> 
> Also the patch add call to phy_stop() to arc_emac_stop() to allow
> the PHY device to be fully suspended when the interface is unused.
> 
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>

Applied to net-next, thanks.

^ permalink raw reply

* [PATCH net-next v2 5/5] nfp: remove the refresh of all ports optimization
From: Jakub Kicinski @ 2017-04-21 19:23 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, kubakici, Jakub Kicinski
In-Reply-To: <20170421192310.193875-1-jakub.kicinski@netronome.com>

The code refreshing the eth port state was trying to update state
of all ports of the card.  Unfortunately to safely walk the port
list we would have to hold the port lock, which we can't due to
lock ordering constraints against rtnl.

Make the per-port sync refresh and async refresh of all ports
completely separate routines.

Fixes: 172f638c93dd ("nfp: add port state refresh")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |  3 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 13 +++--
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 67 +++++++++++++++-------
 3 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 8302a2d688da..8f20fdef0754 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -819,7 +819,8 @@ struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
 int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new);
 
 bool nfp_net_link_changed_read_clear(struct nfp_net *nn);
-void nfp_net_refresh_port_config(struct nfp_net *nn);
+int nfp_net_refresh_eth_port(struct nfp_net *nn);
+void nfp_net_refresh_port_table(struct nfp_net *nn);
 
 #ifdef CONFIG_NFP_DEBUG
 void nfp_net_debugfs_create(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 3328041ec290..6e27d1281425 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -211,10 +211,15 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
 		return 0;
 
 	/* Use link speed from ETH table if available, otherwise try the BAR */
-	if (nn->eth_port && nfp_net_link_changed_read_clear(nn))
-		nfp_net_refresh_port_config(nn);
-	/* Separate if - on FW error the port could've disappeared from table */
 	if (nn->eth_port) {
+		int err;
+
+		if (nfp_net_link_changed_read_clear(nn)) {
+			err = nfp_net_refresh_eth_port(nn);
+			if (err)
+				return err;
+		}
+
 		cmd->base.port = nn->eth_port->port_type;
 		cmd->base.speed = nn->eth_port->speed;
 		cmd->base.duplex = DUPLEX_FULL;
@@ -273,7 +278,7 @@ nfp_net_set_link_ksettings(struct net_device *netdev,
 	if (err > 0)
 		return 0; /* no change */
 
-	nfp_net_refresh_port_config(nn);
+	nfp_net_refresh_port_table(nn);
 
 	return err;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 4c6863a072d3..8cb87cbe1120 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -176,13 +176,13 @@ nfp_net_get_mac_addr(struct nfp_net *nn, struct nfp_cpp *cpp, unsigned int id)
 }
 
 static struct nfp_eth_table_port *
-nfp_net_find_port(struct nfp_pf *pf, unsigned int id)
+nfp_net_find_port(struct nfp_eth_table *eth_tbl, unsigned int id)
 {
 	int i;
 
-	for (i = 0; pf->eth_tbl && i < pf->eth_tbl->count; i++)
-		if (pf->eth_tbl->ports[i].eth_index == id)
-			return &pf->eth_tbl->ports[i];
+	for (i = 0; eth_tbl && i < eth_tbl->count; i++)
+		if (eth_tbl->ports[i].eth_index == id)
+			return &eth_tbl->ports[i];
 
 	return NULL;
 }
@@ -367,7 +367,7 @@ nfp_net_pf_alloc_netdevs(struct nfp_pf *pf, void __iomem *ctrl_bar,
 		prev_tx_base = tgt_tx_base;
 		prev_rx_base = tgt_rx_base;
 
-		eth_port = nfp_net_find_port(pf, i);
+		eth_port = nfp_net_find_port(pf->eth_tbl, i);
 		if (eth_port && eth_port->override_changed) {
 			nfp_warn(pf->cpp, "Config changed for port #%d, reboot required before port will be operational\n", i);
 		} else {
@@ -485,6 +485,7 @@ static void nfp_net_refresh_netdevs(struct work_struct *work)
 {
 	struct nfp_pf *pf = container_of(work, struct nfp_pf,
 					 port_refresh_work);
+	struct nfp_eth_table *eth_table;
 	struct nfp_net *nn, *next;
 
 	mutex_lock(&pf->port_lock);
@@ -493,6 +494,27 @@ static void nfp_net_refresh_netdevs(struct work_struct *work)
 	if (list_empty(&pf->ports))
 		goto out;
 
+	list_for_each_entry(nn, &pf->ports, port_list)
+		nfp_net_link_changed_read_clear(nn);
+
+	eth_table = nfp_eth_read_ports(pf->cpp);
+	if (!eth_table) {
+		nfp_err(pf->cpp, "Error refreshing port config!\n");
+		goto out;
+	}
+
+	rtnl_lock();
+	list_for_each_entry(nn, &pf->ports, port_list) {
+		if (!nn->eth_port)
+			continue;
+		nn->eth_port = nfp_net_find_port(eth_table,
+						 nn->eth_port->eth_index);
+	}
+	rtnl_unlock();
+
+	kfree(pf->eth_tbl);
+	pf->eth_tbl = eth_table;
+
 	list_for_each_entry_safe(nn, next, &pf->ports, port_list) {
 		if (!nn->eth_port) {
 			nfp_warn(pf->cpp, "Warning: port not present after reconfig\n");
@@ -517,31 +539,36 @@ static void nfp_net_refresh_netdevs(struct work_struct *work)
 	mutex_unlock(&pf->port_lock);
 }
 
-void nfp_net_refresh_port_config(struct nfp_net *nn)
+void nfp_net_refresh_port_table(struct nfp_net *nn)
 {
 	struct nfp_pf *pf = pci_get_drvdata(nn->pdev);
-	struct nfp_eth_table *old_table;
 
-	ASSERT_RTNL();
+	schedule_work(&pf->port_refresh_work);
+}
 
-	old_table = pf->eth_tbl;
+int nfp_net_refresh_eth_port(struct nfp_net *nn)
+{
+	struct nfp_eth_table_port *eth_port;
+	struct nfp_eth_table *eth_table;
 
-	list_for_each_entry(nn, &pf->ports, port_list)
-		nfp_net_link_changed_read_clear(nn);
+	eth_table = nfp_eth_read_ports(nn->cpp);
+	if (!eth_table) {
+		nn_err(nn, "Error refreshing port state table!\n");
+		return -EIO;
+	}
 
-	pf->eth_tbl = nfp_eth_read_ports(pf->cpp);
-	if (!pf->eth_tbl) {
-		pf->eth_tbl = old_table;
-		nfp_err(pf->cpp, "Error refreshing port config!\n");
-		return;
+	eth_port = nfp_net_find_port(eth_table, nn->eth_port->eth_index);
+	if (!eth_port) {
+		nn_err(nn, "Error finding state of the port!\n");
+		kfree(eth_table);
+		return -EIO;
 	}
 
-	list_for_each_entry(nn, &pf->ports, port_list)
-		nn->eth_port = nfp_net_find_port(pf, nn->eth_port->eth_index);
+	memcpy(nn->eth_port, eth_port, sizeof(*eth_port));
 
-	kfree(old_table);
+	kfree(eth_table);
 
-	schedule_work(&pf->port_refresh_work);
+	return 0;
 }
 
 /*
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next v2 4/5] nfp: fix free list buffer size reporting
From: Jakub Kicinski @ 2017-04-21 19:23 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, kubakici, Jakub Kicinski
In-Reply-To: <20170421192310.193875-1-jakub.kicinski@netronome.com>

XDP headroom should not be included in free list buffer size.

Fixes: 6fe0c3b43804 ("nfp: add support for xdp_adjust_head()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 7b70944573fa..ea0efa54916f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2166,7 +2166,7 @@ nfp_net_tx_ring_hw_cfg_write(struct nfp_net *nn,
  */
 static int nfp_net_set_config_and_enable(struct nfp_net *nn)
 {
-	u32 new_ctrl, update = 0;
+	u32 bufsz, new_ctrl, update = 0;
 	unsigned int r;
 	int err;
 
@@ -2200,8 +2200,9 @@ static int nfp_net_set_config_and_enable(struct nfp_net *nn)
 	nfp_net_write_mac_addr(nn);
 
 	nn_writel(nn, NFP_NET_CFG_MTU, nn->dp.netdev->mtu);
-	nn_writel(nn, NFP_NET_CFG_FLBUFSZ,
-		  nn->dp.fl_bufsz - NFP_NET_RX_BUF_NON_DATA);
+
+	bufsz = nn->dp.fl_bufsz - nn->dp.rx_dma_off - NFP_NET_RX_BUF_NON_DATA;
+	nn_writel(nn, NFP_NET_CFG_FLBUFSZ, bufsz);
 
 	/* Enable device */
 	new_ctrl |= NFP_NET_CFG_CTRL_ENABLE;
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next v2 3/5] nfp: add NSP routine to get static information
From: Jakub Kicinski @ 2017-04-21 19:23 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, kubakici, David Brunecz, Jakub Kicinski
In-Reply-To: <20170421192310.193875-1-jakub.kicinski@netronome.com>

From: David Brunecz <david.brunecz@netronome.com>

Retrieve identifying information from the NSP.  For now it only
contains versions of firmware subcomponents.

Signed-off-by: David Brunecz <david.brunecz@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/Makefile        |  1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c      |  7 ++
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |  1 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  7 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   | 24 ++++++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c  | 89 ++++++++++++++++++++++
 6 files changed, 129 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index 4a5d13ef92a4..4b15f0f496aa 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -9,6 +9,7 @@ nfp-objs := \
 	    nfpcore/nfp_mutex.o \
 	    nfpcore/nfp_nffw.o \
 	    nfpcore/nfp_nsp.o \
+	    nfpcore/nfp_nsp_cmds.o \
 	    nfpcore/nfp_nsp_eth.o \
 	    nfpcore/nfp_resource.o \
 	    nfpcore/nfp_rtsym.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index bea2a1a6c211..dde35dae35c5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -253,6 +253,7 @@ nfp_fw_load(struct pci_dev *pdev, struct nfp_pf *pf, struct nfp_nsp *nsp)
 
 static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf *pf)
 {
+	struct nfp_nsp_identify *nspi;
 	struct nfp_nsp *nsp;
 	int err;
 
@@ -269,6 +270,12 @@ static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf *pf)
 
 	pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp);
 
+	nspi = __nfp_nsp_identify(nsp);
+	if (nspi) {
+		dev_info(&pdev->dev, "BSP: %s\n", nspi->version);
+		kfree(nspi);
+	}
+
 	err = nfp_fw_load(pdev, pf, nsp);
 	if (err < 0) {
 		kfree(pf->eth_tbl);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
index 8afef7593f13..4df2ce261b3f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
@@ -63,6 +63,7 @@ void nfp_nsp_config_clear_state(struct nfp_nsp *state);
 int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size);
 int nfp_nsp_write_eth_table(struct nfp_nsp *state,
 			    const void *buf, unsigned int size);
+int nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size);
 
 /* Implemented in nfp_resource.c */
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 4635f42e15b0..61797c98f5fe 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -93,6 +93,7 @@ enum nfp_nsp_cmd {
 	SPCODE_FW_LOAD		= 6, /* Load fw from buffer, len in option */
 	SPCODE_ETH_RESCAN	= 7, /* Rescan ETHs, write ETH_TABLE to buf */
 	SPCODE_ETH_CONTROL	= 8, /* Update media config from buffer */
+	SPCODE_NSP_IDENTIFY	= 13, /* Read NSP version */
 
 	__MAX_SPCODE,
 };
@@ -493,3 +494,9 @@ int nfp_nsp_write_eth_table(struct nfp_nsp *state,
 	return nfp_nsp_command_buf(state, SPCODE_ETH_CONTROL, size, buf, size,
 				   NULL, 0);
 }
+
+int nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_NSP_IDENTIFY, size, NULL, 0,
+				   buf, size);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index 7d34ff145fd7..36b21e4dc56d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -147,4 +147,28 @@ int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode);
 int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed);
 int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes);
 
+/**
+ * struct nfp_nsp_identify - NSP static information
+ * @version:      opaque version string
+ * @flags:        version flags
+ * @br_primary:   branch id of primary bootloader
+ * @br_secondary: branch id of secondary bootloader
+ * @br_nsp:       branch id of NSP
+ * @primary:      version of primarary bootloader
+ * @secondary:    version id of secondary bootloader
+ * @nsp:          version id of NSP
+ */
+struct nfp_nsp_identify {
+	char version[40];
+	u8 flags;
+	u8 br_primary;
+	u8 br_secondary;
+	u8 br_nsp;
+	u16 primary;
+	u16 secondary;
+	u16 nsp;
+};
+
+struct nfp_nsp_identify *__nfp_nsp_identify(struct nfp_nsp *nsp);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c
new file mode 100644
index 000000000000..e7a263de3731
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c
@@ -0,0 +1,89 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "nfp.h"
+#include "nfp_nsp.h"
+
+struct nsp_identify {
+	u8 version[40];
+	u8 flags;
+	u8 br_primary;
+	u8 br_secondary;
+	u8 br_nsp;
+	__le16 primary;
+	__le16 secondary;
+	__le16 nsp;
+	__le16 reserved;
+};
+
+struct nfp_nsp_identify *__nfp_nsp_identify(struct nfp_nsp *nsp)
+{
+	struct nfp_nsp_identify *nspi = NULL;
+	struct nsp_identify *ni;
+	int ret;
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 15)
+		return NULL;
+
+	ni = kzalloc(sizeof(*ni), GFP_KERNEL);
+	if (!ni)
+		return NULL;
+
+	ret = nfp_nsp_read_identify(nsp, ni, sizeof(*ni));
+	if (ret < 0) {
+		nfp_err(nfp_nsp_cpp(nsp), "reading bsp version failed %d\n",
+			ret);
+		goto exit_free;
+	}
+
+	nspi = kzalloc(sizeof(*nspi), GFP_KERNEL);
+	if (!nspi)
+		goto exit_free;
+
+	memcpy(nspi->version, ni->version, sizeof(nspi->version));
+	nspi->version[sizeof(nspi->version) - 1] = '\0';
+	nspi->flags = ni->flags;
+	nspi->br_primary = ni->br_primary;
+	nspi->br_secondary = ni->br_secondary;
+	nspi->br_nsp = ni->br_nsp;
+	nspi->primary = le16_to_cpu(ni->primary);
+	nspi->secondary = le16_to_cpu(ni->secondary);
+	nspi->nsp = le16_to_cpu(ni->nsp);
+
+exit_free:
+	kfree(ni);
+	return nspi;
+}
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next v2 2/5] nfp: parse metadata prepend before XDP runs
From: Jakub Kicinski @ 2017-04-21 19:23 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, kubakici, Jakub Kicinski
In-Reply-To: <20170421192310.193875-1-jakub.kicinski@netronome.com>

Calling memcpy to shift metadata out of the way for XDP to run
seems like an overkill.  The most common metadata contents are
8 bytes containing type and flow hash.  Simply parse the metadata
before we run XDP.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |  6 ++
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 67 +++++++++++-----------
 2 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 052db9208fbb..8302a2d688da 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -284,6 +284,12 @@ struct nfp_net_rx_desc {
 
 #define NFP_NET_META_FIELD_MASK GENMASK(NFP_NET_META_FIELD_SIZE - 1, 0)
 
+struct nfp_meta_parsed {
+	u32 hash_type;
+	u32 hash;
+	u32 mark;
+};
+
 struct nfp_net_rx_hash {
 	__be32 hash_type;
 	__be32 hash;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 71e5568a43ff..7b70944573fa 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1403,8 +1403,9 @@ static void nfp_net_rx_csum(struct nfp_net_dp *dp,
 	}
 }
 
-static void nfp_net_set_hash(struct net_device *netdev, struct sk_buff *skb,
-			     unsigned int type, __be32 *hash)
+static void
+nfp_net_set_hash(struct net_device *netdev, struct nfp_meta_parsed *meta,
+		 unsigned int type, __be32 *hash)
 {
 	if (!(netdev->features & NETIF_F_RXHASH))
 		return;
@@ -1413,16 +1414,18 @@ static void nfp_net_set_hash(struct net_device *netdev, struct sk_buff *skb,
 	case NFP_NET_RSS_IPV4:
 	case NFP_NET_RSS_IPV6:
 	case NFP_NET_RSS_IPV6_EX:
-		skb_set_hash(skb, get_unaligned_be32(hash), PKT_HASH_TYPE_L3);
+		meta->hash_type = PKT_HASH_TYPE_L3;
 		break;
 	default:
-		skb_set_hash(skb, get_unaligned_be32(hash), PKT_HASH_TYPE_L4);
+		meta->hash_type = PKT_HASH_TYPE_L4;
 		break;
 	}
+
+	meta->hash = get_unaligned_be32(hash);
 }
 
 static void
-nfp_net_set_hash_desc(struct net_device *netdev, struct sk_buff *skb,
+nfp_net_set_hash_desc(struct net_device *netdev, struct nfp_meta_parsed *meta,
 		      void *data, struct nfp_net_rx_desc *rxd)
 {
 	struct nfp_net_rx_hash *rx_hash = data;
@@ -1430,12 +1433,12 @@ nfp_net_set_hash_desc(struct net_device *netdev, struct sk_buff *skb,
 	if (!(rxd->rxd.flags & PCIE_DESC_RX_RSS))
 		return;
 
-	nfp_net_set_hash(netdev, skb, get_unaligned_be32(&rx_hash->hash_type),
+	nfp_net_set_hash(netdev, meta, get_unaligned_be32(&rx_hash->hash_type),
 			 &rx_hash->hash);
 }
 
 static void *
-nfp_net_parse_meta(struct net_device *netdev, struct sk_buff *skb,
+nfp_net_parse_meta(struct net_device *netdev, struct nfp_meta_parsed *meta,
 		   void *data, int meta_len)
 {
 	u32 meta_info;
@@ -1447,13 +1450,13 @@ nfp_net_parse_meta(struct net_device *netdev, struct sk_buff *skb,
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
 		case NFP_NET_META_HASH:
 			meta_info >>= NFP_NET_META_FIELD_SIZE;
-			nfp_net_set_hash(netdev, skb,
+			nfp_net_set_hash(netdev, meta,
 					 meta_info & NFP_NET_META_FIELD_MASK,
 					 (__be32 *)data);
 			data += 4;
 			break;
 		case NFP_NET_META_MARK:
-			skb->mark = get_unaligned_be32(data);
+			meta->mark = get_unaligned_be32(data);
 			data += 4;
 			break;
 		default:
@@ -1588,12 +1591,11 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 
 	while (pkts_polled < budget) {
 		unsigned int meta_len, data_len, meta_off, pkt_len, pkt_off;
-		u8 meta_prepend[NFP_NET_MAX_PREPEND];
 		struct nfp_net_rx_buf *rxbuf;
 		struct nfp_net_rx_desc *rxd;
+		struct nfp_meta_parsed meta;
 		dma_addr_t new_dma_addr;
 		void *new_frag;
-		u8 *meta;
 
 		idx = rx_ring->rd_p & (rx_ring->cnt - 1);
 
@@ -1606,6 +1608,8 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 		 */
 		dma_rmb();
 
+		memset(&meta, 0, sizeof(meta));
+
 		rx_ring->rd_p++;
 		pkts_polled++;
 
@@ -1639,9 +1643,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 		r_vec->rx_bytes += pkt_len;
 		u64_stats_update_end(&r_vec->rx_sync);
 
-		/* Pointer to start of metadata */
-		meta = rxbuf->frag + meta_off;
-
 		if (unlikely(meta_len > NFP_NET_MAX_PREPEND ||
 			     (dp->rx_offset && meta_len > dp->rx_offset))) {
 			nn_dp_warn(dp, "oversized RX packet metadata %u\n",
@@ -1653,6 +1654,23 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 		nfp_net_dma_sync_cpu_rx(dp, rxbuf->dma_addr + meta_off,
 					data_len);
 
+		if (!dp->chained_metadata_format) {
+			nfp_net_set_hash_desc(dp->netdev, &meta,
+					      rxbuf->frag + meta_off, rxd);
+		} else if (meta_len) {
+			void *end;
+
+			end = nfp_net_parse_meta(dp->netdev, &meta,
+						 rxbuf->frag + meta_off,
+						 meta_len);
+			if (unlikely(end != rxbuf->frag + pkt_off)) {
+				nn_dp_warn(dp, "invalid RX packet metadata\n");
+				nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf,
+						NULL);
+				continue;
+			}
+		}
+
 		if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
 				  dp->bpf_offload_xdp)) {
 			unsigned int dma_off;
@@ -1661,12 +1679,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 
 			hard_start = rxbuf->frag + NFP_NET_RX_BUF_HEADROOM;
 
-			/* Move prepend out of the way */
-			if (xdp_prog->xdp_adjust_head) {
-				memcpy(meta_prepend, meta, meta_len);
-				meta = meta_prepend;
-			}
-
 			act = nfp_net_run_xdp(xdp_prog, rxbuf->frag, hard_start,
 					      &pkt_off, &pkt_len);
 			switch (act) {
@@ -1710,19 +1722,8 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 		skb_reserve(skb, pkt_off);
 		skb_put(skb, pkt_len);
 
-		if (!dp->chained_metadata_format) {
-			nfp_net_set_hash_desc(dp->netdev, skb, meta, rxd);
-		} else if (meta_len) {
-			void *end;
-
-			end = nfp_net_parse_meta(dp->netdev, skb, meta,
-						 meta_len);
-			if (unlikely(end != meta + meta_len)) {
-				nn_dp_warn(dp, "invalid RX packet metadata\n");
-				nfp_net_rx_drop(dp, r_vec, rx_ring, NULL, skb);
-				continue;
-			}
-		}
+		skb->mark = meta.mark;
+		skb_set_hash(skb, meta.hash, meta.hash_type);
 
 		skb_record_rx_queue(skb, rx_ring->idx);
 		skb->protocol = eth_type_trans(skb, dp->netdev);
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next v2 1/5] nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attribute
From: Jakub Kicinski @ 2017-04-21 19:23 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, kubakici, Jakub Kicinski
In-Reply-To: <20170421192310.193875-1-jakub.kicinski@netronome.com>

DMA unmap may destroy changes CPU made to the buffer.  To make XDP
run correctly on non-x86 platforms we should use the
DMA_ATTR_SKIP_CPU_SYNC attribute.

Thanks to using the attribute we can now push the sync operation to the
common code path from XDP handler.

A little bit of variable name reshuffling is required to bring the
code back to readable state.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 54 ++++++++++++++--------
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index e2197160e4dc..71e5568a43ff 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -87,16 +87,30 @@ void nfp_net_get_fw_version(struct nfp_net_fw_version *fw_ver,
 
 static dma_addr_t nfp_net_dma_map_rx(struct nfp_net_dp *dp, void *frag)
 {
-	return dma_map_single(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
-			      dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
-			      dp->rx_dma_dir);
+	return dma_map_single_attrs(dp->dev, frag + NFP_NET_RX_BUF_HEADROOM,
+				    dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+				    dp->rx_dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static void nfp_net_dma_sync_dev_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr)
+{
+	dma_sync_single_for_device(dp->dev, dma_addr,
+				   dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+				   dp->rx_dma_dir);
 }
 
 static void nfp_net_dma_unmap_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr)
 {
-	dma_unmap_single(dp->dev, dma_addr,
-			 dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
-			 dp->rx_dma_dir);
+	dma_unmap_single_attrs(dp->dev, dma_addr,
+			       dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA,
+			       dp->rx_dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static void nfp_net_dma_sync_cpu_rx(struct nfp_net_dp *dp, dma_addr_t dma_addr,
+				    unsigned int len)
+{
+	dma_sync_single_for_cpu(dp->dev, dma_addr - NFP_NET_RX_BUF_HEADROOM,
+				len, dp->rx_dma_dir);
 }
 
 /* Firmware reconfig
@@ -1167,6 +1181,8 @@ nfp_net_rx_alloc_one(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
 		return NULL;
 	}
 
+	nfp_net_dma_sync_dev_rx(dp, *dma_addr);
+
 	return frag;
 }
 
@@ -1190,6 +1206,8 @@ static void *nfp_net_napi_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 		return NULL;
 	}
 
+	nfp_net_dma_sync_dev_rx(dp, *dma_addr);
+
 	return frag;
 }
 
@@ -1569,7 +1587,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 	tx_ring = r_vec->xdp_ring;
 
 	while (pkts_polled < budget) {
-		unsigned int meta_len, data_len, data_off, pkt_len;
+		unsigned int meta_len, data_len, meta_off, pkt_len, pkt_off;
 		u8 meta_prepend[NFP_NET_MAX_PREPEND];
 		struct nfp_net_rx_buf *rxbuf;
 		struct nfp_net_rx_desc *rxd;
@@ -1608,11 +1626,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 		data_len = le16_to_cpu(rxd->rxd.data_len);
 		pkt_len = data_len - meta_len;
 
+		pkt_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_dma_off;
 		if (dp->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
-			data_off = NFP_NET_RX_BUF_HEADROOM + meta_len;
+			pkt_off += meta_len;
 		else
-			data_off = NFP_NET_RX_BUF_HEADROOM + dp->rx_offset;
-		data_off += dp->rx_dma_off;
+			pkt_off += dp->rx_offset;
+		meta_off = pkt_off - meta_len;
 
 		/* Stats update */
 		u64_stats_update_begin(&r_vec->rx_sync);
@@ -1621,7 +1640,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 		u64_stats_update_end(&r_vec->rx_sync);
 
 		/* Pointer to start of metadata */
-		meta = rxbuf->frag + data_off - meta_len;
+		meta = rxbuf->frag + meta_off;
 
 		if (unlikely(meta_len > NFP_NET_MAX_PREPEND ||
 			     (dp->rx_offset && meta_len > dp->rx_offset))) {
@@ -1631,6 +1650,9 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			continue;
 		}
 
+		nfp_net_dma_sync_cpu_rx(dp, rxbuf->dma_addr + meta_off,
+					data_len);
+
 		if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
 				  dp->bpf_offload_xdp)) {
 			unsigned int dma_off;
@@ -1638,10 +1660,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			int act;
 
 			hard_start = rxbuf->frag + NFP_NET_RX_BUF_HEADROOM;
-			dma_off = data_off - NFP_NET_RX_BUF_HEADROOM;
-			dma_sync_single_for_cpu(dp->dev, rxbuf->dma_addr,
-						dma_off + pkt_len,
-						DMA_BIDIRECTIONAL);
 
 			/* Move prepend out of the way */
 			if (xdp_prog->xdp_adjust_head) {
@@ -1650,12 +1668,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			}
 
 			act = nfp_net_run_xdp(xdp_prog, rxbuf->frag, hard_start,
-					      &data_off, &pkt_len);
+					      &pkt_off, &pkt_len);
 			switch (act) {
 			case XDP_PASS:
 				break;
 			case XDP_TX:
-				dma_off = data_off - NFP_NET_RX_BUF_HEADROOM;
+				dma_off = pkt_off - NFP_NET_RX_BUF_HEADROOM;
 				if (unlikely(!nfp_net_tx_xdp_buf(dp, rx_ring,
 								 tx_ring, rxbuf,
 								 dma_off,
@@ -1689,7 +1707,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 
 		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
 
-		skb_reserve(skb, data_off);
+		skb_reserve(skb, pkt_off);
 		skb_put(skb, pkt_len);
 
 		if (!dp->chained_metadata_format) {
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next v2 0/5] nfp: DMA flags, adjust head and fixes
From: Jakub Kicinski @ 2017-04-21 19:23 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, kubakici, Jakub Kicinski

Hi!

This series takes advantage of Alex's DMA_ATTR_SKIP_CPU_SYNC to make 
XDP packet modifications "correct" from DMA API point of view.  It 
also allows us to parse the metadata before we run XDP at no additional
DMA sync cost.  That way we can get rid of the metadata memcpy, and 
remove the last upstream user of bpf_prog->xdp_adjust_head.

David's patch adds a way to read capabilities from the management
firmware.

There are also two net-next fixes.  Patch 4 which fixes what seems to
be a result of a botched rebase on my part.  Patch 5 corrects locking
when state of ethernet ports is being refreshed.

---
v2: sync rx buffers before giving them to the card (Alex)

David Brunecz (1):
  nfp: add NSP routine to get static information

Jakub Kicinski (4):
  nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC flag
  nfp: parse metadata prepend before XDP runs
  nfp: fix free list buffer size reporting
  nfp: remove the refresh of all ports optimization

 drivers/net/ethernet/netronome/nfp/Makefile        |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c      |   7 ++
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |   9 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 126 ++++++++++++---------
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  13 ++-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  67 +++++++----
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |   1 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |   7 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  24 ++++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c  |  89 +++++++++++++++
 10 files changed, 266 insertions(+), 78 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c

-- 
2.11.0

^ permalink raw reply

* [PATCH net-next 2/3] packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
From: Willem de Bruijn @ 2017-04-21 14:56 UTC (permalink / raw)
  To: netdev; +Cc: maloneykernel, davem, Mike Maloney
In-Reply-To: <20170421145612.106869-1-willemdebruijn.kernel@gmail.com>

From: Mike Maloney <maloney@google.com>

Fanout uses a per net global namespace. A process that intends to create
a new fanout group can accidentally join an existing group. It is not
possible to detect this.

Add socket option PACKET_FANOUT_FLAG_UNIQUEID.  When specified the
supplied fanout group id must be set to 0, and the kernel chooses an id
that is not already in use.  This is an ephemeral flag so that
other sockets can be added to this group using setsockopt, but NOT
specifying this flag.  The current getsockopt(..., PACKET_FANOUT, ...)
can be used to retrieve the new group id.

We assume that there are not a lot of fanout groups and that this is not
a high frequency call.

The method assigns ids starting at zero and increases until it finds an
unused id.  It keeps track of the last assigned id, and uses it as a
starting point to find new ids.

Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>

---
 include/uapi/linux/if_packet.h |  1 +
 net/packet/af_packet.c         | 44 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index 9e7edfd8141e..4df96a7dd4fa 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -66,6 +66,7 @@ struct sockaddr_ll {
 #define PACKET_FANOUT_CBPF		6
 #define PACKET_FANOUT_EBPF		7
 #define PACKET_FANOUT_FLAG_ROLLOVER	0x1000
+#define PACKET_FANOUT_FLAG_UNIQUEID	0x2000
 #define PACKET_FANOUT_FLAG_DEFRAG	0x8000
 
 struct tpacket_stats {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8489beff5c25..94052f42058b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1496,6 +1496,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 DEFINE_MUTEX(fanout_mutex);
 EXPORT_SYMBOL_GPL(fanout_mutex);
 static LIST_HEAD(fanout_list);
+static u16 fanout_next_id;
 
 static void __fanout_link(struct sock *sk, struct packet_sock *po)
 {
@@ -1629,6 +1630,36 @@ static void fanout_release_data(struct packet_fanout *f)
 	};
 }
 
+static bool __fanout_id_is_free(struct sock *sk, u16 candidate_id)
+{
+	struct packet_fanout *f;
+
+	list_for_each_entry(f, &fanout_list, list) {
+		if (f->id == candidate_id &&
+		    read_pnet(&f->net) == sock_net(sk)) {
+			return false;
+		}
+	}
+	return true;
+}
+
+static bool fanout_find_new_id(struct sock *sk, u16 *new_id)
+{
+	u16 id = fanout_next_id;
+
+	do {
+		if (__fanout_id_is_free(sk, id)) {
+			*new_id = id;
+			fanout_next_id = id + 1;
+			return true;
+		}
+
+		id++;
+	} while (id != fanout_next_id);
+
+	return false;
+}
+
 static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 {
 	struct packet_rollover *rollover = NULL;
@@ -1676,6 +1707,19 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 		po->rollover = rollover;
 	}
 
+	if (type_flags & PACKET_FANOUT_FLAG_UNIQUEID) {
+		if (id != 0) {
+			err = -EINVAL;
+			goto out;
+		}
+		if (!fanout_find_new_id(sk, &id)) {
+			err = -ENOMEM;
+			goto out;
+		}
+		/* ephemeral flag for the first socket in the group: drop it */
+		flags &= ~(PACKET_FANOUT_FLAG_UNIQUEID >> 8);
+	}
+
 	match = NULL;
 	list_for_each_entry(f, &fanout_list, list) {
 		if (f->id == id &&
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* Re: [PATCH net-next v6 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-21 15:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Pirko, davem, xiyou.wangcong, netdev
In-Reply-To: <1492788031.6453.14.camel@edumazet-glaptop3.roam.corp.google.com>

On 17-04-21 11:20 AM, Eric Dumazet wrote:
> On Fri, 2017-04-21 at 11:12 -0400, Jamal Hadi Salim wrote:
>> On 17-04-21 09:12 AM, Jiri Pirko wrote:
>>> Fri, Apr 21, 2017 at 12:55:31PM CEST, jhs@mojatatu.com wrote:
>>>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>>

>> Jiri, there is a balance between extensibility and performance.
>> It is senseless to use a TLV just so i can set a 0/1(true/false).
>
> You assume that the (user space) did sensible things.
>

If it didnt it is a bug. Seriously.
Look: If this was the case a lot of things would break in the
kernel. We have bit flags everywhere. We add new ones frequently
to these bitmaps.

> Sometimes they do not, and sets some bits to 1 while they should not.
>

That is a bug.  You cant blame the compiler.

> If old kernel just ignored theses bits, application just ran fine and
> was _qualified_.
>
> Now customers might use these _working_ applications.
>
> Then, Jamal comes and change the kernel to give a meaning to these bits.
>

As happens frequently, not just Jamal - Eric also;->

> Now the customer is running the new kernel and the old application
> breaks horribly.
>
> Who is at fault ? Jamal of course, not the application authors that
> might be out of business, and could not have any test that could have
> spot the (future) issue.
>
> Please Jamal, can we stop this for good ?

Eric: Your are speaking in generalities and you starting premise is
wrong. Anyone setting random bits in a netlink bitmask that has not
been defined is creating bug. We have many examples of how netlink
bitmasks are being used and constantly extended. Please take a look.

If i was the first person starting this today, then yes you will be
making a lot of sense.
For the pads - the arguement that malloc-ing the datastructure may put
random values in the pads was a reasonable arguement. But this is not.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next 0/2] ip_tunnel: Allow policy-based routing through tunnels
From: David Miller @ 2017-04-21 17:21 UTC (permalink / raw)
  To: kraigatgoog; +Cc: netdev
In-Reply-To: <20170419163054.31173-1-kraigatgoog@gmail.com>

From: Craig Gallek <kraigatgoog@gmail.com>
Date: Wed, 19 Apr 2017 12:30:52 -0400

> iproute2 changes to follow.  Example usage:
>   ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4
>   ip -detail link show gre-test
>   ...
>   ip link set gre-test type gre fwmark 0

Looks good, series applied, thanks.

^ permalink raw reply

* Re: pull request (net-next): ipsec-next 2017-04-20
From: David Miller @ 2017-04-21 19:14 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev
In-Reply-To: <1492678515-14347-1-git-send-email-steffen.klassert@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 20 Apr 2017 10:54:59 +0200

> This adds the basic infrastructure for IPsec hardware
> offloading, it creates a configuration API and adjusts
> the packet path.
 ...

Looks great, pulled, thanks Steffen.

^ permalink raw reply

* Re: [PATCH net] ipv6: sr: fix double free of skb after handling invalid SRH
From: David Miller @ 2017-04-21 17:17 UTC (permalink / raw)
  To: david.lebrun; +Cc: netdev, dan.carpenter
In-Reply-To: <20170419141019.8193-1-david.lebrun@uclouvain.be>

From: David Lebrun <david.lebrun@uclouvain.be>
Date: Wed, 19 Apr 2017 16:10:19 +0200

> The icmpv6_param_prob() function already does a kfree_skb(),
> this patch removes the duplicate one.
> 
> Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next v6 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Eric Dumazet @ 2017-04-21 15:20 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Jiri Pirko, davem, xiyou.wangcong, netdev
In-Reply-To: <1dea2c80-9c86-353e-6a74-f5b86070df04@mojatatu.com>

On Fri, 2017-04-21 at 11:12 -0400, Jamal Hadi Salim wrote:
> On 17-04-21 09:12 AM, Jiri Pirko wrote:
> > Fri, Apr 21, 2017 at 12:55:31PM CEST, jhs@mojatatu.com wrote:
> >> From: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> >> +#define TCA_FLAG_LARGE_DUMP_ON		(1 << 0)
> >
> > This is u32 "flags" that could not be extended for other use in future.
> > I'm missing the point. Also, you don't check the rest of the bits for 0
> > as requested by DaveM.
> >
> > As far as this is unextendable, please have this as u8 with values 0 and 1
> > as I originally suggested.
> >
> > I don't understand why are we running in circles about this...
> >
> 
> If i have a 32 bit space of which i am using one bit.
> The sender (user space) zeroes the bits except the one they are 
> interested in. The kernel checks the bits they are interested in.
> Future - we add one more bit and the same philosophy applies.
> Older kernels dont see this bit but they dont have the feature
> to begin with. So where is the lack of extensibility?
> 
> Jiri, there is a balance between extensibility and performance.
> It is senseless to use a TLV just so i can set a 0/1(true/false).

You assume that the (user space) did sensible things.

Sometimes they do not, and sets some bits to 1 while they should not.

If old kernel just ignored theses bits, application just ran fine and
was _qualified_. 

Now customers might use these _working_ applications.

Then, Jamal comes and change the kernel to give a meaning to these bits.

Now the customer is running the new kernel and the old application
breaks horribly.

Who is at fault ? Jamal of course, not the application authors that
might be out of business, and could not have any test that could have
spot the (future) issue.

Please Jamal, can we stop this for good ?

^ permalink raw reply

* Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
From: Willem de Bruijn @ 2017-04-21 14:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Network Development, Michael S. Tsirkin, virtualization,
	David Miller, Willem de Bruijn
In-Reply-To: <118c5f96-6151-c012-7a31-16d24a835878@redhat.com>

>>> Maybe I was wrong, but according to Michael's comment it looks like he
>>> want
>>> check affinity_hint_set just for speculative tx polling on rx napi
>>> instead
>>> of disabling it at all.
>>>
>>> And I'm not convinced this is really needed, driver only provide affinity
>>> hint instead of affinity, so it's not guaranteed that tx and rx interrupt
>>> are in the same vcpus.
>>
>> You're right. I made the restriction broader than the request, to really
>> err
>> on the side of caution for the initial merge of napi tx. And enabling
>> the optimization is always a win over keeping it off, even without irq
>> affinity.
>>
>> The cycle cost is significant without affinity regardless of whether the
>> optimization is used.
>
>
> Yes, I noticed this in the past too.
>
>> Though this is not limited to napi-tx, it is more
>> pronounced in that mode than without napi.
>>
>> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:
>>
>> upstream:
>>
>> 1,1,1: 28985 Mbps, 278 Gcyc
>> 1,0,2: 30067 Mbps, 402 Gcyc
>>
>> napi tx:
>>
>> 1,1,1: 34492 Mbps, 269 Gcyc
>> 1,0,2: 36527 Mbps, 537 Gcyc (!)
>> 1,0,1: 36269 Mbps, 394 Gcyc
>> 1,0,0: 34674 Mbps, 402 Gcyc
>>
>> This is a particularly strong example. It is also representative
>> of most RR tests. It is less pronounced in other streaming tests.
>> 10x TCP_RR, for instance:
>>
>> upstream:
>>
>> 1,1,1: 42267 Mbps, 301 Gcyc
>> 1,0,2: 40663 Mbps, 445 Gcyc
>>
>> napi tx:
>>
>> 1,1,1: 42420 Mbps, 303 Gcyc
>> 1,0,2:  42267 Mbps, 431 Gcyc
>>
>> These numbers were obtained with the virtqueue_enable_cb_delayed
>> optimization after xmit_skb, btw. It turns out that moving that before
>> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
>> 100x TCP_RR a bit.
>
>
> I see, so I think we can leave the affinity hint optimization/check for
> future investigation:
>
> - to avoid endless optimization (e.g we may want to share a single
> vector/napi for tx/rx queue pairs in the future) for this series.
> - tx napi is disabled by default which means we can do optimization on top.

Okay. I'll drop the vi->affinity_hint_set from the patch set for now.

^ permalink raw reply

* Re: [PATCH net] MAINTAINERS: Add new IPsec offloading files.
From: David Miller @ 2017-04-21 19:10 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20170420083632.GG2649@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 20 Apr 2017 10:36:32 +0200

> This adds two new files to IPsec maintenance scope:
> 
> net/ipv4/esp4_offload.c
> net/ipv6/ip6_offload.c
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net-next 3/3] selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID
From: Willem de Bruijn @ 2017-04-21 14:56 UTC (permalink / raw)
  To: netdev; +Cc: maloneykernel, davem, Mike Maloney
In-Reply-To: <20170421145612.106869-1-willemdebruijn.kernel@gmail.com>

From: Mike Maloney <maloney@google.com>

Create two groups with PACKET_FANOUT_FLAG_UNIQUEID, add a socket to one.
Ensure that the groups can only be joined if all options are consistent
with the original except for this flag.

Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>

---
 tools/testing/selftests/net/psock_fanout.c | 95 ++++++++++++++++++++++++++----
 1 file changed, 84 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index b475d87d3aa3..1c46a21a9b5e 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -71,7 +71,7 @@
 
 /* Open a socket in a given fanout mode.
  * @return -1 if mode is bad, a valid socket otherwise */
-static int sock_fanout_open(uint16_t typeflags)
+static int sock_fanout_open(uint16_t typeflags, uint16_t group_id)
 {
 	int fd, val;
 
@@ -81,8 +81,7 @@ static int sock_fanout_open(uint16_t typeflags)
 		exit(1);
 	}
 
-	/* fanout group ID is always 0: tests whether old groups are deleted */
-	val = ((int) typeflags) << 16;
+	val = (((int) typeflags) << 16) | group_id;
 	if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT, &val, sizeof(val))) {
 		if (close(fd)) {
 			perror("close packet");
@@ -95,6 +94,20 @@ static int sock_fanout_open(uint16_t typeflags)
 	return fd;
 }
 
+static void sock_fanout_getopts(int fd, uint16_t *typeflags, uint16_t *group_id)
+{
+	int sockopt;
+	socklen_t sockopt_len = sizeof(sockopt);
+
+	if (getsockopt(fd, SOL_PACKET, PACKET_FANOUT,
+		       &sockopt, &sockopt_len)) {
+		perror("failed to getsockopt");
+		exit(1);
+	}
+	*typeflags = sockopt >> 16;
+	*group_id = sockopt & 0xfffff;
+}
+
 static void sock_fanout_set_ebpf(int fd)
 {
 	const int len_off = __builtin_offsetof(struct __sk_buff, len);
@@ -210,7 +223,7 @@ static void test_control_single(void)
 	fprintf(stderr, "test: control single socket\n");
 
 	if (sock_fanout_open(PACKET_FANOUT_ROLLOVER |
-			       PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
+			       PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
 		fprintf(stderr, "ERROR: opened socket with dual rollover\n");
 		exit(1);
 	}
@@ -223,26 +236,26 @@ static void test_control_group(void)
 
 	fprintf(stderr, "test: control multiple sockets\n");
 
-	fds[0] = sock_fanout_open(PACKET_FANOUT_HASH);
+	fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
 	if (fds[0] == -1) {
 		fprintf(stderr, "ERROR: failed to open HASH socket\n");
 		exit(1);
 	}
 	if (sock_fanout_open(PACKET_FANOUT_HASH |
-			       PACKET_FANOUT_FLAG_DEFRAG) != -1) {
+			       PACKET_FANOUT_FLAG_DEFRAG, 0) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong flag defrag\n");
 		exit(1);
 	}
 	if (sock_fanout_open(PACKET_FANOUT_HASH |
-			       PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
+			       PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong flag ro\n");
 		exit(1);
 	}
-	if (sock_fanout_open(PACKET_FANOUT_CPU) != -1) {
+	if (sock_fanout_open(PACKET_FANOUT_CPU, 0) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong mode\n");
 		exit(1);
 	}
-	fds[1] = sock_fanout_open(PACKET_FANOUT_HASH);
+	fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
 	if (fds[1] == -1) {
 		fprintf(stderr, "ERROR: failed to join group\n");
 		exit(1);
@@ -253,6 +266,61 @@ static void test_control_group(void)
 	}
 }
 
+/* Test creating a unique fanout group ids */
+static void test_unique_fanout_group_ids(void)
+{
+	int fds[3];
+	uint16_t typeflags, first_group_id, second_group_id;
+
+	fprintf(stderr, "test: unique ids\n");
+
+	fds[0] = sock_fanout_open(PACKET_FANOUT_HASH |
+				  PACKET_FANOUT_FLAG_UNIQUEID, 0);
+	if (fds[0] == -1) {
+		fprintf(stderr, "ERROR: failed to create a unique id group.\n");
+		exit(1);
+	}
+
+	sock_fanout_getopts(fds[0], &typeflags, &first_group_id);
+	if (typeflags != PACKET_FANOUT_HASH) {
+		fprintf(stderr, "ERROR: unexpected typeflags %x\n", typeflags);
+		exit(1);
+	}
+
+	if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id)) {
+		fprintf(stderr, "ERROR: joined group with wrong type.\n");
+		exit(1);
+	}
+
+	fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, first_group_id);
+	if (fds[1] == -1) {
+		fprintf(stderr,
+			"ERROR: failed to join previously created group.\n");
+		exit(1);
+	}
+
+	fds[2] = sock_fanout_open(PACKET_FANOUT_HASH |
+				  PACKET_FANOUT_FLAG_UNIQUEID, 0);
+	if (fds[2] == -1) {
+		fprintf(stderr,
+			"ERROR: failed to create a second unique id group.\n");
+		exit(1);
+	}
+
+	sock_fanout_getopts(fds[2], &typeflags, &second_group_id);
+	if (sock_fanout_open(PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_UNIQUEID,
+			     second_group_id) != -1) {
+		fprintf(stderr,
+			"ERROR: specified a group id when requesting unique id\n");
+		exit(1);
+	}
+
+	if (close(fds[0]) || close(fds[1]) || close(fds[2])) {
+		fprintf(stderr, "ERROR: closing sockets\n");
+		exit(1);
+	}
+}
+
 static int test_datapath(uint16_t typeflags, int port_off,
 			 const int expect1[], const int expect2[])
 {
@@ -263,8 +331,8 @@ static int test_datapath(uint16_t typeflags, int port_off,
 
 	fprintf(stderr, "test: datapath 0x%hx\n", typeflags);
 
-	fds[0] = sock_fanout_open(typeflags);
-	fds[1] = sock_fanout_open(typeflags);
+	fds[0] = sock_fanout_open(typeflags, 0);
+	fds[1] = sock_fanout_open(typeflags, 0);
 	if (fds[0] == -1 || fds[1] == -1) {
 		fprintf(stderr, "ERROR: failed open\n");
 		exit(1);
@@ -331,10 +399,12 @@ int main(int argc, char **argv)
 	const int expect_cpu0[2][2]	= { { 20, 0 },  { 20, 0 } };
 	const int expect_cpu1[2][2]	= { { 0, 20 },  { 0, 20 } };
 	const int expect_bpf[2][2]	= { { 15, 5 },  { 15, 20 } };
+	const int expect_uniqueid[2][2] = { { 20, 20},  { 20, 20 } };
 	int port_off = 2, tries = 5, ret;
 
 	test_control_single();
 	test_control_group();
+	test_unique_fanout_group_ids();
 
 	/* find a set of ports that do not collide onto the same socket */
 	ret = test_datapath(PACKET_FANOUT_HASH, port_off,
@@ -365,6 +435,9 @@ int main(int argc, char **argv)
 		ret |= test_datapath(PACKET_FANOUT_CPU, port_off,
 				     expect_cpu1[0], expect_cpu1[1]);
 
+	ret |= test_datapath(PACKET_FANOUT_FLAG_UNIQUEID, port_off,
+			     expect_uniqueid[0], expect_uniqueid[1]);
+
 	if (ret)
 		return 1;
 
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* Re: [PATCH net-next] Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning
From: David Miller @ 2017-04-21 17:22 UTC (permalink / raw)
  To: tedheadster; +Cc: netdev
In-Reply-To: <1492619830-7561-1-git-send-email-tedheadster@gmail.com>

From: Matthew Whitehead <tedheadster@gmail.com>
Date: Wed, 19 Apr 2017 12:37:10 -0400

> Constants used for tuning are generally a bad idea, especially as hardware
> changes over time. Replace the constant 2 jiffies with sysctl variable
> netdev_budget_usecs to enable sysadmins to tune the softirq processing.
> Also document the variable.
> 
> For example, a very fast machine might tune this to 1000 microseconds,
> while my regression testing 486DX-25 needs it to be 4000 microseconds on
> a nearly idle network to prevent time_squeeze from being incremented.
> 
> Version 2: changed jiffies to microseconds for predictable units.
> 
> Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>

Applied, thanks.

^ permalink raw reply

* new documentation: IP_TRANSPARENT, is it correct?
From: bert hubert @ 2017-04-21 18:59 UTC (permalink / raw)
  To: netdev

Hi everyone,

10 years after lartc.org I decided to document a little bit more of Linux
networking, and I hope I got it right. This email asks for your help in
making sure.

Recently I attempted to use IP_TRANSPARENT as outlined in
https://www.kernel.org/doc/Documentation/networking/tproxy.txt but I could
not figure out how it really worked from there (although I could copy paste
my way to some working code).  The web also mostly offered little in the way
of (correct) explanation.

I think I have it figured out by now, but I'm sure there are nuances I have
missed. I'm especially interested in understanding _exactly_ what the
IP_TRANSPARENT socket option does, because it appears somewhat arbitrary
right now:

"The IP_TRANSPARENT socket option enables:

* Binding to addresses that are not (usually) considered local
* Receiving connections and packets from iptables TPROXY redirected sessions"

https://ds9a.nl/tproxy/tproxy.md.html has somewhat prettified Markdown that
requires Javascript, plain Markdown is on
https://github.com/ahupowerdns/tproxydoc/blob/master/tproxy.md

If you could give this a read and a comment on things I got wrong, that
would be most appreciated. Pointers to other relevant documentation are also
very welcome.

Thanks!

	Bert
	

^ permalink raw reply

* Re: [PATCH RFC] sparc64: eBPF JIT
From: David Miller @ 2017-04-21 19:02 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: daniel, sparclinux, netdev, ast
In-Reply-To: <20170421184935.GA86324@ast-mbp.thefacebook.com>

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Fri, 21 Apr 2017 11:49:37 -0700

> On Fri, Apr 21, 2017 at 12:46:40PM -0400, David Miller wrote:
>> From: Daniel Borkmann <daniel@iogearbox.net>
>> Date: Mon, 17 Apr 2017 20:44:35 +0200
>> 
>> > There is samples/bpf/sockex3_kern.c, which exercises it. To
>> > run it, it would be (clang/llvm needed due to BPF backend not
>> > available in gcc):
>> > 
>> > # cd samples/bpf
>> > # make
>> > # ./sockex3
>> > IP     src.port -> dst.port               bytes      packets
>> > 127.0.0.1.12865 -> 127.0.0.1.49711          148            2
>> > 127.0.0.1.49711 -> 127.0.0.1.12865          108            2
>> > [...]
>> > 
>> > Inside parse_eth_proto(), it will do tail calls based on the
>> > eth protocol. Over time, we'll move such C based tests over to
>> > tools/testing/selftests/bpf/.
>> 
>> Ok, after a lot of work setting up an LLVM/CLANG environment and other
>> things, 
> 
> was it painful because of sparc environment?

Yes, primarily this was the reason.

It wasn't just a matter of building llvm/clang, I had to install a
whole new version of gcc because of llvm/clang build requirements.

Also, it isn't clear what is supposed to happen when the system
include files are older than the one in the kernel tree.  I thought it
should just work and the selftests and samples would pick up the ones
in the tree properly.

> redhat folks have been back and forth on adding bpf support to gcc.
> The backend itself was fully functional before it was abandoned.
> Last time we discussed it the lack of integrated asm in gcc was the main blocker.
> Can we bend gcc rules and let bpf backend emit custom binary and/or elf?

I don't see why you can't just do a proper GNU as/ld port.

Frankly I would love to write foo.s BFD assembler files just like for
any other target and them feed them to binutils.

And then once you have BFD support for BPF ELF you get lots of tooling
and you can start going for gdb support.

^ permalink raw reply

* [PATCH net-next 1/3] selftests/net: cleanup unused parameter in psock_fanout
From: Willem de Bruijn @ 2017-04-21 14:56 UTC (permalink / raw)
  To: netdev; +Cc: maloneykernel, davem, Mike Maloney
In-Reply-To: <20170421145612.106869-1-willemdebruijn.kernel@gmail.com>

From: Mike Maloney <maloney@google.com>

sock_fanout_open no longer sets the size of packet_socket ring, so stop
passing the parameter.

Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>

---
 tools/testing/selftests/net/psock_fanout.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index 412459369686..b475d87d3aa3 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -71,7 +71,7 @@
 
 /* Open a socket in a given fanout mode.
  * @return -1 if mode is bad, a valid socket otherwise */
-static int sock_fanout_open(uint16_t typeflags, int num_packets)
+static int sock_fanout_open(uint16_t typeflags)
 {
 	int fd, val;
 
@@ -210,7 +210,7 @@ static void test_control_single(void)
 	fprintf(stderr, "test: control single socket\n");
 
 	if (sock_fanout_open(PACKET_FANOUT_ROLLOVER |
-			       PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
+			       PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
 		fprintf(stderr, "ERROR: opened socket with dual rollover\n");
 		exit(1);
 	}
@@ -223,26 +223,26 @@ static void test_control_group(void)
 
 	fprintf(stderr, "test: control multiple sockets\n");
 
-	fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 20);
+	fds[0] = sock_fanout_open(PACKET_FANOUT_HASH);
 	if (fds[0] == -1) {
 		fprintf(stderr, "ERROR: failed to open HASH socket\n");
 		exit(1);
 	}
 	if (sock_fanout_open(PACKET_FANOUT_HASH |
-			       PACKET_FANOUT_FLAG_DEFRAG, 10) != -1) {
+			       PACKET_FANOUT_FLAG_DEFRAG) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong flag defrag\n");
 		exit(1);
 	}
 	if (sock_fanout_open(PACKET_FANOUT_HASH |
-			       PACKET_FANOUT_FLAG_ROLLOVER, 10) != -1) {
+			       PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong flag ro\n");
 		exit(1);
 	}
-	if (sock_fanout_open(PACKET_FANOUT_CPU, 10) != -1) {
+	if (sock_fanout_open(PACKET_FANOUT_CPU) != -1) {
 		fprintf(stderr, "ERROR: joined group with wrong mode\n");
 		exit(1);
 	}
-	fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 20);
+	fds[1] = sock_fanout_open(PACKET_FANOUT_HASH);
 	if (fds[1] == -1) {
 		fprintf(stderr, "ERROR: failed to join group\n");
 		exit(1);
@@ -263,8 +263,8 @@ static int test_datapath(uint16_t typeflags, int port_off,
 
 	fprintf(stderr, "test: datapath 0x%hx\n", typeflags);
 
-	fds[0] = sock_fanout_open(typeflags, 20);
-	fds[1] = sock_fanout_open(typeflags, 20);
+	fds[0] = sock_fanout_open(typeflags);
+	fds[1] = sock_fanout_open(typeflags);
 	if (fds[0] == -1 || fds[1] == -1) {
 		fprintf(stderr, "ERROR: failed open\n");
 		exit(1);
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* Re: [PATCH 2/2] ipv6: don't deliver packets with zero length to raw sockets
From: David Miller @ 2017-04-21 14:53 UTC (permalink / raw)
  To: jbainbri; +Cc: sd, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <CAOTh05AEQe=QMoStwdXckQ7pPW_ZuDAAj5W78rNqSR-6hNZGSQ@mail.gmail.com>

From: Jamie Bainbridge <jbainbri@redhat.com>
Date: Fri, 21 Apr 2017 21:18:00 +1000

> I cannot see the use in delivering a skb with zero bytes after the
> network header to a raw socket.

Then it cannot be used to look at zero length UDP packets, which are
completely legal and used.

So we must deliver it.

^ permalink raw reply

* Re: net/core: BUG in unregister_netdevice_many
From: Cong Wang @ 2017-04-21 18:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrey Konovalov, Eric Dumazet, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Alexander Duyck, David Ahern, Daniel Borkmann, tcharding,
	Jiri Pirko, stephen hemminger, Dmitry Vyukov, Kostya Serebryany,
	syzkaller
In-Reply-To: <CA+55aFwg=OtMWmU153uM27Fwk5eVv5ZBGBA9phQqVqn3nNAy6Q@mail.gmail.com>

On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Apr 21, 2017 at 5:48 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
>>
>> I've got the following error report while fuzzing the kernel with syzkaller.
>>
>> ------------[ cut here ]------------
>> kernel BUG at net/core/dev.c:6813!
>
> Another useless BUG_ON() that

I think we are double-unregister'ing the pim6reg device,
we probably need something like:

commit 7dc00c82cbb0119cf4663f65bbaa2cc55f961db2
Author: Wang Chen <wangchen@cn.fujitsu.com>
Date:   Mon Jul 14 20:56:34 2008 -0700

    ipv4: Fix ipmr unregister device oops

    An oops happens during device unregister.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox