Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next v2] xsk: include XDP meta data in AF_XDP frames
From: Björn Töpel @ 2018-08-30 13:12 UTC (permalink / raw)
  To: bjorn.topel, magnus.karlsson, magnus.karlsson, ast, daniel,
	netdev
  Cc: Björn Töpel, brouer
In-Reply-To: <20180830080900.16350-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did
not include XDP meta data in the data buffers copied out to the user
application.

In this commit, we check if meta data is available, and if so, it is
prepended to the frame.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 4e937cd7c17d..569048e299df 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -55,20 +55,30 @@ EXPORT_SYMBOL(xsk_umem_discard_addr);
 
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	void *buffer;
+	void *to_buf, *from_buf;
+	u32 metalen;
 	u64 addr;
 	int err;
 
 	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
-	    len > xs->umem->chunk_size_nohr) {
+	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
 
 	addr += xs->umem->headroom;
 
-	buffer = xdp_umem_get_data(xs->umem, addr);
-	memcpy(buffer, xdp->data, len);
+	if (unlikely(xdp_data_meta_unsupported(xdp))) {
+		from_buf = xdp->data;
+		metalen = 0;
+	} else {
+		from_buf = xdp->data_meta;
+		metalen = xdp->data - xdp->data_meta;
+	}
+
+	to_buf = xdp_umem_get_data(xs->umem, addr);
+	memcpy(to_buf, from_buf, len + metalen);
+	addr += metalen;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
@@ -111,6 +121,7 @@ void xsk_flush(struct xdp_sock *xs)
 
 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
+	u32 metalen = xdp->data - xdp->data_meta;
 	u32 len = xdp->data_end - xdp->data;
 	void *buffer;
 	u64 addr;
@@ -120,7 +131,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		return -EINVAL;
 
 	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
-	    len > xs->umem->chunk_size_nohr) {
+	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
@@ -128,7 +139,8 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	addr += xs->umem->headroom;
 
 	buffer = xdp_umem_get_data(xs->umem, addr);
-	memcpy(buffer, xdp->data, len);
+	memcpy(buffer, xdp->data_meta, len + metalen);
+	addr += metalen;
 	err = xskq_produce_batch_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH 1/2] xfrm6: call kfree_skb when skb is toobig
From: Sabrina Dubroca @ 2018-08-30 13:23 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: netdev, yoshfuji, kuznet, davem, herbert, steffen.klassert,
	eyal.birger
In-Reply-To: <20180830125817.4567-1-cascardo@canonical.com>

2018-08-30, 09:58:16 -0300, Thadeu Lima de Souza Cascardo wrote:
> After commit d6990976af7c5d8f55903bfb4289b6fb030bf754 ("vti6: fix PMTU caching
> and reporting on xmit"), some too big skbs might be potentially passed down to
> __xfrm6_output, causing it to fail to transmit but not free the skb, causing a
> leak of skb, and consequentially a leak of dst references.
> 
> After running pmtu.sh, that shows as failure to unregister devices in a namespace:
> 
> [  311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1
> 
> The fix is to call kfree_skb in case of transmit failures.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

I was about to post the same patch. Arguably, the commit introducing
this bug is the one that added those "return -EMSGSIZE" to
__xfrm6_output without freeing.

Either way, it's missing a Fixes: tag, which should be one of those,
or both:

Fixes: d6990976af7c ("vti6: fix PMTU caching and reporting on xmit")
Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH bpf-next v2] xsk: include XDP meta data in AF_XDP frames
From: Daniel Borkmann @ 2018-08-30 13:27 UTC (permalink / raw)
  To: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	netdev
  Cc: Björn Töpel, brouer
In-Reply-To: <20180830131248.21061-1-bjorn.topel@gmail.com>

On 08/30/2018 03:12 PM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did
> not include XDP meta data in the data buffers copied out to the user
> application.
> 
> In this commit, we check if meta data is available, and if so, it is
> prepended to the frame.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Applied to bpf-next, thanks Björn!

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: nixge: Add support for having nixge as subdevice
From: Andrew Lunn @ 2018-08-30 17:42 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: David S. Miller, Kees Cook, Florian Fainelli,
	Linux Kernel Mailing List, netdev, Alex Williams
In-Reply-To: <CAAtXAHceYgPx4A7F14AYEQs9RFo2hP6JTkApHa_GtbxVcdiQNA@mail.gmail.com>

On Thu, Aug 30, 2018 at 09:39:39AM -0700, Moritz Fischer wrote:
> Hi Andrew,
> 
> On Wed, Aug 29, 2018 at 8:11 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > Could you tell us more about the parent device. I'm guessing PCIe.  Is
> > it x86 so no device tree? Are there cases where it does not have a PHY
> > connected? What is connected instead? SFP? A switch? Can there be
> > multiple PHYs on the MDIO bus?
> 
> The device is part of a larger FPGA design. One use case that I was trying
> to support with this patch is PCIe with x86 (hopefully on it's own PF...)
> Since the whole design isn't completely done, these are the use cases I
> see upcoming and current:
> 
> ARM(64):
> a) DT: PHY over MDIO (current use case), fixed-link with GPIO (coming)
> b) DT: SFP (potentially coming)
> 
> x86:
> a) no PHY (coming)-> fixed-link with GPIO
> b) SFP (potentially), PHY over MDIO (potentially)

Hi Mortiz

For SFP, you need to convert this driver to use phylink. That will
also help you with fixed-link, since phylink will handle probing that
for you.

But this brings its own problems. phylink and sfp currently has no
support for platform devices. The SFP driver needs to know which i2c
bus to use, and which GPIOs are connected to the SFP. phylink parses
the device tree to find out if there is an SFP device, or a fixed
link, etc. I don't know of any conceptual reason why platform data
would not work, it just needs implementing.

There also does not appear to be any in kernel users of the device
tree binding. That gives you some flexibility in that you could think
about making non-backwards compatible changes in the binding. I would
definitely want to move PHYs into an mdio subnode.

I'm not aware of any x86 drivers using fixed link. What they generally
do is register the mdio bus using mdiobus_register() and then use
phy_find_first() to get a PHY. This works O.K. for PCs, laptops, and
PCIe cards where there is only one PHY on the bus. What you might be
able to do is always register the MDIO bus, and if you fail to find a
PHY, instantiate a fixed-link and use that instead.

Reality is, all the core work in this area has been pushed by the
embedded world, which is ARM and device tree. For Intel and Server
style networking, drivers tend to either ignore the Linux core code
and write there own PHY and MDIO bus drivers, or it is all done in
firmware.

     Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: nixge: Add support for fixed-link subnodes
From: kbuild test robot @ 2018-08-30 17:44 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: kbuild-all, davem, keescook, f.fainelli, linux-kernel, netdev,
	alex.williams, Moritz Fischer
In-Reply-To: <20180830004046.9417-2-mdf@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5891 bytes --]

Hi Moritz,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Moritz-Fischer/nixge-fixed-link-support/20180830-150857
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Moritz-Fischer/nixge-fixed-link-support/20180830-150857 HEAD 300a42d41dc76f270bff67d414dc7fc127d3f17c builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/net/ethernet/ni/nixge.c: In function 'nixge_mdio_setup':
   drivers/net/ethernet/ni/nixge.c:1221:6: warning: unused variable 'err' [-Wunused-variable]
     int err;
         ^~~
   drivers/net/ethernet/ni/nixge.c: In function 'nixge_remove':
>> drivers/net/ethernet/ni/nixge.c:1366:6: error: 'np' undeclared (first use in this function); did you mean 'up'?
     if (np && of_phy_is_fixed_link(np))
         ^~
         up
   drivers/net/ethernet/ni/nixge.c:1366:6: note: each undeclared identifier is reported only once for each function it appears in

vim +1366 drivers/net/ethernet/ni/nixge.c

  1217	
  1218	static int nixge_mdio_setup(struct nixge_priv *priv, struct device_node *np)
  1219	{
  1220		struct mii_bus *bus;
> 1221		int err;
  1222	
  1223		bus = devm_mdiobus_alloc(priv->dev);
  1224		if (!bus)
  1225			return -ENOMEM;
  1226	
  1227		snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(priv->dev));
  1228		bus->priv = priv;
  1229		bus->name = "nixge_mii_bus";
  1230		bus->read = nixge_mdio_read;
  1231		bus->write = nixge_mdio_write;
  1232		bus->parent = priv->dev;
  1233	
  1234		priv->mii_bus = bus;
  1235	
  1236		return of_mdiobus_register(bus, np);
  1237	}
  1238	
  1239	static void *nixge_get_nvmem_address(struct device *dev)
  1240	{
  1241		struct nvmem_cell *cell;
  1242		size_t cell_size;
  1243		char *mac;
  1244	
  1245		cell = nvmem_cell_get(dev, "address");
  1246		if (IS_ERR(cell))
  1247			return NULL;
  1248	
  1249		mac = nvmem_cell_read(cell, &cell_size);
  1250		nvmem_cell_put(cell);
  1251	
  1252		return mac;
  1253	}
  1254	
  1255	static int nixge_probe(struct platform_device *pdev)
  1256	{
  1257		struct nixge_priv *priv;
  1258		struct net_device *ndev;
  1259		struct resource *dmares;
  1260		struct device_node *np;
  1261		const u8 *mac_addr;
  1262		int err;
  1263	
  1264		ndev = alloc_etherdev(sizeof(*priv));
  1265		if (!ndev)
  1266			return -ENOMEM;
  1267	
  1268		np = pdev->dev.of_node;
  1269	
  1270		platform_set_drvdata(pdev, ndev);
  1271		SET_NETDEV_DEV(ndev, &pdev->dev);
  1272	
  1273		ndev->features = NETIF_F_SG;
  1274		ndev->netdev_ops = &nixge_netdev_ops;
  1275		ndev->ethtool_ops = &nixge_ethtool_ops;
  1276	
  1277		/* MTU range: 64 - 9000 */
  1278		ndev->min_mtu = 64;
  1279		ndev->max_mtu = NIXGE_JUMBO_MTU;
  1280	
  1281		mac_addr = nixge_get_nvmem_address(&pdev->dev);
  1282		if (mac_addr && is_valid_ether_addr(mac_addr)) {
  1283			ether_addr_copy(ndev->dev_addr, mac_addr);
  1284			kfree(mac_addr);
  1285		} else {
  1286			eth_hw_addr_random(ndev);
  1287		}
  1288	
  1289		priv = netdev_priv(ndev);
  1290		priv->ndev = ndev;
  1291		priv->dev = &pdev->dev;
  1292	
  1293		netif_napi_add(ndev, &priv->napi, nixge_poll, NAPI_POLL_WEIGHT);
  1294	
  1295		dmares = platform_get_resource(pdev, IORESOURCE_MEM, 0);
  1296		priv->dma_regs = devm_ioremap_resource(&pdev->dev, dmares);
  1297		if (IS_ERR(priv->dma_regs)) {
  1298			netdev_err(ndev, "failed to map dma regs\n");
  1299			return PTR_ERR(priv->dma_regs);
  1300		}
  1301		priv->ctrl_regs = priv->dma_regs + NIXGE_REG_CTRL_OFFSET;
  1302		__nixge_hw_set_mac_address(ndev);
  1303	
  1304		priv->tx_irq = platform_get_irq_byname(pdev, "tx");
  1305		if (priv->tx_irq < 0) {
  1306			netdev_err(ndev, "could not find 'tx' irq");
  1307			return priv->tx_irq;
  1308		}
  1309	
  1310		priv->rx_irq = platform_get_irq_byname(pdev, "rx");
  1311		if (priv->rx_irq < 0) {
  1312			netdev_err(ndev, "could not find 'rx' irq");
  1313			return priv->rx_irq;
  1314		}
  1315	
  1316		priv->coalesce_count_rx = XAXIDMA_DFT_RX_THRESHOLD;
  1317		priv->coalesce_count_tx = XAXIDMA_DFT_TX_THRESHOLD;
  1318	
  1319		if (np) {
  1320			err = nixge_of_get_phy(priv, np);
  1321			if (err)
  1322				goto free_netdev;
  1323		}
  1324	
  1325		/* only if it's not a fixed link, do we care about MDIO at all */
  1326		if (priv->phy_node && !of_phy_is_fixed_link(np)) {
  1327			err = nixge_mdio_setup(priv, np);
  1328			if (err) {
  1329				dev_err(&pdev->dev, "error registering mdio bus");
  1330				goto free_phy;
  1331			}
  1332		}
  1333	
  1334		err = register_netdev(priv->ndev);
  1335		if (err) {
  1336			netdev_err(ndev, "register_netdev() error (%i)\n", err);
  1337			goto unregister_mdio;
  1338		}
  1339	
  1340		return 0;
  1341	
  1342	unregister_mdio:
  1343		if (priv->mii_bus)
  1344			mdiobus_unregister(priv->mii_bus);
  1345	free_phy:
  1346		if (np && of_phy_is_fixed_link(np)) {
  1347			of_phy_deregister_fixed_link(np);
  1348			of_node_put(np);
  1349		}
  1350	free_netdev:
  1351		free_netdev(ndev);
  1352	
  1353		return err;
  1354	}
  1355	
  1356	static int nixge_remove(struct platform_device *pdev)
  1357	{
  1358		struct net_device *ndev = platform_get_drvdata(pdev);
  1359		struct nixge_priv *priv = netdev_priv(ndev);
  1360	
  1361		unregister_netdev(ndev);
  1362	
  1363		if (priv->mii_bus)
  1364			mdiobus_unregister(priv->mii_bus);
  1365	
> 1366		if (np && of_phy_is_fixed_link(np))
  1367			of_phy_deregister_fixed_link(np);
  1368	
  1369		free_netdev(ndev);
  1370	
  1371		return 0;
  1372	}
  1373	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65144 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: nixge: Add support for fixed-link subnodes
From: Andrew Lunn @ 2018-08-30 17:54 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: David S. Miller, Kees Cook, Florian Fainelli,
	Linux Kernel Mailing List, netdev, Alex Williams
In-Reply-To: <CAAtXAHe_fX9_XExvFj20a0L-=b0Z1gX410=wnR=P2DjO-hEByA@mail.gmail.com>

> > The hardware has MDIO, but you don't have a PHY connected on it, and
> > use fixed link.
> 
> Since it's an FPGA design in that case we'd probably build the hardware without
> MDIO to save resources.

You can save resources, but is it worth the complexity else where,
like in the software?

> > It is important you have the mdio subnode, with PHYs and switches as
> > children. The driver currently gets this wrong, it uses
> > pdev->dev.of_node.
> 
> Oh, whoops.

Yes, and i also missed it. I generally review all new network drivers
and look at their MDIO and PHY code.

> Any good examples of drivers doing it right? Is the one going with
> the DT snippet above a good example?

That comes from the Freescale fec_main.c. It only supports DT, and
always uses of_mdiobus_register. You need to be a bit more flexible
for when you don't have DT. I'm not sure there are good example of
this, since they either don't need this flexibility, or they get it
wrong :-(

      Andrew

^ permalink raw reply

* [PATCH bpf-next 0/3] xsk: misc code cleanup
From: Magnus Karlsson @ 2018-08-30 13:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev

This patch set cleans up two code style issues with the xsk zero-copy
code. The resulting code is smaller and simpler.

Patch 1: Removes a potential compiler warning reported by the Intel
         0-DAY kernel test infrastructure.
Patches 2-3: Removes the xdp_umem_props structure. At some point, it
             was used to break a dependency, but the members are these
             days much better off in the xdp_umem since the dependency
             does not exist anymore.

I based this patch set on bpf-next commit 234dbe3dc1db ("Merge branch
'verifier-liveness-simplification'")

Thanks: Magnus

Magnus Karlsson (3):
  i40e: fix possible compiler warning in xsk TX path
  xsk: get rid of useless struct xdp_umem_props
  i40e: adapt driver to new xdp_umem structure

 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 10 ++++------
 include/net/xdp_sock.h                     |  8 ++------
 net/xdp/xdp_umem.c                         |  4 ++--
 net/xdp/xdp_umem_props.h                   | 14 --------------
 net/xdp/xsk.c                              | 10 ++++++----
 net/xdp/xsk_queue.c                        |  5 +++--
 net/xdp/xsk_queue.h                        | 13 +++++++------
 7 files changed, 24 insertions(+), 40 deletions(-)
 delete mode 100644 net/xdp/xdp_umem_props.h

^ permalink raw reply

* [PATCH bpf-next 1/3] i40e: fix possible compiler warning in xsk TX path
From: Magnus Karlsson @ 2018-08-30 13:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev
In-Reply-To: <1535637403-14549-1-git-send-email-magnus.karlsson@intel.com>

With certain gcc versions, it was possible to get the warning
"'tx_desc' may be used uninitialized in this function" for the
i40e_xmit_zc. This was not possible, however this commit simplifies
the code path so that this warning is no longer emitted.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 94947a8..41ca7e1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -668,9 +668,8 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
  **/
 static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 {
-	unsigned int total_packets = 0;
+	struct i40e_tx_desc *tx_desc = NULL;
 	struct i40e_tx_buffer *tx_bi;
-	struct i40e_tx_desc *tx_desc;
 	bool work_done = true;
 	dma_addr_t dma;
 	u32 len;
@@ -697,14 +696,13 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 			build_ctob(I40E_TX_DESC_CMD_ICRC
 				   | I40E_TX_DESC_CMD_EOP,
 				   0, len, 0);
-		total_packets++;
 
 		xdp_ring->next_to_use++;
 		if (xdp_ring->next_to_use == xdp_ring->count)
 			xdp_ring->next_to_use = 0;
 	}
 
-	if (total_packets > 0) {
+	if (tx_desc) {
 		/* Request an interrupt for the last frame and bump tail ptr. */
 		tx_desc->cmd_type_offset_bsz |= (I40E_TX_DESC_CMD_RS <<
 						 I40E_TXD_QW1_CMD_SHIFT);
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 2/3] xsk: get rid of useless struct xdp_umem_props
From: Magnus Karlsson @ 2018-08-30 13:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev
In-Reply-To: <1535637403-14549-1-git-send-email-magnus.karlsson@intel.com>

This commit gets rid of the structure xdp_umem_props. It was there to
be able to break a dependency at one point, but this is no longer
needed. The values in the struct are instead stored directly in the
xdp_umem structure. This simplifies the xsk code as well as af_xdp
zero-copy drivers and as a bonus gets rid of one internal header file.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h   |  8 ++------
 net/xdp/xdp_umem.c       |  4 ++--
 net/xdp/xdp_umem_props.h | 14 --------------
 net/xdp/xsk.c            | 10 ++++++----
 net/xdp/xsk_queue.c      |  5 +++--
 net/xdp/xsk_queue.h      | 13 +++++++------
 6 files changed, 20 insertions(+), 34 deletions(-)
 delete mode 100644 net/xdp/xdp_umem_props.h

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 56994ad..932ca0d 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -16,11 +16,6 @@
 struct net_device;
 struct xsk_queue;
 
-struct xdp_umem_props {
-	u64 chunk_mask;
-	u64 size;
-};
-
 struct xdp_umem_page {
 	void *addr;
 	dma_addr_t dma;
@@ -30,7 +25,8 @@ struct xdp_umem {
 	struct xsk_queue *fq;
 	struct xsk_queue *cq;
 	struct xdp_umem_page *pages;
-	struct xdp_umem_props props;
+	u64 chunk_mask;
+	u64 size;
 	u32 headroom;
 	u32 chunk_size_nohr;
 	struct user_struct *user;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index bfe2dbe..2471614 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -314,8 +314,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 
 	umem->pid = get_task_pid(current, PIDTYPE_PID);
 	umem->address = (unsigned long)addr;
-	umem->props.chunk_mask = ~((u64)chunk_size - 1);
-	umem->props.size = size;
+	umem->chunk_mask = ~((u64)chunk_size - 1);
+	umem->size = size;
 	umem->headroom = headroom;
 	umem->chunk_size_nohr = chunk_size - headroom;
 	umem->npgs = size / PAGE_SIZE;
diff --git a/net/xdp/xdp_umem_props.h b/net/xdp/xdp_umem_props.h
deleted file mode 100644
index 40eab10..0000000
--- a/net/xdp/xdp_umem_props.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* XDP user-space packet buffer
- * Copyright(c) 2018 Intel Corporation.
- */
-
-#ifndef XDP_UMEM_PROPS_H_
-#define XDP_UMEM_PROPS_H_
-
-struct xdp_umem_props {
-	u64 chunk_mask;
-	u64 size;
-};
-
-#endif /* XDP_UMEM_PROPS_H_ */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 4e937cd..acbe5b5 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -458,8 +458,10 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		goto out_unlock;
 	} else {
 		/* This xsk has its own umem. */
-		xskq_set_umem(xs->umem->fq, &xs->umem->props);
-		xskq_set_umem(xs->umem->cq, &xs->umem->props);
+		xskq_set_umem(xs->umem->fq, xs->umem->size,
+			      xs->umem->chunk_mask);
+		xskq_set_umem(xs->umem->cq, xs->umem->size,
+			      xs->umem->chunk_mask);
 
 		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
 		if (err)
@@ -469,8 +471,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	xs->dev = dev;
 	xs->zc = xs->umem->zc;
 	xs->queue_id = qid;
-	xskq_set_umem(xs->rx, &xs->umem->props);
-	xskq_set_umem(xs->tx, &xs->umem->props);
+	xskq_set_umem(xs->rx, xs->umem->size, xs->umem->chunk_mask);
+	xskq_set_umem(xs->tx, xs->umem->size, xs->umem->chunk_mask);
 	xdp_add_sk_umem(xs->umem, xs);
 
 out_unlock:
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index 6c32e92..2dc1384d 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -7,12 +7,13 @@
 
 #include "xsk_queue.h"
 
-void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props)
+void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask)
 {
 	if (!q)
 		return;
 
-	q->umem_props = *umem_props;
+	q->size = size;
+	q->chunk_mask = chunk_mask;
 }
 
 static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 8a64b15..82252cc 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -31,7 +31,8 @@ struct xdp_umem_ring {
 };
 
 struct xsk_queue {
-	struct xdp_umem_props umem_props;
+	u64 chunk_mask;
+	u64 size;
 	u32 ring_mask;
 	u32 nentries;
 	u32 prod_head;
@@ -78,7 +79,7 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
 
 static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
-	if (addr >= q->umem_props.size) {
+	if (addr >= q->size) {
 		q->invalid_descs++;
 		return false;
 	}
@@ -92,7 +93,7 @@ static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
 		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 		unsigned int idx = q->cons_tail & q->ring_mask;
 
-		*addr = READ_ONCE(ring->desc[idx]) & q->umem_props.chunk_mask;
+		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
 		if (xskq_is_valid_addr(q, *addr))
 			return addr;
 
@@ -173,8 +174,8 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
 	if (!xskq_is_valid_addr(q, d->addr))
 		return false;
 
-	if (((d->addr + d->len) & q->umem_props.chunk_mask) !=
-	    (d->addr & q->umem_props.chunk_mask)) {
+	if (((d->addr + d->len) & q->chunk_mask) !=
+	    (d->addr & q->chunk_mask)) {
 		q->invalid_descs++;
 		return false;
 	}
@@ -253,7 +254,7 @@ static inline bool xskq_empty_desc(struct xsk_queue *q)
 	return xskq_nb_free(q, q->prod_tail, q->nentries) == q->nentries;
 }
 
-void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props);
+void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
 struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
 void xskq_destroy(struct xsk_queue *q_ops);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 3/3] i40e: adapt driver to new xdp_umem structure
From: Magnus Karlsson @ 2018-08-30 13:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev
In-Reply-To: <1535637403-14549-1-git-send-email-magnus.karlsson@intel.com>

The struct xdp_umem_props was removed in the xsk code and this commit
adapts the i40e af_xdp zero-copy driver code to the new xdp_umem
structure.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 41ca7e1..2ebfc78 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -442,7 +442,7 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
 				    struct i40e_rx_buffer *old_bi)
 {
 	struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
-	unsigned long mask = (unsigned long)rx_ring->xsk_umem->props.chunk_mask;
+	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
 	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
 	u16 nta = rx_ring->next_to_alloc;
 
@@ -477,7 +477,7 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
 
 	rx_ring = container_of(alloc, struct i40e_ring, zca);
 	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
-	mask = rx_ring->xsk_umem->props.chunk_mask;
+	mask = rx_ring->xsk_umem->chunk_mask;
 
 	nta = rx_ring->next_to_alloc;
 	bi = &rx_ring->rx_bi[nta];
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/2] selftests: pmtu: maximum MTU for vti4 is 2^16-1-20
From: Sabrina Dubroca @ 2018-08-30 14:01 UTC (permalink / raw)
  To: netdev; +Cc: Sabrina Dubroca, Stefano Brivio, Nicolas Dichtel

Since commit 82612de1c98e ("ip_tunnel: restore binding to ifaces with a
large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of
the mysterious constant 0xFFF8.  This makes this selftest fail.

Fixes: 82612de1c98e ("ip_tunnel: restore binding to ifaces with a large mtu")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
---
 tools/testing/selftests/net/pmtu.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index f8cc38afffa2..0ecf2609b9a4 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -334,7 +334,7 @@ test_pmtu_vti4_link_add_mtu() {
 	fail=0
 
 	min=68
-	max=$((65528 - 20))
+	max=$((65535 - 20))
 	# Check invalid values first
 	for v in $((min - 1)) $((max + 1)); do
 		${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10 2>/dev/null
-- 
2.18.0

^ permalink raw reply related

* [PATCH net 2/2] selftests: pmtu: detect correct binary to ping ipv6 addresses
From: Sabrina Dubroca @ 2018-08-30 14:01 UTC (permalink / raw)
  To: netdev; +Cc: Sabrina Dubroca, Stefano Brivio
In-Reply-To: <1e62875c4c72b38b17f6c73f9654696b14fb3166.1535636302.git.sd@queasysnail.net>

Some systems don't have the ping6 binary anymore, and use ping for
everything. Detect the absence of ping6 and try to use ping instead.

Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
---
 tools/testing/selftests/net/pmtu.sh | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 0ecf2609b9a4..cc2798a0a2d7 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -46,6 +46,9 @@
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 
+# Some systems don't have a ping6 binary anymore
+which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
+
 tests="
 	pmtu_vti6_exception		vti6: PMTU exceptions
 	pmtu_vti4_exception		vti4: PMTU exceptions
@@ -274,7 +277,7 @@ test_pmtu_vti6_exception() {
 	mtu "${ns_b}" veth_b 4000
 	mtu "${ns_a}" vti6_a 5000
 	mtu "${ns_b}" vti6_b 5000
-	${ns_a} ping6 -q -i 0.1 -w 2 -s 60000 ${vti6_b_addr} > /dev/null
+	${ns_a} ${ping6} -q -i 0.1 -w 2 -s 60000 ${vti6_b_addr} > /dev/null
 
 	# Check that exception was created
 	if [ "$(route_get_dst_pmtu_from_exception "${ns_a}" ${vti6_b_addr})" = "" ]; then
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next] net: stmmac: Add CBS support in XGMAC2
From: Jose Abreu @ 2018-08-30 14:09 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue

XGMAC2 uses the same CBS mechanism as GMAC5, only registers offset
changes. Lets use the same TC callbacks and implement the .config_cbs
callback in XGMAC2 core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      | 12 ++++++++++++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 19 ++++++++++++++++++-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c  | 19 +++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/hwif.c          |  2 +-
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 0a80fa25afe3..d6bb953685fa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -119,11 +119,23 @@
 #define XGMAC_MTL_TXQ_OPMODE(x)		(0x00001100 + (0x80 * (x)))
 #define XGMAC_TQS			GENMASK(25, 16)
 #define XGMAC_TQS_SHIFT			16
+#define XGMAC_Q2TCMAP			GENMASK(10, 8)
+#define XGMAC_Q2TCMAP_SHIFT		8
 #define XGMAC_TTC			GENMASK(6, 4)
 #define XGMAC_TTC_SHIFT			4
 #define XGMAC_TXQEN			GENMASK(3, 2)
 #define XGMAC_TXQEN_SHIFT		2
 #define XGMAC_TSF			BIT(1)
+#define XGMAC_MTL_TCx_ETS_CONTROL(x)	(0x00001110 + (0x80 * (x)))
+#define XGMAC_MTL_TCx_QUANTUM_WEIGHT(x)	(0x00001118 + (0x80 * (x)))
+#define XGMAC_MTL_TCx_SENDSLOPE(x)	(0x0000111c + (0x80 * (x)))
+#define XGMAC_MTL_TCx_HICREDIT(x)	(0x00001120 + (0x80 * (x)))
+#define XGMAC_MTL_TCx_LOCREDIT(x)	(0x00001124 + (0x80 * (x)))
+#define XGMAC_CC			BIT(3)
+#define XGMAC_TSA			GENMASK(1, 0)
+#define XGMAC_SP			(0x0 << 0)
+#define XGMAC_CBS			(0x1 << 0)
+#define XGMAC_ETS			(0x2 << 0)
 #define XGMAC_MTL_RXQ_OPMODE(x)		(0x00001140 + (0x80 * (x)))
 #define XGMAC_RQS			GENMASK(25, 16)
 #define XGMAC_RQS_SHIFT			16
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index d182f82f7b58..64b8cb88ea45 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -177,6 +177,23 @@ static void dwxgmac2_map_mtl_to_dma(struct mac_device_info *hw, u32 queue,
 	writel(value, ioaddr + reg);
 }
 
+static void dwxgmac2_config_cbs(struct mac_device_info *hw,
+				u32 send_slope, u32 idle_slope,
+				u32 high_credit, u32 low_credit, u32 queue)
+{
+	void __iomem *ioaddr = hw->pcsr;
+	u32 value;
+
+	writel(send_slope, ioaddr + XGMAC_MTL_TCx_SENDSLOPE(queue));
+	writel(idle_slope, ioaddr + XGMAC_MTL_TCx_QUANTUM_WEIGHT(queue));
+	writel(high_credit, ioaddr + XGMAC_MTL_TCx_HICREDIT(queue));
+	writel(low_credit, ioaddr + XGMAC_MTL_TCx_LOCREDIT(queue));
+
+	value = readl(ioaddr + XGMAC_MTL_TCx_ETS_CONTROL(queue));
+	value |= XGMAC_CC | XGMAC_CBS;
+	writel(value, ioaddr + XGMAC_MTL_TCx_ETS_CONTROL(queue));
+}
+
 static int dwxgmac2_host_irq_status(struct mac_device_info *hw,
 				    struct stmmac_extra_stats *x)
 {
@@ -316,7 +333,7 @@ const struct stmmac_ops dwxgmac210_ops = {
 	.prog_mtl_tx_algorithms = dwxgmac2_prog_mtl_tx_algorithms,
 	.set_mtl_tx_queue_weight = NULL,
 	.map_mtl_to_dma = dwxgmac2_map_mtl_to_dma,
-	.config_cbs = NULL,
+	.config_cbs = dwxgmac2_config_cbs,
 	.dump_regs = NULL,
 	.host_irq_status = dwxgmac2_host_irq_status,
 	.host_mtl_irq_status = dwxgmac2_host_mtl_irq_status,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
index 20909036e002..6c5092e7771c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
@@ -182,6 +182,9 @@ static void dwxgmac2_dma_tx_mode(void __iomem *ioaddr, int mode,
 			value |= 0x7 << XGMAC_TTC_SHIFT;
 	}
 
+	/* Use static TC to Queue mapping */
+	value |= (channel << XGMAC_Q2TCMAP_SHIFT) & XGMAC_Q2TCMAP;
+
 	value &= ~XGMAC_TXQEN;
 	if (qmode != MTL_QUEUE_AVB)
 		value |= 0x2 << XGMAC_TXQEN_SHIFT;
@@ -374,6 +377,21 @@ static void dwxgmac2_enable_tso(void __iomem *ioaddr, bool en, u32 chan)
 	writel(value, ioaddr + XGMAC_DMA_CH_TX_CONTROL(chan));
 }
 
+static void dwxgmac2_qmode(void __iomem *ioaddr, u32 channel, u8 qmode)
+{
+	u32 value = readl(ioaddr + XGMAC_MTL_TXQ_OPMODE(channel));
+
+	value &= ~XGMAC_TXQEN;
+	if (qmode != MTL_QUEUE_AVB) {
+		value |= 0x2 << XGMAC_TXQEN_SHIFT;
+		writel(0, ioaddr + XGMAC_MTL_TCx_ETS_CONTROL(channel));
+	} else {
+		value |= 0x1 << XGMAC_TXQEN_SHIFT;
+	}
+
+	writel(value, ioaddr +  XGMAC_MTL_TXQ_OPMODE(channel));
+}
+
 static void dwxgmac2_set_bfsize(void __iomem *ioaddr, int bfsize, u32 chan)
 {
 	u32 value;
@@ -407,5 +425,6 @@ const struct stmmac_dma_ops dwxgmac210_dma_ops = {
 	.set_rx_tail_ptr = dwxgmac2_set_rx_tail_ptr,
 	.set_tx_tail_ptr = dwxgmac2_set_tx_tail_ptr,
 	.enable_tso = dwxgmac2_enable_tso,
+	.qmode = dwxgmac2_qmode,
 	.set_bfsize = dwxgmac2_set_bfsize,
 };
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 357309a6d6a5..d9a34a4d08b3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -201,7 +201,7 @@ static const struct stmmac_hwif_entry {
 		.mac = &dwxgmac210_ops,
 		.hwtimestamp = &stmmac_ptp,
 		.mode = NULL,
-		.tc = NULL,
+		.tc = &dwmac510_tc_ops,
 		.setup = dwxgmac2_setup,
 		.quirks = NULL,
 	},
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v4 0/3] KASLR feature to randomize each loadable module
From: Edgecombe, Rick P @ 2018-08-30 18:24 UTC (permalink / raw)
  To: alexei.starovoitov@gmail.com
  Cc: linux-kernel@vger.kernel.org, daniel@iogearbox.net,
	jannh@google.com, keescook@chromium.org, arjan@linux.intel.com,
	tglx@linutronix.de, linux-mm@kvack.org, x86@kernel.org,
	kristen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	kernel-hardening@lists.openwall.com, Hansen, Dave,
	netdev@vger.kernel.org
In-Reply-To: <20180830022703.xxl5eolthinicgwp@ast-mbp>

On Wed, 2018-08-29 at 19:27 -0700, Alexei Starovoitov wrote:
> On Wed, Aug 29, 2018 at 03:59:36PM -0700, Rick Edgecombe wrote:
> > Changes for V3:
> >  - Code cleanup based on internal feedback. (thanks to Dave Hansen and
> > Andriy
> >    Shevchenko)
> >  - Slight refactor of existing algorithm to more cleanly live along side new
> >    one.
> >  - BPF synthetic benchmark
> I don't see this benchmark in this patch set.
> Could you prepare it as a test in tools/testing/selftests/bpf/ ?
> so we can double check what is being tested and run it regularly
> like we do for all other tests in there.
Sure.

There were two benchmarks I had run with BPF in mind, one was the timing the
module_alloc function in different scenarios, looking to make sure there were no
slowdowns for insertions.

The other was to check if the fragmentation caused any measurable runtime
performance:
"For runtime performance, a synthetic benchmark was run that does 5000000 BPF
JIT invocations each, from varying numbers of parallel processes, while the
kernel compiles sharing the same CPU to stand in for the cache impact of a real
workload. The seccomp filter invocations were just Jann Horn's seccomp filtering
test from this thread http://openwall.com/lists/kernel-hardening/2018/07/18/2,
except non-real time priority. The kernel was configured with KPTI and
retpoline, and pcid was disabled. There wasn't any significant difference
between the new and the old."

From what I know about the bpf kselftest, the first one would probably be a
better fit. Not sure if the second one would fit, with the kernel compiling
sharing the same CPU, a special config, and a huge amount of processes being
spawned... I can try to add a micro-benchmark instead if that sounds good.

Rick

^ permalink raw reply

* Re: [PATCH 00/15] soc: octeontx2: Add RVU admin function driver
From: Sunil Kovvuri @ 2018-08-30 18:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Arnd Bergmann, LKML, olof, LAKML, linux-soc, Sunil Goutham,
	Linux Netdev List, David S. Miller
In-Reply-To: <20180830132658.GA27566@lunn.ch>

On Thu, Aug 30, 2018 at 6:57 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > > > My feeling overall is that we need a review from the network driver
> > > > folks more than the arm-soc team etc, and that maybe the driver
> > > > as a whole should go into drivers/net/ethernet.
> > >
> > > This driver doesn't handle any network IO and moreever this driver has to handle
> > > configuration requests from crypto driver as well. There will be
> > > separate network and
> > > crypto drivers which will be upstreamed into drivers/net/ethernet and
> > > drivers/crypto.
> > > And in future silicons there will be different types of functional
> > > blocks which will be
> > > added into this resource virtualization unit (RVU). Hence i thought
> > > this driver is not a
> > > right fit in drivers/net/ethernet.
>
> Hi Sunil
>
> Do you have a git branch for everything? I would like to look at the
> actual Ethernet driver, and the full API this driver exports to other
> drivers.

Hi Andrew,

I have pushed all patches into a github repo for your reference.
These are not the final patches, i still need to do some minor changes
before submitting
(i mean otherthan the ones i already submitted).

AF driver & CGX drivers:
https://github.com/sunilkovvuri/rvu_drivers/tree/master/drivers/soc/marvell/octeontx2

Ethernet drivers PF and VF
https://github.com/sunilkovvuri/rvu_drivers/tree/master/drivers/net/ethernet/marvell/octeontx2

Info exported/shared by AF driver with other drivers is
# Mailbox communication APIs, message IDs, message structs e.t.c
   https://github.com/sunilkovvuri/rvu_drivers/blob/master/drivers/soc/marvell/octeontx2/mbox.c
   https://github.com/sunilkovvuri/rvu_drivers/blob/master/drivers/soc/marvell/octeontx2/mbox.h
# Other structs and APIs
    https://github.com/sunilkovvuri/rvu_drivers/blob/master/drivers/soc/marvell/octeontx2/common.h
    https://github.com/sunilkovvuri/rvu_drivers/blob/master/drivers/soc/marvell/octeontx2/rvu_struct.h

>
> I think there real question here is, do you have split between this
> driver and the actual device drivers in the right place? For me, link
> up/down detection should be in the Ethernet driver, since it is not
> shared with the crypto driver.
>

As mentioned in the patch '[PATCH 13/15] soc: octeontx2: Add support
for CGX link management',
link up/down is detected by firmware. And firmware triggers a IRQ to
CGX driver, which then
takes the new status and sends a update to ethernet driver via a
mailbox communication.

Ethernet driver detects the link change and does the necessary stuff.
https://github.com/sunilkovvuri/rvu_drivers/blob/master/drivers/net/ethernet/marvell/octeontx2/otx2_pf.c#L116


>        Thanks
>         Andrew

^ permalink raw reply

* Re: [PATCH 04/15] soc: octeontx2: Add mailbox support infra
From: Sunil Kovvuri @ 2018-08-30 18:36 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: LKML, olof, LAKML, linux-soc, Aleksey Makarov, Sunil Goutham,
	Lukasz Bartosik, Linux Netdev List, David S. Miller
In-Reply-To: <CAK8P3a1W2D5YWEC-VDkvQ8y3W8257g6X7B8_At7g8f9iE73a8w@mail.gmail.com>

On Thu, Aug 30, 2018 at 7:27 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Aug 28, 2018 at 3:23 PM Sunil Kovvuri <sunil.kovvuri@gmail.com> wrote:
> >
> > On Tue, Aug 28, 2018 at 6:22 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Tue, Aug 28, 2018 at 2:48 PM Sunil Kovvuri <sunil.kovvuri@gmail.com> wrote:
> > > >
> > > > On Tue, Aug 28, 2018 at 5:33 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > > > >
> > > > > On Tue, Aug 28, 2018 at 12:57 PM <sunil.kovvuri@gmail.com> wrote:
> > > > > >
> > > > > > From: Aleksey Makarov <amakarov@marvell.com>
> > > > > >
> > > > > > This patch adds mailbox support infrastructure APIs.
> > > > > > Each RVU device has a dedicated 64KB mailbox region
> > > > > > shared with it's peer for communication. RVU AF has
> > > > > > a separate mailbox region shared with each of RVU PFs
> > > > > > and a RVU PF has a separate region shared with each of
> > > > > > it's VF.
> > > > > >
> > > > > > These set of APIs are used by this driver (RVU AF) and
> > > > > > other RVU PF/VF drivers eg netdev, crypto e.t.c.
> > > > > >
> > > > > > Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
> > > > > > Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
> > > > > > Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
> > > > >
> > > > > Why does this driver not use the drivers/mailbox/ infrastructure?
> > > > >
> > > > This is a common administrative software driver which will be handling requests
> > > > from kernel drivers and as well as drivers in userspace applications.
> > > > We had to keep mailbox communication infrastructure same across all usages.
> > >
> > > Can you explain more about the usage of userspace applications
> > > and what interface you plan to use into the kernel?
> >
> > Any PCI device here irrespective in what domain (kernel or userspace)
> > they are in
> > use common mailbox communication. Which is
> > # Write a mailbox msg (format is agreed between all parties) into
> > shared (between AF and other PF/VFs)
> >    memory region and trigger a interrupt to admin function.
> > # Admin function processes the msg and puts reply in the same memory
> > region and trigger
> >    IRQ to the requesting device. If the device has a driver instance
> > in kernel then it uses
> >    IRQ and userspace applications does polling on the IRQ status bit.
>
> Ok, so the mailbox here is a communication mechanism between
> two device drivers that may run on the same kernel, or in different
> instances (user space, virtual machine, ...), but each driver
> only talks to the mailbox visible in its own device, right?

Yes.

>
> What is the purpose of the exported interface then? Is this
> just an abstraction so each of the drivers can talk to its own
> mailbox using a set of common helper functions?
>
>       Arnd

Yes, that's correct.

In kernel there will be a minimum of 3 drivers which will use this
mailbox communication.
So instead of duplicating APIs and structures in every driver, we
thought of adding them
in this AF driver and export them to ethernet and crypto drivers.

Thanks,
Sunil.

^ permalink raw reply

* Re: [PATCH net-next 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
From: kbuild test robot @ 2018-08-30 18:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kbuild-all, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, ktkhai, lucien.xin,
	jakub.kicinski, jbenc, nicolas.dichtel, Christian Brauner
In-Reply-To: <20180828231859.29758-5-christian@brauner.io>

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

Hi Christian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-add-IFA_IF_NETNSID-for-RTM_GETADDR/20180830-194411
config: x86_64-randconfig-s1-08302022 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "rtnl_get_net_ns_capable" [net/ipv6/ipv6.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23841 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-08-30 14:45 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel
In-Reply-To: <81379a4f-7149-10ff-2453-886314d0b0c4@virtuozzo.com>

On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> On 29.08.2018 21:13, Christian Brauner wrote:
> > Hi Kirill,
> > 
> > Thanks for the question!
> > 
> > On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> >> Hi, Christian,
> >>
> >> On 29.08.2018 02:18, Christian Brauner wrote:
> >>> From: Christian Brauner <christian@brauner.io>
> >>>
> >>> Hey,
> >>>
> >>> A while back we introduced and enabled IFLA_IF_NETNSID in
> >>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> >>> to signficant performance increases since it allows userspace to avoid
> >>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> >>> interfaces from the netns associated with the netns_fd. Especially when a
> >>> lot of network namespaces are in use, using setns() becomes increasingly
> >>> problematic when performance matters.
> >>
> >> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> >> problems with the performance? You should do this only once on application
> >> startup, and then you have created netlink sockets in any net namespaces you
> >> need. What is the problem here?
> > 
> > So we have a daemon (LXD) that is often running thousands of containers.
> > When users issue a lxc list request against the daemon it returns a list
> > of all containers including all of the interfaces and addresses for each
> > container. To retrieve those addresses we currently rely on setns() +
> > getifaddrs() for each of those containers. That has horrible
> > performance.
> 
> Could you please provide some numbers showing that setns()
> introduces signify performance decrease in the application?

Sure, might take a few days++ though since I'm traveling.

> 
> I worry about all this just because of netlink interface is
> already overloaded, while this patch makes it even less modular.

Introducing the IFA_IF_NETNSID property will not make the netlink
interface less modular. It is a clean, RTM_*ADDR-request specific
property using network namespace identifiers which we discussed in prior
patches are the way to go forward.

You can already get interfaces via GETLINK from another network
namespaces than the one you reside in (Which we enabled just a few
months back.) but you can't do the same for GETADDR. Those two are
almost always used together. When you want to get the links you usually
also want to get the addresses associated with it right after.
In a prior discussion we agreed that network namespace identifiers are
the way to go forward but that any other propery, i.e. PIDs and fds
should never be ported into other parts of the codebase and that is
indeed something I agree with.

> In case of one day we finally reach rtnl unscalability trap,
> every common interface like this may be a decisive nail in
> a coffin lid of possibility to overwrite everything.
> 
> > The problem with what you're proposing is that the daemon would need to
> > cache a socket file descriptor for each container which is something
> > that we unfortunately cannot do since we can't excessively cache file
> > descriptors because we can easily hit the open file limit. We also
> > refrain from caching file descriptors for a long time for security
> > reasons.
> > 
> > For the case where users just request a list of the interfaces we
> > can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
> > performance. But we can't do the same with RTM_GETADDR requests which
> > was an oversight on my part when I wrote the original patchset for the
> > RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
> > RTM_GETADDR.
> > Based on this patchset I have written a userspace POC that is basically
> > a netns namespace aware getifaddr() or - as I like to call it -
> > netns_getifaddr().
> > 
> >>
> >>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> >>> getifaddrs() style functions and friends). But currently, RTM_GETADDR
> >>> requests do not support a similar property like IFLA_IF_NETNSID for
> >>> RTM_*LINK requests.
> >>> This is problematic since userspace can retrieve interfaces from another
> >>> network namespace by sending a IFLA_IF_NETNSID property along but
> >>> RTM_GETLINK request but is still forced to use the legacy setns() style of
> >>> retrieving interfaces in RTM_GETADDR requests.
> >>>
> >>> The goal of this series is to make it possible to perform RTM_GETADDR
> >>> requests on different network namespaces. To this end a new IFA_IF_NETNSID
> >>> property for RTM_*ADDR requests is introduced. It can be used to send a
> >>> network namespace identifier along in RTM_*ADDR requests.  The network
> >>> namespace identifier will be used to retrieve the target network namespace
> >>> in which the request is supposed to be fulfilled.  This aligns the behavior
> >>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> >>>
> >>> Security:
> >>> - The caller must have assigned a valid network namespace identifier for
> >>>   the target network namespace.
> >>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> >>>   target network namespace.
> >>>
> >>> Thanks!
> >>> Christian
> >>>
> >>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> >>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> >>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> >>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> >>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> >>>
> >>> Christian Brauner (5):
> >>>   rtnetlink: add rtnl_get_net_ns_capable()
> >>>   if_addr: add IFA_IF_NETNSID
> >>>   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> >>>   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> >>>   rtnetlink: move type calculation out of loop
> >>>
> >>>  include/net/rtnetlink.h      |  1 +
> >>>  include/uapi/linux/if_addr.h |  1 +
> >>>  net/core/rtnetlink.c         | 15 +++++---
> >>>  net/ipv4/devinet.c           | 38 +++++++++++++++-----
> >>>  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
> >>>  5 files changed, 97 insertions(+), 28 deletions(-)
> >>>

^ permalink raw reply

* [PATCH net-next] net/sched: fix type of htb statistics
From: Florent Fourcot @ 2018-08-30 14:39 UTC (permalink / raw)
  To: netdev; +Cc: Florent Fourcot

tokens and ctokens are defined as s64 in htb_class structure,
and clamped to 32bits value during netlink dumps:

cl->xstats.tokens = clamp_t(s64, PSCHED_NS2TICKS(cl->tokens),
                            INT_MIN, INT_MAX);

Defining it as u32 is working since userspace (tc) is printing it as
signed int, but a correct definition from the beginning is probably
better.

In the same time, 'giants' structure member is unused since years, so
update the comment to mark it unused.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
---
 include/uapi/linux/pkt_sched.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 8975fd1a1421..e9b7244ac381 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -395,9 +395,9 @@ enum {
 struct tc_htb_xstats {
 	__u32 lends;
 	__u32 borrows;
-	__u32 giants;	/* too big packets (rate will not be accurate) */
-	__u32 tokens;
-	__u32 ctokens;
+	__u32 giants;	/* unused since 'Make HTB scheduler work with TSO.' */
+	__s32 tokens;
+	__s32 ctokens;
 };
 
 /* HFSC section */
-- 
2.11.0

^ permalink raw reply related

* [PATCH iproute2] tc/htb: remove unused variable
From: Florent Fourcot @ 2018-08-30 14:38 UTC (permalink / raw)
  To: netdev; +Cc: Florent Fourcot

Since introduction of htb module, this variable has never been used.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
---
 tc/q_htb.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/tc/q_htb.c b/tc/q_htb.c
index b93d31d4..c8b2941d 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -109,7 +109,6 @@ static int htb_parse_opt(struct qdisc_util *qu, int argc,
 
 static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, struct nlmsghdr *n, const char *dev)
 {
-	int ok = 0;
 	struct tc_htb_opt opt = {};
 	__u32 rtab[256], ctab[256];
 	unsigned buffer = 0, cbuffer = 0;
@@ -127,7 +126,6 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 			if (get_u32(&opt.prio, *argv, 10)) {
 				explain1("prio"); return -1;
 			}
-			ok++;
 		} else if (matches(*argv, "mtu") == 0) {
 			NEXT_ARG();
 			if (get_u32(&mtu, *argv, 10)) {
@@ -161,7 +159,6 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 				explain1("buffer");
 				return -1;
 			}
-			ok++;
 		} else if (matches(*argv, "cburst") == 0 ||
 			   strcmp(*argv, "cbuffer") == 0 ||
 			   strcmp(*argv, "cmaxburst") == 0) {
@@ -170,7 +167,6 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 				explain1("cbuffer");
 				return -1;
 			}
-			ok++;
 		} else if (strcmp(*argv, "ceil") == 0) {
 			NEXT_ARG();
 			if (ceil64) {
@@ -186,7 +182,6 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 				explain1("ceil");
 				return -1;
 			}
-			ok++;
 		} else if (strcmp(*argv, "rate") == 0) {
 			NEXT_ARG();
 			if (rate64) {
@@ -202,7 +197,6 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 				explain1("rate");
 				return -1;
 			}
-			ok++;
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -214,9 +208,6 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 		argc--; argv++;
 	}
 
-	/*	if (!ok)
-		return 0;*/
-
 	if (!rate64) {
 		fprintf(stderr, "\"rate\" is required.\n");
 		return -1;
-- 
2.11.0

^ permalink raw reply related

* [PATCH] i40e: mark expected switch fall-through
From: Gustavo A. R. Silva @ 2018-08-30 18:50 UTC (permalink / raw)
  To: Jeff Kirsher, David S. Miller
  Cc: intel-wired-lan, netdev, linux-kernel, Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1473099 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 94947a8..a6f50f6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -300,9 +300,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
+		/* fall through */
 	case XDP_ABORTED:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
-		/* fallthrough -- handle aborts by dropping packet */
+		/* fall through -- handle aborts by dropping packet */
 	case XDP_DROP:
 		result = I40E_XDP_CONSUMED;
 		break;
-- 
2.7.4

^ permalink raw reply related

* Re: [iproute PATCH] iprule: Fix for incorrect space between dst and prefix
From: Stephen Hemminger @ 2018-08-30 14:51 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev
In-Reply-To: <20180829135257.4545-1-phil@nwl.cc>

On Wed, 29 Aug 2018 15:52:57 +0200
Phil Sutter <phil@nwl.cc> wrote:

> This was added by accident when introducing JSON support.
> 
> Fixes: 0dd4ccc56c0e3 ("iprule: add json support")
> Signed-off-by: Phil Sutter <phil@nwl.cc>


Applied, thanks.

^ permalink raw reply

* Re: [Patch iproute2 v2] ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets
From: Stephen Hemminger @ 2018-08-30 14:55 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <20180829170927.16900-1-xiyou.wangcong@gmail.com>

On Wed, 29 Aug 2018 10:09:27 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
> make them available in ss -e output.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks

^ permalink raw reply

* Re: [PATCHv3 iproute2 0/2] clang + misc changes
From: Stephen Hemminger @ 2018-08-30 14:59 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: netdev, Mahesh Bandewar
In-Reply-To: <20180823010130.147780-1-mahesh@bandewar.net>

On Wed, 22 Aug 2018 18:01:30 -0700
Mahesh Bandewar <mahesh@bandewar.net> wrote:

> From: Mahesh Bandewar <maheshb@google.com>
> 
> The primary theme is to make clang compile the iproute2 package without
> warnings. Along with this there are two other misc patches in the series.
> 
> First patch uses the preferred_family when operating with maddr feature.
> Prior to this patch, it would always open an AF_INET socket irrespective
> of the family that is preferred via command-line. 
> 
> Second patch mostly adds format attributes to make the c-lang compiler
> happy and not throw the warning messages.
> 
> Mahesh Bandewar (2):
>   ipmaddr: use preferred_family when given
>   iproute: make clang happy with iproute2 package
> 
>  include/json_writer.h |  3 +--
>  ip/iplink_can.c       | 19 ++++++++++++-------
>  ip/ipmaddr.c          | 13 ++++++++++++-
>  lib/color.c           |  1 +
>  lib/json_print.c      |  1 +
>  lib/json_writer.c     | 15 +--------------
>  misc/ss.c             |  3 ++-
>  tc/m_ematch.c         |  1 +
>  tc/m_ematch.h         |  1 +
>  9 files changed, 32 insertions(+), 25 deletions(-)
> 
> -- 
> 2.18.0.1017.ga543ac7ca45-goog
> 

Applied, thanks

^ permalink raw reply

* Re: [PATCH iproute2] tc/htb: remove unused variable
From: Stephen Hemminger @ 2018-08-30 15:01 UTC (permalink / raw)
  To: Florent Fourcot; +Cc: netdev
In-Reply-To: <20180830143854.24928-1-florent.fourcot@wifirst.fr>

On Thu, 30 Aug 2018 16:38:54 +0200
Florent Fourcot <florent.fourcot@wifirst.fr> wrote:

> Since introduction of htb module, this variable has never been used.
> 
> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>

Looks good. Applied

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox