Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next] selftests/bpf: use localhost in tcp_{server,client}.py
From: Daniel Borkmann @ 2019-02-04 22:16 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev; +Cc: davem, ast
In-Reply-To: <20190204184319.177504-1-sdf@google.com>

On 02/04/2019 07:43 PM, Stanislav Fomichev wrote:
> Bind and connect to localhost. There is no reason for this test to
> use non-localhost interface. This lets us run this test in a network
> namespace.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v1] net: dsa: qca8k: implement DT-based ports <-> phy translation
From: Florian Fainelli @ 2019-02-04 22:11 UTC (permalink / raw)
  To: Christian Lamparter, netdev; +Cc: Vivien Didelot, Andrew Lunn
In-Reply-To: <20190204213555.26054-1-chunkeey@gmail.com>

On 2/4/19 1:35 PM, Christian Lamparter wrote:
> The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4.
> Based on the System Block Diagram in Section 1.2 of the
> QCA8337's datasheet. These PHYs are internally connected
> to MACs of PORT 1 - PORT 5. However, neither qca8k's slave
> mdio access functions qca8k_phy_read()/qca8k_phy_write()
> nor the dsa framework is set up for that.
> 
> This version of the patch uses the existing phy-handle
> properties of each specified DSA Port in the DT to map
> each PORT/MAC to its exposed PHY on the MDIO bus. This
> is supported by the current binding document qca8k.txt
> as well.

I don't think you should have to do any of this translation, because you
can do a couple of things with DSA/Device Tree:

- you can not provide a phy-handle property at all, in which case, the
core DSA layer assumes that the PHY is part of the switch's internal
MDIO bus which is implictly created by dsa_slave_mii_bus_create()

- you can specify a phy-handle property and then the PHY device tree
node can be placed pretty much anywhere in Device Tree, including on a
separate MDIO bus Device Tre node which is "external" to the switch

In either case, the PHY device's MDIO bus parent and its address are
taken care of by drivers/of/of_mdio.c. You can look at mx88e6xxx for how
it deals with its internal vs. external MDIO bus controller and that
driver is used on a wide variety of cconfiguration.

> 
> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
> ---
>  drivers/net/dsa/qca8k.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
> index a4b6cda38016..6558b7ed855d 100644
> --- a/drivers/net/dsa/qca8k.c
> +++ b/drivers/net/dsa/qca8k.c
> @@ -11,6 +11,7 @@
>  #include <linux/netdevice.h>
>  #include <net/dsa.h>
>  #include <linux/of_net.h>
> +#include <linux/of_mdio.h>
>  #include <linux/of_platform.h>
>  #include <linux/if_bridge.h>
>  #include <linux/mdio.h>
> @@ -612,20 +613,50 @@ qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy)
>  	qca8k_port_set_status(priv, port, 1);
>  }
>  
> +static int
> +qca8k_to_real_phy(struct dsa_switch *ds, int phy)
> +{
> +	struct device_node *phy_dn, *port_dn;
> +	int id;
> +
> +	if (phy >= ds->num_ports)
> +		return -EINVAL;
> +
> +	port_dn = ds->ports[phy].dn;
> +	if (!port_dn)
> +		return -EINVAL;
> +
> +	phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
> +	if (!phy_dn)
> +		return phy;
> +
> +	id = of_mdio_parse_addr(ds->dev, phy_dn);
> +	of_node_put(phy_dn);
> +	return id;
> +}
> +
>  static int
>  qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
>  {
>  	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
> +	int realphy = qca8k_to_real_phy(ds, phy);
> +
> +	if (realphy < 0)
> +		return realphy;
>  
> -	return mdiobus_read(priv->bus, phy, regnum);
> +	return mdiobus_read(priv->bus, realphy, regnum);
>  }
>  
>  static int
>  qca8k_phy_write(struct dsa_switch *ds, int phy, int regnum, u16 val)
>  {
>  	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
> +	int realphy = qca8k_to_real_phy(ds, phy);
> +
> +	if (realphy < 0)
> +		return realphy;
>  
> -	return mdiobus_write(priv->bus, phy, regnum, val);
> +	return mdiobus_write(priv->bus, realphy, regnum, val);
>  }
>  
>  static void
> 


-- 
Florian

^ permalink raw reply

* [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
From: John David Anglin @ 2019-02-04 21:59 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <53b49df8-53ed-704f-9197-230b18d83090@bell.net>

This change fixes a race condition in the setup of hardware irqs and the
code enabling PHY link detection in the mv88e6xxx driver.

This race was observed on the espressobin board where the GPIO interrupt
controller only supports edge interrupts.  If the INTn output pin goes low
before the GPIO interrupt is enabled, PHY link interrupts are not detected.

With this change, we
1) force INTn high by clearing all interrupt enables in global 1 control 1,
2) setup the hardware irq, and then
3) perform the remaining common setup.

This simplifies the setup and allows some unnecessary code to be removed.

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
b/drivers/net/dsa/mv88e6xxx/chip.c
index b2a0e59b6252..9f5c416a3223 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -374,10 +374,29 @@ static void mv88e6xxx_g1_irq_free(struct
mv88e6xxx_chip *chip)
     mutex_unlock(&chip->reg_lock);
 }
 
+static int mv88e6xxx_g1_irq_setup_masks(struct mv88e6xxx_chip *chip)
+{
+    int err;
+    u16 reg;
+
+    /* The INTn output must be high when hardware interrupts are setup.
+       The EEPROM done interrupt enable is set on reset, so clear all
+       interrupt enable bits to ensure INTn is not driven low */
+    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &reg);
+    if (err)
+        return err;
+    reg &= ~GENMASK(chip->info->g1_irqs, 0);
+    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, reg);
+    if (err)
+        return err;
+
+    /* Reading the interrupt status clears (most of) them */
+    return mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
+}
+
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 {
-    int err, irq, virq;
-    u16 reg, mask;
+    int irq;
 
     chip->g1_irq.nirqs = chip->info->g1_irqs;
     chip->g1_irq.domain = irq_domain_add_simple(
@@ -392,43 +411,14 @@ static int mv88e6xxx_g1_irq_setup_common(struct
mv88e6xxx_chip *chip)
     chip->g1_irq.chip = mv88e6xxx_g1_irq_chip;
     chip->g1_irq.masked = ~0;
 
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
-    if (err)
-        goto out_mapping;
-
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-
-    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-    if (err)
-        goto out_disable;
-
-    /* Reading the interrupt status clears (most of) them */
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
-    if (err)
-        goto out_disable;
-
     return 0;
-
-out_disable:
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-    mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-
-out_mapping:
-    for (irq = 0; irq < 16; irq++) {
-        virq = irq_find_mapping(chip->g1_irq.domain, irq);
-        irq_dispose_mapping(virq);
-    }
-
-    irq_domain_remove(chip->g1_irq.domain);
-
-    return err;
 }
 
 static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip *chip)
 {
     int err;
 
-    err = mv88e6xxx_g1_irq_setup_common(chip);
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
     if (err)
         return err;
 
@@ -437,9 +427,9 @@ static int mv88e6xxx_g1_irq_setup(struct
mv88e6xxx_chip *chip)
                    IRQF_ONESHOT | IRQF_SHARED,
                    dev_name(chip->dev), chip);
     if (err)
-        mv88e6xxx_g1_irq_free_common(chip);
+        return err;
 
-    return err;
+    return mv88e6xxx_g1_irq_setup_common(chip);
 }
 
 static void mv88e6xxx_irq_poll(struct kthread_work *work)
@@ -457,6 +447,10 @@ static int mv88e6xxx_irq_poll_setup(struct
mv88e6xxx_chip *chip)
 {
     int err;
 
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
+    if (err)
+        return err;
+
     err = mv88e6xxx_g1_irq_setup_common(chip);
     if (err)
         return err;

Signed-off-by: John David Anglin <dave.anglin@bell.net>

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply related

* [PATCH v2] net: emac: remove IBM_EMAC_RX_SKB_HEADROOM
From: Christian Lamparter @ 2019-02-04 21:58 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller

The EMAC driver had a custom IBM_EMAC_RX_SKB_HEADROOM
Kconfig option that reserved additional skb headroom for RX.
This patch removes the option and migrates the code
to use napi_alloc_skb() and netdev_alloc_skb_ip_align()
in its place.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
---
 drivers/net/ethernet/ibm/emac/Kconfig | 12 -----
 drivers/net/ethernet/ibm/emac/core.c  | 64 ++++++++++++++++++---------
 drivers/net/ethernet/ibm/emac/core.h  | 10 ++---
 3 files changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/Kconfig b/drivers/net/ethernet/ibm/emac/Kconfig
index 90d49191beb3..eacf7e141fdc 100644
--- a/drivers/net/ethernet/ibm/emac/Kconfig
+++ b/drivers/net/ethernet/ibm/emac/Kconfig
@@ -28,18 +28,6 @@ config IBM_EMAC_RX_COPY_THRESHOLD
 	depends on IBM_EMAC
 	default "256"
 
-config IBM_EMAC_RX_SKB_HEADROOM
-	int "Additional RX skb headroom (bytes)"
-	depends on IBM_EMAC
-	default "0"
-	help
-	  Additional receive skb headroom. Note, that driver
-	  will always reserve at least 2 bytes to make IP header
-	  aligned, so usually there is no need to add any additional
-	  headroom.
-
-	  If unsure, set to 0.
-
 config IBM_EMAC_DEBUG
 	bool "Debugging"
 	depends on IBM_EMAC
diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 209255495bc9..5fc5fa37d305 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -1071,7 +1071,9 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 
 	/* Second pass, allocate new skbs */
 	for (i = 0; i < NUM_RX_BUFF; ++i) {
-		struct sk_buff *skb = alloc_skb(rx_skb_size, GFP_ATOMIC);
+		struct sk_buff *skb;
+
+		skb = netdev_alloc_skb_ip_align(dev->ndev, rx_skb_size);
 		if (!skb) {
 			ret = -ENOMEM;
 			goto oom;
@@ -1080,10 +1082,10 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 		BUG_ON(!dev->rx_skb[i]);
 		dev_kfree_skb(dev->rx_skb[i]);
 
-		skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
 		dev->rx_desc[i].data_ptr =
-		    dma_map_single(&dev->ofdev->dev, skb->data - 2, rx_sync_size,
-				   DMA_FROM_DEVICE) + 2;
+		    dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
+				   rx_sync_size, DMA_FROM_DEVICE)
+				   + NET_IP_ALIGN;
 		dev->rx_skb[i] = skb;
 	}
  skip:
@@ -1174,20 +1176,18 @@ static void emac_clean_rx_ring(struct emac_instance *dev)
 	}
 }
 
-static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
-				    gfp_t flags)
+static inline int
+__emac_prepare_rx_skb(struct sk_buff *skb, struct emac_instance *dev, int slot)
 {
-	struct sk_buff *skb = alloc_skb(dev->rx_skb_size, flags);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
 	dev->rx_skb[slot] = skb;
 	dev->rx_desc[slot].data_len = 0;
 
-	skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
 	dev->rx_desc[slot].data_ptr =
-	    dma_map_single(&dev->ofdev->dev, skb->data - 2, dev->rx_sync_size,
-			   DMA_FROM_DEVICE) + 2;
+	    dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
+			   dev->rx_sync_size, DMA_FROM_DEVICE) + NET_IP_ALIGN;
 	wmb();
 	dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
 	    (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
@@ -1195,6 +1195,27 @@ static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
 	return 0;
 }
 
+static inline int
+emac_alloc_rx_skb(struct emac_instance *dev, int slot)
+{
+	struct sk_buff *skb;
+
+	skb = __netdev_alloc_skb_ip_align(dev->ndev, dev->rx_skb_size,
+					  GFP_KERNEL);
+
+	return __emac_prepare_rx_skb(skb, dev, slot);
+}
+
+static inline int
+emac_alloc_rx_skb_napi(struct emac_instance *dev, int slot)
+{
+	struct sk_buff *skb;
+
+	skb = napi_alloc_skb(&dev->mal->napi, dev->rx_skb_size);
+
+	return __emac_prepare_rx_skb(skb, dev, slot);
+}
+
 static void emac_print_link_status(struct emac_instance *dev)
 {
 	if (netif_carrier_ok(dev->ndev))
@@ -1225,7 +1246,7 @@ static int emac_open(struct net_device *ndev)
 
 	/* Allocate RX ring */
 	for (i = 0; i < NUM_RX_BUFF; ++i)
-		if (emac_alloc_rx_skb(dev, i, GFP_KERNEL)) {
+		if (emac_alloc_rx_skb(dev, i)) {
 			printk(KERN_ERR "%s: failed to allocate RX ring\n",
 			       ndev->name);
 			goto oom;
@@ -1660,8 +1681,9 @@ static inline void emac_recycle_rx_skb(struct emac_instance *dev, int slot,
 	DBG2(dev, "recycle %d %d" NL, slot, len);
 
 	if (len)
-		dma_map_single(&dev->ofdev->dev, skb->data - 2,
-			       EMAC_DMA_ALIGN(len + 2), DMA_FROM_DEVICE);
+		dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
+			       SKB_DATA_ALIGN(len + NET_IP_ALIGN),
+			       DMA_FROM_DEVICE);
 
 	dev->rx_desc[slot].data_len = 0;
 	wmb();
@@ -1713,7 +1735,7 @@ static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
 		int len = dev->rx_desc[slot].data_len;
 		int tot_len = dev->rx_sg_skb->len + len;
 
-		if (unlikely(tot_len + 2 > dev->rx_skb_size)) {
+		if (unlikely(tot_len + NET_IP_ALIGN > dev->rx_skb_size)) {
 			++dev->estats.rx_dropped_mtu;
 			dev_kfree_skb(dev->rx_sg_skb);
 			dev->rx_sg_skb = NULL;
@@ -1769,16 +1791,18 @@ static int emac_poll_rx(void *param, int budget)
 		}
 
 		if (len && len < EMAC_RX_COPY_THRESH) {
-			struct sk_buff *copy_skb =
-			    alloc_skb(len + EMAC_RX_SKB_HEADROOM + 2, GFP_ATOMIC);
+			struct sk_buff *copy_skb;
+
+			copy_skb = napi_alloc_skb(&dev->mal->napi, len);
 			if (unlikely(!copy_skb))
 				goto oom;
 
-			skb_reserve(copy_skb, EMAC_RX_SKB_HEADROOM + 2);
-			memcpy(copy_skb->data - 2, skb->data - 2, len + 2);
+			memcpy(copy_skb->data - NET_IP_ALIGN,
+			       skb->data - NET_IP_ALIGN,
+			       len + NET_IP_ALIGN);
 			emac_recycle_rx_skb(dev, slot, len);
 			skb = copy_skb;
-		} else if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC)))
+		} else if (unlikely(emac_alloc_rx_skb_napi(dev, slot)))
 			goto oom;
 
 		skb_put(skb, len);
@@ -1799,7 +1823,7 @@ static int emac_poll_rx(void *param, int budget)
 	sg:
 		if (ctrl & MAL_RX_CTRL_FIRST) {
 			BUG_ON(dev->rx_sg_skb);
-			if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC))) {
+			if (unlikely(emac_alloc_rx_skb_napi(dev, slot))) {
 				DBG(dev, "rx OOM %d" NL, slot);
 				++dev->estats.rx_dropped_oom;
 				emac_recycle_rx_skb(dev, slot, 0);
diff --git a/drivers/net/ethernet/ibm/emac/core.h b/drivers/net/ethernet/ibm/emac/core.h
index 84caa4a3fc52..187689cd8212 100644
--- a/drivers/net/ethernet/ibm/emac/core.h
+++ b/drivers/net/ethernet/ibm/emac/core.h
@@ -68,22 +68,18 @@ static inline int emac_rx_size(int mtu)
 		return mal_rx_size(ETH_DATA_LEN + EMAC_MTU_OVERHEAD);
 }
 
-#define EMAC_DMA_ALIGN(x)		ALIGN((x), dma_get_cache_alignment())
-
-#define EMAC_RX_SKB_HEADROOM		\
-	EMAC_DMA_ALIGN(CONFIG_IBM_EMAC_RX_SKB_HEADROOM)
-
 /* Size of RX skb for the given MTU */
 static inline int emac_rx_skb_size(int mtu)
 {
 	int size = max(mtu + EMAC_MTU_OVERHEAD, emac_rx_size(mtu));
-	return EMAC_DMA_ALIGN(size + 2) + EMAC_RX_SKB_HEADROOM;
+
+	return SKB_DATA_ALIGN(size + NET_IP_ALIGN) + NET_SKB_PAD;
 }
 
 /* RX DMA sync size */
 static inline int emac_rx_sync_size(int mtu)
 {
-	return EMAC_DMA_ALIGN(emac_rx_size(mtu) + 2);
+	return SKB_DATA_ALIGN(emac_rx_size(mtu) + NET_IP_ALIGN);
 }
 
 /* Driver statistcs is split into two parts to make it more cache friendly:
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH rdma-next 00/12] Add SRQ and XRC support for ODP MRs
From: Jason Gunthorpe @ 2019-02-04 21:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190203105430.GF3634@mtr-leonro.mtl.com>

On Sun, Feb 03, 2019 at 12:54:30PM +0200, Leon Romanovsky wrote:
> On Thu, Jan 31, 2019 at 04:27:39PM -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 22, 2019 at 08:48:39AM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > >
> > > Hi,
> > >
> > > This series extend ODP to work with SRQ and XRC. Being both per-operation
> > > (e.g. RDMA write, RDMA read and atomic) and per-transport (e.g. RC, UD and XRC),
> > > we extend IB/core and mlx5 driver to provide needed information to user space.
> > >
> > > Thanks
> > >
> > > Moni Shoua (12):
> > >   IB/mlx5: Fix locking SRQ object in ODP event
> > >   IB/core: Allocate bit for SRQ ODP support
> > >   IB/uverbs: Expose XRC ODP device capabilities
> > >   IB/mlx5: Remove useless check in ODP handler
> > >   IB/mlx5: Clean mlx5_ib_mr_responder_pfault_handler() signature
> > >   IB/mlx5: Add XRC initiator ODP support
> > >   IB/mlx5: Let read user wqe also from SRQ buffer
> > >   IB/mlx5: Add ODP SRQ support
> > >   IB/mlx5: Advertise SRQ ODP support for supported transports
> >
> > I applied these patches to for-next
> >
> > >   net/mlx5: Add XRC transport to ODP device capabilities layout
> > >   IB/mlx5: Advertise XRC ODP support
> > >   net/mlx5: Set ODP SRQ support in firmware
> >
> > This might need some re-organizing - the last patch could be split
> > (possibly merge with the first) so the header changes can go to the
> > shared branch, but the handle_hca_cap_odp() stuff must only be applied
> > to the rdma tree.
> >
> > I'm fine either way, if you don't want to split it send a commit ID
> > for the first patch on mlx5-next.
> 
> I applied two following patches,
> 
> 46861e3e88be net/mlx5: Set ODP SRQ support in firmware
> dda7a817f287 net/mlx5: Add XRC transport to ODP device capabilities layout

Okay, done..

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <20190204212832.GB3397@lunn.ch>

On 04.02.2019 22:28, Andrew Lunn wrote:
> On Mon, Feb 04, 2019 at 10:03:21PM +0100, Heiner Kallweit wrote:
>> Add support for speeds 10Mbps, 5Gbps, and 10Gbps. In addition don't
>> hardcode duplex but read it from the chip.
> 
> Hi Heiner
> 
> The marvell10g does this differently. It gets the local and link
> partner advertised link modes and from that works out what the PHY is
> doing. If auto-neg is not being used, it then reads the link speed
> from the PMA.
> 
Right, it's the same mechanism we use in genphy_read_status() for
clause 22.

> The question is, should the Aquantia PHY do the same, or should it
> look an vendor registers? Apart from getting the 1G advertisement, all
> the Marvell code uses generic registers. So we should be able to move
> most of it into phy-c45 and reuse it. That is what i would prefer.
> 
I'd like to use standard registers wherever possible. This patch is
meant as a quick win to improve what we do already in aqr_read_status.
Once we have a generic c45 read_status function we should switch to it.
However I assume that information like interface mode we still have
to read from vendor registers.

>      Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
From: John David Anglin @ 2019-02-04 21:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190204201905.GA2488@lunn.ch>

On 2019-02-04 3:19 p.m., Andrew Lunn wrote:
> The IRQ core would do this if it was needed.
>
> How many other irq thread work functions can you point to which do
> something similar?
This is comment for handle_edge_irq:

/**
 *    handle_edge_irq - edge type IRQ handler
 *    @desc:    the interrupt description structure for this irq
 *
 *    Interrupt occures on the falling and/or rising edge of a hardware
 *    signal. The occurrence is latched into the irq controller hardware
 *    and must be acked in order to be reenabled. After the ack another
 *    interrupt can happen on the same source even before the first one
 *    is handled by the associated event handler. If this happens it
 *    might be necessary to disable (mask) the interrupt depending on the
 *    controller hardware. This requires to reenable the interrupt inside
 *    of the loop which handles the interrupts which have arrived while
 *    the handler was running. If all pending interrupts are handled, the
 *    loop is left.
 */

As can be seen, the above comment suggests that it may be necessary to
disable (mask) interrupt
as I proposed.

I see no evidence from the Marvell functional specifications for the
88E6341 that it sequences
interrupts from the various sources although it might be that device
interrupts are sequenced
so INTn rises and falls.  I haven't seen any ports fail to link without
the hunk on espressobin
but it is hard to stress test the code.

Disabling and re-enabling interrupts in the global control register does
not affect their status.
Thus, at worst, the hunk adds a bit of unnecessary code.  It could be
skipped if we knew we
were using level interrupts.

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply

* [PATCH v1] net: dsa: qca8k: implement DT-based ports <-> phy translation
From: Christian Lamparter @ 2019-02-04 21:35 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Vivien Didelot, Andrew Lunn

The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4.
Based on the System Block Diagram in Section 1.2 of the
QCA8337's datasheet. These PHYs are internally connected
to MACs of PORT 1 - PORT 5. However, neither qca8k's slave
mdio access functions qca8k_phy_read()/qca8k_phy_write()
nor the dsa framework is set up for that.

This version of the patch uses the existing phy-handle
properties of each specified DSA Port in the DT to map
each PORT/MAC to its exposed PHY on the MDIO bus. This
is supported by the current binding document qca8k.txt
as well.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
---
 drivers/net/dsa/qca8k.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index a4b6cda38016..6558b7ed855d 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -11,6 +11,7 @@
 #include <linux/netdevice.h>
 #include <net/dsa.h>
 #include <linux/of_net.h>
+#include <linux/of_mdio.h>
 #include <linux/of_platform.h>
 #include <linux/if_bridge.h>
 #include <linux/mdio.h>
@@ -612,20 +613,50 @@ qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy)
 	qca8k_port_set_status(priv, port, 1);
 }
 
+static int
+qca8k_to_real_phy(struct dsa_switch *ds, int phy)
+{
+	struct device_node *phy_dn, *port_dn;
+	int id;
+
+	if (phy >= ds->num_ports)
+		return -EINVAL;
+
+	port_dn = ds->ports[phy].dn;
+	if (!port_dn)
+		return -EINVAL;
+
+	phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
+	if (!phy_dn)
+		return phy;
+
+	id = of_mdio_parse_addr(ds->dev, phy_dn);
+	of_node_put(phy_dn);
+	return id;
+}
+
 static int
 qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
+	int realphy = qca8k_to_real_phy(ds, phy);
+
+	if (realphy < 0)
+		return realphy;
 
-	return mdiobus_read(priv->bus, phy, regnum);
+	return mdiobus_read(priv->bus, realphy, regnum);
 }
 
 static int
 qca8k_phy_write(struct dsa_switch *ds, int phy, int regnum, u16 val)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
+	int realphy = qca8k_to_real_phy(ds, phy);
+
+	if (realphy < 0)
+		return realphy;
 
-	return mdiobus_write(priv->bus, phy, regnum, val);
+	return mdiobus_write(priv->bus, realphy, regnum, val);
 }
 
 static void
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 0/7] sh_eth: implement simple RX checksum offload
From: David Miller @ 2019-02-04 21:31 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-renesas-soc, linux-sh
In-Reply-To: <a21deed1-35dc-f1be-6c7e-7061ebe4b56c@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Mon, 4 Feb 2019 21:01:25 +0300

> Here's a set of 7 patches against DaveM's 'net-next.git' repo. I'm implemeting
> the simple RX checksum offload (like was done for the 'ravb' driver by Simon
> Horman); it has been only tested on the R8A7740 and R8A77980 SoCs, the other
> SoCs should just work (according to their manuals)...

Series applied, thanks.

There was a "tha" --> "the" typo in one of your commit messages which I
fixed up.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] net: phy: aquantia: use FIELD_GET for getting speed in aqr_read_status
From: Andrew Lunn @ 2019-02-04 21:31 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <f3554610-2a8b-dec8-daad-fd592404dc29@gmail.com>

On Mon, Feb 04, 2019 at 10:09:06PM +0100, Heiner Kallweit wrote:
> Change getting the speed to use FIELD_GET() too to be in line with the
> rest of the code.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Andrew Lunn @ 2019-02-04 21:28 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <d1f1160c-ebea-0e7b-4d73-a27ebbd5c199@gmail.com>

On Mon, Feb 04, 2019 at 10:03:21PM +0100, Heiner Kallweit wrote:
> Add support for speeds 10Mbps, 5Gbps, and 10Gbps. In addition don't
> hardcode duplex but read it from the chip.

Hi Heiner

The marvell10g does this differently. It gets the local and link
partner advertised link modes and from that works out what the PHY is
doing. If auto-neg is not being used, it then reads the link speed
from the PMA.

The question is, should the Aquantia PHY do the same, or should it
look an vendor registers? Apart from getting the 1G advertisement, all
the Marvell code uses generic registers. So we should be able to move
most of it into phy-c45 and reuse it. That is what i would prefer.

     Andrew

^ permalink raw reply

* Re: [PATCH mlx5-next 12/12] net/mlx5: Set ODP SRQ support in firmware
From: Jason Gunthorpe @ 2019-02-04 21:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190203090311.GD3634@mtr-leonro.mtl.com>


On Sun, Feb 03, 2019 at 11:03:11AM +0200, Leon Romanovsky wrote:
> On Thu, Jan 31, 2019 at 04:28:44PM -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 22, 2019 at 08:48:51AM +0200, Leon Romanovsky wrote:
> > > From: Moni Shoua <monis@mellanox.com>
> > >
> > > To avoid compatibility issue with older kernels the firmware doesn't
> > > allow SRQ to work with ODP unless kernel asks for it.
> > >
> > > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >  .../net/ethernet/mellanox/mlx5/core/main.c    | 53 +++++++++++++++++++
> > >  include/linux/mlx5/device.h                   |  3 ++
> > >  include/linux/mlx5/mlx5_ifc.h                 |  1 +
> > >  3 files changed, 57 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > index be81b319b0dc..b3a76df0cf6c 100644
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > @@ -459,6 +459,53 @@ static int handle_hca_cap_atomic(struct mlx5_core_dev *dev)
> > >  	return err;
> > >  }
> > >
> > > +static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
> > > +{
> > > +	void *set_ctx;
> > > +	void *set_hca_cap;
> > > +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > > +	int err;
> > > +
> > > +	if (!MLX5_CAP_GEN(dev, pg))
> > > +		return 0;
> >
> > Should a
> >
> >     if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING))
> >              return 0;
> >
> > Be here?
> 
> We had similar discussion in mlx5_ib main.c, but here we are talking
> about mlx5_core code, which from my point of view should represent the
> real HW capabilities without relation to kernel compilation mode.

This switch is to tell the FW that the mlx5_ib module supports the new
protocol - so having it in core code at all is really weird. I assume
there is some startup sequence reason?

Since the modularity is already wrecked it seems like an odd
reason not to add the if..

Jason

^ permalink raw reply

* [PATCH 3/3 net-next] net: phy: aquantia: use FIELD_GET for getting speed in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:09 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org
In-Reply-To: <8e41695d-a23e-adad-ae3d-66a46d1ab077@gmail.com>

Change getting the speed to use FIELD_GET() too to be in line with the
rest of the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/aquantia.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 7dfcbae4f..d8332b128 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -21,13 +21,13 @@
 #define PHY_ID_AQR405	0x03a1b4b0
 
 #define MDIO_AN_TX_VEND_STATUS1			0xc800
-#define MDIO_AN_TX_VEND_STATUS1_10BASET		(0x0 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_100BASETX	(0x1 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_1000BASET	(0x2 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_10GBASET	(0x3 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_2500BASET	(0x4 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_5000BASET	(0x5 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_RATE_MASK	(0x7 << 1)
+#define MDIO_AN_TX_VEND_STATUS1_RATE_MASK	GENMASK(3, 1)
+#define MDIO_AN_TX_VEND_STATUS1_10BASET		0
+#define MDIO_AN_TX_VEND_STATUS1_100BASETX	1
+#define MDIO_AN_TX_VEND_STATUS1_1000BASET	2
+#define MDIO_AN_TX_VEND_STATUS1_10GBASET	3
+#define MDIO_AN_TX_VEND_STATUS1_2500BASET	4
+#define MDIO_AN_TX_VEND_STATUS1_5000BASET	5
 #define MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX	BIT(0)
 
 #define MDIO_AN_TX_VEND_INT_STATUS2		0xcc01
@@ -148,7 +148,7 @@ static int aqr_read_status(struct phy_device *phydev)
 	mdelay(10);
 	reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_STATUS1);
 
-	switch (reg & MDIO_AN_TX_VEND_STATUS1_RATE_MASK) {
+	switch (FIELD_GET(MDIO_AN_TX_VEND_STATUS1_RATE_MASK, reg)) {
 	case MDIO_AN_TX_VEND_STATUS1_10GBASET:
 		phydev->speed = SPEED_10000;
 		break;
-- 
2.20.1



^ permalink raw reply related

* [PATCH 2/3 net-next] net: phy: aquantia: set interface mode in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:07 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org
In-Reply-To: <8e41695d-a23e-adad-ae3d-66a46d1ab077@gmail.com>

Extend aqr_read_status to set the interface mode properly.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/aquantia.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 51ae3feea..281873c59 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/phy.h>
+#include <linux/bitfield.h>
 
 #define PHY_ID_AQ1202	0x03a1b445
 #define PHY_ID_AQ2104	0x03a1b460
@@ -34,6 +35,21 @@
 #define MDIO_AN_TX_VEND_INT_MASK2		0xd401
 #define MDIO_AN_TX_VEND_INT_MASK2_LINK		BIT(0)
 
+/* PHY XS System Interface Connection Status */
+#define MDIO_XS_SYSIF_STATUS			0xe812
+#define MDIO_XS_SYSIF_MODE_MASK			GENMASK(7, 3)
+#define MDIO_XS_SYSIF_MODE_BACKPLANE_KR		0
+#define MDIO_XS_SYSIF_MODE_BACKPLANE_KX		1
+#define MDIO_XS_SYSIF_MODE_XFI			2
+#define MDIO_XS_SYSIF_MODE_USXGMII		3
+#define MDIO_XS_SYSIF_MODE_XAUI			4
+#define MDIO_XS_SYSIF_MODE_XAUI_PAUSE		5
+#define MDIO_XS_SYSIF_MODE_SGMII		6
+#define MDIO_XS_SYSIF_MODE_RXAUI		7
+#define MDIO_XS_SYSIF_MODE_MAC			8
+#define MDIO_XS_SYSIF_MODE_OFF			9
+#define MDIO_XS_SYSIF_MODE_OCSGMII		10
+
 /* Vendor specific 1, MDIO_MMD_VEND1 */
 #define VEND1_GLOBAL_INT_STD_STATUS		0xfc00
 #define VEND1_GLOBAL_INT_VEND_STATUS		0xfc01
@@ -158,6 +174,27 @@ static int aqr_read_status(struct phy_device *phydev)
 
 	phydev->duplex = !!(reg & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX);
 
+	reg = phy_read_mmd(phydev, MDIO_MMD_PHYXS, MDIO_XS_SYSIF_STATUS);
+
+	switch (FIELD_GET(MDIO_XS_SYSIF_MODE_MASK, reg)) {
+	case MDIO_XS_SYSIF_MODE_BACKPLANE_KR:
+		phydev->interface = PHY_INTERFACE_MODE_10GKR;
+		break;
+	case MDIO_XS_SYSIF_MODE_SGMII:
+		phydev->interface = PHY_INTERFACE_MODE_SGMII;
+		break;
+	case MDIO_XS_SYSIF_MODE_XAUI:
+	case MDIO_XS_SYSIF_MODE_XAUI_PAUSE:
+		phydev->interface = PHY_INTERFACE_MODE_XAUI;
+		break;
+	case MDIO_XS_SYSIF_MODE_RXAUI:
+		phydev->interface = PHY_INTERFACE_MODE_RXAUI;
+		break;
+	default:
+		phydev->interface = PHY_INTERFACE_MODE_NA;
+		break;
+	}
+
 	return 0;
 }
 
-- 
2.20.1



^ permalink raw reply related

* [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:03 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org
In-Reply-To: <8e41695d-a23e-adad-ae3d-66a46d1ab077@gmail.com>

Add support for speeds 10Mbps, 5Gbps, and 10Gbps. In addition don't
hardcode duplex but read it from the chip.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/aquantia.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 482004efa..51ae3feea 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -133,6 +133,12 @@ static int aqr_read_status(struct phy_device *phydev)
 	reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_STATUS1);
 
 	switch (reg & MDIO_AN_TX_VEND_STATUS1_RATE_MASK) {
+	case MDIO_AN_TX_VEND_STATUS1_10GBASET:
+		phydev->speed = SPEED_10000;
+		break;
+	case MDIO_AN_TX_VEND_STATUS1_5000BASET:
+		phydev->speed = SPEED_5000;
+		break;
 	case MDIO_AN_TX_VEND_STATUS1_2500BASET:
 		phydev->speed = SPEED_2500;
 		break;
@@ -142,11 +148,15 @@ static int aqr_read_status(struct phy_device *phydev)
 	case MDIO_AN_TX_VEND_STATUS1_100BASETX:
 		phydev->speed = SPEED_100;
 		break;
+	case MDIO_AN_TX_VEND_STATUS1_10BASET:
+		phydev->speed = SPEED_10;
+		break;
 	default:
-		phydev->speed = SPEED_10000;
+		phydev->speed = SPEED_UNKNOWN;
 		break;
 	}
-	phydev->duplex = DUPLEX_FULL;
+
+	phydev->duplex = !!(reg & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX);
 
 	return 0;
 }
-- 
2.20.1



^ permalink raw reply related

* [PATCH 0/3 net-next] net: phy: aquantia: extend aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:02 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org

Extend aqr_read_status to read more parameters from the chip.

Heiner Kallweit (3):
  net: phy: aquantia: improve setting speed and duplex in aqr_read_status
  net: phy: aquantia: set interface mode in aqr_read_status
  net: phy: aquantia: use FIELD_GET for getting speed in aqr_read_status

 drivers/net/phy/aquantia.c | 66 ++++++++++++++++++++++++++++++++------
 1 file changed, 56 insertions(+), 10 deletions(-)

-- 
2.20.1

^ permalink raw reply

* Re: [PATCH] bpf: test_maps: Avoid possible out of bound access
From: Daniel Borkmann @ 2019-02-04 20:57 UTC (permalink / raw)
  To: Breno Leitao, netdev; +Cc: ast
In-Reply-To: <1549297631-27789-1-git-send-email-leitao@debian.org>

On 02/04/2019 05:27 PM, Breno Leitao wrote:
> When compiling test_maps selftest with GCC-8, it warns that an array might
> be indexed with a negative value, which could cause a negative out of bound
> access, depending on parameters of the function. This is the GCC-8 warning:
> 
> 	gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf -I../../../../include/generated -DHAVE_GENHDR -I../../../include    test_maps.c /home/breno/Devel/linux/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /home/breno/Devel/linux/tools/testing/selftests/bpf/test_maps
> 	In file included from test_maps.c:16:
> 	test_maps.c: In function ‘run_all_tests’:
> 	test_maps.c:1079:10: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
> 	   assert(waitpid(pid[i], &status, 0) == pid[i]);
> 		  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	test_maps.c:1059:6: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
> 	   pid[i] = fork();
> 	   ~~~^~~
> 
> This patch simply guarantees that the tasks variable is unsigned, thus, it
> could never be a negative number, hence avoiding an out of bound access
> warning.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Thanks for the patch, small comment below:

>  tools/testing/selftests/bpf/test_maps.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index e2b9eee37187..1714e26f4a72 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -641,7 +641,7 @@ static void test_stackmap(int task, void *data)
>  #define SOCKMAP_PARSE_PROG "./sockmap_parse_prog.o"
>  #define SOCKMAP_VERDICT_PROG "./sockmap_verdict_prog.o"
>  #define SOCKMAP_TCP_MSG_PROG "./sockmap_tcp_msg_prog.o"
> -static void test_sockmap(int tasks, void *data)
> +static void test_sockmap(unsigned int tasks, void *data)

There are couple more test_*() functions that need to be converted if we do
the change to unsigned:

tools/testing/selftests/bpf/test_maps.c:48:static void test_hashmap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:138:static void test_hashmap_sizes(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:158:static void test_hashmap_percpu(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:285:static void test_hashmap_walk(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:356:static void test_arraymap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:411:static void test_arraymap_percpu(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:507:static void test_devmap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:522:static void test_queuemap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:580:static void test_stackmap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:645:static void test_sockmap(int tasks, void *data)

>  {
>  	struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_msg, *bpf_map_break;
>  	int map_fd_msg = 0, map_fd_rx = 0, map_fd_tx = 0, map_fd_break;
> @@ -1261,7 +1261,7 @@ static void test_map_large(void)
>  	printf("Fork %d tasks to '" #FN "'\n", N); \
>  	__run_parallel(N, FN, DATA)
>  
> -static void __run_parallel(int tasks, void (*fn)(int task, void *data),
> +static void __run_parallel(unsigned int tasks, void (*fn)(int task, void *data),

This would also need conversion to unsigned for the func arg above so that
we don't type mismatch.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-next 0/2] tools/bpf: expose several libbpf API functions
From: Alexei Starovoitov @ 2019-02-04 20:51 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team
In-Reply-To: <20190204190057.3965903-1-yhs@fb.com>

On Mon, Feb 4, 2019 at 12:27 PM Yonghong Song <yhs@fb.com> wrote:
>
> This patch set exposed a few functions in libbpf.
> All these newly added API functions are helpful for
> JIT based bpf compilation where .BTF and .BTF.ext
> are available as in-memory data blobs.
>
> Patch #1 exposed several btf_ext__* API functions which
> are used to handle .BTF.ext ELF sections.
> Patch #2 refactored the function bpf_map_find_btf_info()
> and exposed API function btf__get_map_kv_tids() to
> retrieve the map key/value type id's generated by
> bpf program through BPF_ANNOTATE_KV_PAIR macro.

Applied to bpf-next. Thanks!

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf, riscv: add BPF JIT for RV64G
From: Björn Töpel @ 2019-02-04 20:27 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: linux-riscv, ast, Netdev, Palmer Dabbelt, Christoph Hellwig
In-Reply-To: <88cdae60-494e-6294-b2c1-10b9cbeb95ac@iogearbox.net>

Den mån 4 feb. 2019 kl 21:06 skrev Daniel Borkmann <daniel@iogearbox.net>:
>
> On 02/03/2019 12:51 PM, bjorn.topel@gmail.com wrote:
> > From: Björn Töpel <bjorn.topel@gmail.com>
> >
> > This commit adds BPF JIT for RV64G.
> >
> > The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar
> > to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64).
> >
> > At the moment the RISC-V Linux port does not support HAVE_KPROBES,
> > which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests
> > involving BPF_PROG_TYPE_TRACEPOINT passes.
> >
> > Further, the implementation does not support "far branching" (>4KiB).
> >
> > The implementation passes all the test_bpf.ko tests:
> >   test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
> >
> > All the tail_call tests in the selftest/bpf/test_verifier program
> > passes.
> >
> > All tests where done on QEMU (QEMU emulator version 3.1.50
> > (v3.1.0-688-g8ae951fbc106)).
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
>
> Some minor comments:
>
> Looks like all the BPF_JMP32 instructions are missing. Would probably
> make sense to include these into the initial merge as well unless there
> is some good reason not to; presumably the test_verifier parts with
> BPF_JMP32 haven't been tried out?
>

Yes indeed. My bad, I didn't realize that Jiong's patches were in the
tree! BPF_JMP32 should definitely be in the initial merge.

> [...]
> > +
> > +enum {
> > +     RV_CTX_F_SEEN_TAIL_CALL =       0,
> > +     RV_CTX_F_SEEN_CALL =            RV_REG_RA,
> > +     RV_CTX_F_SEEN_S1 =              RV_REG_S1,
> > +     RV_CTX_F_SEEN_S2 =              RV_REG_S2,
> > +     RV_CTX_F_SEEN_S3 =              RV_REG_S3,
> > +     RV_CTX_F_SEEN_S4 =              RV_REG_S4,
> > +     RV_CTX_F_SEEN_S5 =              RV_REG_S5,
> > +     RV_CTX_F_SEEN_S6 =              RV_REG_S6,
> > +};
> > +
> > +struct rv_jit_context {
> > +     struct bpf_prog *prog;
> > +     u32 *insns; /* RV insns */
> > +     int ninsns;
> > +     int epilogue_offset;
> > +     int *offset; /* BPF to RV */
> > +     unsigned long flags;
> > +     int stack_size;
> > +};
> > +
> > +struct rv_jit_data {
> > +     struct bpf_binary_header *header;
> > +     u8 *image;
> > +     struct rv_jit_context ctx;
> > +};
> > +
> > +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
> > +{
> > +     u8 reg = regmap[bpf_reg];
> > +
> > +     switch (reg) {
> > +     case RV_CTX_F_SEEN_S1:
> > +     case RV_CTX_F_SEEN_S2:
> > +     case RV_CTX_F_SEEN_S3:
> > +     case RV_CTX_F_SEEN_S4:
> > +     case RV_CTX_F_SEEN_S5:
> > +     case RV_CTX_F_SEEN_S6:
> > +             __set_bit(reg, &ctx->flags);
> > +     }
> > +     return reg;
> > +};
> > +
> > +static bool seen_reg(int reg, struct rv_jit_context *ctx)
> > +{
> > +     switch (reg) {
> > +     case RV_CTX_F_SEEN_CALL:
> > +     case RV_CTX_F_SEEN_S1:
> > +     case RV_CTX_F_SEEN_S2:
> > +     case RV_CTX_F_SEEN_S3:
> > +     case RV_CTX_F_SEEN_S4:
> > +     case RV_CTX_F_SEEN_S5:
> > +     case RV_CTX_F_SEEN_S6:
> > +             return test_bit(reg, &ctx->flags);
> > +     }
> > +     return false;
> > +}
> > +
> > +static void mark_call(struct rv_jit_context *ctx)
> > +{
> > +     __set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
> > +}
> > +
> > +static bool seen_call(struct rv_jit_context *ctx)
> > +{
> > +     return seen_reg(RV_REG_RA, ctx);
> > +}
>
> Just nit: probably might be more obvious to remove this asymmetry in
> seen_reg() and do __set_bit()/test_bit() for RV_CTX_F_SEEN_CALL similar
> like below.
>

Yeah, let's do that.

> > +static void mark_tail_call(struct rv_jit_context *ctx)
> > +{
> > +     __set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> > +}
> > +
> > +static bool seen_tail_call(struct rv_jit_context *ctx)
> > +{
> > +     return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> > +}
> > +
> > +static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
> > +{
> > +     mark_tail_call(ctx);
> > +
> > +     if (seen_call(ctx)) {
> > +             __set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
> > +             return RV_REG_S6;
> > +     }
> > +     return RV_REG_A6;
> > +}
> > +
> > +static void emit(const u32 insn, struct rv_jit_context *ctx)
> > +{
> > +     if (ctx->insns)
> > +             ctx->insns[ctx->ninsns] = insn;
> > +
> > +     ctx->ninsns++;
> > +}
> > +
> > +static u32 rv_r_insn(u8 funct7, u8 rs2, u8 rs1, u8 funct3, u8 rd, u8 opcode)
> > +{
> [...]
> > +     /* Allocate image, now that we know the size. */
> > +     image_size = sizeof(u32) * ctx->ninsns;
> > +     jit_data->header = bpf_jit_binary_alloc(image_size, &jit_data->image,
> > +                                             sizeof(u32),
> > +                                             bpf_fill_ill_insns);
> > +     if (!jit_data->header) {
> > +             prog = orig_prog;
> > +             goto out_offset;
> > +     }
> > +
> > +     /* Second, real pass, that acutally emits the image. */
> > +     ctx->insns = (u32 *)jit_data->image;
> > +skip_init_ctx:
> > +     ctx->ninsns = 0;
> > +
> > +     build_prologue(ctx);
> > +     if (build_body(ctx, extra_pass)) {
> > +             bpf_jit_binary_free(jit_data->header);
> > +             prog = orig_prog;
> > +             goto out_offset;
> > +     }
> > +     build_epilogue(ctx);
> > +
> > +     if (bpf_jit_enable > 1)
> > +             bpf_jit_dump(prog->len, image_size, 2, ctx->insns);
> > +
> > +     prog->bpf_func = (void *)ctx->insns;
> > +     prog->jited = 1;
> > +     prog->jited_len = image_size;
> > +
> > +     bpf_flush_icache(jit_data->header, (u8 *)ctx->insns + ctx->ninsns);
>
> Shouldn't this be '(u32 *)ctx->insns + ctx->ninsns' to cover the range?
>

Yikes! Indeed so, I'll make sure this is corrected!

Thanks for the comments!


Björn

> > +
> > +     if (!prog->is_func || extra_pass) {
> > +out_offset:
> > +             kfree(ctx->offset);
> > +             kfree(jit_data);
> > +             prog->aux->jit_data = NULL;
> > +     }
> > +out:
> > +     if (tmp_blinded)
> > +             bpf_jit_prog_release_other(prog, prog == orig_prog ?
> > +                                        tmp : orig_prog);
> > +     return prog;
> > +}
> >
>

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: support SO_DEBUG in bpf_setsockopt()
From: Daniel Borkmann @ 2019-02-04 20:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Yafang Shao; +Cc: kafai, brakmo, ast, netdev, shaoyafang
In-Reply-To: <20190204173517.6o5v7yd5yn7pxjzi@ast-mbp.dhcp.thefacebook.com>

On 02/04/2019 06:35 PM, Alexei Starovoitov wrote:
> On Sun, Feb 03, 2019 at 04:15:07PM +0800, Yafang Shao wrote:
>> Then we can enable/disable socket debugging without modifying user code.
>> That is more convenient for debugging.
>>
>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>> ---
>>  include/net/sock.h | 8 ++++++++
>>  net/core/filter.c  | 3 +++
>>  net/core/sock.c    | 8 --------
>>  3 files changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 2b229f7..8decee9 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1935,6 +1935,14 @@ static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
>>  	}
>>  }
>>  
>> +static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
>> +{
>> +	if (valbool)
>> +		sock_set_flag(sk, bit);
>> +	else
>> +		sock_reset_flag(sk, bit);
>> +}
>> +
>>  bool sk_mc_loop(struct sock *sk);
>>  
>>  static inline bool sk_can_gso(const struct sock *sk)
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 3a49f68..ce5da57 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -4111,6 +4111,9 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
>>  
>>  		/* Only some socketops are supported */
>>  		switch (optname) {
>> +		case SO_DEBUG:
>> +			sock_valbool_flag(sk, SOCK_DBG, val);
>> +			break;
> 
> I'm missing the point here.
> This flag has any effect only when SOCK_DEBUGGING is set.
> But it is off in distros.
> Since it's for custom debug kernel only why bother with
> setting the flag via bpf prog?

+1, this seems like some ancient debugging interface. Back at last netconf
there was a proposal [0] to have a tcp_stats(sk, TCP_MIB_...) API for MIBs
counter such that this can be traced via BPF on a per socket basis, for
example. Might be worthwhile to work into that direction instead and potentially
get rid of the SOCK_DEBUG() statements and convert (where appropriate) to
such an interface. Thoughts?

  [0] page 14, http://vger.kernel.org/netconf2018_files/BrendanGregg_netconf2018.pdf

^ permalink raw reply

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
From: Andrew Lunn @ 2019-02-04 20:19 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <2d8c0eff-00cd-31c7-9906-89ff9d3c7dd4@bell.net>

> Can you be more specific regarding what you think is wrong with this hunk?

Hi David

The IRQ core would do this if it was needed.

How many other irq thread work functions can you point to which do
something similar?

	  Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 3/3] bpf, doc: add RISC-V to filter.txt
From: Björn Töpel @ 2019-02-04 20:16 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: linux-riscv, ast, Netdev, Palmer Dabbelt, Christoph Hellwig
In-Reply-To: <d033e68a-b520-c25d-e951-761693c16fe7@iogearbox.net>

Den mån 4 feb. 2019 kl 21:09 skrev Daniel Borkmann <daniel@iogearbox.net>:
>
> On 02/03/2019 12:51 PM, bjorn.topel@gmail.com wrote:
> > From: Björn Töpel <bjorn.topel@gmail.com>
> >
> > Update Documentation/networking/filter.txt to mention RISC-V.
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
>
> Nit: in Documentation/sysctl/net.txt under bpf_jit_enable there is also
> a concrete list of eBPF/cBPF JITs, would be good to add riscv64 there as
> well.
>

Thanks for pointing this out! I'll do a v2.


Björn

> >  Documentation/networking/filter.txt | 16 +++++++++-------
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> > index 01603bc2eff1..b5e060edfc38 100644
> > --- a/Documentation/networking/filter.txt
> > +++ b/Documentation/networking/filter.txt
> > @@ -464,10 +464,11 @@ breakpoints: 0 1
> >  JIT compiler
> >  ------------
> >
> > -The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, PowerPC,
> > -ARM, ARM64, MIPS and s390 and can be enabled through CONFIG_BPF_JIT. The JIT
> > -compiler is transparently invoked for each attached filter from user space
> > -or for internal kernel users if it has been previously enabled by root:
> > +The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
> > +PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
> > +CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
> > +attached filter from user space or for internal kernel users if it has
> > +been previously enabled by root:
> >
> >    echo 1 > /proc/sys/net/core/bpf_jit_enable
> >
> > @@ -603,9 +604,10 @@ got from bpf_prog_create(), and 'ctx' the given context (e.g.
> >  skb pointer). All constraints and restrictions from bpf_check_classic() apply
> >  before a conversion to the new layout is being done behind the scenes!
> >
> > -Currently, the classic BPF format is being used for JITing on most 32-bit
> > -architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64, arm32 perform
> > -JIT compilation from eBPF instruction set.
> > +Currently, the classic BPF format is being used for JITing on most
> > +32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
> > +sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF
> > +instruction set.
> >
> >  Some core changes of the new internal format:
> >
> >
>

^ permalink raw reply

* Re: [PATCH bpf-next 3/3] bpf, doc: add RISC-V to filter.txt
From: Daniel Borkmann @ 2019-02-04 20:09 UTC (permalink / raw)
  To: bjorn.topel, linux-riscv, ast, netdev; +Cc: palmer, hch
In-Reply-To: <20190203115132.8766-4-bjorn.topel@gmail.com>

On 02/03/2019 12:51 PM, bjorn.topel@gmail.com wrote:
> From: Björn Töpel <bjorn.topel@gmail.com>
> 
> Update Documentation/networking/filter.txt to mention RISC-V.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>

Nit: in Documentation/sysctl/net.txt under bpf_jit_enable there is also
a concrete list of eBPF/cBPF JITs, would be good to add riscv64 there as
well.

>  Documentation/networking/filter.txt | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> index 01603bc2eff1..b5e060edfc38 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -464,10 +464,11 @@ breakpoints: 0 1
>  JIT compiler
>  ------------
>  
> -The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, PowerPC,
> -ARM, ARM64, MIPS and s390 and can be enabled through CONFIG_BPF_JIT. The JIT
> -compiler is transparently invoked for each attached filter from user space
> -or for internal kernel users if it has been previously enabled by root:
> +The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
> +PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
> +CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
> +attached filter from user space or for internal kernel users if it has
> +been previously enabled by root:
>  
>    echo 1 > /proc/sys/net/core/bpf_jit_enable
>  
> @@ -603,9 +604,10 @@ got from bpf_prog_create(), and 'ctx' the given context (e.g.
>  skb pointer). All constraints and restrictions from bpf_check_classic() apply
>  before a conversion to the new layout is being done behind the scenes!
>  
> -Currently, the classic BPF format is being used for JITing on most 32-bit
> -architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64, arm32 perform
> -JIT compilation from eBPF instruction set.
> +Currently, the classic BPF format is being used for JITing on most
> +32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
> +sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF
> +instruction set.
>  
>  Some core changes of the new internal format:
>  
> 


^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros
From: David Miller @ 2019-02-04 20:09 UTC (permalink / raw)
  To: gvrose8192; +Cc: elibr, pshelar, dev, netdev, simon.horman
In-Reply-To: <0bbab6ba-5892-e0c3-8290-6d844414ca2e@gmail.com>

From: Gregory Rose <gvrose8192@gmail.com>
Date: Mon, 4 Feb 2019 11:41:29 -0800

> 
> On 2/3/2019 1:12 AM, Eli Britstein wrote:
>> Declare ovs key structures using macros as a pre-step towards to
>> enable retrieving fields information, as a work done in proposed
>> commit in the OVS tree https://patchwork.ozlabs.org/patch/1023406/
>> ("odp-util: Do not rewrite fields with the same values as matched"),
>> with no functional change.
>>
>> Signed-off-by: Eli Britstein <elibr@mellanox.com>
>> Reviewed-by: Roi Dayan <roid@mellanox.com>
> 
> Obscuring the structures with these macros is awful.  I'm opposed but
> I see it has already been
> accepted upstream so I guess that's that.

I am personally in no way obligated to apply this patch to my tree
just because "upstream" did, and I absolutely have no plans to do so
at this point.

This patch is absolutely awful.

^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros
From: David Miller @ 2019-02-04 20:07 UTC (permalink / raw)
  To: yihung.wei; +Cc: elibr, pshelar, dev, netdev, simon.horman
In-Reply-To: <CAG1aQhJVzbYnYNb-mWH4s6+23jOnDz9UVYTPtBmNkhKAfowrQg@mail.gmail.com>

From: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Mon, 4 Feb 2019 10:47:18 -0800

> For example, to see how 'struct ovs_key_ipv6' is defined, now we need
> to trace how OVS_KEY_IPV6_FIELDS is defined, and how OVS_KEY_FIELD_ARR
> and OVS_KEY_FIELD defined.  I think it makes the header file to be
> more complicated.

I completely agree.

Unless this is totally unavoidable, I do not want to apply a patch
which makes reading and auditing the networking code more difficult.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox