Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
From: Andrew Lunn @ 2019-02-04 23:14 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <824d011b-3692-69c3-5e2c-58e950a80abf@bell.net>

On Mon, Feb 04, 2019 at 04:59:13PM -0500, John David Anglin wrote:
> This change fixes a race condition in the setup of hardware irqs and the
> code enabling PHY link detection in the mv88e6xxx driver.
> 
> This race was observed on the espressobin board where the GPIO interrupt
> controller only supports edge interrupts.  If the INTn output pin goes low
> before the GPIO interrupt is enabled, PHY link interrupts are not detected.
> 
> With this change, we
> 1) force INTn high by clearing all interrupt enables in global 1 control 1,
> 2) setup the hardware irq, and then
> 3) perform the remaining common setup.
> 
> This simplifies the setup and allows some unnecessary code to be removed.

Hi Dave

I took a closer look now. I don't actually see why the current code is
wrong.

mv88e6xxx_g1_irq_setup() calls mv88e6xxx_g1_irq_setup_common() and
then registers the interrupt handler.

mv88e6xxx_g1_irq_setup_common() does what you want, it masks all
interrupts in the hardware and clears any pending interrupts which can
be cleared.

The change you made is actually dangerous. As soon as you request the
interrupt, it is live, it can fire, and call
mv88e6xxx_g1_irq_thread_work(). That needs the irq domain. But the
change you made defers the creating of the domain until after the
interrupt is registered. So we can de-refernece a NULL pointer in the
interrupt handler.

	  Andrew

^ permalink raw reply

* Re: [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 23:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <20190204222348.GD3397@lunn.ch>

On 04.02.2019 23:23, Andrew Lunn wrote:
>> I'd like to use standard registers wherever possible. This patch is
>> meant as a quick win to improve what we do already in aqr_read_status.
>> Once we have a generic c45 read_status function we should switch to it.
> 
> Hi Heiner
> 
> I don't see much point in adding code which we know we are soon going
> to replace. Just replace it.
> 
OK, let me have a closer look at the other patches you sent me.
To test them I need to get my DTU running first. And I need to check
what happens if certain standard registers don't report what they
should and how to deal with this. E.g. the AQCS109 according to the
datasheet reports in the speed ability register that it is 10G-capable,
what it is not.

>> However I assume that information like interface mode we still have
>> to read from vendor registers.
> 
> For the Aquantia PHY, yes. It appears the Marvell PHY does not have
> any registers which indicate this, so it uses heuristics based on the
> link speed.
> 
>     Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
From: Andrew Lunn @ 2019-02-04 22:47 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <c536ef45-0d5a-5d7e-d9f2-4ff030a34eb9@bell.net>

On Mon, Feb 04, 2019 at 04:38:53PM -0500, John David Anglin wrote:
> On 2019-02-04 3:19 p.m., Andrew Lunn wrote:
> > The IRQ core would do this if it was needed.
> >
> > How many other irq thread work functions can you point to which do
> > something similar?
> This is comment for handle_edge_irq:
> 
> /**
>  *    handle_edge_irq - edge type IRQ handler
>  *    @desc:    the interrupt description structure for this irq
>  *
>  *    Interrupt occures on the falling and/or rising edge of a hardware
>  *    signal. The occurrence is latched into the irq controller hardware
>  *    and must be acked in order to be reenabled. After the ack another
>  *    interrupt can happen on the same source even before the first one
>  *    is handled by the associated event handler. If this happens it
>  *    might be necessary to disable (mask) the interrupt depending on the
>  *    controller hardware. This requires to reenable the interrupt inside
>  *    of the loop which handles the interrupts which have arrived while
>  *    the handler was running. If all pending interrupts are handled, the
>  *    loop is left.
>  */
> 
> As can be seen, the above comment suggests that it may be necessary to
> disable (mask) interrupt
> as I proposed.

Hi Dave

This comment is describing what handle_edge_irq() actually does. Read
the code. It does not say anything about that the handling thread
function should do.

	 Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 1/6] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
From: Daniel Borkmann @ 2019-02-04 22:33 UTC (permalink / raw)
  To: Martin KaFai Lau, netdev; +Cc: Alexei Starovoitov, kernel-team, Lawrence Brakmo
In-Reply-To: <20190201070356.4148323-1-kafai@fb.com>

Hi Martin,

On 02/01/2019 08:03 AM, Martin KaFai Lau wrote:
> In kernel, it is common to check "!skb->sk && sk_fullsock(skb->sk)"
> before accessing the fields in sock.  For example, in __netdev_pick_tx:
> 
> static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
> 			    struct net_device *sb_dev)
> {
> 	/* ... */
> 
> 	struct sock *sk = skb->sk;
> 
> 		if (queue_index != new_index && sk &&
> 		    sk_fullsock(sk) &&
> 		    rcu_access_pointer(sk->sk_dst_cache))
> 			sk_tx_queue_set(sk, new_index);
> 
> 	/* ... */
> 
> 	return queue_index;
> }
> 
> This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
> where a few of the convert_ctx_access() in filter.c has already been
> accessing the skb->sk sock_common's fields,
> e.g. sock_ops_convert_ctx_access().
> 
> "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
> Some of the fileds in "bpf_sock" will not be directly
> accessible through the "__sk_buff->sk" pointer.  It is limited
> by the new "bpf_sock_common_is_valid_access()".
> e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
>      are not allowed.
> 
> The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
> can be used to get a sk with all accessible fields in "bpf_sock".
> This helper is added to both cg_skb and sched_(cls|act).
> 
> int cg_skb_foo(struct __sk_buff *skb) {
> 	struct bpf_sock *sk;
> 	__u32 family;
> 
> 	sk = skb->sk;
> 	if (!sk)
> 		return 1;
> 
> 	sk = bpf_sk_fullsock(sk);
> 	if (!sk)
> 		return 1;
> 
> 	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
> 		return 1;
> 
> 	/* some_traffic_shaping(); */
> 
> 	return 1;
> }
> 
> (1) The sk is read only
> 
> (2) There is no new "struct bpf_sock_common" introduced.
> 
> (3) Future kernel sock's members could be added to bpf_sock only
>     instead of repeatedly adding at multiple places like currently
>     in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
> 
> (4) After "sk = skb->sk", the reg holding sk is in type
>     PTR_TO_SOCK_COMMON_OR_NULL.
> 
> (5) After bpf_sk_fullsock(), the return type will be in type
>     PTR_TO_SOCKET_OR_NULL which is the same as the return type of
>     bpf_sk_lookup_xxx().
> 
>     However, bpf_sk_fullsock() does not take refcnt.  The
>     acquire_reference_state() is only depending on the return type now.
>     To avoid it, a new is_acquire_function() is checked before calling
>     acquire_reference_state().

Bit unfortunate that a helper like bpf_sk_fullsock() would be needed, after
all this is more of an implementation detail which we would expose here to
the developer.

Is there a specific reason why fetching skb->sk couldn't already be of the
type PTR_TO_SOCKET_OR_NULL such that the bpf_sk_fullsock() step wouldn't be
needed and most logic we have today could already be reused (modulo refcnt
avoidance)?

In particular, do you need the skb->sk without the full-sk part somewhere
(e.g. in tw socks)? Why not doing something like sk_to_full_sk() inside the
helper or even better as BPF ctx rewrite upon skb->sk to fetch the full sk
parent where you could also access remaining bpf_sock fields?

This could then also be plugged into bpf_tcp_sock() given this needs to be
full sk anyway.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v1] net: dsa: qca8k: implement DT-based ports <-> phy translation
From: Andrew Lunn @ 2019-02-04 22:26 UTC (permalink / raw)
  To: Christian Lamparter; +Cc: netdev, Florian Fainelli, Vivien Didelot
In-Reply-To: <20190204213555.26054-1-chunkeey@gmail.com>

On Mon, Feb 04, 2019 at 10:35:55PM +0100, Christian Lamparter wrote:
> The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4.
> Based on the System Block Diagram in Section 1.2 of the
> QCA8337's datasheet. These PHYs are internally connected
> to MACs of PORT 1 - PORT 5.

Hi Christian

Is the off-by-one the problem here?

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Andrew Lunn @ 2019-02-04 22:23 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <d6ceeb7c-7502-7332-ce6e-b95b4d9b2be1@gmail.com>

> I'd like to use standard registers wherever possible. This patch is
> meant as a quick win to improve what we do already in aqr_read_status.
> Once we have a generic c45 read_status function we should switch to it.

Hi Heiner

I don't see much point in adding code which we know we are soon going
to replace. Just replace it.

> However I assume that information like interface mode we still have
> to read from vendor registers.

For the Aquantia PHY, yes. It appears the Marvell PHY does not have
any registers which indicate this, so it uses heuristics based on the
link speed.

    Andrew

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: use localhost in tcp_{server,client}.py
From: Daniel Borkmann @ 2019-02-04 22:16 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev; +Cc: davem, ast
In-Reply-To: <20190204184319.177504-1-sdf@google.com>

On 02/04/2019 07:43 PM, Stanislav Fomichev wrote:
> Bind and connect to localhost. There is no reason for this test to
> use non-localhost interface. This lets us run this test in a network
> namespace.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v1] net: dsa: qca8k: implement DT-based ports <-> phy translation
From: Florian Fainelli @ 2019-02-04 22:11 UTC (permalink / raw)
  To: Christian Lamparter, netdev; +Cc: Vivien Didelot, Andrew Lunn
In-Reply-To: <20190204213555.26054-1-chunkeey@gmail.com>

On 2/4/19 1:35 PM, Christian Lamparter wrote:
> The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4.
> Based on the System Block Diagram in Section 1.2 of the
> QCA8337's datasheet. These PHYs are internally connected
> to MACs of PORT 1 - PORT 5. However, neither qca8k's slave
> mdio access functions qca8k_phy_read()/qca8k_phy_write()
> nor the dsa framework is set up for that.
> 
> This version of the patch uses the existing phy-handle
> properties of each specified DSA Port in the DT to map
> each PORT/MAC to its exposed PHY on the MDIO bus. This
> is supported by the current binding document qca8k.txt
> as well.

I don't think you should have to do any of this translation, because you
can do a couple of things with DSA/Device Tree:

- you can not provide a phy-handle property at all, in which case, the
core DSA layer assumes that the PHY is part of the switch's internal
MDIO bus which is implictly created by dsa_slave_mii_bus_create()

- you can specify a phy-handle property and then the PHY device tree
node can be placed pretty much anywhere in Device Tree, including on a
separate MDIO bus Device Tre node which is "external" to the switch

In either case, the PHY device's MDIO bus parent and its address are
taken care of by drivers/of/of_mdio.c. You can look at mx88e6xxx for how
it deals with its internal vs. external MDIO bus controller and that
driver is used on a wide variety of cconfiguration.

> 
> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
> ---
>  drivers/net/dsa/qca8k.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
> index a4b6cda38016..6558b7ed855d 100644
> --- a/drivers/net/dsa/qca8k.c
> +++ b/drivers/net/dsa/qca8k.c
> @@ -11,6 +11,7 @@
>  #include <linux/netdevice.h>
>  #include <net/dsa.h>
>  #include <linux/of_net.h>
> +#include <linux/of_mdio.h>
>  #include <linux/of_platform.h>
>  #include <linux/if_bridge.h>
>  #include <linux/mdio.h>
> @@ -612,20 +613,50 @@ qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy)
>  	qca8k_port_set_status(priv, port, 1);
>  }
>  
> +static int
> +qca8k_to_real_phy(struct dsa_switch *ds, int phy)
> +{
> +	struct device_node *phy_dn, *port_dn;
> +	int id;
> +
> +	if (phy >= ds->num_ports)
> +		return -EINVAL;
> +
> +	port_dn = ds->ports[phy].dn;
> +	if (!port_dn)
> +		return -EINVAL;
> +
> +	phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
> +	if (!phy_dn)
> +		return phy;
> +
> +	id = of_mdio_parse_addr(ds->dev, phy_dn);
> +	of_node_put(phy_dn);
> +	return id;
> +}
> +
>  static int
>  qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
>  {
>  	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
> +	int realphy = qca8k_to_real_phy(ds, phy);
> +
> +	if (realphy < 0)
> +		return realphy;
>  
> -	return mdiobus_read(priv->bus, phy, regnum);
> +	return mdiobus_read(priv->bus, realphy, regnum);
>  }
>  
>  static int
>  qca8k_phy_write(struct dsa_switch *ds, int phy, int regnum, u16 val)
>  {
>  	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
> +	int realphy = qca8k_to_real_phy(ds, phy);
> +
> +	if (realphy < 0)
> +		return realphy;
>  
> -	return mdiobus_write(priv->bus, phy, regnum, val);
> +	return mdiobus_write(priv->bus, realphy, regnum, val);
>  }
>  
>  static void
> 


-- 
Florian

^ permalink raw reply

* [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
From: John David Anglin @ 2019-02-04 21:59 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <53b49df8-53ed-704f-9197-230b18d83090@bell.net>

This change fixes a race condition in the setup of hardware irqs and the
code enabling PHY link detection in the mv88e6xxx driver.

This race was observed on the espressobin board where the GPIO interrupt
controller only supports edge interrupts.  If the INTn output pin goes low
before the GPIO interrupt is enabled, PHY link interrupts are not detected.

With this change, we
1) force INTn high by clearing all interrupt enables in global 1 control 1,
2) setup the hardware irq, and then
3) perform the remaining common setup.

This simplifies the setup and allows some unnecessary code to be removed.

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
b/drivers/net/dsa/mv88e6xxx/chip.c
index b2a0e59b6252..9f5c416a3223 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -374,10 +374,29 @@ static void mv88e6xxx_g1_irq_free(struct
mv88e6xxx_chip *chip)
     mutex_unlock(&chip->reg_lock);
 }
 
+static int mv88e6xxx_g1_irq_setup_masks(struct mv88e6xxx_chip *chip)
+{
+    int err;
+    u16 reg;
+
+    /* The INTn output must be high when hardware interrupts are setup.
+       The EEPROM done interrupt enable is set on reset, so clear all
+       interrupt enable bits to ensure INTn is not driven low */
+    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &reg);
+    if (err)
+        return err;
+    reg &= ~GENMASK(chip->info->g1_irqs, 0);
+    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, reg);
+    if (err)
+        return err;
+
+    /* Reading the interrupt status clears (most of) them */
+    return mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
+}
+
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 {
-    int err, irq, virq;
-    u16 reg, mask;
+    int irq;
 
     chip->g1_irq.nirqs = chip->info->g1_irqs;
     chip->g1_irq.domain = irq_domain_add_simple(
@@ -392,43 +411,14 @@ static int mv88e6xxx_g1_irq_setup_common(struct
mv88e6xxx_chip *chip)
     chip->g1_irq.chip = mv88e6xxx_g1_irq_chip;
     chip->g1_irq.masked = ~0;
 
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
-    if (err)
-        goto out_mapping;
-
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-
-    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-    if (err)
-        goto out_disable;
-
-    /* Reading the interrupt status clears (most of) them */
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
-    if (err)
-        goto out_disable;
-
     return 0;
-
-out_disable:
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-    mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-
-out_mapping:
-    for (irq = 0; irq < 16; irq++) {
-        virq = irq_find_mapping(chip->g1_irq.domain, irq);
-        irq_dispose_mapping(virq);
-    }
-
-    irq_domain_remove(chip->g1_irq.domain);
-
-    return err;
 }
 
 static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip *chip)
 {
     int err;
 
-    err = mv88e6xxx_g1_irq_setup_common(chip);
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
     if (err)
         return err;
 
@@ -437,9 +427,9 @@ static int mv88e6xxx_g1_irq_setup(struct
mv88e6xxx_chip *chip)
                    IRQF_ONESHOT | IRQF_SHARED,
                    dev_name(chip->dev), chip);
     if (err)
-        mv88e6xxx_g1_irq_free_common(chip);
+        return err;
 
-    return err;
+    return mv88e6xxx_g1_irq_setup_common(chip);
 }
 
 static void mv88e6xxx_irq_poll(struct kthread_work *work)
@@ -457,6 +447,10 @@ static int mv88e6xxx_irq_poll_setup(struct
mv88e6xxx_chip *chip)
 {
     int err;
 
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
+    if (err)
+        return err;
+
     err = mv88e6xxx_g1_irq_setup_common(chip);
     if (err)
         return err;

Signed-off-by: John David Anglin <dave.anglin@bell.net>

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply related

* [PATCH v2] net: emac: remove IBM_EMAC_RX_SKB_HEADROOM
From: Christian Lamparter @ 2019-02-04 21:58 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller

The EMAC driver had a custom IBM_EMAC_RX_SKB_HEADROOM
Kconfig option that reserved additional skb headroom for RX.
This patch removes the option and migrates the code
to use napi_alloc_skb() and netdev_alloc_skb_ip_align()
in its place.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
---
 drivers/net/ethernet/ibm/emac/Kconfig | 12 -----
 drivers/net/ethernet/ibm/emac/core.c  | 64 ++++++++++++++++++---------
 drivers/net/ethernet/ibm/emac/core.h  | 10 ++---
 3 files changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/Kconfig b/drivers/net/ethernet/ibm/emac/Kconfig
index 90d49191beb3..eacf7e141fdc 100644
--- a/drivers/net/ethernet/ibm/emac/Kconfig
+++ b/drivers/net/ethernet/ibm/emac/Kconfig
@@ -28,18 +28,6 @@ config IBM_EMAC_RX_COPY_THRESHOLD
 	depends on IBM_EMAC
 	default "256"
 
-config IBM_EMAC_RX_SKB_HEADROOM
-	int "Additional RX skb headroom (bytes)"
-	depends on IBM_EMAC
-	default "0"
-	help
-	  Additional receive skb headroom. Note, that driver
-	  will always reserve at least 2 bytes to make IP header
-	  aligned, so usually there is no need to add any additional
-	  headroom.
-
-	  If unsure, set to 0.
-
 config IBM_EMAC_DEBUG
 	bool "Debugging"
 	depends on IBM_EMAC
diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 209255495bc9..5fc5fa37d305 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -1071,7 +1071,9 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 
 	/* Second pass, allocate new skbs */
 	for (i = 0; i < NUM_RX_BUFF; ++i) {
-		struct sk_buff *skb = alloc_skb(rx_skb_size, GFP_ATOMIC);
+		struct sk_buff *skb;
+
+		skb = netdev_alloc_skb_ip_align(dev->ndev, rx_skb_size);
 		if (!skb) {
 			ret = -ENOMEM;
 			goto oom;
@@ -1080,10 +1082,10 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 		BUG_ON(!dev->rx_skb[i]);
 		dev_kfree_skb(dev->rx_skb[i]);
 
-		skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
 		dev->rx_desc[i].data_ptr =
-		    dma_map_single(&dev->ofdev->dev, skb->data - 2, rx_sync_size,
-				   DMA_FROM_DEVICE) + 2;
+		    dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
+				   rx_sync_size, DMA_FROM_DEVICE)
+				   + NET_IP_ALIGN;
 		dev->rx_skb[i] = skb;
 	}
  skip:
@@ -1174,20 +1176,18 @@ static void emac_clean_rx_ring(struct emac_instance *dev)
 	}
 }
 
-static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
-				    gfp_t flags)
+static inline int
+__emac_prepare_rx_skb(struct sk_buff *skb, struct emac_instance *dev, int slot)
 {
-	struct sk_buff *skb = alloc_skb(dev->rx_skb_size, flags);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
 	dev->rx_skb[slot] = skb;
 	dev->rx_desc[slot].data_len = 0;
 
-	skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
 	dev->rx_desc[slot].data_ptr =
-	    dma_map_single(&dev->ofdev->dev, skb->data - 2, dev->rx_sync_size,
-			   DMA_FROM_DEVICE) + 2;
+	    dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
+			   dev->rx_sync_size, DMA_FROM_DEVICE) + NET_IP_ALIGN;
 	wmb();
 	dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
 	    (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
@@ -1195,6 +1195,27 @@ static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
 	return 0;
 }
 
+static inline int
+emac_alloc_rx_skb(struct emac_instance *dev, int slot)
+{
+	struct sk_buff *skb;
+
+	skb = __netdev_alloc_skb_ip_align(dev->ndev, dev->rx_skb_size,
+					  GFP_KERNEL);
+
+	return __emac_prepare_rx_skb(skb, dev, slot);
+}
+
+static inline int
+emac_alloc_rx_skb_napi(struct emac_instance *dev, int slot)
+{
+	struct sk_buff *skb;
+
+	skb = napi_alloc_skb(&dev->mal->napi, dev->rx_skb_size);
+
+	return __emac_prepare_rx_skb(skb, dev, slot);
+}
+
 static void emac_print_link_status(struct emac_instance *dev)
 {
 	if (netif_carrier_ok(dev->ndev))
@@ -1225,7 +1246,7 @@ static int emac_open(struct net_device *ndev)
 
 	/* Allocate RX ring */
 	for (i = 0; i < NUM_RX_BUFF; ++i)
-		if (emac_alloc_rx_skb(dev, i, GFP_KERNEL)) {
+		if (emac_alloc_rx_skb(dev, i)) {
 			printk(KERN_ERR "%s: failed to allocate RX ring\n",
 			       ndev->name);
 			goto oom;
@@ -1660,8 +1681,9 @@ static inline void emac_recycle_rx_skb(struct emac_instance *dev, int slot,
 	DBG2(dev, "recycle %d %d" NL, slot, len);
 
 	if (len)
-		dma_map_single(&dev->ofdev->dev, skb->data - 2,
-			       EMAC_DMA_ALIGN(len + 2), DMA_FROM_DEVICE);
+		dma_map_single(&dev->ofdev->dev, skb->data - NET_IP_ALIGN,
+			       SKB_DATA_ALIGN(len + NET_IP_ALIGN),
+			       DMA_FROM_DEVICE);
 
 	dev->rx_desc[slot].data_len = 0;
 	wmb();
@@ -1713,7 +1735,7 @@ static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
 		int len = dev->rx_desc[slot].data_len;
 		int tot_len = dev->rx_sg_skb->len + len;
 
-		if (unlikely(tot_len + 2 > dev->rx_skb_size)) {
+		if (unlikely(tot_len + NET_IP_ALIGN > dev->rx_skb_size)) {
 			++dev->estats.rx_dropped_mtu;
 			dev_kfree_skb(dev->rx_sg_skb);
 			dev->rx_sg_skb = NULL;
@@ -1769,16 +1791,18 @@ static int emac_poll_rx(void *param, int budget)
 		}
 
 		if (len && len < EMAC_RX_COPY_THRESH) {
-			struct sk_buff *copy_skb =
-			    alloc_skb(len + EMAC_RX_SKB_HEADROOM + 2, GFP_ATOMIC);
+			struct sk_buff *copy_skb;
+
+			copy_skb = napi_alloc_skb(&dev->mal->napi, len);
 			if (unlikely(!copy_skb))
 				goto oom;
 
-			skb_reserve(copy_skb, EMAC_RX_SKB_HEADROOM + 2);
-			memcpy(copy_skb->data - 2, skb->data - 2, len + 2);
+			memcpy(copy_skb->data - NET_IP_ALIGN,
+			       skb->data - NET_IP_ALIGN,
+			       len + NET_IP_ALIGN);
 			emac_recycle_rx_skb(dev, slot, len);
 			skb = copy_skb;
-		} else if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC)))
+		} else if (unlikely(emac_alloc_rx_skb_napi(dev, slot)))
 			goto oom;
 
 		skb_put(skb, len);
@@ -1799,7 +1823,7 @@ static int emac_poll_rx(void *param, int budget)
 	sg:
 		if (ctrl & MAL_RX_CTRL_FIRST) {
 			BUG_ON(dev->rx_sg_skb);
-			if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC))) {
+			if (unlikely(emac_alloc_rx_skb_napi(dev, slot))) {
 				DBG(dev, "rx OOM %d" NL, slot);
 				++dev->estats.rx_dropped_oom;
 				emac_recycle_rx_skb(dev, slot, 0);
diff --git a/drivers/net/ethernet/ibm/emac/core.h b/drivers/net/ethernet/ibm/emac/core.h
index 84caa4a3fc52..187689cd8212 100644
--- a/drivers/net/ethernet/ibm/emac/core.h
+++ b/drivers/net/ethernet/ibm/emac/core.h
@@ -68,22 +68,18 @@ static inline int emac_rx_size(int mtu)
 		return mal_rx_size(ETH_DATA_LEN + EMAC_MTU_OVERHEAD);
 }
 
-#define EMAC_DMA_ALIGN(x)		ALIGN((x), dma_get_cache_alignment())
-
-#define EMAC_RX_SKB_HEADROOM		\
-	EMAC_DMA_ALIGN(CONFIG_IBM_EMAC_RX_SKB_HEADROOM)
-
 /* Size of RX skb for the given MTU */
 static inline int emac_rx_skb_size(int mtu)
 {
 	int size = max(mtu + EMAC_MTU_OVERHEAD, emac_rx_size(mtu));
-	return EMAC_DMA_ALIGN(size + 2) + EMAC_RX_SKB_HEADROOM;
+
+	return SKB_DATA_ALIGN(size + NET_IP_ALIGN) + NET_SKB_PAD;
 }
 
 /* RX DMA sync size */
 static inline int emac_rx_sync_size(int mtu)
 {
-	return EMAC_DMA_ALIGN(emac_rx_size(mtu) + 2);
+	return SKB_DATA_ALIGN(emac_rx_size(mtu) + NET_IP_ALIGN);
 }
 
 /* Driver statistcs is split into two parts to make it more cache friendly:
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH rdma-next 00/12] Add SRQ and XRC support for ODP MRs
From: Jason Gunthorpe @ 2019-02-04 21:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190203105430.GF3634@mtr-leonro.mtl.com>

On Sun, Feb 03, 2019 at 12:54:30PM +0200, Leon Romanovsky wrote:
> On Thu, Jan 31, 2019 at 04:27:39PM -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 22, 2019 at 08:48:39AM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > >
> > > Hi,
> > >
> > > This series extend ODP to work with SRQ and XRC. Being both per-operation
> > > (e.g. RDMA write, RDMA read and atomic) and per-transport (e.g. RC, UD and XRC),
> > > we extend IB/core and mlx5 driver to provide needed information to user space.
> > >
> > > Thanks
> > >
> > > Moni Shoua (12):
> > >   IB/mlx5: Fix locking SRQ object in ODP event
> > >   IB/core: Allocate bit for SRQ ODP support
> > >   IB/uverbs: Expose XRC ODP device capabilities
> > >   IB/mlx5: Remove useless check in ODP handler
> > >   IB/mlx5: Clean mlx5_ib_mr_responder_pfault_handler() signature
> > >   IB/mlx5: Add XRC initiator ODP support
> > >   IB/mlx5: Let read user wqe also from SRQ buffer
> > >   IB/mlx5: Add ODP SRQ support
> > >   IB/mlx5: Advertise SRQ ODP support for supported transports
> >
> > I applied these patches to for-next
> >
> > >   net/mlx5: Add XRC transport to ODP device capabilities layout
> > >   IB/mlx5: Advertise XRC ODP support
> > >   net/mlx5: Set ODP SRQ support in firmware
> >
> > This might need some re-organizing - the last patch could be split
> > (possibly merge with the first) so the header changes can go to the
> > shared branch, but the handle_hca_cap_odp() stuff must only be applied
> > to the rdma tree.
> >
> > I'm fine either way, if you don't want to split it send a commit ID
> > for the first patch on mlx5-next.
> 
> I applied two following patches,
> 
> 46861e3e88be net/mlx5: Set ODP SRQ support in firmware
> dda7a817f287 net/mlx5: Add XRC transport to ODP device capabilities layout

Okay, done..

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <20190204212832.GB3397@lunn.ch>

On 04.02.2019 22:28, Andrew Lunn wrote:
> On Mon, Feb 04, 2019 at 10:03:21PM +0100, Heiner Kallweit wrote:
>> Add support for speeds 10Mbps, 5Gbps, and 10Gbps. In addition don't
>> hardcode duplex but read it from the chip.
> 
> Hi Heiner
> 
> The marvell10g does this differently. It gets the local and link
> partner advertised link modes and from that works out what the PHY is
> doing. If auto-neg is not being used, it then reads the link speed
> from the PMA.
> 
Right, it's the same mechanism we use in genphy_read_status() for
clause 22.

> The question is, should the Aquantia PHY do the same, or should it
> look an vendor registers? Apart from getting the 1G advertisement, all
> the Marvell code uses generic registers. So we should be able to move
> most of it into phy-c45 and reuse it. That is what i would prefer.
> 
I'd like to use standard registers wherever possible. This patch is
meant as a quick win to improve what we do already in aqr_read_status.
Once we have a generic c45 read_status function we should switch to it.
However I assume that information like interface mode we still have
to read from vendor registers.

>      Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
From: John David Anglin @ 2019-02-04 21:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190204201905.GA2488@lunn.ch>

On 2019-02-04 3:19 p.m., Andrew Lunn wrote:
> The IRQ core would do this if it was needed.
>
> How many other irq thread work functions can you point to which do
> something similar?
This is comment for handle_edge_irq:

/**
 *    handle_edge_irq - edge type IRQ handler
 *    @desc:    the interrupt description structure for this irq
 *
 *    Interrupt occures on the falling and/or rising edge of a hardware
 *    signal. The occurrence is latched into the irq controller hardware
 *    and must be acked in order to be reenabled. After the ack another
 *    interrupt can happen on the same source even before the first one
 *    is handled by the associated event handler. If this happens it
 *    might be necessary to disable (mask) the interrupt depending on the
 *    controller hardware. This requires to reenable the interrupt inside
 *    of the loop which handles the interrupts which have arrived while
 *    the handler was running. If all pending interrupts are handled, the
 *    loop is left.
 */

As can be seen, the above comment suggests that it may be necessary to
disable (mask) interrupt
as I proposed.

I see no evidence from the Marvell functional specifications for the
88E6341 that it sequences
interrupts from the various sources although it might be that device
interrupts are sequenced
so INTn rises and falls.  I haven't seen any ports fail to link without
the hunk on espressobin
but it is hard to stress test the code.

Disabling and re-enabling interrupts in the global control register does
not affect their status.
Thus, at worst, the hunk adds a bit of unnecessary code.  It could be
skipped if we knew we
were using level interrupts.

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply

* [PATCH v1] net: dsa: qca8k: implement DT-based ports <-> phy translation
From: Christian Lamparter @ 2019-02-04 21:35 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, Vivien Didelot, Andrew Lunn

The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4.
Based on the System Block Diagram in Section 1.2 of the
QCA8337's datasheet. These PHYs are internally connected
to MACs of PORT 1 - PORT 5. However, neither qca8k's slave
mdio access functions qca8k_phy_read()/qca8k_phy_write()
nor the dsa framework is set up for that.

This version of the patch uses the existing phy-handle
properties of each specified DSA Port in the DT to map
each PORT/MAC to its exposed PHY on the MDIO bus. This
is supported by the current binding document qca8k.txt
as well.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
---
 drivers/net/dsa/qca8k.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index a4b6cda38016..6558b7ed855d 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -11,6 +11,7 @@
 #include <linux/netdevice.h>
 #include <net/dsa.h>
 #include <linux/of_net.h>
+#include <linux/of_mdio.h>
 #include <linux/of_platform.h>
 #include <linux/if_bridge.h>
 #include <linux/mdio.h>
@@ -612,20 +613,50 @@ qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy)
 	qca8k_port_set_status(priv, port, 1);
 }
 
+static int
+qca8k_to_real_phy(struct dsa_switch *ds, int phy)
+{
+	struct device_node *phy_dn, *port_dn;
+	int id;
+
+	if (phy >= ds->num_ports)
+		return -EINVAL;
+
+	port_dn = ds->ports[phy].dn;
+	if (!port_dn)
+		return -EINVAL;
+
+	phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
+	if (!phy_dn)
+		return phy;
+
+	id = of_mdio_parse_addr(ds->dev, phy_dn);
+	of_node_put(phy_dn);
+	return id;
+}
+
 static int
 qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
+	int realphy = qca8k_to_real_phy(ds, phy);
+
+	if (realphy < 0)
+		return realphy;
 
-	return mdiobus_read(priv->bus, phy, regnum);
+	return mdiobus_read(priv->bus, realphy, regnum);
 }
 
 static int
 qca8k_phy_write(struct dsa_switch *ds, int phy, int regnum, u16 val)
 {
 	struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
+	int realphy = qca8k_to_real_phy(ds, phy);
+
+	if (realphy < 0)
+		return realphy;
 
-	return mdiobus_write(priv->bus, phy, regnum, val);
+	return mdiobus_write(priv->bus, realphy, regnum, val);
 }
 
 static void
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 0/7] sh_eth: implement simple RX checksum offload
From: David Miller @ 2019-02-04 21:31 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-renesas-soc, linux-sh
In-Reply-To: <a21deed1-35dc-f1be-6c7e-7061ebe4b56c@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Mon, 4 Feb 2019 21:01:25 +0300

> Here's a set of 7 patches against DaveM's 'net-next.git' repo. I'm implemeting
> the simple RX checksum offload (like was done for the 'ravb' driver by Simon
> Horman); it has been only tested on the R8A7740 and R8A77980 SoCs, the other
> SoCs should just work (according to their manuals)...

Series applied, thanks.

There was a "tha" --> "the" typo in one of your commit messages which I
fixed up.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] net: phy: aquantia: use FIELD_GET for getting speed in aqr_read_status
From: Andrew Lunn @ 2019-02-04 21:31 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <f3554610-2a8b-dec8-daad-fd592404dc29@gmail.com>

On Mon, Feb 04, 2019 at 10:09:06PM +0100, Heiner Kallweit wrote:
> Change getting the speed to use FIELD_GET() too to be in line with the
> rest of the code.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Andrew Lunn @ 2019-02-04 21:28 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Nikita Yushchenko,
	netdev@vger.kernel.org
In-Reply-To: <d1f1160c-ebea-0e7b-4d73-a27ebbd5c199@gmail.com>

On Mon, Feb 04, 2019 at 10:03:21PM +0100, Heiner Kallweit wrote:
> Add support for speeds 10Mbps, 5Gbps, and 10Gbps. In addition don't
> hardcode duplex but read it from the chip.

Hi Heiner

The marvell10g does this differently. It gets the local and link
partner advertised link modes and from that works out what the PHY is
doing. If auto-neg is not being used, it then reads the link speed
from the PMA.

The question is, should the Aquantia PHY do the same, or should it
look an vendor registers? Apart from getting the 1G advertisement, all
the Marvell code uses generic registers. So we should be able to move
most of it into phy-c45 and reuse it. That is what i would prefer.

     Andrew

^ permalink raw reply

* Re: [PATCH mlx5-next 12/12] net/mlx5: Set ODP SRQ support in firmware
From: Jason Gunthorpe @ 2019-02-04 21:23 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190203090311.GD3634@mtr-leonro.mtl.com>


On Sun, Feb 03, 2019 at 11:03:11AM +0200, Leon Romanovsky wrote:
> On Thu, Jan 31, 2019 at 04:28:44PM -0700, Jason Gunthorpe wrote:
> > On Tue, Jan 22, 2019 at 08:48:51AM +0200, Leon Romanovsky wrote:
> > > From: Moni Shoua <monis@mellanox.com>
> > >
> > > To avoid compatibility issue with older kernels the firmware doesn't
> > > allow SRQ to work with ODP unless kernel asks for it.
> > >
> > > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >  .../net/ethernet/mellanox/mlx5/core/main.c    | 53 +++++++++++++++++++
> > >  include/linux/mlx5/device.h                   |  3 ++
> > >  include/linux/mlx5/mlx5_ifc.h                 |  1 +
> > >  3 files changed, 57 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > index be81b319b0dc..b3a76df0cf6c 100644
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > @@ -459,6 +459,53 @@ static int handle_hca_cap_atomic(struct mlx5_core_dev *dev)
> > >  	return err;
> > >  }
> > >
> > > +static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
> > > +{
> > > +	void *set_ctx;
> > > +	void *set_hca_cap;
> > > +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > > +	int err;
> > > +
> > > +	if (!MLX5_CAP_GEN(dev, pg))
> > > +		return 0;
> >
> > Should a
> >
> >     if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING))
> >              return 0;
> >
> > Be here?
> 
> We had similar discussion in mlx5_ib main.c, but here we are talking
> about mlx5_core code, which from my point of view should represent the
> real HW capabilities without relation to kernel compilation mode.

This switch is to tell the FW that the mlx5_ib module supports the new
protocol - so having it in core code at all is really weird. I assume
there is some startup sequence reason?

Since the modularity is already wrecked it seems like an odd
reason not to add the if..

Jason

^ permalink raw reply

* [PATCH 3/3 net-next] net: phy: aquantia: use FIELD_GET for getting speed in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:09 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org
In-Reply-To: <8e41695d-a23e-adad-ae3d-66a46d1ab077@gmail.com>

Change getting the speed to use FIELD_GET() too to be in line with the
rest of the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/aquantia.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 7dfcbae4f..d8332b128 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -21,13 +21,13 @@
 #define PHY_ID_AQR405	0x03a1b4b0
 
 #define MDIO_AN_TX_VEND_STATUS1			0xc800
-#define MDIO_AN_TX_VEND_STATUS1_10BASET		(0x0 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_100BASETX	(0x1 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_1000BASET	(0x2 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_10GBASET	(0x3 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_2500BASET	(0x4 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_5000BASET	(0x5 << 1)
-#define MDIO_AN_TX_VEND_STATUS1_RATE_MASK	(0x7 << 1)
+#define MDIO_AN_TX_VEND_STATUS1_RATE_MASK	GENMASK(3, 1)
+#define MDIO_AN_TX_VEND_STATUS1_10BASET		0
+#define MDIO_AN_TX_VEND_STATUS1_100BASETX	1
+#define MDIO_AN_TX_VEND_STATUS1_1000BASET	2
+#define MDIO_AN_TX_VEND_STATUS1_10GBASET	3
+#define MDIO_AN_TX_VEND_STATUS1_2500BASET	4
+#define MDIO_AN_TX_VEND_STATUS1_5000BASET	5
 #define MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX	BIT(0)
 
 #define MDIO_AN_TX_VEND_INT_STATUS2		0xcc01
@@ -148,7 +148,7 @@ static int aqr_read_status(struct phy_device *phydev)
 	mdelay(10);
 	reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_STATUS1);
 
-	switch (reg & MDIO_AN_TX_VEND_STATUS1_RATE_MASK) {
+	switch (FIELD_GET(MDIO_AN_TX_VEND_STATUS1_RATE_MASK, reg)) {
 	case MDIO_AN_TX_VEND_STATUS1_10GBASET:
 		phydev->speed = SPEED_10000;
 		break;
-- 
2.20.1



^ permalink raw reply related

* [PATCH 2/3 net-next] net: phy: aquantia: set interface mode in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:07 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org
In-Reply-To: <8e41695d-a23e-adad-ae3d-66a46d1ab077@gmail.com>

Extend aqr_read_status to set the interface mode properly.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/aquantia.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 51ae3feea..281873c59 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/phy.h>
+#include <linux/bitfield.h>
 
 #define PHY_ID_AQ1202	0x03a1b445
 #define PHY_ID_AQ2104	0x03a1b460
@@ -34,6 +35,21 @@
 #define MDIO_AN_TX_VEND_INT_MASK2		0xd401
 #define MDIO_AN_TX_VEND_INT_MASK2_LINK		BIT(0)
 
+/* PHY XS System Interface Connection Status */
+#define MDIO_XS_SYSIF_STATUS			0xe812
+#define MDIO_XS_SYSIF_MODE_MASK			GENMASK(7, 3)
+#define MDIO_XS_SYSIF_MODE_BACKPLANE_KR		0
+#define MDIO_XS_SYSIF_MODE_BACKPLANE_KX		1
+#define MDIO_XS_SYSIF_MODE_XFI			2
+#define MDIO_XS_SYSIF_MODE_USXGMII		3
+#define MDIO_XS_SYSIF_MODE_XAUI			4
+#define MDIO_XS_SYSIF_MODE_XAUI_PAUSE		5
+#define MDIO_XS_SYSIF_MODE_SGMII		6
+#define MDIO_XS_SYSIF_MODE_RXAUI		7
+#define MDIO_XS_SYSIF_MODE_MAC			8
+#define MDIO_XS_SYSIF_MODE_OFF			9
+#define MDIO_XS_SYSIF_MODE_OCSGMII		10
+
 /* Vendor specific 1, MDIO_MMD_VEND1 */
 #define VEND1_GLOBAL_INT_STD_STATUS		0xfc00
 #define VEND1_GLOBAL_INT_VEND_STATUS		0xfc01
@@ -158,6 +174,27 @@ static int aqr_read_status(struct phy_device *phydev)
 
 	phydev->duplex = !!(reg & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX);
 
+	reg = phy_read_mmd(phydev, MDIO_MMD_PHYXS, MDIO_XS_SYSIF_STATUS);
+
+	switch (FIELD_GET(MDIO_XS_SYSIF_MODE_MASK, reg)) {
+	case MDIO_XS_SYSIF_MODE_BACKPLANE_KR:
+		phydev->interface = PHY_INTERFACE_MODE_10GKR;
+		break;
+	case MDIO_XS_SYSIF_MODE_SGMII:
+		phydev->interface = PHY_INTERFACE_MODE_SGMII;
+		break;
+	case MDIO_XS_SYSIF_MODE_XAUI:
+	case MDIO_XS_SYSIF_MODE_XAUI_PAUSE:
+		phydev->interface = PHY_INTERFACE_MODE_XAUI;
+		break;
+	case MDIO_XS_SYSIF_MODE_RXAUI:
+		phydev->interface = PHY_INTERFACE_MODE_RXAUI;
+		break;
+	default:
+		phydev->interface = PHY_INTERFACE_MODE_NA;
+		break;
+	}
+
 	return 0;
 }
 
-- 
2.20.1



^ permalink raw reply related

* [PATCH 1/3 net-next] net: phy: aquantia: improve setting speed and duplex in aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:03 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org
In-Reply-To: <8e41695d-a23e-adad-ae3d-66a46d1ab077@gmail.com>

Add support for speeds 10Mbps, 5Gbps, and 10Gbps. In addition don't
hardcode duplex but read it from the chip.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/aquantia.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/aquantia.c b/drivers/net/phy/aquantia.c
index 482004efa..51ae3feea 100644
--- a/drivers/net/phy/aquantia.c
+++ b/drivers/net/phy/aquantia.c
@@ -133,6 +133,12 @@ static int aqr_read_status(struct phy_device *phydev)
 	reg = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_STATUS1);
 
 	switch (reg & MDIO_AN_TX_VEND_STATUS1_RATE_MASK) {
+	case MDIO_AN_TX_VEND_STATUS1_10GBASET:
+		phydev->speed = SPEED_10000;
+		break;
+	case MDIO_AN_TX_VEND_STATUS1_5000BASET:
+		phydev->speed = SPEED_5000;
+		break;
 	case MDIO_AN_TX_VEND_STATUS1_2500BASET:
 		phydev->speed = SPEED_2500;
 		break;
@@ -142,11 +148,15 @@ static int aqr_read_status(struct phy_device *phydev)
 	case MDIO_AN_TX_VEND_STATUS1_100BASETX:
 		phydev->speed = SPEED_100;
 		break;
+	case MDIO_AN_TX_VEND_STATUS1_10BASET:
+		phydev->speed = SPEED_10;
+		break;
 	default:
-		phydev->speed = SPEED_10000;
+		phydev->speed = SPEED_UNKNOWN;
 		break;
 	}
-	phydev->duplex = DUPLEX_FULL;
+
+	phydev->duplex = !!(reg & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX);
 
 	return 0;
 }
-- 
2.20.1



^ permalink raw reply related

* [PATCH 0/3 net-next] net: phy: aquantia: extend aqr_read_status
From: Heiner Kallweit @ 2019-02-04 21:02 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: Nikita Yushchenko, netdev@vger.kernel.org

Extend aqr_read_status to read more parameters from the chip.

Heiner Kallweit (3):
  net: phy: aquantia: improve setting speed and duplex in aqr_read_status
  net: phy: aquantia: set interface mode in aqr_read_status
  net: phy: aquantia: use FIELD_GET for getting speed in aqr_read_status

 drivers/net/phy/aquantia.c | 66 ++++++++++++++++++++++++++++++++------
 1 file changed, 56 insertions(+), 10 deletions(-)

-- 
2.20.1

^ permalink raw reply

* Re: [PATCH] bpf: test_maps: Avoid possible out of bound access
From: Daniel Borkmann @ 2019-02-04 20:57 UTC (permalink / raw)
  To: Breno Leitao, netdev; +Cc: ast
In-Reply-To: <1549297631-27789-1-git-send-email-leitao@debian.org>

On 02/04/2019 05:27 PM, Breno Leitao wrote:
> When compiling test_maps selftest with GCC-8, it warns that an array might
> be indexed with a negative value, which could cause a negative out of bound
> access, depending on parameters of the function. This is the GCC-8 warning:
> 
> 	gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf -I../../../../include/generated -DHAVE_GENHDR -I../../../include    test_maps.c /home/breno/Devel/linux/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /home/breno/Devel/linux/tools/testing/selftests/bpf/test_maps
> 	In file included from test_maps.c:16:
> 	test_maps.c: In function ‘run_all_tests’:
> 	test_maps.c:1079:10: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
> 	   assert(waitpid(pid[i], &status, 0) == pid[i]);
> 		  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	test_maps.c:1059:6: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
> 	   pid[i] = fork();
> 	   ~~~^~~
> 
> This patch simply guarantees that the tasks variable is unsigned, thus, it
> could never be a negative number, hence avoiding an out of bound access
> warning.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Thanks for the patch, small comment below:

>  tools/testing/selftests/bpf/test_maps.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index e2b9eee37187..1714e26f4a72 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -641,7 +641,7 @@ static void test_stackmap(int task, void *data)
>  #define SOCKMAP_PARSE_PROG "./sockmap_parse_prog.o"
>  #define SOCKMAP_VERDICT_PROG "./sockmap_verdict_prog.o"
>  #define SOCKMAP_TCP_MSG_PROG "./sockmap_tcp_msg_prog.o"
> -static void test_sockmap(int tasks, void *data)
> +static void test_sockmap(unsigned int tasks, void *data)

There are couple more test_*() functions that need to be converted if we do
the change to unsigned:

tools/testing/selftests/bpf/test_maps.c:48:static void test_hashmap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:138:static void test_hashmap_sizes(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:158:static void test_hashmap_percpu(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:285:static void test_hashmap_walk(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:356:static void test_arraymap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:411:static void test_arraymap_percpu(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:507:static void test_devmap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:522:static void test_queuemap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:580:static void test_stackmap(int task, void *data)
tools/testing/selftests/bpf/test_maps.c:645:static void test_sockmap(int tasks, void *data)

>  {
>  	struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_msg, *bpf_map_break;
>  	int map_fd_msg = 0, map_fd_rx = 0, map_fd_tx = 0, map_fd_break;
> @@ -1261,7 +1261,7 @@ static void test_map_large(void)
>  	printf("Fork %d tasks to '" #FN "'\n", N); \
>  	__run_parallel(N, FN, DATA)
>  
> -static void __run_parallel(int tasks, void (*fn)(int task, void *data),
> +static void __run_parallel(unsigned int tasks, void (*fn)(int task, void *data),

This would also need conversion to unsigned for the func arg above so that
we don't type mismatch.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-next 0/2] tools/bpf: expose several libbpf API functions
From: Alexei Starovoitov @ 2019-02-04 20:51 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team
In-Reply-To: <20190204190057.3965903-1-yhs@fb.com>

On Mon, Feb 4, 2019 at 12:27 PM Yonghong Song <yhs@fb.com> wrote:
>
> This patch set exposed a few functions in libbpf.
> All these newly added API functions are helpful for
> JIT based bpf compilation where .BTF and .BTF.ext
> are available as in-memory data blobs.
>
> Patch #1 exposed several btf_ext__* API functions which
> are used to handle .BTF.ext ELF sections.
> Patch #2 refactored the function bpf_map_find_btf_info()
> and exposed API function btf__get_map_kv_tids() to
> retrieve the map key/value type id's generated by
> bpf program through BPF_ANNOTATE_KV_PAIR macro.

Applied to bpf-next. Thanks!

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf, riscv: add BPF JIT for RV64G
From: Björn Töpel @ 2019-02-04 20:27 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: linux-riscv, ast, Netdev, Palmer Dabbelt, Christoph Hellwig
In-Reply-To: <88cdae60-494e-6294-b2c1-10b9cbeb95ac@iogearbox.net>

Den mån 4 feb. 2019 kl 21:06 skrev Daniel Borkmann <daniel@iogearbox.net>:
>
> On 02/03/2019 12:51 PM, bjorn.topel@gmail.com wrote:
> > From: Björn Töpel <bjorn.topel@gmail.com>
> >
> > This commit adds BPF JIT for RV64G.
> >
> > The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar
> > to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64).
> >
> > At the moment the RISC-V Linux port does not support HAVE_KPROBES,
> > which means that CONFIG_BPF_EVENTS is not supported. Thus, no tests
> > involving BPF_PROG_TYPE_TRACEPOINT passes.
> >
> > Further, the implementation does not support "far branching" (>4KiB).
> >
> > The implementation passes all the test_bpf.ko tests:
> >   test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
> >
> > All the tail_call tests in the selftest/bpf/test_verifier program
> > passes.
> >
> > All tests where done on QEMU (QEMU emulator version 3.1.50
> > (v3.1.0-688-g8ae951fbc106)).
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
>
> Some minor comments:
>
> Looks like all the BPF_JMP32 instructions are missing. Would probably
> make sense to include these into the initial merge as well unless there
> is some good reason not to; presumably the test_verifier parts with
> BPF_JMP32 haven't been tried out?
>

Yes indeed. My bad, I didn't realize that Jiong's patches were in the
tree! BPF_JMP32 should definitely be in the initial merge.

> [...]
> > +
> > +enum {
> > +     RV_CTX_F_SEEN_TAIL_CALL =       0,
> > +     RV_CTX_F_SEEN_CALL =            RV_REG_RA,
> > +     RV_CTX_F_SEEN_S1 =              RV_REG_S1,
> > +     RV_CTX_F_SEEN_S2 =              RV_REG_S2,
> > +     RV_CTX_F_SEEN_S3 =              RV_REG_S3,
> > +     RV_CTX_F_SEEN_S4 =              RV_REG_S4,
> > +     RV_CTX_F_SEEN_S5 =              RV_REG_S5,
> > +     RV_CTX_F_SEEN_S6 =              RV_REG_S6,
> > +};
> > +
> > +struct rv_jit_context {
> > +     struct bpf_prog *prog;
> > +     u32 *insns; /* RV insns */
> > +     int ninsns;
> > +     int epilogue_offset;
> > +     int *offset; /* BPF to RV */
> > +     unsigned long flags;
> > +     int stack_size;
> > +};
> > +
> > +struct rv_jit_data {
> > +     struct bpf_binary_header *header;
> > +     u8 *image;
> > +     struct rv_jit_context ctx;
> > +};
> > +
> > +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
> > +{
> > +     u8 reg = regmap[bpf_reg];
> > +
> > +     switch (reg) {
> > +     case RV_CTX_F_SEEN_S1:
> > +     case RV_CTX_F_SEEN_S2:
> > +     case RV_CTX_F_SEEN_S3:
> > +     case RV_CTX_F_SEEN_S4:
> > +     case RV_CTX_F_SEEN_S5:
> > +     case RV_CTX_F_SEEN_S6:
> > +             __set_bit(reg, &ctx->flags);
> > +     }
> > +     return reg;
> > +};
> > +
> > +static bool seen_reg(int reg, struct rv_jit_context *ctx)
> > +{
> > +     switch (reg) {
> > +     case RV_CTX_F_SEEN_CALL:
> > +     case RV_CTX_F_SEEN_S1:
> > +     case RV_CTX_F_SEEN_S2:
> > +     case RV_CTX_F_SEEN_S3:
> > +     case RV_CTX_F_SEEN_S4:
> > +     case RV_CTX_F_SEEN_S5:
> > +     case RV_CTX_F_SEEN_S6:
> > +             return test_bit(reg, &ctx->flags);
> > +     }
> > +     return false;
> > +}
> > +
> > +static void mark_call(struct rv_jit_context *ctx)
> > +{
> > +     __set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
> > +}
> > +
> > +static bool seen_call(struct rv_jit_context *ctx)
> > +{
> > +     return seen_reg(RV_REG_RA, ctx);
> > +}
>
> Just nit: probably might be more obvious to remove this asymmetry in
> seen_reg() and do __set_bit()/test_bit() for RV_CTX_F_SEEN_CALL similar
> like below.
>

Yeah, let's do that.

> > +static void mark_tail_call(struct rv_jit_context *ctx)
> > +{
> > +     __set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> > +}
> > +
> > +static bool seen_tail_call(struct rv_jit_context *ctx)
> > +{
> > +     return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> > +}
> > +
> > +static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
> > +{
> > +     mark_tail_call(ctx);
> > +
> > +     if (seen_call(ctx)) {
> > +             __set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
> > +             return RV_REG_S6;
> > +     }
> > +     return RV_REG_A6;
> > +}
> > +
> > +static void emit(const u32 insn, struct rv_jit_context *ctx)
> > +{
> > +     if (ctx->insns)
> > +             ctx->insns[ctx->ninsns] = insn;
> > +
> > +     ctx->ninsns++;
> > +}
> > +
> > +static u32 rv_r_insn(u8 funct7, u8 rs2, u8 rs1, u8 funct3, u8 rd, u8 opcode)
> > +{
> [...]
> > +     /* Allocate image, now that we know the size. */
> > +     image_size = sizeof(u32) * ctx->ninsns;
> > +     jit_data->header = bpf_jit_binary_alloc(image_size, &jit_data->image,
> > +                                             sizeof(u32),
> > +                                             bpf_fill_ill_insns);
> > +     if (!jit_data->header) {
> > +             prog = orig_prog;
> > +             goto out_offset;
> > +     }
> > +
> > +     /* Second, real pass, that acutally emits the image. */
> > +     ctx->insns = (u32 *)jit_data->image;
> > +skip_init_ctx:
> > +     ctx->ninsns = 0;
> > +
> > +     build_prologue(ctx);
> > +     if (build_body(ctx, extra_pass)) {
> > +             bpf_jit_binary_free(jit_data->header);
> > +             prog = orig_prog;
> > +             goto out_offset;
> > +     }
> > +     build_epilogue(ctx);
> > +
> > +     if (bpf_jit_enable > 1)
> > +             bpf_jit_dump(prog->len, image_size, 2, ctx->insns);
> > +
> > +     prog->bpf_func = (void *)ctx->insns;
> > +     prog->jited = 1;
> > +     prog->jited_len = image_size;
> > +
> > +     bpf_flush_icache(jit_data->header, (u8 *)ctx->insns + ctx->ninsns);
>
> Shouldn't this be '(u32 *)ctx->insns + ctx->ninsns' to cover the range?
>

Yikes! Indeed so, I'll make sure this is corrected!

Thanks for the comments!


Björn

> > +
> > +     if (!prog->is_func || extra_pass) {
> > +out_offset:
> > +             kfree(ctx->offset);
> > +             kfree(jit_data);
> > +             prog->aux->jit_data = NULL;
> > +     }
> > +out:
> > +     if (tmp_blinded)
> > +             bpf_jit_prog_release_other(prog, prog == orig_prog ?
> > +                                        tmp : orig_prog);
> > +     return prog;
> > +}
> >
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox