Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: Alexander Duyck @ 2019-08-20 15:35 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu
In-Reply-To: <20190820151611.10727-1-i.maximets@samsung.com>

On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>
> Tx code doesn't clear the descriptor status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the comletion queue ring.
>
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
>
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>

I'm not sure this is the best way to go. My preference would be to
have something in the ring that would prevent us from racing which I
don't think this really addresses. I am pretty sure this code is safe
on x86 but I would be worried about weak ordered systems such as
PowerPC.

It might make sense to look at adding the eop_desc logic like we have
in the regular path with a proper barrier before we write it and after
we read it. So for example we could hold of on writing the bytecount
value until the end of an iteration and call smp_wmb before we write
it. Then on the cleanup we could read it and if it is non-zero we take
an smp_rmb before proceeding further to process the Tx descriptor and
clearing the value. Otherwise this code is going to just keep popping
up with issues.

> ---
>
> Not tested yet because of lack of available hardware.
> So, testing is very welcome.
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      | 10 ++++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +-----------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 ++++--
>  3 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 39e73ad60352..0befcef46e80 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>  }
>
> +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring)
> +{
> +       unsigned int head, tail;
> +
> +       head = ring->next_to_clean;
> +       tail = ring->next_to_use;
> +
> +       return ((head <= tail) ? tail : tail + ring->count) - head;
> +}
> +
>  #define IXGBE_RX_DESC(R, i)        \
>         (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
>  #define IXGBE_TX_DESC(R, i)        \
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 7882148abb43..d417237857d8 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
>         return ring->stats.packets;
>  }
>
> -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
> -{
> -       unsigned int head, tail;
> -
> -       head = ring->next_to_clean;
> -       tail = ring->next_to_use;
> -
> -       return ((head <= tail) ? tail : tail + ring->count) - head;
> -}
> -
>  static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring)
>  {
>         u32 tx_done = ixgbe_get_tx_completed(tx_ring);
>         u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
> -       u32 tx_pending = ixgbe_get_tx_pending(tx_ring);
> +       u32 tx_pending = ixgbe_desc_used(tx_ring);
>
>         clear_check_for_tx_hang(tx_ring);
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 6b609553329f..7702efed356a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>         u32 i = tx_ring->next_to_clean, xsk_frames = 0;
>         unsigned int budget = q_vector->tx.work_limit;
>         struct xdp_umem *umem = tx_ring->xsk_umem;
> +       u32 used_descs = ixgbe_desc_used(tx_ring);
>         union ixgbe_adv_tx_desc *tx_desc;
>         struct ixgbe_tx_buffer *tx_bi;
>         bool xmit_done;
> @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>         tx_desc = IXGBE_TX_DESC(tx_ring, i);
>         i -= tx_ring->count;
>
> -       do {
> +       while (likely(budget && used_descs)) {
>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>                         break;
>
> @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>
>                 /* update budget accounting */
>                 budget--;
> -       } while (likely(budget));
> +               used_descs--;
> +       }
>
>         i += tx_ring->count;
>         tx_ring->next_to_clean = i;
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Andrew Lunn @ 2019-08-20 15:39 UTC (permalink / raw)
  To: megous
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820145343.29108-4-megous@megous.com>

On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> From: Ondrej Jirman <megous@megous.com>
> 
> Use devm_regulator_get instead of devm_regulator_get_optional and rely
> on dummy supply. This avoids NULL checks before regulator_enable/disable
> calls.

Hi Ondrej

What do you mean by a dummy supply? I'm just trying to make sure you
are not breaking backwards compatibility.

     Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-20 15:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Hubert Feurstein, netdev, lkml, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190820152306.GJ29991@lunn.ch>

On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > - take a second "post" system timestamp after the completion
> 
> For this hardware, completion is an interrupt, which has a lot of
> jitter on it. But this hardware is odd, in that it uses an
> interrupt. Every other MDIO bus controller uses polled IO, with an
> mdelay(10) or similar between each poll. So the jitter is going to be
> much larger.

I think a large jitter is ok in this case. We just need to timestamp
something that we know for sure happened after the PHC timestamp. It
should have no impact on the offset and its stability, just the
reported delay. A test with phc2sys should be able to confirm that.
phc2sys selects the measurement with the shortest delay, which has
least uncertainty. I'd say that applies to both interrupt and polling.

If it is difficult to specify the minimum interrupt delay, I'd still
prefer an overly pessimistic interval assuming a zero delay.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [v3] ocelot_ace: fix action of trap
From: Andrew Lunn @ 2019-08-20 15:45 UTC (permalink / raw)
  To: Allan W . Nielsen
  Cc: Yangbo Lu, netdev, David S . Miller, Alexandre Belloni,
	Microchip Linux Driver Support
In-Reply-To: <20190820070518.mypyahquun6t4yjf@lx-anielsen.microsemi.net>

On Tue, Aug 20, 2019 at 09:05:20AM +0200, Allan W . Nielsen wrote:
> Hi,
> 
> This is fixing a bug introduced in b596229448dd2

Hi Allan

You should express that as:

Fixes: b596229448dd ("net: mscc: ocelot: Add support for tcam")

       Andrew

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Ondřej Jirman @ 2019-08-20 15:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820153939.GL29991@lunn.ch>

Hi Andrew,

On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > From: Ondrej Jirman <megous@megous.com>
> > 
> > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > calls.
> 
> Hi Ondrej
> 
> What do you mean by a dummy supply? I'm just trying to make sure you
> are not breaking backwards compatibility.

Sorry, I mean dummy regulator. See:

https://elixir.bootlin.com/linux/latest/source/drivers/regulator/core.c#L1874

On systems that use DT (i.e. have_full_constraints() == true), when the
regulator is not found (ENODEV, not specified in DT), regulator_get will return
a fake dummy regulator that can be enabled/disabled, but doesn't do anything
real.

This can be used to avoid NULL checks and make the code simpler.

regards,
	Ondrej

>      Thanks
> 	Andrew

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Ondřej Jirman @ 2019-08-20 15:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820153939.GL29991@lunn.ch>

On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > From: Ondrej Jirman <megous@megous.com>
> > 
> > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > calls.
> 
> Hi Ondrej
> 
> What do you mean by a dummy supply? I'm just trying to make sure you
> are not breaking backwards compatibility.

I have tested it on Orange Pi PC 2, that uses only phy-supply, but not
phy-io-supply, and the kernel now prints:

[    1.410137] dwmac-sun8i 1c30000.ethernet: 1c30000.ethernet supply phy-io not found, using dummy regulator

I have also tested it on Orange Pi PC, that doesn't use external phy, and
instead of:

[    1.081378] dwmac-sun8i 1c30000.ethernet: No regulator found

The kernel now prints:

[    1.112752] dwmac-sun8i 1c30000.ethernet: 1c30000.ethernet supply phy not found, using dummy regulator
[    1.112814] dwmac-sun8i 1c30000.ethernet: 1c30000.ethernet supply phy-io not found, using dummy regulator

Ethernet works in both cases, so that should cover all existing combinations. :)

regards,
	Ondrej


>      Thanks
> 	Andrew

^ permalink raw reply

* Re: [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver
From: Vladimir Oltean @ 2019-08-20 15:57 UTC (permalink / raw)
  To: Mark Brown, Hubert Feurstein, mlichvar, Richard Cochran,
	Andrew Lunn, Florian Fainelli
  Cc: linux-spi, netdev
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

On Sun, 18 Aug 2019 at 21:26, Vladimir Oltean <olteanv@gmail.com> wrote:
>
> This patchset proposes an interface from the SPI subsystem for
> software timestamping SPI transfers. There is a default implementation
> provided in the core, as well as a mechanism for SPI slave drivers to
> check which byte was in fact timestamped post-facto. The patchset also
> adds the first user of this interface (the NXP DSPI driver in TCFQ mode).
>
> The interface is somewhat similar to Hubert Feurstein's proposal for the
> MDIO subsystem: https://lkml.org/lkml/2019/8/16/638
>
> Original cover letter below. Also provided at the end some results with
> an extra test (J - phc2sys using the timestamps taken by the SPI core).
>
> ===========================================================
>
> Continuing the discussion created by Hubert Feurstein around the
> mv88e6xxx driver for MDIO-controlled switches
> (https://lkml.org/lkml/2019/8/2/1364), this patchset takes a similar
> approach for the NXP LS1021A-TSN board, which has a SPI-controlled DSA
> switch (SJA1105).
>
> The patchset is motivated by some experiments done with a logic
> analyzer, trying to understand the source of latency (and especially of
> the jitter). SJA1105 SPI messages for reading the PTP clock are 12 bytes
> in length: 4 for the SPI header and 8 for the timestamp. When looking at
> the messages with a scope, there's jitter basically everywhere: between
> bits of a frame and between frames in a transfer. The inter-bit jitter
> is hardware and impacts us to a lesser extend (is smaller and caused by
> the PVT stability of the oscillators, PLLs, etc). We will focus on the
> latency between consecutive SPI frames within a 12-byte transfer.
>
> As a preface, revisions of the DSPI controller IP are integrated in many
> Freescale/NXP devices. As a result, the driver has 3 modes of operation:
> - TCFQ (Transfer Complete Flag mode): The controller signals software
>   that data has been sent/received after each individual word.
> - EOQ (End of Queue mode): The driver can implement batching by making
>   use of the controller's 4-word deep FIFO.
> - DMA (Direct Memory Access mode): The SPI controller's FIFO is no
>   longer in direct interaction with the driver, but is used to trigger
>   the RX and TX channels of the eDMA module on the SoC.
>
> In LS1021A, the driver works in the least efficient mode of the 3
> (TCFQ). There is a well-known errata that the DSPI controller is broken
> in conjunction with the eDMA module. As for the EOQ mode, I have tried
> unsuccessfully for a few days to make use of the 4 entry FIFO, and the
> hardware simply fails to reliably acknowledge the transmission when the
> FIFO gets full. So it looks like we're stuck with the TCFQ mode.
>
> The problem with phc2sys on the LS1021A-TSN board is that in order for
> the gettime64() call to complete on the sja1105, the system has to
> service 12 IRQs. Intuitively that is excessive and is the main source of
> jitter, but let's not get ahead of ourselves.
>
> An outline of the experiments that were done (unless otherwise
> mentioned, all of these ran for 120 seconds):
>
> A. First I have measured the (poor) performance of phc2sys under current
>    conditions. (DSPI driver in IRQ mode, no PTP system timestamping)
>
>    offset: min -53310 max 16107 mean -1737.18 std dev 11444.3
>    delay: min 163680 max 237360 mean 201149 std dev 22446.6
>    lost servo lock 1 times
>
> B. I switched the .gettime64 callback to .gettimex64, snapshotting the
>    PTP system timestamp within the sja1105 driver.
>
>    offset: min -48923 max 64217 mean -904.137 std dev 17358.1
>    delay: min 149600 max 203840 mean 169045 std dev 17993.3
>    lost servo lock 8 times
>
> C. I patched "struct spi_transfer" to contain the PTP system timestamp,
>    and from the sja1105 driver, I passed this structure to be
>    snapshotted by the SPI controller's driver (spi-fsl-dspi). This is
>    the "transfer-level" snapshot.
>
>    offset: min -64979 max 38979 mean -416.197 std dev 15367.9
>    delay: min 125120 max 168320 mean 150286 std dev 17675.3
>    lost servo lock 10 times
>
> D. I changed the placement of the transfer snapshotting within the DSPI
>    driver, from "transfer-level" to "byte-level".
>
>    offset: min -9021 max 7149 mean -0.418803 std dev 3529.81
>    delay: min 7840 max 23920 mean 14493.7 std dev 5982.17
>    lost servo lock 0 times
>
> E. I moved the DSPI driver to poll mode. I went back to collecting the
>    PTP system timestamps from the sja1105 driver (same as B).
>
>    offset: min -4199 max 46643 mean 418.214 std dev 4554.01
>    delay: min 84000 max 194000 mean 99463.2 std dev 12936.5
>    lost servo lock 1 times
>
> F. Transfer-level snapshotting in the DSPI driver (same as C), but in
>    poll mode.
>
>    offset: min -24244 max 1115 mean -230.478 std dev 2297.28
>    delay: min 69440 max 119040 mean 70312.9 std dev 8065.34
>    lost servo lock 1 times
>
> G. Byte-level snapshotting (same as D) but in poll mode.
>
>    offset: min -314 max 288 mean -2.48718 std dev 118.045
>    delay: min 4880 max 6000 mean 5118.63 std dev 507.258
>    lost servo lock 0 times
>
>    This seemed suspiciously good to me, so I let it run for longer
>    (58 minutes):
>
>    offset: min -26251 max 16416 mean -21.8672 std dev 863.416
>    delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
>    lost servo lock 3 times
>
> H. Transfer-level snapshotting (same as F), but with IRQs disabled.
>    This ran for 86 minutes.
>
>    offset: min -1927 max 1843 mean -0.209203 std dev 529.398
>    delay: min 85440 max 93680 mean 88245 std dev 1454.71
>    lost servo lock 0 times
>
> I. Byte-level snapshotting (same as G), but with IRQs disabled.
>    This ran for 102 minutes.
>
>    offset: min -378 max 381 mean -0.0083089 std dev 101.495
>    delay: min 4720 max 5920 mean 5129.38 std dev 154.899
>    lost servo lock 0 times
>
> J. Default snapshotting taken by the SPI core, with the DSPI driver
>    running in poll mode, IRQs enabled. This ran for 274 minutes.
>
>    offset: min -42568 max 44576 mean 2.91646 std dev 947.467
>    delay: min 58480 max 171040 mean 80750.7 std dev 2001.61
>    lost servo lock 3 times
>
> As a result, this patchset proposes the implementation of scenario I.
> The others were done through temporary patches which are not presented
> here due to the difficulty of presenting a coherent git history without
> resorting to reverts etc. The gist of each experiment should be clear
> though.
>
> The raw data is available for dissection at
> https://drive.google.com/open?id=1r9raU9ZeqOqkqts6Lb-ISf5ubLDLP3wk.
> The logic analyzer captures can be opened with a free-as-in-beer program
> provided by Saleae: https://www.saleae.com/downloads/.
>
> In the capture data one can find the MOSI, SCK SPI signals, as well as a
> debug GPIO which was toggled at the same time as the PTP system
> timestamp was taken, to give the viewer an impression of what the
> software is capturing compared to the actual timing of the SPI transfer.
>
> Attached are also some close-up screenshots of transfers where there is
> a clear and huge delay in-between frames of the same 12-byte SPI
> transfer. As it turns out, these were all caused by the CPU getting
> interrupted by some other IRQ. Approaches H and I are the only ones that
> get rid of these glitches. In theory, the byte-level snapshotting should
> be less vulnerable to an IRQ interrupting the SPI transfer (because the
> time window is much smaller) but as the 58 minutes experiment shows, it
> is not immune.
>
> Vladimir Oltean (5):
>   spi: Use an abbreviated pointer to ctlr->cur_msg in
>     __spi_pump_messages
>   spi: Add a PTP system timestamp to the transfer structure
>   spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
>   spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode
>   spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode
>     transfer
>
>  drivers/spi/spi-fsl-dspi.c | 117 +++++++++++++++++++++++++++++++------
>  drivers/spi/spi.c          |  85 +++++++++++++++++++++++----
>  include/linux/spi/spi.h    |  38 ++++++++++++
>  3 files changed, 210 insertions(+), 30 deletions(-)
>
> --
> 2.17.1
>

To the PTP/DSA people copied,

It's possible that I'm going to get a lot of hate for saying this, but
I think we're all missing the forest for the trees with these
ptp_system_timestamp patches.
I'm going to start off with 4 truisms:
- The best software timestamp is worse than a hardware timestamp
- DSA switches are switches that are connected to their host over Ethernet
- Ethernet has support for hardware timestamping
- The mess of taking precise hardware timestamps is well hidden from the kernel

You might see where I'm going.
Maybe this is all really a DSA-specific problem, and should be treated
as such (kept contained).
The claimed goal is to synchronize the DSA switches' time with phc2sys
to/from something else. The problem is that there is latency for
reading the PHC on a DSA device that is a discrete chip.
What if all we need is just a mini-"phc2sys-over-Ethernet" that runs
on a kernel thread internally to DSA? We say that DSA switches are
"slave" to the "master" netdevice - their PTP clock can be the same.
I am fairly confident that the sja1105 at least can be configured in
hardware to work in this mode. One just needs to enable the CPU port
in its own reachability matrix. None of the switch ports is really a
"CPU port" hardware speaking.
- TX timestamps are taken by installing a management route with a
specified port destination. That destination can be the port the frame
came from (the CPU port) if the above condition is met.
- RX timestamps are taken by the switch for frames matching one of 2
MAC filters. Then a short Ethernet frame containing metadata (RX
timestamp) is created and sent to the CPU port. If I enable RX
timestamping on the CPU port, then every management frame sent from
Linux will also generate an RX timestamp (as well as a TX timestamp,
but that is irrelevant).
I believe something similar should be possible on other hardware as well.

The kernel thread can loop back an Ethernet frame and use the 4
collected timestamps to calculate offset and delay.
The only question is how to manage the sync direction (DSA switch to
master, or vice-versa).
It is assumed that the master netdevice supports hardware timestamping
and has a PHC with lower access jitter. That might be a more common
thing than an MDIO or SPI controller with polled I/O and software
timestamping implemented.

Looking forward to comments. If I'm wrong and we do need to extend the
SPI and MDIO subsystems, then I'd better be wrong in any way we look
at the problem, because the alternative is rather intrusive and
tedious to do right (not to mention, not very reusable).

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Andrew Lunn @ 2019-08-20 15:57 UTC (permalink / raw)
  To: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820154714.2rt4ctovil5ol3u2@core.my.home>

On Tue, Aug 20, 2019 at 05:47:14PM +0200, Ondřej Jirman wrote:
> Hi Andrew,
> 
> On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> > On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > > From: Ondrej Jirman <megous@megous.com>
> > > 
> > > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > > calls.
> > 
> > Hi Ondrej
> > 
> > What do you mean by a dummy supply? I'm just trying to make sure you
> > are not breaking backwards compatibility.
> 
> Sorry, I mean dummy regulator. See:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/regulator/core.c#L1874
> 
> On systems that use DT (i.e. have_full_constraints() == true), when the
> regulator is not found (ENODEV, not specified in DT), regulator_get will return
> a fake dummy regulator that can be enabled/disabled, but doesn't do anything
> real.

Hi Ondrej

But we also gain a new warning:

	dev_warn(dev,
		 "%s supply %s not found, using dummy regulator\n",
	         devname, id);

This regulator is clearly optional, so there should not be a warning.

Maybe you can add a new get_type, OPTIONAL_GET, which does not issue
the warning, but does give back a dummy regulator.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: Ilya Maximets @ 2019-08-20 15:58 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu
In-Reply-To: <CAKgT0Udn0D0_f=SOH2wpBRWV_u4rb1Qe2h7gguXnRNzJ_VkRzg@mail.gmail.com>

On 20.08.2019 18:35, Alexander Duyck wrote:
> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>>
>> Tx code doesn't clear the descriptor status after cleaning.
>> So, if the budget is larger than number of used elems in a ring, some
>> descriptors will be accounted twice and xsk_umem_complete_tx will move
>> prod_tail far beyond the prod_head breaking the comletion queue ring.
>>
>> Fix that by limiting the number of descriptors to clean by the number
>> of used descriptors in the tx ring.
>>
>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> 
> I'm not sure this is the best way to go. My preference would be to
> have something in the ring that would prevent us from racing which I
> don't think this really addresses. I am pretty sure this code is safe
> on x86 but I would be worried about weak ordered systems such as
> PowerPC.
> 
> It might make sense to look at adding the eop_desc logic like we have
> in the regular path with a proper barrier before we write it and after
> we read it. So for example we could hold of on writing the bytecount
> value until the end of an iteration and call smp_wmb before we write
> it. Then on the cleanup we could read it and if it is non-zero we take
> an smp_rmb before proceeding further to process the Tx descriptor and
> clearing the value. Otherwise this code is going to just keep popping
> up with issues.

But, unlike regular case, xdp zero-copy xmit and clean for particular
tx ring always happens in the same NAPI context and even on the same
CPU core.

I saw the 'eop_desc' manipulations in regular case and yes, we could
use 'next_to_watch' field just as a flag of descriptor existence,
but it seems unnecessarily complicated. Am I missing something?

> 
>> ---
>>
>> Not tested yet because of lack of available hardware.
>> So, testing is very welcome.
>>
>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      | 10 ++++++++++
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +-----------
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 ++++--
>>  3 files changed, 15 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> index 39e73ad60352..0befcef46e80 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
>>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>>  }
>>
>> +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring)
>> +{
>> +       unsigned int head, tail;
>> +
>> +       head = ring->next_to_clean;
>> +       tail = ring->next_to_use;
>> +
>> +       return ((head <= tail) ? tail : tail + ring->count) - head;
>> +}
>> +
>>  #define IXGBE_RX_DESC(R, i)        \
>>         (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
>>  #define IXGBE_TX_DESC(R, i)        \
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index 7882148abb43..d417237857d8 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
>>         return ring->stats.packets;
>>  }
>>
>> -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
>> -{
>> -       unsigned int head, tail;
>> -
>> -       head = ring->next_to_clean;
>> -       tail = ring->next_to_use;
>> -
>> -       return ((head <= tail) ? tail : tail + ring->count) - head;
>> -}
>> -
>>  static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring)
>>  {
>>         u32 tx_done = ixgbe_get_tx_completed(tx_ring);
>>         u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
>> -       u32 tx_pending = ixgbe_get_tx_pending(tx_ring);
>> +       u32 tx_pending = ixgbe_desc_used(tx_ring);
>>
>>         clear_check_for_tx_hang(tx_ring);
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 6b609553329f..7702efed356a 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>         u32 i = tx_ring->next_to_clean, xsk_frames = 0;
>>         unsigned int budget = q_vector->tx.work_limit;
>>         struct xdp_umem *umem = tx_ring->xsk_umem;
>> +       u32 used_descs = ixgbe_desc_used(tx_ring);
>>         union ixgbe_adv_tx_desc *tx_desc;
>>         struct ixgbe_tx_buffer *tx_bi;
>>         bool xmit_done;
>> @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>         tx_desc = IXGBE_TX_DESC(tx_ring, i);
>>         i -= tx_ring->count;
>>
>> -       do {
>> +       while (likely(budget && used_descs)) {
>>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>>                         break;
>>
>> @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>
>>                 /* update budget accounting */
>>                 budget--;
>> -       } while (likely(budget));
>> +               used_descs--;
>> +       }
>>
>>         i += tx_ring->count;
>>         tx_ring->next_to_clean = i;
>> --
>> 2.17.1
>>
> 
> 

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: flow_offload: mangle 128-bit packet field with one action
From: Edward Cree @ 2019-08-20 16:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, vladbu
In-Reply-To: <20190820144453.ckme6oj2c4hmofhu@salvia>

On 20/08/2019 15:44, Pablo Neira Ayuso wrote:
> It looks to me this limitation is coming from tc pedit.
>
> Four actions to mangle an IPv6 address consume more memory when making
> the translation, and if you expect a lot of rules.
Your change means that now every pedit uses four hw entries, even if it
 was only meant to be a 32-bit mangle.  Host memory used to keep track of
 the pedit actions is cheap, hw entries in pedit tables are not.  Nor is
 driver implementation complexity.
NAK.

^ permalink raw reply

* Re: [PATCH 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Leonardo Bras @ 2019-08-20 16:15 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, linux-kernel,
	Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller
In-Reply-To: <20190820053607.GL2588@breakpoint.cc>

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

On Tue, 2019-08-20 at 07:36 +0200, Florian Westphal wrote:
> Wouldn't fib_netdev.c have the same problem?
Probably, but I haven't hit this issue yet.

> If so, might be better to place this test in both
> nft_fib6_eval_type and nft_fib6_eval.
I think that is possible, and not very hard to do.

But in my humble viewpoint, it looks like it's nft_fib_inet_eval() and
nft_fib_netdev_eval() have the responsibility to choose a valid
protocol or drop the package. 
I am not sure if it would be a good move to transfer this
responsibility to nft_fib6_eval_type() and nft_fib6_eval(), so I would
rather add the same test to nft_fib_netdev_eval().

Does it make sense?

Thanks for the feedback!

Leonardo Bras


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 1/6] dt-bindings: net: sun8i-a83t-emac: Add phy-supply property
From: Rob Herring @ 2019-08-20 16:18 UTC (permalink / raw)
  To: Ondrej Jirman
  Cc: David S. Miller, Mark Rutland, Maxime Ripard, Chen-Yu Tsai,
	Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
	netdev, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel@vger.kernel.org, linux-stm32
In-Reply-To: <20190820145343.29108-2-megous@megous.com>

On Tue, Aug 20, 2019 at 9:53 AM <megous@megous.com> wrote:
>
> From: Ondrej Jirman <megous@megous.com>
>
> This is already supported by the driver, but is missing from the
> bindings.

Really, the supply for the phy should be in the phy's node...

>
> Signed-off-by: Ondrej Jirman <megous@megous.com>
> ---
>  .../devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml    | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Ondřej Jirman @ 2019-08-20 16:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820155744.GN29991@lunn.ch>

Hi,

On Tue, Aug 20, 2019 at 05:57:44PM +0200, Andrew Lunn wrote:
> On Tue, Aug 20, 2019 at 05:47:14PM +0200, Ondřej Jirman wrote:
> > Hi Andrew,
> > 
> > On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> > > On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > > > From: Ondrej Jirman <megous@megous.com>
> > > > 
> > > > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > > > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > > > calls.
> > > 
> > > Hi Ondrej
> > > 
> > > What do you mean by a dummy supply? I'm just trying to make sure you
> > > are not breaking backwards compatibility.
> > 
> > Sorry, I mean dummy regulator. See:
> > 
> > https://elixir.bootlin.com/linux/latest/source/drivers/regulator/core.c#L1874
> > 
> > On systems that use DT (i.e. have_full_constraints() == true), when the
> > regulator is not found (ENODEV, not specified in DT), regulator_get will return
> > a fake dummy regulator that can be enabled/disabled, but doesn't do anything
> > real.
> 
> Hi Ondrej
> 
> But we also gain a new warning:
> 
> 	dev_warn(dev,
> 		 "%s supply %s not found, using dummy regulator\n",
> 	         devname, id);
> 
> This regulator is clearly optional, so there should not be a warning.
> 
> Maybe you can add a new get_type, OPTIONAL_GET, which does not issue
> the warning, but does give back a dummy regulator.

We already had a info message. See my other e-mail with the dmesg output.

IMO, that warning is useful during development, and more informative than the
previous one.

regards,
	o.

> Thanks
> 	Andrew
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/6] dt-bindings: net: sun8i-a83t-emac: Add phy-io-supply property
From: Rob Herring @ 2019-08-20 16:20 UTC (permalink / raw)
  To: Ondrej Jirman
  Cc: David S. Miller, Mark Rutland, Maxime Ripard, Chen-Yu Tsai,
	Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
	netdev, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel@vger.kernel.org, linux-stm32
In-Reply-To: <20190820145343.29108-3-megous@megous.com>

On Tue, Aug 20, 2019 at 9:53 AM <megous@megous.com> wrote:
>
> From: Ondrej Jirman <megous@megous.com>
>
> Some PHYs require separate power supply for I/O pins in some modes
> of operation. Add phy-io-supply property, to allow enabling this
> power supply.

Perhaps since this is new, such phys should have *-supply in their nodes.

>
> Signed-off-by: Ondrej Jirman <megous@megous.com>
> ---
>  .../devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml    | 4 ++++
>  1 file changed, 4 insertions(+)

^ permalink raw reply

* RE: [PATCH] net/ncsi: add control packet payload to NC-SI commands from netlink
From: Justin.Lee1 @ 2019-08-20 16:29 UTC (permalink / raw)
  To: benwei; +Cc: netdev, openbmc, linux-kernel, sam, davem, dkodihal
In-Reply-To: <CH2PR15MB3686A4CEF8FA3B567078B4A1A3A80@CH2PR15MB3686.namprd15.prod.outlook.com>

Hi Ben, 

> Hi Justin, 
> 
> > Hi Ben,
> >
> > I have similar fix locally with different approach as the command handler may have some expectation for those byes.
> > We can use NCSI_PKT_CMD_OEM handler as it only copies data based on the payload length.
> 
> Great! Yes I was thinking the same, we just need some way to take data payload sent from netlink message and sent it over NC-SI.
> 
> >
> > diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c index 5c3fad8..3b01f65 100644
> > --- a/net/ncsi/ncsi-cmd.c
> > +++ b/net/ncsi/ncsi-cmd.c
> > @@ -309,14 +309,19 @@ static struct ncsi_request *ncsi_alloc_command(struct ncsi_cmd_arg *nca)
> >  
> >  int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)  {
> > + struct ncsi_cmd_handler *nch = NULL;
> >         struct ncsi_request *nr;
> > + unsigned char type;
> >         struct ethhdr *eh;
> > -   struct ncsi_cmd_handler *nch = NULL;
> >         int i, ret;
> >  
> > + if (nca->req_flags == NCSI_REQ_FLAG_NETLINK_DRIVEN)
> > +         type = NCSI_PKT_CMD_OEM;
> > + else
> > +         type = nca->type;
> >         /* Search for the handler */
> >         for (i = 0; i < ARRAY_SIZE(ncsi_cmd_handlers); i++) {
> > -           if (ncsi_cmd_handlers[i].type == nca->type) {
> > +         if (ncsi_cmd_handlers[i].type == type) {
> >                         if (ncsi_cmd_handlers[i].handler)
> >                                 nch = &ncsi_cmd_handlers[i];
> >                         else
> >
> 
> So in this case NCSI_PKT_CMD_OEM would be the default handler for all NC-SI command over netlink  (standard and OEM), correct?
Yes, that is correct. The handler for NCSI_PKT_CMD_OEM command is generic.

> Should we rename this to something like NCSI_PKT_CMD_GENERIC for clarity perhaps?  Do you plan to upstream this patch?  
NCSI_PKT_CMD_OEM is a real command type and it is defined by the NC-SI specific. 
We can add comments to indicate that we use the generic command handler from NCSI_PKT_CMD_OEM command.

Does the change work for you? If so, I will prepare the patch.

> 
> 
> Also do you have local patch to support NCSI_PKT_CMD_PLDM and the PLDM over NC-SI commands defined here (https://www.dmtf.org/sites/default/files/NC-SI_1.2_PLDM_Support_over_RBT_Commands_Proposal.pdf)?
> If not I can send my local changes - but I think we can use the same NCSI_PKT_CMD_OEM handler to transport PLDM payload over NC-SI.
> What do you think?
No, I don't have any change currently to support these commands. It should be very similar to NCSI_PKT_CMD_OEM handler with some minor modification.

> 
> (CC Deepak as I think once this is in place we can use pldmtool to send basic PLDM payloads over NC-SI)
> 
> Regards,
> -Ben

Thanks,
Justin


^ permalink raw reply

* [PATCH v1] ocfs2/dlm: Move BITS_TO_BYTES() to bitops.h for wider use
From: Andy Shevchenko @ 2019-08-20 16:31 UTC (permalink / raw)
  To: Mark Fasheh, Joel Becker, Joseph Qi, ocfs2-devel, Ariel Elior,
	Sudarsana Kalluru, GR-everest-linux-l2, David S. Miller, netdev,
	Colin Ian King
  Cc: Andy Shevchenko

There are users already and will be more of BITS_TO_BYTES() macro.
Move it to bitops.h for wider use.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h | 1 -
 fs/ocfs2/dlm/dlmcommon.h                         | 4 ----
 include/linux/bitops.h                           | 1 +
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h
index 066765fbef06..0a59a09ef82f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h
@@ -296,7 +296,6 @@ static inline void bnx2x_dcb_config_qm(struct bnx2x *bp, enum cos_mode mode,
  *    possible, the driver should only write the valid vnics into the internal
  *    ram according to the appropriate port mode.
  */
-#define BITS_TO_BYTES(x) ((x)/8)
 
 /* CMNG constants, as derived from system spec calculations */
 
diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index aaf24548b02a..0463dce65bb2 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -688,10 +688,6 @@ struct dlm_begin_reco
 	__be32 pad2;
 };
 
-
-#define BITS_PER_BYTE 8
-#define BITS_TO_BYTES(bits) (((bits)+BITS_PER_BYTE-1)/BITS_PER_BYTE)
-
 struct dlm_query_join_request
 {
 	u8 node_idx;
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index cf074bce3eb3..79d80f5ddf7b 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -5,6 +5,7 @@
 #include <linux/bits.h>
 
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+#define BITS_TO_BYTES(nr)	DIV_ROUND_UP(nr, BITS_PER_BYTE)
 #define BITS_TO_LONGS(nr)	DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
 
 extern unsigned int __sw_hweight8(unsigned int w);
-- 
2.23.0.rc1


^ permalink raw reply related

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Cornelia Huck @ 2019-08-20 16:31 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Christophe de Dinechin, Alex Williamson, Jiri Pirko,
	David S . Miller, Kirti Wankhede, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866EBB51F7019F2E3D9918CD1AB0@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Tue, 20 Aug 2019 11:25:05 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Christophe de Dinechin <christophe.de.dinechin@gmail.com>
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > 
> > 
> > Parav Pandit writes:
> >   
> > > + Dave.
> > >
> > > Hi Jiri, Dave, Alex, Kirti, Cornelia,
> > >
> > > Please provide your feedback on it, how shall we proceed?
> > >
> > > Hence, I would like to discuss below options.
> > >
> > > Option-1: mdev index
> > > Introduce an optional mdev index/handle as u32 during mdev create time.
> > > User passes mdev index/handle as input.
> > >
> > > phys_port_name=mIndex=m%u
> > > mdev_index will be available in sysfs as mdev attribute for udev to name the  
> > mdev's netdev.  
> > >
> > > example mdev create command:
> > > UUID=$(uuidgen)
> > > echo $UUID index=10 >
> > > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > > example netdevs:
> > > repnetdev=ens2f0_m10	/*ens2f0 is parent PF's netdevice */
> > > mdev_netdev=enm10
> > >
> > > Pros:
> > > 1. mdevctl and any other existing tools are unaffected.
> > > 2. netdev stack, ovs and other switching platforms are unaffected.
> > > 3. achieves unique phys_port_name for representor netdev 4. achieves
> > > unique mdev eth netdev name for the mdev using udev/systemd extension.
> > > 5. Aligns well with mdev and netdev subsystem and similar to existing sriov  
> > bdf's.  
> > >
> > > Option-2: shorter mdev name
> > > Extend mdev to have shorter mdev device name in addition to UUID.
> > > such as 'foo', 'bar'.
> > > Mdev will continue to have UUID.

I fail to understand how 'uses uuid' and 'allow shorter device name'
are supposed to play together?

> > > phys_port_name=mdev_name
> > >
> > > Pros:
> > > 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> > > It is common practice to upgrade iproute2 package along with the kernel.
> > > Similar practice to be done with mdevctl.
> > > 2. Newer users of mdevctl who wants to work with non_UUID names, will use  
> > newer mdevctl/tools.  
> > > Cons:
> > > 1. Dual naming scheme of mdev might affect some of the existing tools.
> > > It's unclear how/if it actually affects.
> > > mdevctl [2] is very recently developed and can be enhanced for dual naming  
> > scheme.  

The main problem is not tools we know about (i.e. mdevctl), but those we
don't know about.

IOW, this (and the IFNAMESIZ change, which seems even worse) are the
options I would not want at all.

> > >
> > > Option-3: mdev uuid alias
> > > Instead of shorter mdev name or mdev index, have alpha-numeric name  
> > alias.  
> > > Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> > > example mdev create command:
> > > UUID=$(uuidgen)
> > > echo $UUID alias=foo >
> > > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > > example netdevs:
> > > examle netdevs:
> > > repnetdev = ens2f0_mfoo
> > > mdev_netdev=enmfoo
> > >
> > > Pros:
> > > 1. All same as option-1.
> > > 2. Doesn't affect existing mdev naming scheme.
> > > Cons:
> > > 1. Index scheme of option-1 is better which can number large number of  
> > mdevs with fewer characters, simplifying the management tool.
> > 
> > I believe that Alex pointed out another "Cons" to all three options, which is that
> > it forces user-space to resolve potential race conditions when creating an index
> > or short name or alias.
> >   
> This race condition exists for at least two subsystems that I know of, i.e. netdev and rdma.
> If a device with a given name exists, subsystem returns error.
> When user space gets error code EEXIST, and it can picks up different identifier(s).

If you decouple device creation and setting the alias/index, you make
the issue visible and thus much more manageable.

> 
> > Also, what happens if `index=10` is not provided on the command-line?
> > Does that make the device unusable for your purpose?  
> Yes, it is unusable to an extent.
> Currently we have DEVLINK_PORT_FLAVOUR_PCI_VF in include/uapi/linux/devlink.h
> Similar to it, we need to have DEVLINK_PORT_FLAVOUR_MDEV for mdev eswitch ports.
> This port flavour needs to generate phys_port_name(). This should be user parameter driven.
> Because representor netdevice name is generated based on this parameter.

I'm also unsure how the extra parameter is supposed to work; writing it
to the create attribute does not sound right.

mdevctl supports setting additional parameters on an already created
device (see the examples provided for vfio-ap), so going that route
would actually work out of the box from the tooling side.

What you would need is some kind of synchronization/locking to make
sure that you only link up to the other device after the extra
attribute has been set and that you don't allow to change it as long as
it is associated with the other side. I do not know enough about the
actual devices to suggest something here; if you need userspace
cooperation, maybe uevents would be an option.

^ permalink raw reply

* Re: [PATCH 2/6] dt-bindings: net: sun8i-a83t-emac: Add phy-io-supply property
From: Ondřej Jirman @ 2019-08-20 16:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: David S. Miller, Mark Rutland, Maxime Ripard, Chen-Yu Tsai,
	Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
	netdev, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel@vger.kernel.org, linux-stm32
In-Reply-To: <CAL_JsqLHeA6A_+ZgmCzC42Y6yJrEq6+D3vKn8ETh2D7LJ+1_-g@mail.gmail.com>

On Tue, Aug 20, 2019 at 11:20:22AM -0500, Rob Herring wrote:
> On Tue, Aug 20, 2019 at 9:53 AM <megous@megous.com> wrote:
> >
> > From: Ondrej Jirman <megous@megous.com>
> >
> > Some PHYs require separate power supply for I/O pins in some modes
> > of operation. Add phy-io-supply property, to allow enabling this
> > power supply.
> 
> Perhaps since this is new, such phys should have *-supply in their nodes.

Yes, I just don't understand, since external ethernet phys are so common,
and they require power, how there's no fairly generic mechanism for this
already in the PHY subsystem, or somewhere?

It looks like other ethernet mac drivers also implement supplies on phys
on the EMAC nodes. Just grep phy-supply through dt-bindings/net.

Historical reasons, or am I missing something? It almost seems like I must
be missing something, since putting these properties to phy nodes
seems so obvious.

thank you and regards,
	Ondrej

> >
> > Signed-off-by: Ondrej Jirman <megous@megous.com>
> > ---
> >  .../devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml    | 4 ++++
> >  1 file changed, 4 insertions(+)

^ permalink raw reply

* Re: [PATCH mlx5-next 0/5] Mellanox, Updates for mlx5-next branch 2019-08-15
From: Doug Ledford @ 2019-08-20 16:44 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On Thu, 2019-08-15 at 19:46 +0000, Saeed Mahameed wrote:
> Hi All,
> 
> This series includes misc updates for mlx5-next shared branch.
> 
> mlx5 HW spec and bits updates:
> 1) Aya exposes IP-in-IP capability in mlx5_core.
> 2) Maxim exposes lag tx port affinity capabilities.
> 3) Moshe adds VNIC_ENV internal rq counter bits.
> 
> Misc updates:
> 4) Saeed, two compiler warnings cleanups
> 
> In case of no objection this series will be applied to mlx5-next
> branch
> and sent later as pull request to both rdma-next and net-next
> branches.
> 
> Thanks,
> Saeed.

Series looks fine to me.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Mark Brown @ 2019-08-20 16:49 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <CA+h21hr653oqOPxoJKWkP9ZhTywNR8EBjWV7U9LHwPRz=PJXsw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

On Tue, Aug 20, 2019 at 04:48:39PM +0300, Vladimir Oltean wrote:
> On Tue, 20 Aug 2019 at 15:55, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Aug 16, 2019 at 05:05:53PM +0300, Vladimir Oltean wrote:

> > > Maybe the SPI master driver should just report what sort of
> > > snapshotting capability it can offer, ranging from none (default
> > > unless otherwise specified), to transfer-level (DMA style) or
> > > byte-level.

> > That does then have the consequence that the majority of controllers
> > aren't going to be usable with the API which isn't great.

> Can we continue this discussion on this thread:
> https://www.spinics.net/lists/netdev/msg593772.html
> The whole point there is that if there's nothing that the driver can
> do, the SPI core will take the timestamps and record their (bad)
> precision.

I'm not on that thread.

> > I'm not 100% clear what the problem you're trying to solve is, or if
> > it's a sensible problem to try to solve for that matter.

> The problem can simply be summarized as: you're trying to read a clock
> over SPI, but there's so much timing jitter in you doing that, that
> you have a high degree of uncertainty in the actual precision of the
> readout you took.

That doesn't seem like a great idea...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/5] bpf: list BTF objects loaded on system
From: Alexei Starovoitov @ 2019-08-20 16:55 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

On Tue, Aug 20, 2019 at 2:32 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> Hi,
> This set adds a new command BPF_BTF_GET_NEXT_ID to the bpf() system call,
> adds the relevant API function in libbpf, and uses it in bpftool to list
> all BTF objects loaded on the system (and to dump the ids of maps and
> programs associated with them, if any).
>
> The main motivation of listing BTF objects is introspection and debugging
> purposes. By getting BPF program and map information, it should already be
> possible to list all BTF objects associated to at least one map or one
> program. But there may be unattached BTF objects, held by a file descriptor
> from a user space process only, and we may want to list them too.
>
> As a side note, it also turned useful for examining the BTF objects
> attached to offloaded programs, which would not show in program information
> because the BTF id is not copied when retrieving such info. A fix is in
> progress on that side.
>
> v2:
> - Rebase patch with new libbpf function on top of Andrii's changes
>   regarding libbpf versioning.

Applied. Thanks

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Hubert Feurstein @ 2019-08-20 16:56 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: Andrew Lunn, netdev, lkml, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190820154005.GM891@localhost>

Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar
<mlichvar@redhat.com>:
>
> On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > > - take a second "post" system timestamp after the completion
> >
> > For this hardware, completion is an interrupt, which has a lot of
> > jitter on it. But this hardware is odd, in that it uses an
> > interrupt. Every other MDIO bus controller uses polled IO, with an
> > mdelay(10) or similar between each poll. So the jitter is going to be
> > much larger.
>
> I think a large jitter is ok in this case. We just need to timestamp
> something that we know for sure happened after the PHC timestamp. It
> should have no impact on the offset and its stability, just the
> reported delay. A test with phc2sys should be able to confirm that.
> phc2sys selects the measurement with the shortest delay, which has
> least uncertainty. I'd say that applies to both interrupt and polling.
>
> If it is difficult to specify the minimum interrupt delay, I'd still
> prefer an overly pessimistic interval assuming a zero delay.
>
Currently I do not see the benefit from this. The original intention was to
compensate for the remaining offset as good as possible. The current code
of phc2sys uses the delay only for the filtering of the measurement record
with the shortest delay and for reporting and statistics. Why not simple shift
the timestamps with the offset to the point where we expect the PHC timestamp
to be captured, and we have a very good result compared to where we came
from.

Hubert

^ permalink raw reply

* Re: [PATCH 2/6] dt-bindings: net: sun8i-a83t-emac: Add phy-io-supply property
From: Rob Herring @ 2019-08-20 16:57 UTC (permalink / raw)
  To: Rob Herring, David S. Miller, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, netdev, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel@vger.kernel.org, linux-stm32
In-Reply-To: <20190820163433.sr4lvjxmmhjtbtcb@core.my.home>

On Tue, Aug 20, 2019 at 11:34 AM Ondřej Jirman <megous@megous.com> wrote:
>
> On Tue, Aug 20, 2019 at 11:20:22AM -0500, Rob Herring wrote:
> > On Tue, Aug 20, 2019 at 9:53 AM <megous@megous.com> wrote:
> > >
> > > From: Ondrej Jirman <megous@megous.com>
> > >
> > > Some PHYs require separate power supply for I/O pins in some modes
> > > of operation. Add phy-io-supply property, to allow enabling this
> > > power supply.
> >
> > Perhaps since this is new, such phys should have *-supply in their nodes.
>
> Yes, I just don't understand, since external ethernet phys are so common,
> and they require power, how there's no fairly generic mechanism for this
> already in the PHY subsystem, or somewhere?

Because generic mechanisms for this don't work. For example, what
happens when the 2 supplies need to be turned on in a certain order
and with certain timings? And then add in reset or control lines into
the mix... You can see in the bindings we already have some of that.

> It looks like other ethernet mac drivers also implement supplies on phys
> on the EMAC nodes. Just grep phy-supply through dt-bindings/net.
>
> Historical reasons, or am I missing something? It almost seems like I must
> be missing something, since putting these properties to phy nodes
> seems so obvious.

Things get added one by one and one new property isn't that
controversial. We've generally learned the lesson and avoid this
pattern now, but ethernet phys are one of the older bindings.

Rob

^ permalink raw reply

* Re: [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver
From: Andrew Lunn @ 2019-08-20 16:57 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Mark Brown, Hubert Feurstein, mlichvar, Richard Cochran,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <CA+h21hr4UcoJK7upNJjG0ibtX7CkF=akxVdrb--1AJn6-z=sUQ@mail.gmail.com>

> - Ethernet has support for hardware timestamping

True, but not all drivers support it.

Marvell switches are often combined with Marvell MACs. None of the
Marvell MAC drivers, mv643xx, mvneta, mvpp2 or octeontx2 support
hardware timestamping. My guess is, the hardware probably supports it,
but nobody has taken the time to writing the driver code. FEC does
have PTP support, but what does it look like in general? Are Marvell
drives the exception, or the norm?

What we have been talking about in this thread, adding timestamp calls
at various places, is simple, compared to adding driver code to a
number of MAC drivers. We can get a lot of 'bang for our buck' with
time stamps, it is easy to copy to other drivers, you don't need a
good knowledge of the hardware, datasheet, etc.

So if we can get this accepted, we should try to do it. You can always
come back later and add your full hardware solution.

   Andrew

^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-20 17:19 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, David S . Miller, Kirti Wankhede, Cornelia Huck,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia@nvidia.com, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48668B6221E477A873688CDBD1AB0@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Tue, 20 Aug 2019 08:58:02 +0000
Parav Pandit <parav@mellanox.com> wrote:

> + Dave.
> 
> Hi Jiri, Dave, Alex, Kirti, Cornelia,
> 
> Please provide your feedback on it, how shall we proceed?
> 
> Short summary of requirements.
> For a given mdev (mediated device [1]), there is one representor
> netdevice and devlink port in switchdev mode (similar to SR-IOV VF),
> And there is one netdevice for the actual mdev when mdev is probed.
> 
> (a) representor netdev and devlink port should be able derive
> phys_port_name(). So that representor netdev name can be built
> deterministically across reboots.
> 
> (b) for mdev's netdevice, mdev's device should have an attribute.
> This attribute can be used by udev rules/systemd or something else to
> rename netdev name deterministically.
> 
> (c) IFNAMSIZ of 16 bytes is too small to fit whole UUID.
> A simple grep IFNAMSIZ in stack hints hundreds of users of IFNAMSIZ
> in drivers, uapi, netlink, boot config area and more. Changing
> IFNAMSIZ for a mdev bus doesn't really look reasonable option to me.

How many characters do we really have to work with?  Your examples
below prepend various characters, ex. option-1 results in ens2f0_m10 or
enm10.  Do the extra 8 or 3 characters in these count against IFNAMSIZ?

> Hence, I would like to discuss below options.
> 
> Option-1: mdev index
> Introduce an optional mdev index/handle as u32 during mdev create
> time. User passes mdev index/handle as input.
> 
> phys_port_name=mIndex=m%u
> mdev_index will be available in sysfs as mdev attribute for udev to
> name the mdev's netdev.
> 
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID index=10
> > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create

Nit, IIRC previous discussions of additional parameters used comma
separators, ex. echo $UUID,index=10 >...

> > example netdevs:
> repnetdev=ens2f0_m10	/*ens2f0 is parent PF's netdevice */

Is the parent really relevant in the name?  Tools like mdevctl are
meant to provide persistence, creating the same mdev devices on the
same parent, but that's simply the easiest policy decision.  We can
also imagine that multiple parent devices might support a specified
mdev type and policies factoring in proximity, load-balancing, power
consumption, etc might be weighed such that we really don't want to
promote userspace creating dependencies on the parent association.

> mdev_netdev=enm10
> 
> Pros:
> 1. mdevctl and any other existing tools are unaffected.
> 2. netdev stack, ovs and other switching platforms are unaffected.
> 3. achieves unique phys_port_name for representor netdev
> 4. achieves unique mdev eth netdev name for the mdev using
> udev/systemd extension. 5. Aligns well with mdev and netdev subsystem
> and similar to existing sriov bdf's.

A user provided index seems strange to me.  It's not really an index,
just a user specified instance number.  Presumably you have the user
providing this because if it really were an index, then the value
depends on the creation order and persistence is lost.  Now the user
needs to both avoid uuid collision as well as "index" number
collision.  The uuid namespace is large enough to mostly ignore this,
but this is not.  This seems like a burden.

> Option-2: shorter mdev name
> Extend mdev to have shorter mdev device name in addition to UUID.
> such as 'foo', 'bar'.
> Mdev will continue to have UUID.
> phys_port_name=mdev_name
> 
> Pros:
> 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> It is common practice to upgrade iproute2 package along with the
> kernel. Similar practice to be done with mdevctl.
> 2. Newer users of mdevctl who wants to work with non_UUID names, will
> use newer mdevctl/tools. Cons:
> 1. Dual naming scheme of mdev might affect some of the existing tools.
> It's unclear how/if it actually affects.
> mdevctl [2] is very recently developed and can be enhanced for dual
> naming scheme.

I think we've already nak'ed this one, the device namespace becomes
meaningless if the name becomes just a string where a uuid might be an
example string.  mdevs are named by uuid.
 
> Option-3: mdev uuid alias
> Instead of shorter mdev name or mdev index, have alpha-numeric name
> alias. Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID alias=foo
> > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > example netdevs:
> examle netdevs:
> repnetdev = ens2f0_mfoo
> mdev_netdev=enmfoo
> 
> Pros:
> 1. All same as option-1.
> 2. Doesn't affect existing mdev naming scheme.
> Cons:
> 1. Index scheme of option-1 is better which can number large number
> of mdevs with fewer characters, simplifying the management tool.

No better than option-1, simply a larger secondary namespace, but still
requires the user to come up with two independent names for the device.

> Option-4: extend IFNAMESZ to be 64 bytes Extended IFNAMESZ from 16 to
> 64 bytes phys_port_name=mdev_UUID_string mdev_netdev_name=enmUUID
> 
> Pros:
> 1. Doesn't require mdev extension
> Cons:
> 1. netdev stack, driver, uapi, user space, boot config wide changes
> 2. Possible user space extensions who assumed name size being 16
> characters 3. Single device type demands namesize change for all
> netdev types

What about an alias based on the uuid?  For example, we use 160-bit
sha1s daily with git (uuids are only 128-bit), but we generally don't
reference git commits with the full 20 character string.  Generally 12
characters is recommended to avoid ambiguity.  Could mdev automatically
create an abbreviated sha1 alias for the device?  If so, how many
characters should we use and what do we do on collision?  The colliding
device could add enough alias characters to disambiguate (we likely
couldn't re-alias the existing device to disambiguate, but I'm not sure
it matters, userspace has sysfs to associate aliases).  Ex.

UUID=$(uuidgen)
ALIAS=$(echo $UUID | sha1sum | colrm 13)

Since there seems to be some prefix overhead, as I ask about above in
how many characters we actually have to work with in IFNAMESZ, maybe we
start with 8 characters (matching your "index" namespace) and expand as
necessary for disambiguation.  If we can eliminate overhead in
IFNAMESZ, let's start with 12.  Thanks,

Alex

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox