Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/6] dt-bindings: net: sun8i-a83t-emac: Add phy-io-supply property
From: Rob Herring @ 2019-08-20 16:20 UTC (permalink / raw)
  To: Ondrej Jirman
  Cc: David S. Miller, Mark Rutland, Maxime Ripard, Chen-Yu Tsai,
	Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
	netdev, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel@vger.kernel.org, linux-stm32
In-Reply-To: <20190820145343.29108-3-megous@megous.com>

On Tue, Aug 20, 2019 at 9:53 AM <megous@megous.com> wrote:
>
> From: Ondrej Jirman <megous@megous.com>
>
> Some PHYs require separate power supply for I/O pins in some modes
> of operation. Add phy-io-supply property, to allow enabling this
> power supply.

Perhaps since this is new, such phys should have *-supply in their nodes.

>
> Signed-off-by: Ondrej Jirman <megous@megous.com>
> ---
>  .../devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml    | 4 ++++
>  1 file changed, 4 insertions(+)

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Ondřej Jirman @ 2019-08-20 16:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820155744.GN29991@lunn.ch>

Hi,

On Tue, Aug 20, 2019 at 05:57:44PM +0200, Andrew Lunn wrote:
> On Tue, Aug 20, 2019 at 05:47:14PM +0200, Ondřej Jirman wrote:
> > Hi Andrew,
> > 
> > On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> > > On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > > > From: Ondrej Jirman <megous@megous.com>
> > > > 
> > > > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > > > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > > > calls.
> > > 
> > > Hi Ondrej
> > > 
> > > What do you mean by a dummy supply? I'm just trying to make sure you
> > > are not breaking backwards compatibility.
> > 
> > Sorry, I mean dummy regulator. See:
> > 
> > https://elixir.bootlin.com/linux/latest/source/drivers/regulator/core.c#L1874
> > 
> > On systems that use DT (i.e. have_full_constraints() == true), when the
> > regulator is not found (ENODEV, not specified in DT), regulator_get will return
> > a fake dummy regulator that can be enabled/disabled, but doesn't do anything
> > real.
> 
> Hi Ondrej
> 
> But we also gain a new warning:
> 
> 	dev_warn(dev,
> 		 "%s supply %s not found, using dummy regulator\n",
> 	         devname, id);
> 
> This regulator is clearly optional, so there should not be a warning.
> 
> Maybe you can add a new get_type, OPTIONAL_GET, which does not issue
> the warning, but does give back a dummy regulator.

We already had a info message. See my other e-mail with the dmesg output.

IMO, that warning is useful during development, and more informative than the
previous one.

regards,
	o.

> Thanks
> 	Andrew
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 1/6] dt-bindings: net: sun8i-a83t-emac: Add phy-supply property
From: Rob Herring @ 2019-08-20 16:18 UTC (permalink / raw)
  To: Ondrej Jirman
  Cc: David S. Miller, Mark Rutland, Maxime Ripard, Chen-Yu Tsai,
	Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
	netdev, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel@vger.kernel.org, linux-stm32
In-Reply-To: <20190820145343.29108-2-megous@megous.com>

On Tue, Aug 20, 2019 at 9:53 AM <megous@megous.com> wrote:
>
> From: Ondrej Jirman <megous@megous.com>
>
> This is already supported by the driver, but is missing from the
> bindings.

Really, the supply for the phy should be in the phy's node...

>
> Signed-off-by: Ondrej Jirman <megous@megous.com>
> ---
>  .../devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml    | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Leonardo Bras @ 2019-08-20 16:15 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, netdev, linux-kernel,
	Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller
In-Reply-To: <20190820053607.GL2588@breakpoint.cc>

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

On Tue, 2019-08-20 at 07:36 +0200, Florian Westphal wrote:
> Wouldn't fib_netdev.c have the same problem?
Probably, but I haven't hit this issue yet.

> If so, might be better to place this test in both
> nft_fib6_eval_type and nft_fib6_eval.
I think that is possible, and not very hard to do.

But in my humble viewpoint, it looks like it's nft_fib_inet_eval() and
nft_fib_netdev_eval() have the responsibility to choose a valid
protocol or drop the package. 
I am not sure if it would be a good move to transfer this
responsibility to nft_fib6_eval_type() and nft_fib6_eval(), so I would
rather add the same test to nft_fib_netdev_eval().

Does it make sense?

Thanks for the feedback!

Leonardo Bras


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: flow_offload: mangle 128-bit packet field with one action
From: Edward Cree @ 2019-08-20 16:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, vladbu
In-Reply-To: <20190820144453.ckme6oj2c4hmofhu@salvia>

On 20/08/2019 15:44, Pablo Neira Ayuso wrote:
> It looks to me this limitation is coming from tc pedit.
>
> Four actions to mangle an IPv6 address consume more memory when making
> the translation, and if you expect a lot of rules.
Your change means that now every pedit uses four hw entries, even if it
 was only meant to be a 32-bit mangle.  Host memory used to keep track of
 the pedit actions is cheap, hw entries in pedit tables are not.  Nor is
 driver implementation complexity.
NAK.

^ permalink raw reply

* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: Ilya Maximets @ 2019-08-20 15:58 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu
In-Reply-To: <CAKgT0Udn0D0_f=SOH2wpBRWV_u4rb1Qe2h7gguXnRNzJ_VkRzg@mail.gmail.com>

On 20.08.2019 18:35, Alexander Duyck wrote:
> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>>
>> Tx code doesn't clear the descriptor status after cleaning.
>> So, if the budget is larger than number of used elems in a ring, some
>> descriptors will be accounted twice and xsk_umem_complete_tx will move
>> prod_tail far beyond the prod_head breaking the comletion queue ring.
>>
>> Fix that by limiting the number of descriptors to clean by the number
>> of used descriptors in the tx ring.
>>
>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> 
> I'm not sure this is the best way to go. My preference would be to
> have something in the ring that would prevent us from racing which I
> don't think this really addresses. I am pretty sure this code is safe
> on x86 but I would be worried about weak ordered systems such as
> PowerPC.
> 
> It might make sense to look at adding the eop_desc logic like we have
> in the regular path with a proper barrier before we write it and after
> we read it. So for example we could hold of on writing the bytecount
> value until the end of an iteration and call smp_wmb before we write
> it. Then on the cleanup we could read it and if it is non-zero we take
> an smp_rmb before proceeding further to process the Tx descriptor and
> clearing the value. Otherwise this code is going to just keep popping
> up with issues.

But, unlike regular case, xdp zero-copy xmit and clean for particular
tx ring always happens in the same NAPI context and even on the same
CPU core.

I saw the 'eop_desc' manipulations in regular case and yes, we could
use 'next_to_watch' field just as a flag of descriptor existence,
but it seems unnecessarily complicated. Am I missing something?

> 
>> ---
>>
>> Not tested yet because of lack of available hardware.
>> So, testing is very welcome.
>>
>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      | 10 ++++++++++
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +-----------
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 ++++--
>>  3 files changed, 15 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> index 39e73ad60352..0befcef46e80 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
>>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>>  }
>>
>> +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring)
>> +{
>> +       unsigned int head, tail;
>> +
>> +       head = ring->next_to_clean;
>> +       tail = ring->next_to_use;
>> +
>> +       return ((head <= tail) ? tail : tail + ring->count) - head;
>> +}
>> +
>>  #define IXGBE_RX_DESC(R, i)        \
>>         (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
>>  #define IXGBE_TX_DESC(R, i)        \
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index 7882148abb43..d417237857d8 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
>>         return ring->stats.packets;
>>  }
>>
>> -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
>> -{
>> -       unsigned int head, tail;
>> -
>> -       head = ring->next_to_clean;
>> -       tail = ring->next_to_use;
>> -
>> -       return ((head <= tail) ? tail : tail + ring->count) - head;
>> -}
>> -
>>  static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring)
>>  {
>>         u32 tx_done = ixgbe_get_tx_completed(tx_ring);
>>         u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
>> -       u32 tx_pending = ixgbe_get_tx_pending(tx_ring);
>> +       u32 tx_pending = ixgbe_desc_used(tx_ring);
>>
>>         clear_check_for_tx_hang(tx_ring);
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 6b609553329f..7702efed356a 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>         u32 i = tx_ring->next_to_clean, xsk_frames = 0;
>>         unsigned int budget = q_vector->tx.work_limit;
>>         struct xdp_umem *umem = tx_ring->xsk_umem;
>> +       u32 used_descs = ixgbe_desc_used(tx_ring);
>>         union ixgbe_adv_tx_desc *tx_desc;
>>         struct ixgbe_tx_buffer *tx_bi;
>>         bool xmit_done;
>> @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>         tx_desc = IXGBE_TX_DESC(tx_ring, i);
>>         i -= tx_ring->count;
>>
>> -       do {
>> +       while (likely(budget && used_descs)) {
>>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>>                         break;
>>
>> @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>
>>                 /* update budget accounting */
>>                 budget--;
>> -       } while (likely(budget));
>> +               used_descs--;
>> +       }
>>
>>         i += tx_ring->count;
>>         tx_ring->next_to_clean = i;
>> --
>> 2.17.1
>>
> 
> 

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Andrew Lunn @ 2019-08-20 15:57 UTC (permalink / raw)
  To: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820154714.2rt4ctovil5ol3u2@core.my.home>

On Tue, Aug 20, 2019 at 05:47:14PM +0200, Ondřej Jirman wrote:
> Hi Andrew,
> 
> On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> > On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > > From: Ondrej Jirman <megous@megous.com>
> > > 
> > > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > > calls.
> > 
> > Hi Ondrej
> > 
> > What do you mean by a dummy supply? I'm just trying to make sure you
> > are not breaking backwards compatibility.
> 
> Sorry, I mean dummy regulator. See:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/regulator/core.c#L1874
> 
> On systems that use DT (i.e. have_full_constraints() == true), when the
> regulator is not found (ENODEV, not specified in DT), regulator_get will return
> a fake dummy regulator that can be enabled/disabled, but doesn't do anything
> real.

Hi Ondrej

But we also gain a new warning:

	dev_warn(dev,
		 "%s supply %s not found, using dummy regulator\n",
	         devname, id);

This regulator is clearly optional, so there should not be a warning.

Maybe you can add a new get_type, OPTIONAL_GET, which does not issue
the warning, but does give back a dummy regulator.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver
From: Vladimir Oltean @ 2019-08-20 15:57 UTC (permalink / raw)
  To: Mark Brown, Hubert Feurstein, mlichvar, Richard Cochran,
	Andrew Lunn, Florian Fainelli
  Cc: linux-spi, netdev
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

On Sun, 18 Aug 2019 at 21:26, Vladimir Oltean <olteanv@gmail.com> wrote:
>
> This patchset proposes an interface from the SPI subsystem for
> software timestamping SPI transfers. There is a default implementation
> provided in the core, as well as a mechanism for SPI slave drivers to
> check which byte was in fact timestamped post-facto. The patchset also
> adds the first user of this interface (the NXP DSPI driver in TCFQ mode).
>
> The interface is somewhat similar to Hubert Feurstein's proposal for the
> MDIO subsystem: https://lkml.org/lkml/2019/8/16/638
>
> Original cover letter below. Also provided at the end some results with
> an extra test (J - phc2sys using the timestamps taken by the SPI core).
>
> ===========================================================
>
> Continuing the discussion created by Hubert Feurstein around the
> mv88e6xxx driver for MDIO-controlled switches
> (https://lkml.org/lkml/2019/8/2/1364), this patchset takes a similar
> approach for the NXP LS1021A-TSN board, which has a SPI-controlled DSA
> switch (SJA1105).
>
> The patchset is motivated by some experiments done with a logic
> analyzer, trying to understand the source of latency (and especially of
> the jitter). SJA1105 SPI messages for reading the PTP clock are 12 bytes
> in length: 4 for the SPI header and 8 for the timestamp. When looking at
> the messages with a scope, there's jitter basically everywhere: between
> bits of a frame and between frames in a transfer. The inter-bit jitter
> is hardware and impacts us to a lesser extend (is smaller and caused by
> the PVT stability of the oscillators, PLLs, etc). We will focus on the
> latency between consecutive SPI frames within a 12-byte transfer.
>
> As a preface, revisions of the DSPI controller IP are integrated in many
> Freescale/NXP devices. As a result, the driver has 3 modes of operation:
> - TCFQ (Transfer Complete Flag mode): The controller signals software
>   that data has been sent/received after each individual word.
> - EOQ (End of Queue mode): The driver can implement batching by making
>   use of the controller's 4-word deep FIFO.
> - DMA (Direct Memory Access mode): The SPI controller's FIFO is no
>   longer in direct interaction with the driver, but is used to trigger
>   the RX and TX channels of the eDMA module on the SoC.
>
> In LS1021A, the driver works in the least efficient mode of the 3
> (TCFQ). There is a well-known errata that the DSPI controller is broken
> in conjunction with the eDMA module. As for the EOQ mode, I have tried
> unsuccessfully for a few days to make use of the 4 entry FIFO, and the
> hardware simply fails to reliably acknowledge the transmission when the
> FIFO gets full. So it looks like we're stuck with the TCFQ mode.
>
> The problem with phc2sys on the LS1021A-TSN board is that in order for
> the gettime64() call to complete on the sja1105, the system has to
> service 12 IRQs. Intuitively that is excessive and is the main source of
> jitter, but let's not get ahead of ourselves.
>
> An outline of the experiments that were done (unless otherwise
> mentioned, all of these ran for 120 seconds):
>
> A. First I have measured the (poor) performance of phc2sys under current
>    conditions. (DSPI driver in IRQ mode, no PTP system timestamping)
>
>    offset: min -53310 max 16107 mean -1737.18 std dev 11444.3
>    delay: min 163680 max 237360 mean 201149 std dev 22446.6
>    lost servo lock 1 times
>
> B. I switched the .gettime64 callback to .gettimex64, snapshotting the
>    PTP system timestamp within the sja1105 driver.
>
>    offset: min -48923 max 64217 mean -904.137 std dev 17358.1
>    delay: min 149600 max 203840 mean 169045 std dev 17993.3
>    lost servo lock 8 times
>
> C. I patched "struct spi_transfer" to contain the PTP system timestamp,
>    and from the sja1105 driver, I passed this structure to be
>    snapshotted by the SPI controller's driver (spi-fsl-dspi). This is
>    the "transfer-level" snapshot.
>
>    offset: min -64979 max 38979 mean -416.197 std dev 15367.9
>    delay: min 125120 max 168320 mean 150286 std dev 17675.3
>    lost servo lock 10 times
>
> D. I changed the placement of the transfer snapshotting within the DSPI
>    driver, from "transfer-level" to "byte-level".
>
>    offset: min -9021 max 7149 mean -0.418803 std dev 3529.81
>    delay: min 7840 max 23920 mean 14493.7 std dev 5982.17
>    lost servo lock 0 times
>
> E. I moved the DSPI driver to poll mode. I went back to collecting the
>    PTP system timestamps from the sja1105 driver (same as B).
>
>    offset: min -4199 max 46643 mean 418.214 std dev 4554.01
>    delay: min 84000 max 194000 mean 99463.2 std dev 12936.5
>    lost servo lock 1 times
>
> F. Transfer-level snapshotting in the DSPI driver (same as C), but in
>    poll mode.
>
>    offset: min -24244 max 1115 mean -230.478 std dev 2297.28
>    delay: min 69440 max 119040 mean 70312.9 std dev 8065.34
>    lost servo lock 1 times
>
> G. Byte-level snapshotting (same as D) but in poll mode.
>
>    offset: min -314 max 288 mean -2.48718 std dev 118.045
>    delay: min 4880 max 6000 mean 5118.63 std dev 507.258
>    lost servo lock 0 times
>
>    This seemed suspiciously good to me, so I let it run for longer
>    (58 minutes):
>
>    offset: min -26251 max 16416 mean -21.8672 std dev 863.416
>    delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
>    lost servo lock 3 times
>
> H. Transfer-level snapshotting (same as F), but with IRQs disabled.
>    This ran for 86 minutes.
>
>    offset: min -1927 max 1843 mean -0.209203 std dev 529.398
>    delay: min 85440 max 93680 mean 88245 std dev 1454.71
>    lost servo lock 0 times
>
> I. Byte-level snapshotting (same as G), but with IRQs disabled.
>    This ran for 102 minutes.
>
>    offset: min -378 max 381 mean -0.0083089 std dev 101.495
>    delay: min 4720 max 5920 mean 5129.38 std dev 154.899
>    lost servo lock 0 times
>
> J. Default snapshotting taken by the SPI core, with the DSPI driver
>    running in poll mode, IRQs enabled. This ran for 274 minutes.
>
>    offset: min -42568 max 44576 mean 2.91646 std dev 947.467
>    delay: min 58480 max 171040 mean 80750.7 std dev 2001.61
>    lost servo lock 3 times
>
> As a result, this patchset proposes the implementation of scenario I.
> The others were done through temporary patches which are not presented
> here due to the difficulty of presenting a coherent git history without
> resorting to reverts etc. The gist of each experiment should be clear
> though.
>
> The raw data is available for dissection at
> https://drive.google.com/open?id=1r9raU9ZeqOqkqts6Lb-ISf5ubLDLP3wk.
> The logic analyzer captures can be opened with a free-as-in-beer program
> provided by Saleae: https://www.saleae.com/downloads/.
>
> In the capture data one can find the MOSI, SCK SPI signals, as well as a
> debug GPIO which was toggled at the same time as the PTP system
> timestamp was taken, to give the viewer an impression of what the
> software is capturing compared to the actual timing of the SPI transfer.
>
> Attached are also some close-up screenshots of transfers where there is
> a clear and huge delay in-between frames of the same 12-byte SPI
> transfer. As it turns out, these were all caused by the CPU getting
> interrupted by some other IRQ. Approaches H and I are the only ones that
> get rid of these glitches. In theory, the byte-level snapshotting should
> be less vulnerable to an IRQ interrupting the SPI transfer (because the
> time window is much smaller) but as the 58 minutes experiment shows, it
> is not immune.
>
> Vladimir Oltean (5):
>   spi: Use an abbreviated pointer to ctlr->cur_msg in
>     __spi_pump_messages
>   spi: Add a PTP system timestamp to the transfer structure
>   spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
>   spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode
>   spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode
>     transfer
>
>  drivers/spi/spi-fsl-dspi.c | 117 +++++++++++++++++++++++++++++++------
>  drivers/spi/spi.c          |  85 +++++++++++++++++++++++----
>  include/linux/spi/spi.h    |  38 ++++++++++++
>  3 files changed, 210 insertions(+), 30 deletions(-)
>
> --
> 2.17.1
>

To the PTP/DSA people copied,

It's possible that I'm going to get a lot of hate for saying this, but
I think we're all missing the forest for the trees with these
ptp_system_timestamp patches.
I'm going to start off with 4 truisms:
- The best software timestamp is worse than a hardware timestamp
- DSA switches are switches that are connected to their host over Ethernet
- Ethernet has support for hardware timestamping
- The mess of taking precise hardware timestamps is well hidden from the kernel

You might see where I'm going.
Maybe this is all really a DSA-specific problem, and should be treated
as such (kept contained).
The claimed goal is to synchronize the DSA switches' time with phc2sys
to/from something else. The problem is that there is latency for
reading the PHC on a DSA device that is a discrete chip.
What if all we need is just a mini-"phc2sys-over-Ethernet" that runs
on a kernel thread internally to DSA? We say that DSA switches are
"slave" to the "master" netdevice - their PTP clock can be the same.
I am fairly confident that the sja1105 at least can be configured in
hardware to work in this mode. One just needs to enable the CPU port
in its own reachability matrix. None of the switch ports is really a
"CPU port" hardware speaking.
- TX timestamps are taken by installing a management route with a
specified port destination. That destination can be the port the frame
came from (the CPU port) if the above condition is met.
- RX timestamps are taken by the switch for frames matching one of 2
MAC filters. Then a short Ethernet frame containing metadata (RX
timestamp) is created and sent to the CPU port. If I enable RX
timestamping on the CPU port, then every management frame sent from
Linux will also generate an RX timestamp (as well as a TX timestamp,
but that is irrelevant).
I believe something similar should be possible on other hardware as well.

The kernel thread can loop back an Ethernet frame and use the 4
collected timestamps to calculate offset and delay.
The only question is how to manage the sync direction (DSA switch to
master, or vice-versa).
It is assumed that the master netdevice supports hardware timestamping
and has a PHC with lower access jitter. That might be a more common
thing than an MDIO or SPI controller with polled I/O and software
timestamping implemented.

Looking forward to comments. If I'm wrong and we do need to extend the
SPI and MDIO subsystems, then I'd better be wrong in any way we look
at the problem, because the alternative is rather intrusive and
tedious to do right (not to mention, not very reusable).

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Ondřej Jirman @ 2019-08-20 15:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820153939.GL29991@lunn.ch>

On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > From: Ondrej Jirman <megous@megous.com>
> > 
> > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > calls.
> 
> Hi Ondrej
> 
> What do you mean by a dummy supply? I'm just trying to make sure you
> are not breaking backwards compatibility.

I have tested it on Orange Pi PC 2, that uses only phy-supply, but not
phy-io-supply, and the kernel now prints:

[    1.410137] dwmac-sun8i 1c30000.ethernet: 1c30000.ethernet supply phy-io not found, using dummy regulator

I have also tested it on Orange Pi PC, that doesn't use external phy, and
instead of:

[    1.081378] dwmac-sun8i 1c30000.ethernet: No regulator found

The kernel now prints:

[    1.112752] dwmac-sun8i 1c30000.ethernet: 1c30000.ethernet supply phy not found, using dummy regulator
[    1.112814] dwmac-sun8i 1c30000.ethernet: 1c30000.ethernet supply phy-io not found, using dummy regulator

Ethernet works in both cases, so that should cover all existing combinations. :)

regards,
	Ondrej


>      Thanks
> 	Andrew

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Ondřej Jirman @ 2019-08-20 15:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820153939.GL29991@lunn.ch>

Hi Andrew,

On Tue, Aug 20, 2019 at 05:39:39PM +0200, Andrew Lunn wrote:
> On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> > From: Ondrej Jirman <megous@megous.com>
> > 
> > Use devm_regulator_get instead of devm_regulator_get_optional and rely
> > on dummy supply. This avoids NULL checks before regulator_enable/disable
> > calls.
> 
> Hi Ondrej
> 
> What do you mean by a dummy supply? I'm just trying to make sure you
> are not breaking backwards compatibility.

Sorry, I mean dummy regulator. See:

https://elixir.bootlin.com/linux/latest/source/drivers/regulator/core.c#L1874

On systems that use DT (i.e. have_full_constraints() == true), when the
regulator is not found (ENODEV, not specified in DT), regulator_get will return
a fake dummy regulator that can be enabled/disabled, but doesn't do anything
real.

This can be used to avoid NULL checks and make the code simpler.

regards,
	Ondrej

>      Thanks
> 	Andrew

^ permalink raw reply

* Re: [v3] ocelot_ace: fix action of trap
From: Andrew Lunn @ 2019-08-20 15:45 UTC (permalink / raw)
  To: Allan W . Nielsen
  Cc: Yangbo Lu, netdev, David S . Miller, Alexandre Belloni,
	Microchip Linux Driver Support
In-Reply-To: <20190820070518.mypyahquun6t4yjf@lx-anielsen.microsemi.net>

On Tue, Aug 20, 2019 at 09:05:20AM +0200, Allan W . Nielsen wrote:
> Hi,
> 
> This is fixing a bug introduced in b596229448dd2

Hi Allan

You should express that as:

Fixes: b596229448dd ("net: mscc: ocelot: Add support for tcam")

       Andrew

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-20 15:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Hubert Feurstein, netdev, lkml, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190820152306.GJ29991@lunn.ch>

On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > - take a second "post" system timestamp after the completion
> 
> For this hardware, completion is an interrupt, which has a lot of
> jitter on it. But this hardware is odd, in that it uses an
> interrupt. Every other MDIO bus controller uses polled IO, with an
> mdelay(10) or similar between each poll. So the jitter is going to be
> much larger.

I think a large jitter is ok in this case. We just need to timestamp
something that we know for sure happened after the PHC timestamp. It
should have no impact on the offset and its stability, just the
reported delay. A test with phc2sys should be able to confirm that.
phc2sys selects the measurement with the shortest delay, which has
least uncertainty. I'd say that applies to both interrupt and polling.

If it is difficult to specify the minimum interrupt delay, I'd still
prefer an overly pessimistic interval assuming a zero delay.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH 3/6] net: stmmac: sun8i: Use devm_regulator_get for PHY regulator
From: Andrew Lunn @ 2019-08-20 15:39 UTC (permalink / raw)
  To: megous
  Cc: David S. Miller, Rob Herring, Mark Rutland, Maxime Ripard,
	Chen-Yu Tsai, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
	Maxime Coquelin, devicetree, netdev, linux-kernel, linux-stm32,
	linux-arm-kernel
In-Reply-To: <20190820145343.29108-4-megous@megous.com>

On Tue, Aug 20, 2019 at 04:53:40PM +0200, megous@megous.com wrote:
> From: Ondrej Jirman <megous@megous.com>
> 
> Use devm_regulator_get instead of devm_regulator_get_optional and rely
> on dummy supply. This avoids NULL checks before regulator_enable/disable
> calls.

Hi Ondrej

What do you mean by a dummy supply? I'm just trying to make sure you
are not breaking backwards compatibility.

     Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: Alexander Duyck @ 2019-08-20 15:35 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu
In-Reply-To: <20190820151611.10727-1-i.maximets@samsung.com>

On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>
> Tx code doesn't clear the descriptor status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the comletion queue ring.
>
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
>
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>

I'm not sure this is the best way to go. My preference would be to
have something in the ring that would prevent us from racing which I
don't think this really addresses. I am pretty sure this code is safe
on x86 but I would be worried about weak ordered systems such as
PowerPC.

It might make sense to look at adding the eop_desc logic like we have
in the regular path with a proper barrier before we write it and after
we read it. So for example we could hold of on writing the bytecount
value until the end of an iteration and call smp_wmb before we write
it. Then on the cleanup we could read it and if it is non-zero we take
an smp_rmb before proceeding further to process the Tx descriptor and
clearing the value. Otherwise this code is going to just keep popping
up with issues.

> ---
>
> Not tested yet because of lack of available hardware.
> So, testing is very welcome.
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      | 10 ++++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +-----------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 ++++--
>  3 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 39e73ad60352..0befcef46e80 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>  }
>
> +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring)
> +{
> +       unsigned int head, tail;
> +
> +       head = ring->next_to_clean;
> +       tail = ring->next_to_use;
> +
> +       return ((head <= tail) ? tail : tail + ring->count) - head;
> +}
> +
>  #define IXGBE_RX_DESC(R, i)        \
>         (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
>  #define IXGBE_TX_DESC(R, i)        \
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 7882148abb43..d417237857d8 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
>         return ring->stats.packets;
>  }
>
> -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
> -{
> -       unsigned int head, tail;
> -
> -       head = ring->next_to_clean;
> -       tail = ring->next_to_use;
> -
> -       return ((head <= tail) ? tail : tail + ring->count) - head;
> -}
> -
>  static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring)
>  {
>         u32 tx_done = ixgbe_get_tx_completed(tx_ring);
>         u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
> -       u32 tx_pending = ixgbe_get_tx_pending(tx_ring);
> +       u32 tx_pending = ixgbe_desc_used(tx_ring);
>
>         clear_check_for_tx_hang(tx_ring);
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 6b609553329f..7702efed356a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>         u32 i = tx_ring->next_to_clean, xsk_frames = 0;
>         unsigned int budget = q_vector->tx.work_limit;
>         struct xdp_umem *umem = tx_ring->xsk_umem;
> +       u32 used_descs = ixgbe_desc_used(tx_ring);
>         union ixgbe_adv_tx_desc *tx_desc;
>         struct ixgbe_tx_buffer *tx_bi;
>         bool xmit_done;
> @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>         tx_desc = IXGBE_TX_DESC(tx_ring, i);
>         i -= tx_ring->count;
>
> -       do {
> +       while (likely(budget && used_descs)) {
>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>                         break;
>
> @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>
>                 /* update budget accounting */
>                 budget--;
> -       } while (likely(budget));
> +               used_descs--;
> +       }
>
>         i += tx_ring->count;
>         tx_ring->next_to_clean = i;
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf-next] xsk: proper socket state check in xsk_poll
From: Björn Töpel @ 2019-08-20 15:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: syzbot+c82697e3043781e08802, Alexei Starovoitov, Netdev,
	Björn Töpel, bpf, David Miller, Jesper Dangaard Brouer,
	Jakub Kicinski, John Fastabend, Jonathan Lemon, Martin KaFai Lau,
	LKML, Karlsson, Magnus, Song Liu, syzkaller-bugs, Xdp,
	Yonghong Song, hdanton
In-Reply-To: <beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net>

On Tue, 20 Aug 2019 at 16:30, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/20/19 12:04 PM, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > The poll() implementation for AF_XDP sockets did not perform the
> > proper state checks, prior accessing the socket umem. This patch fixes
> > that by performing a xsk_is_bound() check.
> >
> > Suggested-by: Hillf Danton <hdanton@sina.com>
> > Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> > Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > ---
> >   net/xdp/xsk.c | 14 ++++++++++++--
> >   1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index ee4428a892fa..08bed5e92af4 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
> >       return err;
> >   }
> >
> > +static bool xsk_is_bound(struct xdp_sock *xs)
> > +{
> > +     struct net_device *dev = READ_ONCE(xs->dev);
> > +
> > +     return dev && xs->state == XSK_BOUND;
> > +}
> > +
> >   static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >   {
> >       bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> >       struct sock *sk = sock->sk;
> >       struct xdp_sock *xs = xdp_sk(sk);
> >
> > -     if (unlikely(!xs->dev))
> > +     if (unlikely(!xsk_is_bound(xs)))
> >               return -ENXIO;
> >       if (unlikely(!(xs->dev->flags & IFF_UP)))
> >               return -ENETDOWN;
> > @@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> >       struct net_device *dev = xs->dev;
> >       struct xdp_umem *umem = xs->umem;
> >
> > +     if (unlikely(!xsk_is_bound(xs)))
> > +             return mask;
> > +
> >       if (umem->need_wakeup)
> >               dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> >                                               umem->need_wakeup);
> > @@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> >   {
> >       struct net_device *dev = xs->dev;
> >
> > -     if (!dev || xs->state != XSK_BOUND)
> > +     if (!xsk_is_bound(xs))
> >               return;
>
> I think I'm a bit confused by your READ_ONCE() usage. ;-/ I can see why you're
> using it in xsk_is_bound() above, but then at the same time all the other callbacks
> like xsk_poll() or xsk_unbind_dev() above have a struct net_device *dev = xs->dev
> right before the test. Could you elaborate?
>

Yes, now I'm confused as well! Digging deeper... I believe there are a
couple of places in xsk.c that do not have
READ_ONCE/WRITE_ONCE-correctness. Various xdp_sock members are read
lock-less outside the control plane mutex (mutex member of struct
xdp_sock). This needs some re-work. I'll look into using the newly
introduced state member (with corresponding read/write barriers) for
this.

I'll cook some patch(es) that address this, but first it sounds like I
need to reread [1] two, or three times. At least. ;-)


Thanks,
Björn


[1] https://lwn.net/Articles/793253/


> Thanks,
> Daniel

^ permalink raw reply

* Re: [PATCH 1/1] net: rds: add service level support in rds-info
From: Doug Ledford @ 2019-08-20 15:28 UTC (permalink / raw)
  To: Zhu Yanjun, davem, netdev, linux-rdma, rds-devel
In-Reply-To: <1566262341-18165-1-git-send-email-yanjun.zhu@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 2195 bytes --]

On Mon, 2019-08-19 at 20:52 -0400, Zhu Yanjun wrote:
> diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
> index fd6b5f6..cba368e 100644
> --- a/include/uapi/linux/rds.h
> +++ b/include/uapi/linux/rds.h
> @@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
>         __u32           rdma_mr_max;
>         __u32           rdma_mr_size;
>         __u8            tos;
> +       __u8            sl;
>         __u32           cache_allocs;
>  };
>  
> @@ -265,6 +266,7 @@ struct rds6_info_rdma_connection {
>         __u32           rdma_mr_max;
>         __u32           rdma_mr_size;
>         __u8            tos;
> +       __u8            sl;
>         __u32           cache_allocs;
>  };
>  

This is a user space API break (as was the prior patch mentioned
below)...

> The commit fe3475af3bdf ("net: rds: add per rds connection cache
> statistics") adds cache_allocs in struct rds_info_rdma_connection
> as below:
> struct rds_info_rdma_connection {
> ...
>         __u32           rdma_mr_max;
>         __u32           rdma_mr_size;
>         __u8            tos;
>         __u32           cache_allocs;
>  };
> The peer struct in rds-tools of struct rds_info_rdma_connection is as
> below:
> struct rds_info_rdma_connection {
> ...
>         uint32_t        rdma_mr_max;
>         uint32_t        rdma_mr_size;
>         uint8_t         tos;
>         uint8_t         sl;
>         uint32_t        cache_allocs;
> };

Why are the user space rds tools not using the kernel provided abi
files?

In order to know if this ABI breakage is safe, we need to know what
versions of rds-tools are out in the wild and have their own headers
that we need to match up with.  Are there any versions of rds-tools that
actually use the kernel provided headers?  Are there any other users of
uapi/linux/rds.h besides rds-tools?

Once the kernel and rds-tools package are in sync, rds-tools needs to be
modified to use the kernel header and proper ABI maintenance needs to be
started.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Andrew Lunn @ 2019-08-20 15:23 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: Hubert Feurstein, netdev, lkml, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190820142537.GL891@localhost>

> - take a second "post" system timestamp after the completion

For this hardware, completion is an interrupt, which has a lot of
jitter on it. But this hardware is odd, in that it uses an
interrupt. Every other MDIO bus controller uses polled IO, with an
mdelay(10) or similar between each poll. So the jitter is going to be
much larger.

Even though the FEC is special with its interrupt completion, i would
like to see the solution being reasonably generic so that others can
copy it into other MDIO bus drivers. That is what is nice about taking
the time stamp around the write which triggers the bus transaction. It
is independent of interrupt or polled, and should mean about the same
thing for different vendors hardware.

      Andrew

^ permalink raw reply

* [PATCH] net: Fix detection for IPv4 duplicate address.
From: Dongxu Liu @ 2019-08-20 15:19 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, netdev, linux-kernel

The network sends an ARP REQUEST packet to determine
whether there is a host with the same IP.
The source IP address of the packet is 0.
However, Windows may also send the source IP address
to determine, then the source IP address is equal to
the destination IP address.

Signed-off-by: Dongxu Liu <liudongxu3@huawei.com>
---
 net/ipv4/arp.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 05eb42f..944f8e8 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -800,8 +800,11 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb)
 			    iptunnel_metadata_reply(skb_metadata_dst(skb),
 						    GFP_ATOMIC);
 
-	/* Special case: IPv4 duplicate address detection packet (RFC2131) */
-	if (sip == 0) {
+/* Special case: IPv4 duplicate address detection packet (RFC2131).
+ * Linux usually sends zero to detect duplication, and windows may
+ * send a same ip (not zero, sip equal to tip) to do this detection.
+ */
+	if (sip == 0 || sip == tip) {
 		if (arp->ar_op == htons(ARPOP_REQUEST) &&
 		    inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL &&
 		    !arp_ignore(in_dev, sip, tip))
-- 
2.12.3



^ permalink raw reply related

* [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: Ilya Maximets @ 2019-08-20 15:16 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu, Ilya Maximets
In-Reply-To: <CGME20190820151644eucas1p179d6d1da42bb6be0aad8f58ac46624ce@eucas1p1.samsung.com>

Tx code doesn't clear the descriptor status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the comletion queue ring.

Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the tx ring.

Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---

Not tested yet because of lack of available hardware.
So, testing is very welcome.

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      | 10 ++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +-----------
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 ++++--
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 39e73ad60352..0befcef46e80 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
+static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring)
+{
+	unsigned int head, tail;
+
+	head = ring->next_to_clean;
+	tail = ring->next_to_use;
+
+	return ((head <= tail) ? tail : tail + ring->count) - head;
+}
+
 #define IXGBE_RX_DESC(R, i)	    \
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBE_TX_DESC(R, i)	    \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7882148abb43..d417237857d8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
 	return ring->stats.packets;
 }
 
-static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
-{
-	unsigned int head, tail;
-
-	head = ring->next_to_clean;
-	tail = ring->next_to_use;
-
-	return ((head <= tail) ? tail : tail + ring->count) - head;
-}
-
 static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring)
 {
 	u32 tx_done = ixgbe_get_tx_completed(tx_ring);
 	u32 tx_done_old = tx_ring->tx_stats.tx_done_old;
-	u32 tx_pending = ixgbe_get_tx_pending(tx_ring);
+	u32 tx_pending = ixgbe_desc_used(tx_ring);
 
 	clear_check_for_tx_hang(tx_ring);
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..7702efed356a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 	u32 i = tx_ring->next_to_clean, xsk_frames = 0;
 	unsigned int budget = q_vector->tx.work_limit;
 	struct xdp_umem *umem = tx_ring->xsk_umem;
+	u32 used_descs = ixgbe_desc_used(tx_ring);
 	union ixgbe_adv_tx_desc *tx_desc;
 	struct ixgbe_tx_buffer *tx_bi;
 	bool xmit_done;
@@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 	tx_desc = IXGBE_TX_DESC(tx_ring, i);
 	i -= tx_ring->count;
 
-	do {
+	while (likely(budget && used_descs)) {
 		if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
 			break;
 
@@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 
 		/* update budget accounting */
 		budget--;
-	} while (likely(budget));
+		used_descs--;
+	}
 
 	i += tx_ring->count;
 	tx_ring->next_to_clean = i;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf 1/1] xdp: unpin xdp umem pages in error path
From: Daniel Borkmann @ 2019-08-20 15:12 UTC (permalink / raw)
  To: Ivan Khoronzhuk, bjorn.topel
  Cc: magnus.karlsson, jonathan.lemon, davem, ast, jakub.kicinski, hawk,
	john.fastabend, netdev, bpf, linux-kernel
In-Reply-To: <20190815205635.6536-1-ivan.khoronzhuk@linaro.org>

On 8/15/19 10:56 PM, Ivan Khoronzhuk wrote:
> Fix mem leak caused by missed unpin routine for umem pages.
> Fixes: 8aef7340ae9695 ("commit xsk: introduce xdp_umem_page")
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Applied & fixed up 'Fixes:' tag, thanks.

^ permalink raw reply

* Re: [PATCH] test_bpf: Fix a new clang warning about xor-ing two numbers
From: Daniel Borkmann @ 2019-08-20 15:11 UTC (permalink / raw)
  To: Nathan Chancellor, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, netdev, bpf,
	linux-kernel, clang-built-linux
In-Reply-To: <20190819043419.68223-1-natechancellor@gmail.com>

On 8/19/19 6:34 AM, Nathan Chancellor wrote:
> r369217 in clang added a new warning about potential misuse of the xor
> operator as an exponentiation operator:
> 
> ../lib/test_bpf.c:870:13: warning: result of '10 ^ 300' is 294; did you
> mean '1e300'? [-Wxor-used-as-pow]
>                  { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
>                         ~~~^~~~~
>                         1e300
> ../lib/test_bpf.c:870:13: note: replace expression with '0xA ^ 300' to
> silence this warning
> ../lib/test_bpf.c:870:31: warning: result of '10 ^ 300' is 294; did you
> mean '1e300'? [-Wxor-used-as-pow]
>                  { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
>                                           ~~~^~~~~
>                                           1e300
> ../lib/test_bpf.c:870:31: note: replace expression with '0xA ^ 300' to
> silence this warning
> 
> The commit link for this new warning has some good logic behind wanting
> to add it but this instance appears to be a false positive. Adopt its
> suggestion to silence the warning but not change the code. According to
> the differential review link in the clang commit, GCC may eventually
> adopt this warning as well.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/643
> Link: https://github.com/llvm/llvm-project/commit/920890e26812f808a74c60ebc14cc636dac661c1
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] bpfilter/verifier: add include guard to tnum.h
From: Daniel Borkmann @ 2019-08-20 15:10 UTC (permalink / raw)
  To: Masahiro Yamada, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, netdev, bpf
  Cc: linux-kernel
In-Reply-To: <20190819161035.21826-1-yamada.masahiro@socionext.com>

On 8/19/19 6:10 PM, Masahiro Yamada wrote:
> Add a header include guard just in case.
> 
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Carlos Antonio Neira Bustos @ 2019-08-20 15:10 UTC (permalink / raw)
  To: Yonghong Song
  Cc: netdev@vger.kernel.org, ebiederm@xmission.com, brouer@redhat.com,
	bpf@vger.kernel.org
In-Reply-To: <445a4535-b8cc-b6bc-717b-a5736030533a@fb.com>

Hi Yonghong,

Thanks for taking the time to review this.


> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOTDIR** if /proc/self/ns does not exists.
> 
> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
> described below.
> 
> Please *do verify* what happens when namespaces or pid_ns are not
> configured.
>


I have tested kernel configurations without namespace support and with 
namespace support but without pid namespaces, the helper returns -EINVAL
on both cases, now it should return -ENOENT.


> > +struct bpf_pidns_info {
> > +	__u32 dev;
> 
> Please add a comment for dev for how device major and minor number are 
> derived. User space gets device major and minor number, they need to
> compare to the corresponding major/minor numbers returned by this helper.
> 
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
>

What do you think of this comment ?

struct bpf_pidns_info {
	__u32 dev;	/* major/minor numbers from /proc/self/ns/pid.
			 * User space gets device major and minor numbers from
			 * the same device that need to be compared against the
			 * major/minor numbers returned by this helper.
			 */
	__u32 nsid;
	__u32 tgid;
	__u32 pid;
};

> 
> Please put an empty line. As a general rule for readability,
> put an empty line if control flow is interrupted, e.g., by
> "return", "break" or "continue". At least this is what
> I saw most in bpf mailing list.
>
I'll fix it in version 10.

> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)tmp->name, pidns_path, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> Adding below to free kmem cache memory
> 	kmem_cache_free(names_cachep, fname);
> 

I think we don't need to call kmem_cache_free as filename_lookup
calls putname that calls kmem_cache_free. 


Thanks a lot for your help.

Bests

> In the above, we checked task_active_pid_ns().
> If not returning NULL, we have a valid pid ns. So the above
> filename_lookup should not go wrong. We can still keep
> the error checking though.
> 
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> 
>

I think we could get rid of this.


On Tue, Aug 13, 2019 at 10:35:42PM +0000, Yonghong Song wrote:
> 
> 
> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > From: Carlos <cneirabustos@gmail.com>
> > 
> > New bpf helper bpf_get_current_pidns_info.
> > This helper obtains the active namespace from current and returns
> > pid, tgid, device and namespace id as seen from that namespace,
> > allowing to instrument a process inside a container.
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> >   fs/internal.h            |  2 --
> >   fs/namei.c               |  1 -
> >   include/linux/bpf.h      |  1 +
> >   include/linux/namei.h    |  4 +++
> >   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >   kernel/bpf/core.c        |  1 +
> >   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   kernel/trace/bpf_trace.c |  2 ++
> >   8 files changed, 102 insertions(+), 4 deletions(-)
> 
> I prefer to break this into two patches to reduce
> the potential merging conflicts:
>    patch 1: fs/internal.h, fs/namei.c, include/linux/namei.h
>    patch 2: rest of changes
> patch 1 is simply a preparing patches to make filename_lookup
> available later.
> 
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 315fcd8d237c..6647e15dd419 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> >   /*
> >    * namei.c
> >    */
> > -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > -			   struct path *path, struct path *root);
> >   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> >   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> >   			   const char *, unsigned int, struct path *);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..a89fc72a4a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> >   #include <linux/export.h>
> >   #include <linux/kernel.h>
> >   #include <linux/slab.h>
> > -#include <linux/fs.h>
> >   #include <linux/namei.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/fsnotify.h>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> >   extern const struct bpf_func_proto bpf_strtol_proto;
> >   extern const struct bpf_func_proto bpf_strtoul_proto;
> >   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >   
> >   /* Shared helpers among cBPF and eBPF. */
> >   void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..b45c8b6f7cb4 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> >   #include <linux/path.h>
> >   #include <linux/fcntl.h>
> >   #include <linux/errno.h>
> > +#include <linux/fs.h>
> >   
> >   enum { MAX_NESTED_LINKS = 8 };
> >   
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >   
> >   extern void nd_jump_link(struct path *path);
> >   
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > +			   struct path *path, struct path *root);
> > +
> >   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> >   {
> >   	((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..db241857ec15 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,28 @@ union bpf_attr {
> >    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> >    *
> >    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> 
> size_of_pidns => size.
> 
> > + *	Description
> > + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> Copies => Copy.
> Maybe something like below:
> Get tgid, pid and namespace id as seen by the current namespace, and 
> device major/minor numbers from device /proc/self/ns/pid. Such
> information is stored in *pidns* of size *size*.
> 
> > + *		current namespace and also device from /proc/self/ns/pid.
> > + *		*size_of_pidns* must be the size of *pidns*
> > + *
> > + *		This helper is used when pid filtering is needed inside a
> > + *		container as bpf_get_current_tgid() helper returns always the
> 
> returns always => always returns.
> 
> > + *		pid id as seen by the root namespace.
> > + *	Return
> > + *		0 on success
> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOTDIR** if /proc/self/ns does not exists.
> 
> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
> described below.
> 
> Please *do verify* what happens when namespaces or pid_ns are not
> configured.
> 
> > + *
> > + *		**-ENOMEM**  if allocation fails.
> 
> helper internal allocation fails.
> 
> > + *
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)		\
> >   	FN(unspec),			\
> > @@ -2853,7 +2875,8 @@ union bpf_attr {
> >   	FN(sk_storage_get),		\
> >   	FN(sk_storage_delete),		\
> >   	FN(send_signal),		\
> > -	FN(tcp_gen_syncookie),
> > +	FN(tcp_gen_syncookie),		\
> > +	FN(get_current_pidns_info),
> >   
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> >   	__s32	retval;
> >   };
> >   
> > +struct bpf_pidns_info {
> > +	__u32 dev;
> 
> Please add a comment for dev for how device major and minor number are 
> derived. User space gets device major and minor number, they need to
> compare to the corresponding major/minor numbers returned by this helper.
> 
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
> >   #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >   
> >   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >   {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..41fbf1f28a48 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> >   #include <linux/uidgid.h>
> >   #include <linux/filter.h>
> >   #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >   
> >   #include "../../lib/kstrtox.h"
> >   
> > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> >   	preempt_enable();
> >   }
> >   
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +	 size)
> > +{
> > +	const char *pidns_path = "/proc/self/ns/pid";
> > +	struct pid_namespace *pidns = NULL;
> > +	struct filename *tmp = NULL;
> 
> tmp => fname
> 
> > +	struct inode *inode;
> > +	struct path kp;
> > +	pid_t tgid = 0;
> > +	pid_t pid = 0;
> > +	int ret;
> > +	int len;
> > +
> > +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +		return -EINVAL;
> 
> Please put an empty line. As a general rule for readability,
> put an empty line if control flow is interrupted, e.g., by
> "return", "break" or "continue". At least this is what
> I saw most in bpf mailing list.
> 
> > +	pidns = task_active_pid_ns(current);
> > +	if (unlikely(!pidns))
> > +		goto clear;
> 
> An empty line. Also, there is nothing to clear.
> I prefer an error code -ENOENT.
> 
> You can set
> 	ret = -EINVAL;
> here
> 
> > +	pidns_info->nsid =  pidns->ns.inum;
> > +	pid = task_pid_nr_ns(current, pidns);
> > +	if (unlikely(!pid))
> > +		goto clear;
> 
> An empty line.
> 
> > +	tgid = task_tgid_nr_ns(current, pidns);
> > +	if (unlikely(!tgid))
> > +		goto clear;
> 
> An empty line.
> 
> > +	pidns_info->tgid = (u32) tgid;
> > +	pidns_info->pid = (u32) pid;
> 
> Different functionality, an empty line.
> 
> > +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return -ENOMEM;
> 
> ret = -ENOMEM;
> goto clear;
> 
> > +	}
> 
> An empty line.
> 
> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)tmp->name, pidns_path, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> Adding below to free kmem cache memory
> 	kmem_cache_free(names_cachep, fname);
> 
> In the above, we checked task_active_pid_ns().
> If not returning NULL, we have a valid pid ns. So the above
> filename_lookup should not go wrong. We can still keep
> the error checking though.
> 
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> 
> goto clear;
> 
> > +	}
> 
> An empty line.
> 
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = inode->i_sb->s_dev;
> > +	return 0;
> 
> An empty line.
> 
> > +clear > +	memset((void *)pidns_info, 0, (size_t) size);
> > +	return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > +	.func		= bpf_get_current_pidns_info,
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> > +	.arg2_type	= ARG_CONST_SIZE,
> > +};
> > +
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   #endif
> >   	case BPF_FUNC_send_signal:
> >   		return &bpf_send_signal_proto;
> > +	case BPF_FUNC_get_current_pidns_info:
> > +		return &bpf_get_current_pidns_info_proto;
> >   	default:
> >   		return NULL;
> >   	}
> > 

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Daniel Borkmann @ 2019-08-20 15:10 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov; +Cc: bpf, netdev, oss-drivers
In-Reply-To: <20190820135346.7593-1-quentin.monnet@netronome.com>

On 8/20/19 3:53 PM, Quentin Monnet wrote:
> Implement the show_fdinfo hook for BTF FDs file operations, and make it
> print the id of the BTF object. This allows for a quick retrieval of the
> BTF id from its FD; or it can help understanding what type of object
> (BTF) the file descriptor points to.
> 
> v2:
> - Do not expose data_size, only btf_id, in FD info.
> 
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH -next] bpf: Use PTR_ERR_OR_ZERO in xsk_map_inc()
From: Daniel Borkmann @ 2019-08-20 15:09 UTC (permalink / raw)
  To: YueHaibing, bjorn.topel, magnus.karlsson, jonathan.lemon, ast,
	kafai, songliubraving, yhs, john.fastabend
  Cc: netdev, bpf, kernel-janitors
In-Reply-To: <20190820013652.147041-1-yuehaibing@huawei.com>

On 8/20/19 3:36 AM, YueHaibing wrote:
> Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied, thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox