Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: XDP performance regression due to CONFIG_RETPOLINE Spectre V2
From: Christoph Hellwig @ 2018-04-12 14:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: xdp-newbies@vger.kernel.org, netdev@vger.kernel.org,
	Christoph Hellwig, David Woodhouse, William Tu,
	Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Arnaldo Carvalho de Melo
In-Reply-To: <20180412145123.GA7048@lst.de>

On Thu, Apr 12, 2018 at 04:51:23PM +0200, Christoph Hellwig wrote:
> On Thu, Apr 12, 2018 at 03:50:29PM +0200, Jesper Dangaard Brouer wrote:
> > ---------------
> > Implement support for keeping the DMA mapping through the XDP return
> > call, to remove RX map/unmap calls.  Implement bulking for XDP
> > ndo_xdp_xmit and XDP return frame API.  Bulking allows to perform DMA
> > bulking via scatter-gatter DMA calls, XDP TX need it for DMA
> > map+unmap. The driver RX DMA-sync (to CPU) per packet calls are harder
> > to mitigate (via bulk technique). Ask DMA maintainer for a common
> > case direct call for swiotlb DMA sync call ;-)
> 
> Why do you even end up in swiotlb code?  Once you bounce buffer your
> performance is toast anyway..

I guess that is because x86 selects it as the default as soon as
we have more than 4G memory. That should be solveable fairly easily
with the per-device dma ops, though.

^ permalink raw reply

* Re: [PATCH 5/5] arm64: allwinner: a64: add SRAM controller device tree node
From: Maxime Ripard @ 2018-04-12 14:58 UTC (permalink / raw)
  To: Icenowy Zheng
  Cc: Rob Herring, Chen-Yu Tsai, Giuseppe Cavallaro, Corentin Labbe,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20180411141641.14675-6-icenowy-h8G6r0blFSE@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

Hi,

On Wed, Apr 11, 2018 at 10:16:41PM +0800, Icenowy Zheng wrote:
> Allwinner A64 has a SRAM controller, and in the device tree currently
> we have a syscon node to enable EMAC driver to access the EMAC clock
> register. As SRAM controller driver can now export regmap for this
> register, replace the syscon node to the SRAM controller device node,
> and let EMAC driver to acquire its EMAC clock regmap.
> 
> Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
> ---
>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> index 1b2ef28c42bd..1c37659d9d41 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> @@ -168,10 +168,25 @@
>  		#size-cells = <1>;
>  		ranges;
>  
> -		syscon: syscon@1c00000 {
> -			compatible = "allwinner,sun50i-a64-system-controller",
> -				"syscon";
> +		sram_controller: sram-controller@1c00000 {
> +			compatible = "allwinner,sun50i-a64-sram-controller";
>  			reg = <0x01c00000 0x1000>;
> +			#address-cells = <1>;
> +			#size-cells = <1>;
> +			ranges;
> +
> +			sram_c: sram@18000 {
> +				compatible = "mmio-sram";
> +				reg = <0x00018000 0x28000>;
> +				#address-cells = <1>;
> +				#size-cells = <1>;
> +				ranges = <0 0x00018000 0x28000>;
> +
> +				de2_sram: sram-section@0 {
> +					compatible = "allwinner,sun50i-a64-sram-c";
> +					reg = <0x0000 0x28000>;
> +				};
> +			};

That doesn't look related at all to what's being discussed here, so
you'd rather add it as part of your DE2-enablement serie (or amend
your commit log to say why this is important to do it in this patch).

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
From: Richard Cochran @ 2018-04-12 15:03 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Thomas Gleixner, netdev, jhs, xiyou.wangcong, jiri,
	vinicius.gomes, anna-maria, henrik, John Stultz, levi.pearson,
	edumazet, willemb, mlichvar
In-Reply-To: <3d1c27a2-7a21-ab80-e92b-47b415b70548@intel.com>

On Wed, Apr 11, 2018 at 04:38:44PM -0700, Jesus Sanchez-Palencia wrote:
> Just breaking this down a bit, yes, TAI is the network time base, and the NICs
> PTP clock use that because PTP is (commonly) based on TAI. After the PHCs have
> been synchronized over the network (e.g. with ptp4l), my understanding is that
> if applications want to use the clockid_t CLOCK_TAI as a network clock reference
> it's required that something (i.e. phc2sys) is synchronizing the PHCs and the
> system clock, and also that something calls adjtime to apply the TAI vs UTC
> offset to CLOCK_TAI.

Yes.  I haven't seen any distro that sets the TAI-UTC offset after
boot, nor are there any user space tools for this.  The kernel is
ready, though.

> I was thinking about the full offload use-cases, thus when no scheduling is
> happening inside the qdiscs. Applications could just read the time from the PHC
> clocks directly without having to rely on any of the above. On this case,
> userspace would use DYNAMIC_CLOCK just to flag that this is the case, but I must
> admit it's not clear to me how common of a use-case that is, or even if it makes
> sense.

1588 allows only two timescales, TAI and ARB-itrary.  Although it
doesn't make too much sense to use ARB, still people will do strange
things.  Probably some people use UTC.  I am not advocating supporting
alternate timescales, just pointing out the possibility.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 3/5] net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device
From: Icenowy Zheng @ 2018-04-12 15:11 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Rob Herring, Chen-Yu Tsai, Giuseppe Cavallaro, Corentin Labbe,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20180412145628.iaaeue2imiijwx5d@flea>



于 2018年4月12日 GMT+08:00 下午10:56:28, Maxime Ripard <maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> 写到:
>On Wed, Apr 11, 2018 at 10:16:39PM +0800, Icenowy Zheng wrote:
>> From: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>> 
>> On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU
>> address space; on the A64 SoC this register is in the SRAM controller
>> address space, and with a different offset.
>> 
>> To access the register from another device and hide the internal
>> difference between the device, let it register a regmap named
>> "emac-clock". We can then get the device from the phandle, and
>> retrieve the regmap with dev_get_regmap(); in this situation the
>> regmap_field will be set up to access the only register in the
>regmap.
>> 
>> Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>> [Icenowy: change to use regmaps with single register, change commit
>>  message]
>> Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 48
>++++++++++++++++++++++-
>>  1 file changed, 46 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> index 1037f6c78bca..b61210c0d415 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> @@ -85,6 +85,13 @@ const struct reg_field old_syscon_reg_field = {
>>  	.msb = 31,
>>  };
>>  
>> +/* Specially exported regmap which contains only EMAC register */
>> +const struct reg_field single_reg_field = {
>> +	.reg = 0,
>> +	.lsb = 0,
>> +	.msb = 31,
>> +};
>> +
>
>I'm not sure this would be wise. If we ever need some other register
>exported through the regmap, will have to change all the calling sites
>everywhere in the kernel, which will be a pain and will break
>bisectability.

In this situation the register can be exported as another
 regmap. Currently the code will access a regmap with name
"emac-clock" for this register.

>
>Chen-Yu's (or was it yours?) initial solution with a custom writeable
>hook only allowing a single register seemed like a better one.

But I remember you mentioned that you want it to hide the
difference inside the device.

>
>Maxime

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel
From: David Miller @ 2018-04-12 15:11 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev
In-Reply-To: <d3bd9e31-5247-8032-8f35-c85d3728c32e@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Thu, 12 Apr 2018 15:02:50 +0100

> A misconfigured system (e.g. with all interrupts affinitised to all CPUs)
>  may produce a storm of ARFS steering events.  With the existing sfc ARFS
>  implementation, that could create a backlog of workitems that grinds the
>  system to a halt.  To prevent this, limit the number of workitems that
>  may be in flight for a given SFC device to 8 (EFX_RPS_MAX_IN_FLIGHT), and
>  return EBUSY from our ndo_rx_flow_steer method if the limit is reached.
> Given this limit, also store the workitems in an array of slots within the
>  struct efx_nic, rather than dynamically allocating for each request.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>

I don't think this behavior is all that great.

If you really have to queue up these operations because they take a long
time, I think it is better to enter a synchronous mode and sleep once
you hit this in-flight limit of 8.

Either that or make the expiration work smarter when it has lots of events
to process.

^ permalink raw reply

* Re: [PATCH 2/4] lan78xx: Read initial EEE setting from Device Tree
From: Phil Elwell @ 2018-04-12 15:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Woojung Huh, Microchip Linux Driver Support, Rob Herring,
	Mark Rutland, David S. Miller, Mauro Carvalho Chehab,
	Greg Kroah-Hartman, Linus Walleij, Andrew Morton, Randy Dunlap,
	netdev, devicetree, linux-kernel, linux-usb
In-Reply-To: <20180412141630.GM28963@lunn.ch>

Andrew,

On 12/04/2018 15:16, Andrew Lunn wrote:
> On Thu, Apr 12, 2018 at 02:55:34PM +0100, Phil Elwell wrote:
>> Add two new Device Tree properties:
>> * microchip,eee-enabled  - a boolean to enable EEE
>> * microchip,tx-lpi-timer - time in microseconds to wait after TX goes
>>                            idle before entering the low power state
>>                            (default 600)
> 
> Hi Phil
> 
> This looks wrong.
> 
> What should happen is that the MAC driver calls phy_init_eee() to find
> out if the PHY supports EEE. There should be no need to look in device
> tree.

If the driver should be calling phy_init_eee to initialise EEE operation then I'm fine
with that (although I notice that the TI cpsw calls phy_ethtool_set_eee but I don't see
it calling phy_init_eee). However, it sounds like I need to keep my DT toggle of the
EEE enablement and parameters downstream.

Phil

^ permalink raw reply

* Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
From: Miroslav Lichvar @ 2018-04-12 15:19 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Jesus Sanchez-Palencia, Thomas Gleixner, netdev, jhs,
	xiyou.wangcong, jiri, vinicius.gomes, anna-maria, henrik,
	John Stultz, levi.pearson, edumazet, willemb
In-Reply-To: <20180412150349.3dgkgqx4ged6t4ng@localhost>

On Thu, Apr 12, 2018 at 08:03:49AM -0700, Richard Cochran wrote:
> On Wed, Apr 11, 2018 at 04:38:44PM -0700, Jesus Sanchez-Palencia wrote:
> > Just breaking this down a bit, yes, TAI is the network time base, and the NICs
> > PTP clock use that because PTP is (commonly) based on TAI. After the PHCs have
> > been synchronized over the network (e.g. with ptp4l), my understanding is that
> > if applications want to use the clockid_t CLOCK_TAI as a network clock reference
> > it's required that something (i.e. phc2sys) is synchronizing the PHCs and the
> > system clock, and also that something calls adjtime to apply the TAI vs UTC
> > offset to CLOCK_TAI.
> 
> Yes.  I haven't seen any distro that sets the TAI-UTC offset after
> boot, nor are there any user space tools for this.  The kernel is
> ready, though.

FWIW, the default NTP configuration in Fedora sets the kernel TAI-UTC
offset.

> > I was thinking about the full offload use-cases, thus when no scheduling is
> > happening inside the qdiscs. Applications could just read the time from the PHC
> > clocks directly without having to rely on any of the above. On this case,
> > userspace would use DYNAMIC_CLOCK just to flag that this is the case, but I must
> > admit it's not clear to me how common of a use-case that is, or even if it makes
> > sense.
> 
> 1588 allows only two timescales, TAI and ARB-itrary.  Although it
> doesn't make too much sense to use ARB, still people will do strange
> things.  Probably some people use UTC.  I am not advocating supporting
> alternate timescales, just pointing out the possibility.

There is also the possibility that the NIC clock is not synchronized
to anything. For synchronization of the system clock it's easier to
leave it free running and only track its phase/frequency offset to
allow conversion between the PHC and system time.

-- 
Miroslav Lichvar

^ permalink raw reply

* RE: [RFC PATCH v2 03/14] xsk: add umem fill queue support and mmap
From: Karlsson, Magnus @ 2018-04-12 15:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Björn Töpel, Duyck, Alexander H,
	alexander.duyck@gmail.com, john.fastabend@gmail.com, ast@fb.com,
	brouer@redhat.com, willemdebruijn.kernel@gmail.com,
	daniel@iogearbox.net, netdev@vger.kernel.org,
	michael.lundkvist@ericsson.com, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z, ravineet.singh@ericsson.com
In-Reply-To: <20180412170110-mutt-send-email-mst@kernel.org>



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 12, 2018 4:05 PM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>
> Cc: Björn Töpel <bjorn.topel@gmail.com>; Duyck, Alexander H
> <alexander.h.duyck@intel.com>; alexander.duyck@gmail.com;
> john.fastabend@gmail.com; ast@fb.com; brouer@redhat.com;
> willemdebruijn.kernel@gmail.com; daniel@iogearbox.net;
> netdev@vger.kernel.org; michael.lundkvist@ericsson.com; Brandeburg,
> Jesse <jesse.brandeburg@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ravineet.singh@ericsson.com
> Subject: Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and
> mmap
> 
> On Thu, Apr 12, 2018 at 07:38:25AM +0000, Karlsson, Magnus wrote:
> > I think you are definitely right in that there are ways in which we
> > can improve performance here. That said, the current queue performs
> > slightly better than the previous one we had that was more or less a
> > copy of one of your first virtio 1.1 proposals from little over a year
> > ago. It had bidirectional queues and a valid flag in the descriptor
> > itself. The reason we abandoned this was not poor performance (it was
> > good), but a need to go to unidirectional queues. Maybe I should have
> > only changed that aspect and kept the valid flag.
> 
> Is there a summary about unidirectional queues anywhere?  I'm curious to
> know whether there are any lessons here to be learned for virtio or ptr_ring.

I did a quick hack in which I used your ptr_ring for the fill queue instead of
our head/tail based one. In the corner cases (usually empty or usually full), there
is basically no difference. But for the case when the queue is always half full,
the ptr_ring implementation boosts the performance from 5.6 to 5.7 Mpps 
(as there is no cache line bouncing in this case) 
on my system (slower than Björn's that was used for the numbers in the RFC).

So I think this should be implemented properly so we can get some real numbers.
Especially since 0.1 Mpps with copies will likely become much more with zero-copy
as we are really chasing cycles there. We will get back a better evaluation in a few
days.

Thanks: Magnus

> --
> MST

^ permalink raw reply

* RE: [PATCH 3/4] lan78xx: Read LED modes from Device Tree
From: Woojung.Huh @ 2018-04-12 15:21 UTC (permalink / raw)
  To: andrew, phil
  Cc: UNGLinuxDriver, robh+dt, mark.rutland, davem, mchehab, gregkh,
	linus.walleij, akpm, rdunlap, netdev, devicetree, linux-kernel,
	linux-usb
In-Reply-To: <20180412143655.GQ28963@lunn.ch>

> > @@ -2097,6 +2098,25 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
> >  		(void)lan78xx_set_eee(dev->net, &edata);
> >  	}
> >
> > +	if (!of_property_read_u32_array(dev->udev->dev.of_node,
> > +					"microchip,led-modes",
> > +					led_modes, ARRAY_SIZE(led_modes))) {
> > +		u32 reg;
> > +		int i;
> > +
> > +		reg = phy_read(phydev, 0x1d);
> > +		for (i = 0; i < ARRAY_SIZE(led_modes); i++) {
> > +			reg &= ~(0xf << (i * 4));
> > +			reg |= (led_modes[i] & 0xf) << (i * 4);
> > +		}
> > +		(void)phy_write(phydev, 0x1d, reg);
> 
> Poking PHY registers directly from the MAC driver is not always a good
> idea. This MAC driver does that in a few places :-(
Agree but, some are for workaround unfortunately.

> What do we know about the PHY? It is built into the device or is it
> external? If it is external, how do you know the LED register is at
> 0x1d?
This register is not defined in include/linux/microchipphy.h. :(
Also agree that there parts should be applied to internal PHY only.

^ permalink raw reply

* Re: Re: [PATCH 3/5] net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device
From: Chen-Yu Tsai @ 2018-04-12 15:23 UTC (permalink / raw)
  To: Icenowy Zheng
  Cc: Maxime Ripard, Rob Herring, Giuseppe Cavallaro, Corentin Labbe,
	netdev, devicetree, linux-arm-kernel, linux-kernel, linux-sunxi
In-Reply-To: <A027702A-F3F7-4896-B2B8-E024DA521661-h8G6r0blFSE@public.gmane.org>

On Thu, Apr 12, 2018 at 11:11 PM, Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org> wrote:
>
>
> 于 2018年4月12日 GMT+08:00 下午10:56:28, Maxime Ripard <maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> 写到:
>>On Wed, Apr 11, 2018 at 10:16:39PM +0800, Icenowy Zheng wrote:
>>> From: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>>>
>>> On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU
>>> address space; on the A64 SoC this register is in the SRAM controller
>>> address space, and with a different offset.
>>>
>>> To access the register from another device and hide the internal
>>> difference between the device, let it register a regmap named
>>> "emac-clock". We can then get the device from the phandle, and
>>> retrieve the regmap with dev_get_regmap(); in this situation the
>>> regmap_field will be set up to access the only register in the
>>regmap.
>>>
>>> Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>>> [Icenowy: change to use regmaps with single register, change commit
>>>  message]
>>> Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
>>> ---
>>>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 48
>>++++++++++++++++++++++-
>>>  1 file changed, 46 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>>b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>>> index 1037f6c78bca..b61210c0d415 100644
>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>>> @@ -85,6 +85,13 @@ const struct reg_field old_syscon_reg_field = {
>>>      .msb = 31,
>>>  };
>>>
>>> +/* Specially exported regmap which contains only EMAC register */
>>> +const struct reg_field single_reg_field = {
>>> +    .reg = 0,
>>> +    .lsb = 0,
>>> +    .msb = 31,
>>> +};
>>> +
>>
>>I'm not sure this would be wise. If we ever need some other register
>>exported through the regmap, will have to change all the calling sites
>>everywhere in the kernel, which will be a pain and will break
>>bisectability.
>
> In this situation the register can be exported as another
>  regmap. Currently the code will access a regmap with name
> "emac-clock" for this register.
>
>>
>>Chen-Yu's (or was it yours?) initial solution with a custom writeable
>>hook only allowing a single register seemed like a better one.
>
> But I remember you mentioned that you want it to hide the
> difference inside the device.

Hi,

The idea is that a device can export multiple regmaps. This one,
the one named "gmac" (in my soon to come v2) or "emac-clock" here,
is but one of many possible regmaps, and it only exports the register
needed by the GMAC/EMAC. IMHO it is highly unlikely the same piece of
hardware would need a second register from the same device. A more
likely situation would be it needs another register from a different
device, like on the H6 where the "internal PHY" is not so internal
from a system bus point of view.

ChenYu

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel
From: Edward Cree @ 2018-04-12 15:24 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev
In-Reply-To: <20180412.111132.159310781108534969.davem@davemloft.net>

On 12/04/18 16:11, David Miller wrote:
> From: Edward Cree <ecree@solarflare.com>
> Date: Thu, 12 Apr 2018 15:02:50 +0100
>
>> A misconfigured system (e.g. with all interrupts affinitised to all CPUs)
>>  may produce a storm of ARFS steering events.  With the existing sfc ARFS
>>  implementation, that could create a backlog of workitems that grinds the
>>  system to a halt.  To prevent this, limit the number of workitems that
>>  may be in flight for a given SFC device to 8 (EFX_RPS_MAX_IN_FLIGHT), and
>>  return EBUSY from our ndo_rx_flow_steer method if the limit is reached.
>> Given this limit, also store the workitems in an array of slots within the
>>  struct efx_nic, rather than dynamically allocating for each request.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
> I don't think this behavior is all that great.
>
> If you really have to queue up these operations because they take a long
> time, I think it is better to enter a synchronous mode and sleep once
> you hit this in-flight limit of 8.
I don't think we can sleep at this point, ndo_rx_flow_steer is called from
 the RX path (netif_receive_skb_internal() -> get_rps_cpu() ->
 set_rps_cpu()).

> Either that or make the expiration work smarter when it has lots of events
> to process.
I'm afraid I don't understand what you mean here.
This code is not handling expiration of old ARFS filters, it's inserting
 new ones.

-Ed

^ permalink raw reply

* Re: [PATCH 08/15] ASoC: pxa: remove the dmaengine compat need
From: Mark Brown @ 2018-04-12 15:26 UTC (permalink / raw)
  To: Robert Jarzmik
  Cc: Daniel Mack, Haojian Zhuang, Bartlomiej Zolnierkiewicz, Tejun Heo,
	Vinod Koul, Mauro Carvalho Chehab, Ulf Hansson, Ezequiel Garcia,
	Boris Brezillon, David Woodhouse, Brian Norris, Marek Vasut,
	Richard Weinberger, Cyrille Pitchen, Nicolas Pitre, Samuel Ortiz,
	Greg Kroah-Hartman, Jaroslav Kysela, Takashi Iwai
In-Reply-To: <20180402142656.26815-9-robert.jarzmik@free.fr>

[-- Attachment #1: Type: text/plain, Size: 348 bytes --]

On Mon, Apr 02, 2018 at 04:26:49PM +0200, Robert Jarzmik wrote:
> As the pxa architecture switched towards the dmaengine slave map, the
> old compatibility mechanism to acquire the dma requestor line number and
> priority are not needed anymore.

Acked-by: Mark Brown <broonie@kernel.org>

If there's no dependency I'm happy to take this for 4.18.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: XDP performance regression due to CONFIG_RETPOLINE Spectre V2
From: Jesper Dangaard Brouer @ 2018-04-12 15:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: xdp-newbies@vger.kernel.org, netdev@vger.kernel.org,
	David Woodhouse, William Tu, Björn Töpel,
	Karlsson, Magnus, Alexander Duyck, Arnaldo Carvalho de Melo,
	brouer
In-Reply-To: <20180412145653.GA7172@lst.de>

On Thu, 12 Apr 2018 16:56:53 +0200 Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Apr 12, 2018 at 04:51:23PM +0200, Christoph Hellwig wrote:
> > On Thu, Apr 12, 2018 at 03:50:29PM +0200, Jesper Dangaard Brouer wrote:  
> > > ---------------
> > > Implement support for keeping the DMA mapping through the XDP return
> > > call, to remove RX map/unmap calls.  Implement bulking for XDP
> > > ndo_xdp_xmit and XDP return frame API.  Bulking allows to perform DMA
> > > bulking via scatter-gatter DMA calls, XDP TX need it for DMA
> > > map+unmap. The driver RX DMA-sync (to CPU) per packet calls are harder
> > > to mitigate (via bulk technique). Ask DMA maintainer for a common
> > > case direct call for swiotlb DMA sync call ;-)  
> > 
> > Why do you even end up in swiotlb code?  Once you bounce buffer your
> > performance is toast anyway..  
> 
> I guess that is because x86 selects it as the default as soon as
> we have more than 4G memory. 

I were also confused why I ended up using SWIOTLB (SoftWare IO-TLB),
that might explain it. And I'm not hitting the bounce-buffer case.

How do I control which DMA engine I use? (So, I can play a little)

> That should be solveable fairly easily with the per-device dma ops,
> though.

I didn't understand this part.

I wanted to ask your opinion, on a hackish idea I have...
Which is howto detect, if I can reuse the RX-DMA map address, for TX-DMA
operation on another device (still/only calling sync_single_for_device).

With XDP_REDIRECT we are redirecting between net_device's. Usually
we keep the RX-DMA mapping as we recycle the page. On the redirect to
TX-device (via ndo_xdp_xmit) we do a new DMA map+unmap for TX.  The
question is how to avoid this mapping(?).  In some cases, with some DMA
engines (or lack of) I guess the DMA address is actually the same as
the RX-DMA mapping dma_addr_t already known, right?  For those cases,
would it be possible to just (re)use that address for TX?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel
From: David Miller @ 2018-04-12 15:33 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev
In-Reply-To: <c007fb70-9908-1692-2939-5a2258d1d789@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Thu, 12 Apr 2018 16:24:46 +0100

> This code is not handling expiration of old ARFS filters, it's inserting
>  new ones.

Then simply make the work process a queue, and add entries to the queue
here if the work is already scheduled.

Is there a reason why that wouldn't work?

^ permalink raw reply

* Re: [PATCH net-next] Per interface IPv4 stats (CONFIG_IP_IFSTATS_TABLE)
From: Stephen Suryaputra @ 2018-04-12 15:58 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.20.1804112208190.2618@ja.home.ssi.bg>

Thanks for the feedbacks. Please see the detail below:

On Wed, Apr 11, 2018 at 3:37 PM, Julian Anastasov <ja@ssi.bg> wrote:
[snip]
>> -     __IP_INC_STATS(net, IPSTATS_MIB_INHDRERRORS);
>> +     __IP_INC_STATS(net, skb_dst(skb)->dev, IPSTATS_MIB_INHDRERRORS);
>
>         May be skb->dev if we want to account it to the
> input device.
>
Yes. I'm about to make change it but see the next one.

[snip]
>> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
>> index 4527921..32bd3af 100644
>> --- a/net/netfilter/ipvs/ip_vs_xmit.c
>> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
>> @@ -286,7 +286,7 @@ static inline bool decrement_ttl(struct netns_ipvs *ipvs,
>>       {
>>               if (ip_hdr(skb)->ttl <= 1) {
>>                       /* Tell the sender its packet died... */
>> -                     __IP_INC_STATS(net, IPSTATS_MIB_INHDRERRORS);
>> +                     __IP_INC_STATS(net, skb_dst(skb)->dev, IPSTATS_MIB_INHDRERRORS);
>
>         At this point, skb_dst(skb) can be:
>
> - input route at LOCAL_IN => dst->dev is "lo", skb->dev = input_device
> - output route at LOCAL_OUT => dst->dev is output_device, skb->dev = NULL
>
>         We should see this error on LOCAL_IN but better to be
> safe: use 'skb->dev ? : skb_dst(skb)->dev' instead of just
> 'skb_dst(skb)->dev'.
>
This follows v6 implementation in the same function:

#ifdef CONFIG_IP_VS_IPV6
    if (skb_af == AF_INET6) {
        struct dst_entry *dst = skb_dst(skb);

        /* check and decrement ttl */
        if (ipv6_hdr(skb)->hop_limit <= 1) {
            /* Force OUTPUT device used as source address */
            skb->dev = dst->dev;
            icmpv6_send(skb, ICMPV6_TIME_EXCEED,
                    ICMPV6_EXC_HOPLIMIT, 0);
            __IP6_INC_STATS(net, ip6_dst_idev(dst),
                    IPSTATS_MIB_INHDRERRORS);

            return false;
        }

        /* don't propagate ttl change to cloned packets */
        if (!skb_make_writable(skb, sizeof(struct ipv6hdr)))
            return false;

        ipv6_hdr(skb)->hop_limit--;
    } else
#endif

[snip]
>
>         The patch probably has other errors, for example,
> using rt->dst.dev (lo) when rt->dst.error != 0 in ip_error,
> may be 'dev' should be used instead...

Same also here. Examples are ip6_forward and ip6_pkt_drop.

I think it's better be counted in the input device for them also. Thoughts?

Regards,
Stephen.

^ permalink raw reply

* Re: [PATCH net] tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
From: Yuchung Cheng @ 2018-04-12 16:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <20180411213628.194344-1-edumazet@google.com>

On Wed, Apr 11, 2018 at 2:36 PM, Eric Dumazet <edumazet@google.com> wrote:
>
> syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
>
> I believe this was caused by a TCP_MD5SIG being set on live
> flow.
>
> This is highly unexpected, since TCP option space is limited.
>
> For instance, presence of TCP MD5 option automatically disables
> TCP TimeStamp option at SYN/SYNACK time, which we can not do
> once flow has been established.
>
> Really, adding/deleting an MD5 key only makes sense on sockets
> in CLOSE or LISTEN state.
>
> [1]
> BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
> CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
>  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
>  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
>  tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
>  tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline]
>  tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184
>  tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453
>  tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
>  sk_backlog_rcv include/net/sock.h:908 [inline]
>  __release_sock+0x2d6/0x680 net/core/sock.c:2271
>  release_sock+0x97/0x2a0 net/core/sock.c:2786
>  tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
>  inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg net/socket.c:640 [inline]
>  SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
>  SyS_sendto+0x8a/0xb0 net/socket.c:1715
>  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x448fe9
> RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9
> RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004
> RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010
> R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000
> R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
>
> Uninit was created at:
>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
>  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
>  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
>  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
>  slab_post_alloc_hook mm/slab.h:445 [inline]
>  slab_alloc_node mm/slub.c:2737 [inline]
>  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
>  __kmalloc_reserve net/core/skbuff.c:138 [inline]
>  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
>  alloc_skb include/linux/skbuff.h:984 [inline]
>  tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624
>  __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline]
>  tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline]
>  tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469
>  tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
>  sk_backlog_rcv include/net/sock.h:908 [inline]
>  __release_sock+0x2d6/0x680 net/core/sock.c:2271
>  release_sock+0x97/0x2a0 net/core/sock.c:2786
>  tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
>  inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg net/socket.c:640 [inline]
>  SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
>  SyS_sendto+0x8a/0xb0 net/socket.c:1715
>  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> ---
Acked-by: Yuchung Cheng <ycheng@google.com>

Thanks for the fix!
>  net/ipv4/tcp.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index bccc4c2700870b8c7ff592a6bd27acebd9bc6471..4fa3f812b9ff8954a9b6a018c648ff12ab995721 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2813,8 +2813,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>  #ifdef CONFIG_TCP_MD5SIG
>         case TCP_MD5SIG:
>         case TCP_MD5SIG_EXT:
> -               /* Read the IP->Key mappings from userspace */
> -               err = tp->af_specific->md5_parse(sk, optname, optval, optlen);
> +               if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
> +                       err = tp->af_specific->md5_parse(sk, optname, optval, optlen);
> +               else
> +                       err = -EINVAL;
>                 break;
>  #endif
>         case TCP_USER_TIMEOUT:
> --
> 2.17.0.484.g0c8726318c-goog
>

^ permalink raw reply

* Re: TCP one-by-one acking - RFC interpretation question
From: Yuchung Cheng @ 2018-04-12 16:20 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: netdev, Eric Dumazet, Neal Cardwell, Kenneth Klette Jonassen
In-Reply-To: <20180411120655.aib34ye5xnch4zw3@unicorn.suse.cz>

On Wed, Apr 11, 2018 at 5:06 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> On Wed, Apr 11, 2018 at 12:58:37PM +0200, Michal Kubecek wrote:
>> There is something else I don't understand, though. In the case of
>> acking previously sacked and never retransmitted segment,
>> tcp_clean_rtx_queue() calculates the parameters for tcp_ack_update_rtt()
>> using
>>
>>         if (sack->first_sackt.v64) {
>>                 sack_rtt_us = skb_mstamp_us_delta(&now,
>> &sack->first_sackt);
>>                 ca_rtt_us = skb_mstamp_us_delta(&now,
>> &sack->last_sackt);
>>         }
>>
>> (in 4.4; mainline code replaces &now with tp->tcp_mstamp). If I read the
>> code correctly, both sack->first_sackt and sack->last_sackt contain
>> timestamps of initial segment transmission. This would mean we use the
>> time difference between the initial transmission and now, i.e. including
>> the RTO of the lost packet).
>>
>> IMHO we should take the actual round trip time instead, i.e. the
>> difference between the original transmission and the time the packet
>> sacked (first time). It seems we have been doing this before commit
>> 31231a8a8730 ("tcp: improve RTT from SACK for CC").
>
> Sorry for the noise, this was my misunderstanding, the first_sackt and
> last_sackt values are only taken from segments newly sacked by ack
> received right now, not those which were already sacked before.
>
> The actual problem and unrealistic RTT measurements come from another
> RFC violation I didn't mention before: the NAS doesn't follow RFC 2018
> section 4 rule for ordering of SACK blocks. Rather than sending SACK
> blocks three most recently received out-of-order blocks, it simply sends
> first three ordered by sequence numbers. In the earlier example (odd
> packets were received, even lost)
>
>        ACK             SAK             SAK             SAK
>     +-------+-------+-------+-------+-------+-------+-------+-------+-------+
>     |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |
>     +-------+-------+-------+-------+-------+-------+-------+-------+-------+
>   34273   35701   37129   38557   39985   41413   42841   44269   45697   47125
>
> it responds to retransmitted segment 2 by
>
>   1. ACK 37129, SACK 37129-38557 39985-41413 42841-44269
>   2. ACK 38557, SACK 39985-41413 42841-44269 45697-47125
>
> This new SACK block 45697-47125 has not been retransmitted and as it
> wasn't sacked before, it is considered newly sacked. Therefore it gets
> processed and its deemed RTT (time since its original transmit time)
> "poisons" the RTT calculation, leading to RTO spiraling up.
>
> Thus if we want to work around the NAS behaviour, we would need to
> recognize such new SACK block as "not really new" and ignore it for
> first_sackt/last_sackt. I'm not sure if it's possible without
> misinterpreting actually delayed out of order packets. Of course, it is
> not clear if it's worth the effort to work around so severely broken TCP
> implementations (two obvious RFC violations, even if we don't count the
> one-by-one acking).
Right. Not much we (sender) can do if the receiver is not reporting
the delivery status correctly. This also negatively impacts TCP
congestion control (Cubic, Reno, BBR, CDG etc) because we've changed
it to increase/decrease cwnd based on both inorder and out-of-order
delivery.

We're close to publish our internal packetdrill tests. Hopefully they
can be used to test these poor implementations.

>
> Michal Kubecek

^ permalink raw reply

* Re: [PATCH v2] net: dsa: b53: Using sleep-able operations in b53_switch_reset_gpio
From: Florian Fainelli @ 2018-04-12 16:32 UTC (permalink / raw)
  To: Jia-Ju Bai, andrew, vivien.didelot, preid; +Cc: netdev, linux-kernel
In-Reply-To: <1523497702-7200-1-git-send-email-baijiaju1990@gmail.com>



On 04/11/2018 06:48 PM, Jia-Ju Bai wrote:
> b53_switch_reset_gpio() is never called in atomic context.
> 
> The call chain ending up at b53_switch_reset_gpio() is:
> [1] b53_switch_reset_gpio() <- b53_switch_reset() <-
>     b53_reset_switch() <- b53_setup()
> 
> b53_switch_reset_gpio() is set as ".setup" in struct dsa_switch_ops.
> This function is not called in atomic context.
> 
> Despite never getting called from atomic context, b53_switch_reset_gpio()
> calls non-sleep operations mdelay() and gpio_set_value().
> They are not necessary and can be replaced with msleep() 
> and gpio_set_value_cansleep().
> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

-- 
Florian

^ permalink raw reply

* Re: v6/sit tunnels and VRFs
From: Jeff Barnhill @ 2018-04-12 16:54 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <abb3291a-222a-7eff-724c-9d06fea9f02d@gmail.com>

Hi David,

In the slides referenced, you recommend adding an "unreachable
default" route to the end of each VRF route table.  In my testing (for
v4) this results in a change to fib lookup failures such that instead
of ENETUNREACH being returned, EHOSTUNREACH is returned since the fib
finds the unreachable route, versus failing to find a route
altogether.

Have the implications of this been considered?  I don't see a
clean/easy way to achieve the old behavior without affecting non-VRF
routing (eg. remove the unreachable route and delete the non-VRF
rules).  I'm guessing that programmatically, it may not make much
difference, ie. lookup fails, but for debugging or to a user looking
at it, the difference matters.  Do you (or anyone else) have any
thoughts on this?

Thanks,
Jeff

On Sun, Oct 29, 2017 at 11:48 AM, David Ahern <dsahern@gmail.com> wrote:
> On 10/27/17 8:43 PM, Jeff Barnhill wrote:
>> ping v4 loopback...
>>
>> jeff@VM2:~$ ip route list vrf myvrf
>> 127.0.0.0/8 dev myvrf proto kernel scope link src 127.0.0.1
>> 192.168.200.0/24 via 192.168.210.3 dev enp0s8
>> 192.168.210.0/24 dev enp0s8 proto kernel scope link src 192.168.210.2
>>
>> Lookups shown in perf script were for table 255.  Is it necessary to
>> put the l3mdev table first?  If I re-order the tables, it starts
>> working:
>
> Yes, we advise moving the local table down to avoid false hits (e.g.,
> duplicate addresses like this between the default VRF and another VRF).
>
> I covered that and a few other things at OSS 2017. Latest VRF slides for
> users:
>   http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf

^ permalink raw reply

* Re: [PATCH 08/15] ASoC: pxa: remove the dmaengine compat need
From: Robert Jarzmik @ 2018-04-12 16:55 UTC (permalink / raw)
  To: Mark Brown
  Cc: Ulf Hansson, alsa-devel, Jaroslav Kysela, linux-ide, netdev,
	linux-mtd, devel, Boris Brezillon, Vinod Koul, Richard Weinberger,
	linu, Takashi Iwai, Marek Vasut, Ezequiel Garcia, Samuel Ortiz,
	Arnd Bergmann, Bartlomiej Zolnierkiewicz, Haojian Zhuang,
	dmaengine, Mauro Carvalho Chehab, linux-arm-kernel, Nicolas Pitre,
	Greg Kroah-Hartman, linux-mmc, Liam Girdwood <lgirdwood
In-Reply-To: <20180412152632.GG9929@sirena.org.uk>

Mark Brown <broonie@kernel.org> writes:

> On Mon, Apr 02, 2018 at 04:26:49PM +0200, Robert Jarzmik wrote:
>> As the pxa architecture switched towards the dmaengine slave map, the
>> old compatibility mechanism to acquire the dma requestor line number and
>> priority are not needed anymore.
>
> Acked-by: Mark Brown <broonie@kernel.org>
>
> If there's no dependency I'm happy to take this for 4.18.
Thanks for the ack.

The patches 1 and 2 are the dependency here, so I'd rather push it through my
tree once the review is complete.

Cheers.

-- 
Robert

^ permalink raw reply

* [PATCH] NFC: fix attrs checks in netlink interface
From: Andrey Konovalov @ 2018-04-12 16:56 UTC (permalink / raw)
  To: Samuel Ortiz, David S . Miller, linux-wireless, netdev,
	linux-kernel
  Cc: Dmitry Vyukov, Kostya Serebryany, Andrey Konovalov

nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
attribute being present, but doesn't check whether it is actually
provided by the user. Same goes for nfc_genl_fw_download() and
NFC_ATTR_FIRMWARE_NAME.

This patch adds appropriate checks.

Found with syzkaller.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 net/nfc/netlink.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
index f018eafc2a0d..58adfb0c90f6 100644
--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -936,7 +936,8 @@ static int nfc_genl_deactivate_target(struct sk_buff *skb,
 	u32 device_idx, target_idx;
 	int rc;
 
-	if (!info->attrs[NFC_ATTR_DEVICE_INDEX])
+	if (!info->attrs[NFC_ATTR_DEVICE_INDEX] ||
+	    !info->attrs[NFC_ATTR_TARGET_INDEX])
 		return -EINVAL;
 
 	device_idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]);
@@ -1245,7 +1246,8 @@ static int nfc_genl_fw_download(struct sk_buff *skb, struct genl_info *info)
 	u32 idx;
 	char firmware_name[NFC_FIRMWARE_NAME_MAXSIZE + 1];
 
-	if (!info->attrs[NFC_ATTR_DEVICE_INDEX])
+	if (!info->attrs[NFC_ATTR_DEVICE_INDEX] ||
+	    !info->attrs[NFC_ATTR_FIRMWARE_NAME])
 		return -EINVAL;
 
 	idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]);
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek
From: Bob Peterson @ 2018-04-12 17:00 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: cluster-devel, Herbert Xu, netdev, linux-kernel, NeilBrown,
	Thomas Graf, Tom Herbert
In-Reply-To: <20180329120612.6104-1-agruenba@redhat.com>

----- Original Message -----
> Here's a second version of the patch (now a patch set) to eliminate
> rhashtable_walk_peek in gfs2.
> 
> The first patch introduces lockref_put_not_zero, the inverse of
> lockref_get_not_zero.
> 
> The second patch eliminates rhashtable_walk_peek in gfs2.  In
> gfs2_glock_iter_next, the new lockref function from patch one is used to
> drop a lockref count as long as the count doesn't drop to zero.  This is
> almost always the case; if there is a risk of dropping the last
> reference, we must defer that to a work queue because dropping the last
> reference may sleep.
> 
> Thanks,
> Andreas
> 
> Andreas Gruenbacher (2):
>   lockref: Add lockref_put_not_zero
>   gfs2: Stop using rhashtable_walk_peek
> 
>  fs/gfs2/glock.c         | 47 ++++++++++++++++++++++++++++-------------------
>  include/linux/lockref.h |  1 +
>  lib/lockref.c           | 28 ++++++++++++++++++++++++++++
>  3 files changed, 57 insertions(+), 19 deletions(-)
> 
> --
> 2.14.3

Hi,

Thanks. These two patches are now pushed to the for-next branch of the linux-gfs2 tree:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?h=for-next&id=450b1f6f56350c630e795f240dc5a77aa8aa2419
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?h=for-next&id=3fd5d3ad35dc44aaf0f28d60cc0eb75887bff54d

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply

* Re: SRIOV switchdev mode BoF minutes
From: Samudrala, Sridhar @ 2018-04-12 17:05 UTC (permalink / raw)
  To: Or Gerlitz, David Miller
  Cc: Anjali Singhai Jain, Andy Gospodarek, Michael Chan, Simon Horman,
	Jakub Kicinski, John Fastabend, Saeed Mahameed, Jiri Pirko,
	Rony Efraim, Linux Netdev List
In-Reply-To: <CAJ3xEMgnWxi0Uo5j3bDg4eW2vjeeriT7kuq4fzZmLT0bNrMCkQ@mail.gmail.com>

On 11/12/2017 11:49 AM, Or Gerlitz wrote:
> Hi Dave and all,
>
> During and after the BoF on SRIOV switchdev mode, we came into a
> consensus among the developers from four different HW vendors (CC
> audience) that a correct thing to do would be to disallow any new
> extensions to the legacy mode.
>
> The idea is to put focus on the new mode and not add new UAPIs and
> kernel code which was turned to be a wrong design which does not allow
> for properly offloading a kernel switching SW model to e-switch HW.
>
> We also had a good session the day after regarding alignment for the
> representation model of the uplink (physical port) and PF/s.
>
> The VF representor netdevs  exist for all drivers that support the new
> mode but the representation for the uplink and PF wasn't the same for
> all. The decision was to represent the uplink and PFs vports in the
> same manner done for VFs, using rep netdevs. This alignment would
> provide a more strict and clear view of the kernel model for e-switch
> to users and upper layer control plane SW.
>
I don't see any changes in the Mellanox/other drivers to move to this new model to enable
the uplink and PF port representors, any updates?

It would be really nice to highlight the pros and cons of the old versus the
new model.

We are looking into adding switchdev support for our new 100Gb ice driver and could
use some feedback on the direction we should be taking.

Thanks
Sridhar

^ permalink raw reply

* [RFC net-next] ipv6: send netlink notifications for manually configured addresses
From: Lorenzo Bianconi @ 2018-04-12 17:08 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1523551743.git.lorenzo.bianconi@redhat.com>

Send a netlink notification when userspace adds a manually configured
address if DAD is enabled and optimistic flag isn't set.
Moreover send RTM_DELADDR notifications for tentative addresses.

Some userspace applications (e.g. NetworkManager) are interested in
addr netlink events albeit the address is still in tentative state,
however events are not sent if DAD process is not completed.
If the address is added and immediately removed userspace listeners
are not notified. This behaviour can be easily reproduced by using
veth interfaces:

$ ip -b - <<EOF
> link add dev vm1 type veth peer name vm2
> link set dev vm1 up
> link set dev vm2 up
> addr add 2001:db8:a:b:1:2:3:4/64 dev vm1
> addr del 2001:db8:a:b:1:2:3:4/64 dev vm1
EOF

This patch reverts the behaviour introduced by the commit f784ad3d79e5
("ipv6: do not send RTM_DELADDR for tentative addresses")

Suggested-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
 net/ipv6/addrconf.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 78cef00c9596..dffa38004c13 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2902,6 +2902,11 @@ static int inet6_addr_add(struct net *net, int ifindex,
 					      expires, flags);
 		}
 
+		/* Send a netlink notification if DAD is enabled and
+		 * optimistic flag is not set
+		 */
+		if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD)))
+			ipv6_ifa_notify(0, ifp);
 		/*
 		 * Note that section 3.1 of RFC 4429 indicates
 		 * that the Optimistic flag should not be set for
@@ -5029,14 +5034,6 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
 	struct net *net = dev_net(ifa->idev->dev);
 	int err = -ENOBUFS;
 
-	/* Don't send DELADDR notification for TENTATIVE address,
-	 * since NEWADDR notification is sent only after removing
-	 * TENTATIVE flag, if DAD has not failed.
-	 */
-	if (ifa->flags & IFA_F_TENTATIVE && !(ifa->flags & IFA_F_DADFAILED) &&
-	    event == RTM_DELADDR)
-		return;
-
 	skb = nlmsg_new(inet6_ifaddr_msgsize(), GFP_ATOMIC);
 	if (!skb)
 		goto errout;
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH net] net: dsa: mv88e6xxx: Fix receive time stamp race condition.
From: Richard Cochran @ 2018-04-12 17:35 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Andrew Lunn, David Miller, Florian Fainelli,
	Vivien Didelot
In-Reply-To: <20180409141931.kmltk4hridkj5fsk@localhost>

On Mon, Apr 09, 2018 at 07:19:31AM -0700, Richard Cochran wrote:
> Dave, please hold off on this patch.  I am seeing new problems in my
> testing with this applied.  I still need to get to the bottom of
> this.

Looks like the new problems are a HW/board glitch.

The patch is good to go.
 
Thanks,
Richard

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox