Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [bpf-next PATCH 0/3] bpf: improvements to xdp_fwd sample
From: Jesper Dangaard Brouer @ 2019-08-08  9:29 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Xdp, Anton Protopopov, dsahern, Toke Høiland-Jørgensen,
	brouer, netdev@vger.kernel.org
In-Reply-To: <CAC1LvL29KS9CKcXYwR4EHeNo7++i4hYQuXfY5OLtbPFDVUO2mw@mail.gmail.com>

On Wed, 7 Aug 2019 15:09:09 -0700
Zvi Effron <zeffron@riotgames.com> wrote:

> On Wed, Aug 7, 2019 at 6:00 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > Toke's devmap lookup improvement is first avail in kernel v5.3.
> > Thus, not part of XDP-tutorial yet.
> >  
> I probably missed this in an earlier email, but what are Toke's devmap
> improvements? Performance? Capability?

Toke's devmap and redirect improvements are primarily about usability.

Currently, from BPF-context (kernel-side) you cannot read the contents
of devmap (or cpumap or xskmap(AF_XDP)).  Because for devmap you get
the real pointer to the net_device ifindex, and we cannot allow you to
write/change that from BPF (kernel would likely crash or be inconsistent).

The work-around, is to keep a shadow map, that contains the "config" of
the devmap, which you check/validate against instead.  It is just a pain
to maintain this shadow map.  Toke's change allow you to read devmap
from BPF-context.  Thus, you can avoid this shadow map.

Another improvement from Toke, is that the bpf_redirect_map() helper,
now also check if the redirect index is valid in the map.  If not, then
it returns another value than XDP_REDIRECT.  You can choose the
alternative return value yourself, via "flags" e.g. XDP_PASS.  Thus,
you don't even need to check/validate devmap in your BPF-code, as it is
part of the bpf_redirect_map() call now.

 action = bpf_redirect_map(&map, &index, flags_as_xdp_value) 

The default flags used in most programs today is 0, which maps to
XDP_ABORTED.  This is sort of a small UAPI change, but for the better.
As today, the packet is dropped later, only diagnose/seen via
tracepoint xdp:xdp_redirect_map_err.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] liquidio: Use pcie_flr() instead of reimplementing it
From: Andrew Murray @ 2019-08-08  9:25 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Bjorn Helgaas, Derek Chickles, Satanand Burla, Felix Manlunas,
	netdev, linux-pci, linux-kernel
In-Reply-To: <20190808045753.5474-1-efremov@linux.com>

On Thu, Aug 08, 2019 at 07:57:53AM +0300, Denis Efremov wrote:
> octeon_mbox_process_cmd() directly writes the PCI_EXP_DEVCTL_BCR_FLR
> bit, which bypasses timing requirements imposed by the PCIe spec.
> This patch fixes the function to use the pcie_flr() interface instead.
> 
> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> index 021d99cd1665..614d07be7181 100644
> --- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> @@ -260,9 +260,7 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
>  		dev_info(&oct->pci_dev->dev,
>  			 "got a request for FLR from VF that owns DPI ring %u\n",
>  			 mbox->q_no);
> -		pcie_capability_set_word(
> -			oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no],
> -			PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
> +		pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no]);
>  		break;

It's possible for pcie_flr to fail if the device doesn't become ready soon
enough after reset in which case it returns ENOTTY. I think it's OK not to
test the return value here though, as pci_dev_wait will print a warning
anyway, and I'm not sure what you'd do with it anyway.

Reviewed-by: Andrew Murray <andrew.murray@arm.com>

>  
>  	case OCTEON_PF_CHANGED_VF_MACADDR:
> -- 
> 2.21.0
> 

^ permalink raw reply

* Re: Clause 73 and USXGMII
From: Russell King - ARM Linux admin @ 2019-08-08  9:23 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev@vger.kernel.org, Andrew Lunn, Florian Fainelli,
	Heiner Kallweit
In-Reply-To: <BN8PR12MB32665E5465A83D5E11C7B5D6D3D70@BN8PR12MB3266.namprd12.prod.outlook.com>

On Thu, Aug 08, 2019 at 09:02:57AM +0000, Jose Abreu wrote:
> From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> Date: Aug/08/2019, 09:26:26 (UTC+00:00)
> 
> > Hi,
> > 
> > Have you tried enabling debug mode in phylink (add #define DEBUG at the
> > top of the file) ?
> 
> Yes:
> 
> [ With > 2.5G modes removed ]
> # dmesg | grep -i phy
> libphy: stmmac: probed
> stmmaceth 0000:04:00.0 enp4s0: PHY [stmmac-1:00] driver [Synopsys 10G]
> stmmaceth 0000:04:00.0 enp4s0: phy: setting supported 
> 00,00000000,0002e040 advertising 00,00000000,0002e040
> stmmaceth 0000:04:00.0 enp4s0: configuring for phy/usxgmii link mode
> stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
> mode=phy/usxgmii/Unknown/Unknown adv=00,00000000,0002e040 pause=10 
> link=0 an=1
> stmmaceth 0000:04:00.0 enp4s0: phy link down usxgmii/Unknown/Unknown

This shows that the PHY isn't reporting that the link came up.  Did
the PHY negotiate link?  If so, why isn't it reporting that the link
came up?  Maybe something is mis-programming the capability bits in
the PHY?  Maybe disabling the 10G speeds disables everything faster
than 1G?

> [ Without any limit ]
> # dmesg | grep -i phy
> libphy: stmmac: probed
> stmmaceth 0000:04:00.0 enp4s0: PHY [stmmac-1:00] driver [Synopsys 10G]
> stmmaceth 0000:04:00.0 enp4s0: phy: setting supported 
> 00,00000000,000ee040 advertising 00,00000000,000ee040
> stmmaceth 0000:04:00.0 enp4s0: configuring for phy/usxgmii link mode
> stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
> mode=phy/usxgmii/Unknown/Unknown adv=00,00000000,000ee040 pause=10 
> link=0 an=1
> stmmaceth 0000:04:00.0 enp4s0: phy link down usxgmii/Unknown/Unknown
> stmmaceth 0000:04:00.0 enp4s0: phy link up usxgmii/2.5Gbps/Full
> stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
> mode=phy/usxgmii/2.5Gbps/Full adv=00,00000000,00000000 pause=0f link=1 
> an=0
> 
> I'm thinking on whether this can be related with USXGMII. As link is 
> operating in 10G but I configure USXGMII for 2.5G maybe autoneg outcome 
> should always be 10G ?

As I understand USXGMII (which isn't very well, because the spec isn't
available) I believe that it operates in a similar way to SGMII where
data is replicated the appropriate number of times to achieve the link
speed.  So, the USXGMII link always operates at a bit rate equivalent
to 10G, but data is replicated twice for 5G, four times for 2.5G, ten
times for 1G, etc.

I notice that you don't say that you support any copper speeds, which
brings up the question about what the PHY's media is...

> > On Thu, Aug 08, 2019 at 08:17:29AM +0000, Jose Abreu wrote:
> > > ++ PHY Experts
> > > 
> > > From: Jose Abreu <joabreu@synopsys.com>
> > > Date: Aug/07/2019, 16:46:23 (UTC+00:00)
> > > 
> > > > Hello,
> > > > 
> > > > I've some sample code for Clause 73 support using Synopsys based XPCS 
> > > > but I would like to clarify some things that I noticed.
> > > > 
> > > > I'm using USXGMII as interface and a single SERDES that operates at 10G 
> > > > rate but MAC side is working at 2.5G. Maximum available bandwidth is 
> > > > therefore 2.5Gbps.
> > > > 
> > > > So, I configure USXGMII for 2.5G mode and it works but if I try to limit 
> > > > the autoneg abilities to 2.5G max then it never finishes:
> > > > # ethtool enp4s0
> > > > Settings for enp4s0:
> > > > 	Supported ports: [ ]
> > > > 	Supported link modes:   1000baseKX/Full 
> > > > 	                        2500baseX/Full 
> > > > 	Supported pause frame use: Symmetric Receive-only
> > > > 	Supports auto-negotiation: Yes
> > > > 	Supported FEC modes: Not reported
> > > > 	Advertised link modes:  1000baseKX/Full 
> > > > 	                        2500baseX/Full 
> > > > 	Advertised pause frame use: Symmetric Receive-only
> > > > 	Advertised auto-negotiation: Yes
> > > > 	Advertised FEC modes: Not reported
> > > > 	Speed: Unknown!
> > > > 	Duplex: Unknown! (255)
> > > > 	Port: MII
> > > > 	PHYAD: 0
> > > > 	Transceiver: internal
> > > > 	Auto-negotiation: on
> > > > 	Supports Wake-on: ug
> > > > 	Wake-on: d
> > > > 	Current message level: 0x0000003f (63)
> > > > 			       drv probe link timer ifdown ifup
> > > > 	Link detected: no
> > > > 
> > > > When I do not limit autoneg and I say that maximum limit is 10G then I 
> > > > get Link Up and autoneg finishes with this outcome:
> > > > # ethtool enp4s0
> > > > Settings for enp4s0:
> > > > 	Supported ports: [ ]
> > > > 	Supported link modes:   1000baseKX/Full 
> > > > 	                        2500baseX/Full 
> > > > 	                        10000baseKX4/Full 
> > > > 	                        10000baseKR/Full 
> > > > 	Supported pause frame use: Symmetric Receive-only
> > > > 	Supports auto-negotiation: Yes
> > > > 	Supported FEC modes: Not reported
> > > > 	Advertised link modes:  1000baseKX/Full 
> > > > 	                        2500baseX/Full 
> > > > 	                        10000baseKX4/Full 
> > > > 	                        10000baseKR/Full 
> > > > 	Advertised pause frame use: Symmetric Receive-only
> > > > 	Advertised auto-negotiation: Yes
> > > > 	Advertised FEC modes: Not reported
> > > > 	Link partner advertised link modes:  1000baseKX/Full 
> > > > 	                                     2500baseX/Full 
> > > > 	                                     10000baseKX4/Full 
> > > > 	                                     10000baseKR/Full 
> > > > 	Link partner advertised pause frame use: Symmetric Receive-only
> > > > 	Link partner advertised auto-negotiation: Yes
> > > > 	Link partner advertised FEC modes: Not reported
> > > > 	Speed: 2500Mb/s
> > > > 	Duplex: Full
> > > > 	Port: MII <- Never mind this, it's a SW issue
> > > > 	PHYAD: 0
> > > > 	Transceiver: internal
> > > > 	Auto-negotiation: on
> > > > 	Supports Wake-on: ug
> > > > 	Wake-on: d
> > > > 	Current message level: 0x0000003f (63)
> > > > 			       drv probe link timer ifdown ifup
> > > > 	Link detected: yes
> > > > 
> > > > I was expecting that, as MAC side is limited to 2.5G, I should set in 
> > > > phylink the correct capabilities and then outcome of autoneg would only 
> > > > have up to 2.5G modes. Am I wrong ?
> > > > 
> > > > ---
> > > > Thanks,
> > > > Jose Miguel Abreu
> > > 
> > > 
> > > ---
> > > Thanks,
> > > Jose Miguel Abreu
> > > 
> > 
> > -- 
> > RMK's Patch system: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.armlinux.org.uk_developer_patches_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=1MdSlPrmzsMMCJbbLcDYTNuPq1njfusBRjcRz3UD4Dg&s=_30hwSYkGf9DfyCG48mnh7lXP8iiULXpfAP_6agUJno&e= 
> > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> > According to speedtest.net: 11.9Mbps down 500kbps up
> 
> 
> ---
> Thanks,
> Jose Miguel Abreu
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH v2 0/3] arm/arm64: Add support for function error injection
From: Leo Yan @ 2019-08-08  9:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Oleg Nesterov, Catalin Marinas,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Naveen N. Rao,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-arch, netdev,
	bpf, clang-built-linux, Masami Hiramatsu
In-Reply-To: <20190807160703.pe4jxak7hs7ptvde@willie-the-truck>

On Wed, Aug 07, 2019 at 05:07:03PM +0100, Will Deacon wrote:
> On Tue, Aug 06, 2019 at 06:00:12PM +0800, Leo Yan wrote:
> > This small patch set is to add support for function error injection;
> > this can be used to eanble more advanced debugging feature, e.g.
> > CONFIG_BPF_KPROBE_OVERRIDE.
> > 
> > The patch 01/03 is to consolidate the function definition which can be
> > suared cross architectures, patches 02,03/03 are used for enabling
> > function error injection on arm64 and arm architecture respectively.
> > 
> > I tested on arm64 platform Juno-r2 and one of my laptop with x86
> > architecture with below steps; I don't test for Arm architecture so
> > only pass compilation.
> 
> Thanks. I've queued the first two patches up here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/error-injection

Thank you, Will.

Leo.

^ permalink raw reply

* Re: [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled
From: Thomas Jarosch @ 2019-08-08  9:02 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Florian Westphal, Jakub Jankowski,
	Jozsef Kadlecsik, Pablo Neira Ayuso, netfilter-devel, coreteam,
	netdev
In-Reply-To: <20190802132302.13537-4-sashal@kernel.org>

Hello together,

You wrote on Fri, Aug 02, 2019 at 09:22:24AM -0400:
> From: Florian Westphal <fw@strlen.de>
> 
> [ Upstream commit 959b69ef57db00cb33e9c4777400ae7183ebddd3 ]
> 
> Jakub Jankowski reported following oddity:
> 
> After 3 way handshake completes, timeout of new connection is set to
> max_retrans (300s) instead of established (5 days).
> 
> shortened excerpt from pcap provided:
> 25.070622 IP (flags [DF], proto TCP (6), length 52)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
> 26.070462 IP (flags [DF], proto TCP (6), length 48)
> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
> 27.070449 IP (flags [DF], proto TCP (6), length 40)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
> 
> Turns out the last_win is of u16 type, but we store the scaled value:
> 512 << 8 (== 0x20000) becomes 0 window.
> 
> The Fixes tag is not correct, as the bug has existed forever, but
> without that change all that this causes might cause is to mistake a
> window update (to-nonzero-from-zero) for a retransmit.
> 
> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
> Reported-by: Jakub Jankowski <shasta@toxcorp.com>
> Tested-by: Jakub Jankowski <shasta@toxcorp.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>

Also:
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>

;)

We've hit the issue with the wrong conntrack timeout at two different sites,
long-lived connections to a SAP server over IPSec VPN were constantly dropping.

For us this was a regression after updating from kernel 3.14 to 4.19.
Yesterday I've applied the patch to kernel 4.19.57 and the problem is fixed.

The issue was extra hard to debug as we could just boot the new kernel
for twenty minutes in the evening on these productive systems.

The stable kernel patch from last Friday came right on time. I was just
about the replay the TCP connection with tcpreplay, so this saved
me from another week of debugging. Thanks everyone!

Best regards,
Thomas Jarosch

^ permalink raw reply

* RE: Clause 73 and USXGMII
From: Jose Abreu @ 2019-08-08  9:02 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: netdev@vger.kernel.org, Andrew Lunn, Florian Fainelli,
	Heiner Kallweit
In-Reply-To: <20190808082626.GB5193@shell.armlinux.org.uk>

From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: Aug/08/2019, 09:26:26 (UTC+00:00)

> Hi,
> 
> Have you tried enabling debug mode in phylink (add #define DEBUG at the
> top of the file) ?

Yes:

[ With > 2.5G modes removed ]
# dmesg | grep -i phy
libphy: stmmac: probed
stmmaceth 0000:04:00.0 enp4s0: PHY [stmmac-1:00] driver [Synopsys 10G]
stmmaceth 0000:04:00.0 enp4s0: phy: setting supported 
00,00000000,0002e040 advertising 00,00000000,0002e040
stmmaceth 0000:04:00.0 enp4s0: configuring for phy/usxgmii link mode
stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
mode=phy/usxgmii/Unknown/Unknown adv=00,00000000,0002e040 pause=10 
link=0 an=1
stmmaceth 0000:04:00.0 enp4s0: phy link down usxgmii/Unknown/Unknown

[ Without any limit ]
# dmesg | grep -i phy
libphy: stmmac: probed
stmmaceth 0000:04:00.0 enp4s0: PHY [stmmac-1:00] driver [Synopsys 10G]
stmmaceth 0000:04:00.0 enp4s0: phy: setting supported 
00,00000000,000ee040 advertising 00,00000000,000ee040
stmmaceth 0000:04:00.0 enp4s0: configuring for phy/usxgmii link mode
stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
mode=phy/usxgmii/Unknown/Unknown adv=00,00000000,000ee040 pause=10 
link=0 an=1
stmmaceth 0000:04:00.0 enp4s0: phy link down usxgmii/Unknown/Unknown
stmmaceth 0000:04:00.0 enp4s0: phy link up usxgmii/2.5Gbps/Full
stmmaceth 0000:04:00.0 enp4s0: phylink_mac_config: 
mode=phy/usxgmii/2.5Gbps/Full adv=00,00000000,00000000 pause=0f link=1 
an=0

I'm thinking on whether this can be related with USXGMII. As link is 
operating in 10G but I configure USXGMII for 2.5G maybe autoneg outcome 
should always be 10G ?

> On Thu, Aug 08, 2019 at 08:17:29AM +0000, Jose Abreu wrote:
> > ++ PHY Experts
> > 
> > From: Jose Abreu <joabreu@synopsys.com>
> > Date: Aug/07/2019, 16:46:23 (UTC+00:00)
> > 
> > > Hello,
> > > 
> > > I've some sample code for Clause 73 support using Synopsys based XPCS 
> > > but I would like to clarify some things that I noticed.
> > > 
> > > I'm using USXGMII as interface and a single SERDES that operates at 10G 
> > > rate but MAC side is working at 2.5G. Maximum available bandwidth is 
> > > therefore 2.5Gbps.
> > > 
> > > So, I configure USXGMII for 2.5G mode and it works but if I try to limit 
> > > the autoneg abilities to 2.5G max then it never finishes:
> > > # ethtool enp4s0
> > > Settings for enp4s0:
> > > 	Supported ports: [ ]
> > > 	Supported link modes:   1000baseKX/Full 
> > > 	                        2500baseX/Full 
> > > 	Supported pause frame use: Symmetric Receive-only
> > > 	Supports auto-negotiation: Yes
> > > 	Supported FEC modes: Not reported
> > > 	Advertised link modes:  1000baseKX/Full 
> > > 	                        2500baseX/Full 
> > > 	Advertised pause frame use: Symmetric Receive-only
> > > 	Advertised auto-negotiation: Yes
> > > 	Advertised FEC modes: Not reported
> > > 	Speed: Unknown!
> > > 	Duplex: Unknown! (255)
> > > 	Port: MII
> > > 	PHYAD: 0
> > > 	Transceiver: internal
> > > 	Auto-negotiation: on
> > > 	Supports Wake-on: ug
> > > 	Wake-on: d
> > > 	Current message level: 0x0000003f (63)
> > > 			       drv probe link timer ifdown ifup
> > > 	Link detected: no
> > > 
> > > When I do not limit autoneg and I say that maximum limit is 10G then I 
> > > get Link Up and autoneg finishes with this outcome:
> > > # ethtool enp4s0
> > > Settings for enp4s0:
> > > 	Supported ports: [ ]
> > > 	Supported link modes:   1000baseKX/Full 
> > > 	                        2500baseX/Full 
> > > 	                        10000baseKX4/Full 
> > > 	                        10000baseKR/Full 
> > > 	Supported pause frame use: Symmetric Receive-only
> > > 	Supports auto-negotiation: Yes
> > > 	Supported FEC modes: Not reported
> > > 	Advertised link modes:  1000baseKX/Full 
> > > 	                        2500baseX/Full 
> > > 	                        10000baseKX4/Full 
> > > 	                        10000baseKR/Full 
> > > 	Advertised pause frame use: Symmetric Receive-only
> > > 	Advertised auto-negotiation: Yes
> > > 	Advertised FEC modes: Not reported
> > > 	Link partner advertised link modes:  1000baseKX/Full 
> > > 	                                     2500baseX/Full 
> > > 	                                     10000baseKX4/Full 
> > > 	                                     10000baseKR/Full 
> > > 	Link partner advertised pause frame use: Symmetric Receive-only
> > > 	Link partner advertised auto-negotiation: Yes
> > > 	Link partner advertised FEC modes: Not reported
> > > 	Speed: 2500Mb/s
> > > 	Duplex: Full
> > > 	Port: MII <- Never mind this, it's a SW issue
> > > 	PHYAD: 0
> > > 	Transceiver: internal
> > > 	Auto-negotiation: on
> > > 	Supports Wake-on: ug
> > > 	Wake-on: d
> > > 	Current message level: 0x0000003f (63)
> > > 			       drv probe link timer ifdown ifup
> > > 	Link detected: yes
> > > 
> > > I was expecting that, as MAC side is limited to 2.5G, I should set in 
> > > phylink the correct capabilities and then outcome of autoneg would only 
> > > have up to 2.5G modes. Am I wrong ?
> > > 
> > > ---
> > > Thanks,
> > > Jose Miguel Abreu
> > 
> > 
> > ---
> > Thanks,
> > Jose Miguel Abreu
> > 
> 
> -- 
> RMK's Patch system: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.armlinux.org.uk_developer_patches_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=1MdSlPrmzsMMCJbbLcDYTNuPq1njfusBRjcRz3UD4Dg&s=_30hwSYkGf9DfyCG48mnh7lXP8iiULXpfAP_6agUJno&e= 
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up


---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* RE: [PATCH net-next 5/5] r8152: change rx_frag_head_sz and rx_max_agg_num dynamically
From: Hayes Wang @ 2019-08-08  8:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org
In-Reply-To: <20190806151007.75a8dd2c@cakuba.netronome.com>

Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> Sent: Wednesday, August 07, 2019 6:10 AM
[...]
> On Tue, 6 Aug 2019 19:18:04 +0800, Hayes Wang wrote:
> > Let rx_frag_head_sz and rx_max_agg_num could be modified dynamically
> > through the sysfs.
> >
> > Signed-off-by: Hayes Wang <hayeswang@realtek.com>
> 
> Please don't expose those via sysfs. Ethtool's copybreak and descriptor
> count should be applicable here, I think.

Excuse me again.
I find the kernel supports the copybreak of Ethtool.
However, I couldn't find a command of Ethtool to use it.
Do I miss something?

Best Regards,
Hayes


^ permalink raw reply

* [PATCH mlx5-next 3/4] IB/mlx5: Add legacy events to DEVX list
From: Leon Romanovsky @ 2019-08-08  8:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Edward Srouji, Saeed Mahameed,
	Yishai Hadas, linux-netdev
In-Reply-To: <20190808084358.29517-1-leon@kernel.org>

From: Yishai Hadas <yishaih@mellanox.com>

Add two events that were defined in the device specification but were
not exposed in the driver list.

Post this patch those events can be read over the DEVX events interface
once be reported by the firmware.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Edward Srouji <edwards@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c | 8 ++++++++
 include/linux/mlx5/device.h       | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index fd577ffd7864..3dbdfe0eb5e4 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -233,6 +233,8 @@ static bool is_legacy_obj_event_num(u16 event_num)
 	case MLX5_EVENT_TYPE_SRQ_CATAS_ERROR:
 	case MLX5_EVENT_TYPE_DCT_DRAINED:
 	case MLX5_EVENT_TYPE_COMP:
+	case MLX5_EVENT_TYPE_DCT_KEY_VIOLATION:
+	case MLX5_EVENT_TYPE_XRQ_ERROR:
 		return true;
 	default:
 		return false;
@@ -315,8 +317,10 @@ static u16 get_event_obj_type(unsigned long event_type, struct mlx5_eqe *eqe)
 	case MLX5_EVENT_TYPE_SRQ_CATAS_ERROR:
 		return eqe->data.qp_srq.type;
 	case MLX5_EVENT_TYPE_CQ_ERROR:
+	case MLX5_EVENT_TYPE_XRQ_ERROR:
 		return 0;
 	case MLX5_EVENT_TYPE_DCT_DRAINED:
+	case MLX5_EVENT_TYPE_DCT_KEY_VIOLATION:
 		return MLX5_EVENT_QUEUE_TYPE_DCT;
 	default:
 		return MLX5_GET(affiliated_event_header, &eqe->data, obj_type);
@@ -2300,7 +2304,11 @@ static u32 devx_get_obj_id_from_event(unsigned long event_type, void *data)
 	case MLX5_EVENT_TYPE_WQ_ACCESS_ERROR:
 		obj_id = be32_to_cpu(eqe->data.qp_srq.qp_srq_n) & 0xffffff;
 		break;
+	case MLX5_EVENT_TYPE_XRQ_ERROR:
+		obj_id = be32_to_cpu(eqe->data.xrq_err.type_xrqn) & 0xffffff;
+		break;
 	case MLX5_EVENT_TYPE_DCT_DRAINED:
+	case MLX5_EVENT_TYPE_DCT_KEY_VIOLATION:
 		obj_id = be32_to_cpu(eqe->data.dct.dctn) & 0xffffff;
 		break;
 	case MLX5_EVENT_TYPE_CQ_ERROR:
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index ce9839c8bc1a..e427af260ebe 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -328,6 +328,7 @@ enum mlx5_event {
 	MLX5_EVENT_TYPE_GPIO_EVENT	   = 0x15,
 	MLX5_EVENT_TYPE_PORT_MODULE_EVENT  = 0x16,
 	MLX5_EVENT_TYPE_TEMP_WARN_EVENT    = 0x17,
+	MLX5_EVENT_TYPE_XRQ_ERROR	   = 0x18,
 	MLX5_EVENT_TYPE_REMOTE_CONFIG	   = 0x19,
 	MLX5_EVENT_TYPE_GENERAL_EVENT	   = 0x22,
 	MLX5_EVENT_TYPE_MONITOR_COUNTER    = 0x24,
@@ -345,6 +346,7 @@ enum mlx5_event {
 	MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED = 0xe,
 
 	MLX5_EVENT_TYPE_DCT_DRAINED        = 0x1c,
+	MLX5_EVENT_TYPE_DCT_KEY_VIOLATION  = 0x1d,
 
 	MLX5_EVENT_TYPE_FPGA_ERROR         = 0x20,
 	MLX5_EVENT_TYPE_FPGA_QP_ERROR      = 0x21,
@@ -584,6 +586,12 @@ struct mlx5_eqe_cq_err {
 	u8	syndrome;
 };
 
+struct mlx5_eqe_xrq_err {
+	__be32	reserved1[5];
+	__be32	type_xrqn;
+	__be32	reserved2;
+};
+
 struct mlx5_eqe_port_state {
 	u8	reserved0[8];
 	u8	port;
@@ -698,6 +706,7 @@ union ev_data {
 	struct mlx5_eqe_pps		pps;
 	struct mlx5_eqe_dct             dct;
 	struct mlx5_eqe_temp_warning	temp_warning;
+	struct mlx5_eqe_xrq_err		xrq_err;
 } __packed;
 
 struct mlx5_eqe {
-- 
2.20.1


^ permalink raw reply related

* [PATCH rdma-next 4/4] IB/mlx5: Expose XRQ legacy commands over the DEVX interface
From: Leon Romanovsky @ 2019-08-08  8:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Edward Srouji, Saeed Mahameed,
	Yishai Hadas, linux-netdev
In-Reply-To: <20190808084358.29517-1-leon@kernel.org>

From: Yishai Hadas <yishaih@mellanox.com>

Expose missing XRQ legacy commands over the DEVX interface.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 3dbdfe0eb5e4..04b4e937c198 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -546,6 +546,8 @@ static u64 devx_get_obj_id(const void *in)
 		break;
 	case MLX5_CMD_OP_ARM_XRQ:
 	case MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY:
+	case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
+	case MLX5_CMD_OP_MODIFY_XRQ:
 		obj_id = get_enc_obj_id(MLX5_CMD_OP_CREATE_XRQ,
 					MLX5_GET(arm_xrq_in, in, xrqn));
 		break;
@@ -822,6 +824,8 @@ static bool devx_is_obj_modify_cmd(const void *in)
 	case MLX5_CMD_OP_ARM_DCT_FOR_KEY_VIOLATION:
 	case MLX5_CMD_OP_ARM_XRQ:
 	case MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY:
+	case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
+	case MLX5_CMD_OP_MODIFY_XRQ:
 		return true;
 	case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY:
 	{
-- 
2.20.1


^ permalink raw reply related

* [PATCH mlx5-next 2/4] net/mlx5: Add XRQ legacy commands opcodes
From: Leon Romanovsky @ 2019-08-08  8:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Edward Srouji, Saeed Mahameed,
	Yishai Hadas, linux-netdev
In-Reply-To: <20190808084358.29517-1-leon@kernel.org>

From: Yishai Hadas <yishaih@mellanox.com>

Add XRQ legacy commands opcodes, will be used via the DEVX interface.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 ++++
 include/linux/mlx5/mlx5_ifc.h                 | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 8cdd7e66f8df..53d09620e215 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -446,6 +446,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_CREATE_UMEM:
 	case MLX5_CMD_OP_DESTROY_UMEM:
 	case MLX5_CMD_OP_ALLOC_MEMIC:
+	case MLX5_CMD_OP_MODIFY_XRQ:
+	case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
 		*status = MLX5_DRIVER_STATUS_ABORTED;
 		*synd = MLX5_DRIVER_SYND;
 		return -EIO;
@@ -637,6 +639,8 @@ const char *mlx5_command_str(int command)
 	MLX5_COMMAND_STR_CASE(DESTROY_UCTX);
 	MLX5_COMMAND_STR_CASE(CREATE_UMEM);
 	MLX5_COMMAND_STR_CASE(DESTROY_UMEM);
+	MLX5_COMMAND_STR_CASE(RELEASE_XRQ_ERROR);
+	MLX5_COMMAND_STR_CASE(MODIFY_XRQ);
 	default: return "unknown command opcode";
 	}
 }
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 82470b670cb7..080bba20b129 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -172,6 +172,8 @@ enum {
 	MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY     = 0x725,
 	MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY       = 0x726,
 	MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS        = 0x727,
+	MLX5_CMD_OP_RELEASE_XRQ_ERROR             = 0x729,
+	MLX5_CMD_OP_MODIFY_XRQ                    = 0x72a,
 	MLX5_CMD_OP_QUERY_ESW_FUNCTIONS           = 0x740,
 	MLX5_CMD_OP_QUERY_VPORT_STATE             = 0x750,
 	MLX5_CMD_OP_MODIFY_VPORT_STATE            = 0x751,
-- 
2.20.1


^ permalink raw reply related

* [PATCH mlx5-next 1/4] net/mlx5: Use debug message instead of warn
From: Leon Romanovsky @ 2019-08-08  8:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Edward Srouji, Saeed Mahameed,
	Yishai Hadas, linux-netdev
In-Reply-To: <20190808084358.29517-1-leon@kernel.org>

From: Yishai Hadas <yishaih@mellanox.com>

As QP may be created by DEVX, it may be valid to not find the rsn in
mlx5 core tree, change the level to be debug.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index b8ba74de9555..f0f3abe331da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -162,7 +162,7 @@ static int rsc_event_notifier(struct notifier_block *nb,
 
 	common = mlx5_get_rsc(table, rsn);
 	if (!common) {
-		mlx5_core_warn(dev, "Async event for bogus resource 0x%x\n", rsn);
+		mlx5_core_dbg(dev, "Async event for unknown resource 0x%x\n", rsn);
 		return NOTIFY_OK;
 	}
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH rdma-next 0/4] Add XRQ and SRQ support to DEVX interface
From: Leon Romanovsky @ 2019-08-08  8:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Edward Srouji, Saeed Mahameed,
	Yishai Hadas, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Hi,

This small series extends DEVX interface with SRQ and XRQ legacy commands.

Thanks

Yishai Hadas (4):
  net/mlx5: Use debug message instead of warn
  net/mlx5: Add XRQ legacy commands opcodes
  IB/mlx5: Add legacy events to DEVX list
  IB/mlx5: Expose XRQ legacy commands over the DEVX interface

 drivers/infiniband/hw/mlx5/devx.c             | 12 ++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  4 ++++
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  |  2 +-
 include/linux/mlx5/device.h                   |  9 +++++++++
 include/linux/mlx5/mlx5_ifc.h                 |  2 ++
 5 files changed, 28 insertions(+), 1 deletion(-)

--
2.20.1


^ permalink raw reply

* Re: net: micrel: confusion about phyids used in driver
From: Uwe Kleine-König @ 2019-08-08  8:36 UTC (permalink / raw)
  To: Yuiko.Oshino
  Cc: netdev, andrew, f.fainelli, kernel, hkallweit1, Ravi.Hegde,
	Tristram.Ha
In-Reply-To: <BL0PR11MB3251651EB9BC45DF4282D51D8EF80@BL0PR11MB3251.namprd11.prod.outlook.com>

Hello,

On Tue, Jul 02, 2019 at 08:55:07PM +0000, Yuiko.Oshino@microchip.com wrote:
> >On Fri, May 10, 2019 at 09:22:43AM +0200, Uwe Kleine-König wrote:
> >> On Thu, May 09, 2019 at 11:07:45PM +0200, Andrew Lunn wrote:
> >> > On Thu, May 09, 2019 at 10:55:29PM +0200, Heiner Kallweit wrote:
> >> > > On 09.05.2019 22:29, Uwe Kleine-König wrote:
> >> > > > I have a board here that has a KSZ8051MLL (datasheet:
> >> > > > http://ww1.microchip.com/downloads/en/DeviceDoc/ksz8051mll.pdf, phyid:
> >> > > > 0x0022155x) assembled. The actual phyid is 0x00221556.
> >> > >
> >> > > I think the datasheets are the source of the confusion. If the
> >> > > datasheets for different chips list 0x0022155x as PHYID each, and
> >> > > authors of support for additional chips don't check the existing
> >> > > code, then happens what happened.
> >> > >
> >> > > However it's not a rare exception and not Microchip-specific that
> >> > > sometimes vendors use the same PHYID for different chips.
> >>
> >> From the vendor's POV it is even sensible to reuse the phy IDs iff the
> >> chips are "compatible".
> >>
> >> Assuming that the last nibble of the phy ID actually helps to
> >> distinguish the different (not completely) compatible chips, we need
> >> some more detailed information than available in the data sheets I have.
> >> There is one person in the recipents of this mail with an
> >> @microchip.com address (hint, hint!).
> >
> >can you give some input here or forward to a person who can?
>
> I forward this to the team.

This thread still sits in my inbox waiting for some feedback. Did
something happen on your side?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* Re: [PATCH v2 1/1] net: bridge: use mac_len in bridge forwarding
From: Nikolay Aleksandrov @ 2019-08-08  8:34 UTC (permalink / raw)
  To: Zahari Doychev
  Cc: netdev, bridge, roopa, jhs, dsahern, simon.horman,
	makita.toshiaki, xiyou.wangcong, jiri, ast, johannes,
	alexei.starovoitov
In-Reply-To: <20190808080428.o2eqqfdscl274sr5@tycho>

On 08/08/2019 11:04, Zahari Doychev wrote:
> On Wed, Aug 07, 2019 at 12:17:43PM +0300, Nikolay Aleksandrov wrote:
>> Hi Zahari,
>> On 05/08/2019 18:37, Zahari Doychev wrote:
>>> The bridge code cannot forward packets from various paths that set up the
>>> SKBs in different ways. Some of these packets get corrupted during the
>>> forwarding as not always is just ETH_HLEN pulled at the front. This happens
>>> e.g. when VLAN tags are pushed bu using tc act_vlan on ingress.
>> Overall the patch looks good, I think it shouldn't introduce any regressions
>> at least from the codepaths I was able to inspect, but please include more
>> details in here from the cover letter, in fact you don't need it just add all of
>> the details here so we have them, especially the test setup. Also please provide
>> some details how this patch was tested. It'd be great if you could provide a
>> selftest for it so we can make sure it's considered when doing future changes.
> 
> Hi Nik,
> 
> Thanks for the reply. I will do the suggested corrections and try creating a
> selftest. I assume it should go to the net/forwarding together with the other
> bridge tests as a separate patch.
> 
> Regards,
> Zahari
> 

Hi,
Yes, the selftest should target net-next and go to net/forwarding.

Thanks,
 Nik

>>
>> Thank you,
>>  Nik
>>
>>>
>>> The problem is fixed by using skb->mac_len instead of ETH_HLEN, which makes
>>> sure that the skb headers are correctly restored. This usually does not
>>> change anything, execpt the local bridge transmits which now need to set
>>> the skb->mac_len correctly in br_dev_xmit, as well as the broken case noted
>>> above.
>>>
>>> Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
>>> ---
>>>  net/bridge/br_device.c  | 3 ++-
>>>  net/bridge/br_forward.c | 4 ++--
>>>  net/bridge/br_vlan.c    | 3 ++-
>>>  3 files changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
>>> index 681b72862c16..aeb77ff60311 100644
>>> --- a/net/bridge/br_device.c
>>> +++ b/net/bridge/br_device.c
>>> @@ -55,8 +55,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>>>  	BR_INPUT_SKB_CB(skb)->frag_max_size = 0;
>>>  
>>>  	skb_reset_mac_header(skb);
>>> +	skb_reset_mac_len(skb);
>>>  	eth = eth_hdr(skb);
>>> -	skb_pull(skb, ETH_HLEN);
>>> +	skb_pull(skb, skb->mac_len);
>>>  
>>>  	if (!br_allowed_ingress(br, br_vlan_group_rcu(br), skb, &vid))
>>>  		goto out;
>>> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
>>> index 86637000f275..edb4f3533f05 100644
>>> --- a/net/bridge/br_forward.c
>>> +++ b/net/bridge/br_forward.c
>>> @@ -32,7 +32,7 @@ static inline int should_deliver(const struct net_bridge_port *p,
>>>  
>>>  int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
>>>  {
>>> -	skb_push(skb, ETH_HLEN);
>>> +	skb_push(skb, skb->mac_len);
>>>  	if (!is_skb_forwardable(skb->dev, skb))
>>>  		goto drop;
>>>  
>>> @@ -94,7 +94,7 @@ static void __br_forward(const struct net_bridge_port *to,
>>>  		net = dev_net(indev);
>>>  	} else {
>>>  		if (unlikely(netpoll_tx_running(to->br->dev))) {
>>> -			skb_push(skb, ETH_HLEN);
>>> +			skb_push(skb, skb->mac_len);
>>>  			if (!is_skb_forwardable(skb->dev, skb))
>>>  				kfree_skb(skb);
>>>  			else
>>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>>> index 021cc9f66804..88244c9cc653 100644
>>> --- a/net/bridge/br_vlan.c
>>> +++ b/net/bridge/br_vlan.c
>>> @@ -466,13 +466,14 @@ static bool __allowed_ingress(const struct net_bridge *br,
>>>  		/* Tagged frame */
>>>  		if (skb->vlan_proto != br->vlan_proto) {
>>>  			/* Protocol-mismatch, empty out vlan_tci for new tag */
>>> -			skb_push(skb, ETH_HLEN);
>>> +			skb_push(skb, skb->mac_len);
>>>  			skb = vlan_insert_tag_set_proto(skb, skb->vlan_proto,
>>>  							skb_vlan_tag_get(skb));
>>>  			if (unlikely(!skb))
>>>  				return false;
>>>  
>>>  			skb_pull(skb, ETH_HLEN);
>>> +			skb_reset_network_header(skb);
>>>  			skb_reset_mac_len(skb);
>>>  			*vid = 0;
>>>  			tagged = false;
>>>
>>


^ permalink raw reply

* Re: Clause 73 and USXGMII
From: Russell King - ARM Linux admin @ 2019-08-08  8:26 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev@vger.kernel.org, Andrew Lunn, Florian Fainelli,
	Heiner Kallweit
In-Reply-To: <BN8PR12MB3266A710111427071814D371D3D70@BN8PR12MB3266.namprd12.prod.outlook.com>

Hi,

Have you tried enabling debug mode in phylink (add #define DEBUG at the
top of the file) ?

Thanks.

On Thu, Aug 08, 2019 at 08:17:29AM +0000, Jose Abreu wrote:
> ++ PHY Experts
> 
> From: Jose Abreu <joabreu@synopsys.com>
> Date: Aug/07/2019, 16:46:23 (UTC+00:00)
> 
> > Hello,
> > 
> > I've some sample code for Clause 73 support using Synopsys based XPCS 
> > but I would like to clarify some things that I noticed.
> > 
> > I'm using USXGMII as interface and a single SERDES that operates at 10G 
> > rate but MAC side is working at 2.5G. Maximum available bandwidth is 
> > therefore 2.5Gbps.
> > 
> > So, I configure USXGMII for 2.5G mode and it works but if I try to limit 
> > the autoneg abilities to 2.5G max then it never finishes:
> > # ethtool enp4s0
> > Settings for enp4s0:
> > 	Supported ports: [ ]
> > 	Supported link modes:   1000baseKX/Full 
> > 	                        2500baseX/Full 
> > 	Supported pause frame use: Symmetric Receive-only
> > 	Supports auto-negotiation: Yes
> > 	Supported FEC modes: Not reported
> > 	Advertised link modes:  1000baseKX/Full 
> > 	                        2500baseX/Full 
> > 	Advertised pause frame use: Symmetric Receive-only
> > 	Advertised auto-negotiation: Yes
> > 	Advertised FEC modes: Not reported
> > 	Speed: Unknown!
> > 	Duplex: Unknown! (255)
> > 	Port: MII
> > 	PHYAD: 0
> > 	Transceiver: internal
> > 	Auto-negotiation: on
> > 	Supports Wake-on: ug
> > 	Wake-on: d
> > 	Current message level: 0x0000003f (63)
> > 			       drv probe link timer ifdown ifup
> > 	Link detected: no
> > 
> > When I do not limit autoneg and I say that maximum limit is 10G then I 
> > get Link Up and autoneg finishes with this outcome:
> > # ethtool enp4s0
> > Settings for enp4s0:
> > 	Supported ports: [ ]
> > 	Supported link modes:   1000baseKX/Full 
> > 	                        2500baseX/Full 
> > 	                        10000baseKX4/Full 
> > 	                        10000baseKR/Full 
> > 	Supported pause frame use: Symmetric Receive-only
> > 	Supports auto-negotiation: Yes
> > 	Supported FEC modes: Not reported
> > 	Advertised link modes:  1000baseKX/Full 
> > 	                        2500baseX/Full 
> > 	                        10000baseKX4/Full 
> > 	                        10000baseKR/Full 
> > 	Advertised pause frame use: Symmetric Receive-only
> > 	Advertised auto-negotiation: Yes
> > 	Advertised FEC modes: Not reported
> > 	Link partner advertised link modes:  1000baseKX/Full 
> > 	                                     2500baseX/Full 
> > 	                                     10000baseKX4/Full 
> > 	                                     10000baseKR/Full 
> > 	Link partner advertised pause frame use: Symmetric Receive-only
> > 	Link partner advertised auto-negotiation: Yes
> > 	Link partner advertised FEC modes: Not reported
> > 	Speed: 2500Mb/s
> > 	Duplex: Full
> > 	Port: MII <- Never mind this, it's a SW issue
> > 	PHYAD: 0
> > 	Transceiver: internal
> > 	Auto-negotiation: on
> > 	Supports Wake-on: ug
> > 	Wake-on: d
> > 	Current message level: 0x0000003f (63)
> > 			       drv probe link timer ifdown ifup
> > 	Link detected: yes
> > 
> > I was expecting that, as MAC side is limited to 2.5G, I should set in 
> > phylink the correct capabilities and then outcome of autoneg would only 
> > have up to 2.5G modes. Am I wrong ?
> > 
> > ---
> > Thanks,
> > Jose Miguel Abreu
> 
> 
> ---
> Thanks,
> Jose Miguel Abreu
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* RE: Clause 73 and USXGMII
From: Jose Abreu @ 2019-08-08  8:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, Russell King
In-Reply-To: <BN8PR12MB32662070AAC1C34BA47C0763D3D40@BN8PR12MB3266.namprd12.prod.outlook.com>

++ PHY Experts

From: Jose Abreu <joabreu@synopsys.com>
Date: Aug/07/2019, 16:46:23 (UTC+00:00)

> Hello,
> 
> I've some sample code for Clause 73 support using Synopsys based XPCS 
> but I would like to clarify some things that I noticed.
> 
> I'm using USXGMII as interface and a single SERDES that operates at 10G 
> rate but MAC side is working at 2.5G. Maximum available bandwidth is 
> therefore 2.5Gbps.
> 
> So, I configure USXGMII for 2.5G mode and it works but if I try to limit 
> the autoneg abilities to 2.5G max then it never finishes:
> # ethtool enp4s0
> Settings for enp4s0:
> 	Supported ports: [ ]
> 	Supported link modes:   1000baseKX/Full 
> 	                        2500baseX/Full 
> 	Supported pause frame use: Symmetric Receive-only
> 	Supports auto-negotiation: Yes
> 	Supported FEC modes: Not reported
> 	Advertised link modes:  1000baseKX/Full 
> 	                        2500baseX/Full 
> 	Advertised pause frame use: Symmetric Receive-only
> 	Advertised auto-negotiation: Yes
> 	Advertised FEC modes: Not reported
> 	Speed: Unknown!
> 	Duplex: Unknown! (255)
> 	Port: MII
> 	PHYAD: 0
> 	Transceiver: internal
> 	Auto-negotiation: on
> 	Supports Wake-on: ug
> 	Wake-on: d
> 	Current message level: 0x0000003f (63)
> 			       drv probe link timer ifdown ifup
> 	Link detected: no
> 
> When I do not limit autoneg and I say that maximum limit is 10G then I 
> get Link Up and autoneg finishes with this outcome:
> # ethtool enp4s0
> Settings for enp4s0:
> 	Supported ports: [ ]
> 	Supported link modes:   1000baseKX/Full 
> 	                        2500baseX/Full 
> 	                        10000baseKX4/Full 
> 	                        10000baseKR/Full 
> 	Supported pause frame use: Symmetric Receive-only
> 	Supports auto-negotiation: Yes
> 	Supported FEC modes: Not reported
> 	Advertised link modes:  1000baseKX/Full 
> 	                        2500baseX/Full 
> 	                        10000baseKX4/Full 
> 	                        10000baseKR/Full 
> 	Advertised pause frame use: Symmetric Receive-only
> 	Advertised auto-negotiation: Yes
> 	Advertised FEC modes: Not reported
> 	Link partner advertised link modes:  1000baseKX/Full 
> 	                                     2500baseX/Full 
> 	                                     10000baseKX4/Full 
> 	                                     10000baseKR/Full 
> 	Link partner advertised pause frame use: Symmetric Receive-only
> 	Link partner advertised auto-negotiation: Yes
> 	Link partner advertised FEC modes: Not reported
> 	Speed: 2500Mb/s
> 	Duplex: Full
> 	Port: MII <- Never mind this, it's a SW issue
> 	PHYAD: 0
> 	Transceiver: internal
> 	Auto-negotiation: on
> 	Supports Wake-on: ug
> 	Wake-on: d
> 	Current message level: 0x0000003f (63)
> 			       drv probe link timer ifdown ifup
> 	Link detected: yes
> 
> I was expecting that, as MAC side is limited to 2.5G, I should set in 
> phylink the correct capabilities and then outcome of autoneg would only 
> have up to 2.5G modes. Am I wrong ?
> 
> ---
> Thanks,
> Jose Miguel Abreu


---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH v2] Fix non-kerneldoc comment in realtek/rtlwifi/usb.c
From: Larry Finger @ 2019-08-08  8:14 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Valdis Klētnieks, Ping-Ke Shih, David S. Miller,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <877e7odte2.fsf@kamboji.qca.qualcomm.com>

On 8/8/19 2:10 AM, Kalle Valo wrote:
> Larry Finger <Larry.Finger@lwfinger.net> writes:
> 
>> On 8/7/19 8:51 PM, Valdis Klētnieks wrote:
>>> Fix spurious warning message when building with W=1:
>>>
>>>     CC [M]  drivers/net/wireless/realtek/rtlwifi/usb.o
>>> drivers/net/wireless/realtek/rtlwifi/usb.c:243: warning: Cannot understand  * on line 243 - I thought it was a doc line
>>> drivers/net/wireless/realtek/rtlwifi/usb.c:760: warning: Cannot understand  * on line 760 - I thought it was a doc line
>>> drivers/net/wireless/realtek/rtlwifi/usb.c:790: warning: Cannot understand  * on line 790 - I thought it was a doc line
>>>
>>> Clean up the comment format.
>>>
>>> Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
>>>
>>> ---
>>> Changes since v1:  Larry Finger pointed out the patch wasn't checkpatch-clean.
>>>
>>> diff --git a/drivers/net/wireless/realtek/rtlwifi/usb.c b/drivers/net/wireless/realtek/rtlwifi/usb.c
>>> index 34d68dbf4b4c..4b59f3b46b28 100644
>>> --- a/drivers/net/wireless/realtek/rtlwifi/usb.c
>>> +++ b/drivers/net/wireless/realtek/rtlwifi/usb.c
>>> @@ -239,10 +239,7 @@ static void _rtl_usb_io_handler_release(struct ieee80211_hw *hw)
>>>    	mutex_destroy(&rtlpriv->io.bb_mutex);
>>>    }
>>>    -/**
>>> - *
>>> - *	Default aggregation handler. Do nothing and just return the oldest skb.
>>> - */
>>> +/*	Default aggregation handler. Do nothing and just return the oldest skb.  */
>>>    static struct sk_buff *_none_usb_tx_aggregate_hdl(struct ieee80211_hw *hw,
>>>    						  struct sk_buff_head *list)
>>>    {
>>> @@ -756,11 +753,6 @@ static int rtl_usb_start(struct ieee80211_hw *hw)
>>>    	return err;
>>>    }
>>>    -/**
>>> - *
>>> - *
>>> - */
>>> -
>>>    /*=======================  tx =========================================*/
>>>    static void rtl_usb_cleanup(struct ieee80211_hw *hw)
>>>    {
>>> @@ -786,11 +778,7 @@ static void rtl_usb_cleanup(struct ieee80211_hw *hw)
>>>    	usb_kill_anchored_urbs(&rtlusb->tx_submitted);
>>>    }
>>>    -/**
>>> - *
>>> - * We may add some struct into struct rtl_usb later. Do deinit here.
>>> - *
>>> - */
>>> +/* We may add some struct into struct rtl_usb later. Do deinit here.  */
>>>    static void rtl_usb_deinit(struct ieee80211_hw *hw)
>>>    {
>>>    	rtl_usb_cleanup(hw);
>>
>> I missed that the subject line should be "rtwifi: Fix ....". Otherwise it is OK.
> 
> I can fix the subject during commit.

OK. Acked-by: Larry Finger<Larry.Finger@lwfinger.net>

Thanks,

Larry



^ permalink raw reply

* Re: [PATCH v2 1/1] net: bridge: use mac_len in bridge forwarding
From: Zahari Doychev @ 2019-08-08  8:04 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, bridge, roopa, jhs, dsahern, simon.horman,
	makita.toshiaki, xiyou.wangcong, jiri, ast, johannes,
	alexei.starovoitov
In-Reply-To: <48058179-9690-54e2-f60c-c372446bfde9@cumulusnetworks.com>

On Wed, Aug 07, 2019 at 12:17:43PM +0300, Nikolay Aleksandrov wrote:
> Hi Zahari,
> On 05/08/2019 18:37, Zahari Doychev wrote:
> > The bridge code cannot forward packets from various paths that set up the
> > SKBs in different ways. Some of these packets get corrupted during the
> > forwarding as not always is just ETH_HLEN pulled at the front. This happens
> > e.g. when VLAN tags are pushed bu using tc act_vlan on ingress.
> Overall the patch looks good, I think it shouldn't introduce any regressions
> at least from the codepaths I was able to inspect, but please include more
> details in here from the cover letter, in fact you don't need it just add all of
> the details here so we have them, especially the test setup. Also please provide
> some details how this patch was tested. It'd be great if you could provide a
> selftest for it so we can make sure it's considered when doing future changes.

Hi Nik,

Thanks for the reply. I will do the suggested corrections and try creating a
selftest. I assume it should go to the net/forwarding together with the other
bridge tests as a separate patch.

Regards,
Zahari

> 
> Thank you,
>  Nik
> 
> > 
> > The problem is fixed by using skb->mac_len instead of ETH_HLEN, which makes
> > sure that the skb headers are correctly restored. This usually does not
> > change anything, execpt the local bridge transmits which now need to set
> > the skb->mac_len correctly in br_dev_xmit, as well as the broken case noted
> > above.
> > 
> > Signed-off-by: Zahari Doychev <zahari.doychev@linux.com>
> > ---
> >  net/bridge/br_device.c  | 3 ++-
> >  net/bridge/br_forward.c | 4 ++--
> >  net/bridge/br_vlan.c    | 3 ++-
> >  3 files changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> > index 681b72862c16..aeb77ff60311 100644
> > --- a/net/bridge/br_device.c
> > +++ b/net/bridge/br_device.c
> > @@ -55,8 +55,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	BR_INPUT_SKB_CB(skb)->frag_max_size = 0;
> >  
> >  	skb_reset_mac_header(skb);
> > +	skb_reset_mac_len(skb);
> >  	eth = eth_hdr(skb);
> > -	skb_pull(skb, ETH_HLEN);
> > +	skb_pull(skb, skb->mac_len);
> >  
> >  	if (!br_allowed_ingress(br, br_vlan_group_rcu(br), skb, &vid))
> >  		goto out;
> > diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> > index 86637000f275..edb4f3533f05 100644
> > --- a/net/bridge/br_forward.c
> > +++ b/net/bridge/br_forward.c
> > @@ -32,7 +32,7 @@ static inline int should_deliver(const struct net_bridge_port *p,
> >  
> >  int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  {
> > -	skb_push(skb, ETH_HLEN);
> > +	skb_push(skb, skb->mac_len);
> >  	if (!is_skb_forwardable(skb->dev, skb))
> >  		goto drop;
> >  
> > @@ -94,7 +94,7 @@ static void __br_forward(const struct net_bridge_port *to,
> >  		net = dev_net(indev);
> >  	} else {
> >  		if (unlikely(netpoll_tx_running(to->br->dev))) {
> > -			skb_push(skb, ETH_HLEN);
> > +			skb_push(skb, skb->mac_len);
> >  			if (!is_skb_forwardable(skb->dev, skb))
> >  				kfree_skb(skb);
> >  			else
> > diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> > index 021cc9f66804..88244c9cc653 100644
> > --- a/net/bridge/br_vlan.c
> > +++ b/net/bridge/br_vlan.c
> > @@ -466,13 +466,14 @@ static bool __allowed_ingress(const struct net_bridge *br,
> >  		/* Tagged frame */
> >  		if (skb->vlan_proto != br->vlan_proto) {
> >  			/* Protocol-mismatch, empty out vlan_tci for new tag */
> > -			skb_push(skb, ETH_HLEN);
> > +			skb_push(skb, skb->mac_len);
> >  			skb = vlan_insert_tag_set_proto(skb, skb->vlan_proto,
> >  							skb_vlan_tag_get(skb));
> >  			if (unlikely(!skb))
> >  				return false;
> >  
> >  			skb_pull(skb, ETH_HLEN);
> > +			skb_reset_network_header(skb);
> >  			skb_reset_mac_len(skb);
> >  			*vid = 0;
> >  			tagged = false;
> > 
> 

^ permalink raw reply

* Re: [RFC PATCH] kbuild: re-implement detection of CONFIG options leaked to user-space
From: Christoph Hellwig @ 2019-08-08  7:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Masahiro Yamada, Linux Kbuild mailing list, Sam Ravnborg,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, bpf, Linux Kernel Mailing List, Networking,
	Amit Pundir, David Howells
In-Reply-To: <CAK8P3a2POcb+AReLKib513i_RTN9kLM_Tun7+G5LOacDuy7gjQ@mail.gmail.com>

On Tue, Aug 06, 2019 at 11:00:19AM +0200, Arnd Bergmann wrote:
> > I was playing with sed yesterday, but the resulted code might be unreadable.
> >
> > Sed scripts tend to be somewhat unreadable.
> > I just wondered which language is appropriate for this?
> > Maybe perl, or what else? I am not good at perl, though.
> 
> I like the sed version, in particular as it seems to do the job and
> I'm not volunteering to write it in anything else.

Did anyone not like sed?  I have to say I do like scripts using sed and
awk because they are fairly readable and avoid dependencies on "big"
scripting language and their optional modules that sooner or later get
pulled in.

> This one is nontrivial, since it defines two incompatible layouts for
> this structure,
> and the fdpic version is currently not usable at all from user space. Also,
> the definition breaks configurations that have both CONFIG_BINFMT_ELF
> and CONFIG_BINFMT_ELF_FDPIC enabled, which has become possible
> with commit 382e67aec6a7 ("ARM: enable elf_fdpic on systems with an MMU").
> 
> The best way forward I see is to duplicate the structure definition, adding
> a new 'struct elf_fdpic_prstatus', and using that in fs/binfmt_elf_fdpic.c.
> The same change is required in include/linux/elfcore-compat.h.

Yeah, this is a mess.  David Howells suggested something similar when
I brought the issue to his attention last time.

^ permalink raw reply

* Re: [v3,1/4] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-07 22:15 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190807134208.6601fad2@cakuba.netronome.com>

On Thu, Aug 8, 2019 at 5:42 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Wed,  7 Aug 2019 11:25:06 +0900, Daniel T. Lee wrote:
> > By this commit, using `bpftool net attach`, user can attach XDP prog on
> > interface. New type of enum 'net_attach_type' has been made, as stated at
> > cover-letter, the meaning of 'attach' is, prog will be attached on interface.
> >
> > With 'overwrite' option at argument, attached XDP program could be replaced.
> > Added new helper 'net_parse_dev' to parse the network device at argument.
> >
> > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> >  tools/bpf/bpftool/net.c | 141 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 130 insertions(+), 11 deletions(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index 67e99c56bc88..c05a3fac5cac 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -55,6 +55,35 @@ struct bpf_attach_info {
> >       __u32 flow_dissector_id;
> >  };
> >
> > +enum net_attach_type {
> > +     NET_ATTACH_TYPE_XDP,
> > +     NET_ATTACH_TYPE_XDP_GENERIC,
> > +     NET_ATTACH_TYPE_XDP_DRIVER,
> > +     NET_ATTACH_TYPE_XDP_OFFLOAD,
> > +};
> > +
> > +static const char * const attach_type_strings[] = {
> > +     [NET_ATTACH_TYPE_XDP]           = "xdp",
> > +     [NET_ATTACH_TYPE_XDP_GENERIC]   = "xdpgeneric",
> > +     [NET_ATTACH_TYPE_XDP_DRIVER]    = "xdpdrv",
> > +     [NET_ATTACH_TYPE_XDP_OFFLOAD]   = "xdpoffload",
> > +};
> > +
> > +const size_t max_net_attach_type = ARRAY_SIZE(attach_type_strings);
>
> Nit: in practice max_.._type is num_types - 1, so perhaps rename this
> to num_.. or such?
>

I can see at 'map.c', it declares ARRAY_SIZE with '_size' suffix.
         const size_t map_type_name_size = ARRAY_SIZE(map_type_name);

I'll change this variable name 'max_net_attach_type' to 'net_attach_type_size'.

> > +static enum net_attach_type parse_attach_type(const char *str)
> > +{
> > +     enum net_attach_type type;
> > +
> > +     for (type = 0; type < max_net_attach_type; type++) {
> > +             if (attach_type_strings[type] &&
> > +                is_prefix(str, attach_type_strings[type]))
>
>                    ^
> this is misaligned by one space
>
> Please try checkpatch with the --strict option to catch these.
>

I didn't know checkpatch has strict option.
Thanks for letting me know!

> > +                     return type;
> > +     }
> > +
> > +     return max_net_attach_type;
> > +}
> > +
> >  static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
> >  {
> >       struct bpf_netdev_t *netinfo = cookie;
> > @@ -223,6 +252,97 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
> >       return 0;
> >  }
> >
> > +static int net_parse_dev(int *argc, char ***argv)
> > +{
> > +     int ifindex;
> > +
> > +     if (is_prefix(**argv, "dev")) {
> > +             NEXT_ARGP();
> > +
> > +             ifindex = if_nametoindex(**argv);
> > +             if (!ifindex)
> > +                     p_err("invalid devname %s", **argv);
> > +
> > +             NEXT_ARGP();
> > +     } else {
> > +             p_err("expected 'dev', got: '%s'?", **argv);
> > +             return -1;
> > +     }
> > +
> > +     return ifindex;
> > +}
> > +
> > +static int do_attach_detach_xdp(int progfd, enum net_attach_type attach_type,
> > +                             int ifindex, bool overwrite)
> > +{
> > +     __u32 flags = 0;
> > +
> > +     if (!overwrite)
> > +             flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> > +     if (attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> > +             flags |= XDP_FLAGS_SKB_MODE;
> > +     if (attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> > +             flags |= XDP_FLAGS_DRV_MODE;
> > +     if (attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> > +             flags |= XDP_FLAGS_HW_MODE;
> > +
> > +     return bpf_set_link_xdp_fd(ifindex, progfd, flags);
> > +}
> > +
> > +static int do_attach(int argc, char **argv)
> > +{
> > +     enum net_attach_type attach_type;
> > +     int progfd, ifindex, err = 0;
> > +     bool overwrite = false;
> > +
> > +     /* parse attach args */
> > +     if (!REQ_ARGS(5))
> > +             return -EINVAL;
> > +
> > +     attach_type = parse_attach_type(*argv);
> > +     if (attach_type == max_net_attach_type) {
> > +             p_err("invalid net attach/detach type");
>
> worth adding the type to the error message so that user know which part
> of command line was wrong:
>
>         p_err("invalid net attach/detach type '%s'", *argv);
>

It sounds reasonable.
I'll update the error message.


> > +             return -EINVAL;
> > +     }
> > +
> > +     NEXT_ARG();
>
> nit: the new line should be before NEXT_ARG(), IOV NEXT_ARG() belongs
> to the code which consumed the argument
>

I'm not sure I'm following.
Are you saying that, at here the newline shouldn't be necessary?

> > +     progfd = prog_parse_fd(&argc, &argv);
> > +     if (progfd < 0)
> > +             return -EINVAL;
> > +
> > +     ifindex = net_parse_dev(&argc, &argv);
> > +     if (ifindex < 1) {
> > +             close(progfd);
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (argc) {
> > +             if (is_prefix(*argv, "overwrite")) {
> > +                     overwrite = true;
> > +             } else {
> > +                     p_err("expected 'overwrite', got: '%s'?", *argv);
> > +                     close(progfd);
> > +                     return -EINVAL;
> > +             }
> > +     }
> > +
> > +     /* attach xdp prog */
> > +     if (is_prefix("xdp", attach_type_strings[attach_type]))
>
> I'm still unclear on why this if is needed
>

Just an code structure that shows extensibility for other attachment types.
Well, for now there's no other type than XDP, so it's not necessary.

> > +             err = do_attach_detach_xdp(progfd, attach_type, ifindex,
> > +                                        overwrite);
> > +
> > +     if (err < 0) {
> > +             p_err("interface %s attach failed",
> > +                   attach_type_strings[attach_type]);
>
> Please add the error string, like:
>
>                 p_err("interface %s attach failed: %s",
>                       attach_type_strings[attach_type], strerror(errno));
>
>

Oh. Didn't think of propagate errno to error message.
I'll update it right away.

> > +             return err;
> > +     }
> > +
> > +     if (json_output)
> > +             jsonw_null(json_wtr);
> > +
> > +     return 0;
> > +}
> > +
> >  static int do_show(int argc, char **argv)
> >  {
> >       struct bpf_attach_info attach_info = {};
> > @@ -231,17 +351,10 @@ static int do_show(int argc, char **argv)
> >       unsigned int nl_pid;
> >       char err_buf[256];
> >
> > -     if (argc == 2) {
> > -             if (strcmp(argv[0], "dev") != 0)
> > -                     usage();
> > -             filter_idx = if_nametoindex(argv[1]);
> > -             if (filter_idx == 0) {
> > -                     fprintf(stderr, "invalid dev name %s\n", argv[1]);
> > -                     return -1;
> > -             }
> > -     } else if (argc != 0) {
> > +     if (argc == 2)
> > +             filter_idx = net_parse_dev(&argc, &argv);
>
> You should check filter_idx is not negative here, no?
>

You're right.
I'll update it.

> > +     else if (argc != 0)
> >               usage();
> > -     }
> >
> >       ret = query_flow_dissector(&attach_info);
> >       if (ret)

Thank you for your assistance.

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf: support cloning sk storage on accept()
From: Martin Lau @ 2019-08-08  7:11 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Yonghong Song, Stanislav Fomichev, netdev@vger.kernel.org,
	bpf@vger.kernel.org, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net
In-Reply-To: <20190808000533.GA2820@mini-arch>

On Wed, Aug 07, 2019 at 05:05:33PM -0700, Stanislav Fomichev wrote:
[ ... ]
> > > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > > +{
> > > +	struct bpf_sk_storage *new_sk_storage = NULL;
> > > +	struct bpf_sk_storage *sk_storage;
> > > +	struct bpf_sk_storage_elem *selem;
> > > +	int ret;
> > > +
> > > +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > +
> > > +	rcu_read_lock();
> > > +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > > +
> > > +	if (!sk_storage || hlist_empty(&sk_storage->list))
> > > +		goto out;
> > > +
> > > +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > > +		struct bpf_sk_storage_map *smap;
> > > +		struct bpf_sk_storage_elem *copy_selem;
> > > +
> > > +		if (!selem->clone)
> > > +			continue;
> > > +
> > > +		smap = rcu_dereference(SDATA(selem)->smap);
> > > +		if (!smap)
> > > +			continue;
> > > +
> > > +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > > +		if (IS_ERR(copy_selem)) {
> > > +			ret = PTR_ERR(copy_selem);
> > > +			goto err;
> > > +		}
> > > +
> > > +		if (!new_sk_storage) {
> > > +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> > > +			if (ret) {
> > > +				kfree(copy_selem);
> > > +				atomic_sub(smap->elem_size,
> > > +					   &newsk->sk_omem_alloc);
> > > +				goto err;
> > > +			}
> > > +
> > > +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > > +			continue;
> > > +		}
> > > +
> > > +		raw_spin_lock_bh(&new_sk_storage->lock);
> > > +		selem_link_map(smap, copy_selem);
> > > +		__selem_link_sk(new_sk_storage, copy_selem);
> > > +		raw_spin_unlock_bh(&new_sk_storage->lock);
> > 
> > Considering in this particular case, new socket is not visible to 
> > outside world yet (both kernel and user space), map_delete/map_update
> > operations are not applicable in this situation, so
> > the above raw_spin_lock_bh() probably not needed.
> I agree, it's doing nothing, but __selem_link_sk has the following comment:
> /* sk_storage->lock must be held and sk_storage->list cannot be empty */
I believe I may have forgotten to remove this comment after reusing it
in sk_storage_alloc() which also has a similar no-lock-needed situation
like here.

I would also prefer not to acquire the new_sk_storage->lock here to avoid
confusion when investigating racing bugs in the future.

> 
> Just wanted to keep that invariant for this call site as well (in case
> we add some lockdep enforcement or smth else). WDYT?
> 
> > > +	}
> > > +
> > > +out:
> > > +	rcu_read_unlock();
> > > +	return 0;
> > > +
> > > +err:
> > > +	rcu_read_unlock();
> > > +
> > > +	bpf_sk_storage_free(newsk);
> > > +	return ret;
> > > +}
> > > +
> > >   BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > >   	   void *, value, u64, flags)
> > >   {
> > >   	struct bpf_sk_storage_data *sdata;
> > >   
> > > -	if (flags > BPF_SK_STORAGE_GET_F_CREATE)
> > > +	if (flags & ~BPF_SK_STORAGE_GET_F_MASK)
> > > +		return (unsigned long)NULL;
> > > +
> > > +	if ((flags & BPF_SK_STORAGE_GET_F_CLONE) &&
> > > +	    !(flags & BPF_SK_STORAGE_GET_F_CREATE))
> > >   		return (unsigned long)NULL;
> > >   
> > >   	sdata = sk_storage_lookup(sk, map, true);
> > >   	if (sdata)
> > >   		return (unsigned long)sdata->data;
> > >   
> > > -	if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
> > > +	if ((flags & BPF_SK_STORAGE_GET_F_CREATE) &&
> > >   	    /* Cannot add new elem to a going away sk.
> > >   	     * Otherwise, the new elem may become a leak
> > >   	     * (and also other memory issues during map
> > > @@ -762,6 +853,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > >   		/* sk must be a fullsock (guaranteed by verifier),
> > >   		 * so sock_gen_put() is unnecessary.
> > >   		 */
> > > +		if (!IS_ERR(sdata))
> > > +			SELEM(sdata)->clone =
> > > +				!!(flags & BPF_SK_STORAGE_GET_F_CLONE);
> > >   		sock_put(sk);
> > >   		return IS_ERR(sdata) ?
> > >   			(unsigned long)NULL : (unsigned long)sdata->data;
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index d57b0cc995a0..f5e801a9cea4 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
> > >   			goto out;
> > >   		}
> > >   		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> > > -#ifdef CONFIG_BPF_SYSCALL
> > > -		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > -#endif
> > > +
> > > +		if (bpf_sk_storage_clone(sk, newsk)) {
> > > +			sk_free_unlock_clone(newsk);
> > > +			newsk = NULL;
> > > +			goto out;
> > > +		}
> > >   
> > >   		newsk->sk_err	   = 0;
> > >   		newsk->sk_err_soft = 0;
> > > 

^ permalink raw reply

* Re: [PATCH v2] Fix non-kerneldoc comment in realtek/rtlwifi/usb.c
From: Kalle Valo @ 2019-08-08  7:10 UTC (permalink / raw)
  To: Larry Finger
  Cc: Valdis Klētnieks, Ping-Ke Shih, David S. Miller,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <15df2564-8815-f351-8fb2-b46611a90234@lwfinger.net>

Larry Finger <Larry.Finger@lwfinger.net> writes:

> On 8/7/19 8:51 PM, Valdis Klētnieks wrote:
>> Fix spurious warning message when building with W=1:
>>
>>    CC [M]  drivers/net/wireless/realtek/rtlwifi/usb.o
>> drivers/net/wireless/realtek/rtlwifi/usb.c:243: warning: Cannot understand  * on line 243 - I thought it was a doc line
>> drivers/net/wireless/realtek/rtlwifi/usb.c:760: warning: Cannot understand  * on line 760 - I thought it was a doc line
>> drivers/net/wireless/realtek/rtlwifi/usb.c:790: warning: Cannot understand  * on line 790 - I thought it was a doc line
>>
>> Clean up the comment format.
>>
>> Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
>>
>> ---
>> Changes since v1:  Larry Finger pointed out the patch wasn't checkpatch-clean.
>>
>> diff --git a/drivers/net/wireless/realtek/rtlwifi/usb.c b/drivers/net/wireless/realtek/rtlwifi/usb.c
>> index 34d68dbf4b4c..4b59f3b46b28 100644
>> --- a/drivers/net/wireless/realtek/rtlwifi/usb.c
>> +++ b/drivers/net/wireless/realtek/rtlwifi/usb.c
>> @@ -239,10 +239,7 @@ static void _rtl_usb_io_handler_release(struct ieee80211_hw *hw)
>>   	mutex_destroy(&rtlpriv->io.bb_mutex);
>>   }
>>   -/**
>> - *
>> - *	Default aggregation handler. Do nothing and just return the oldest skb.
>> - */
>> +/*	Default aggregation handler. Do nothing and just return the oldest skb.  */
>>   static struct sk_buff *_none_usb_tx_aggregate_hdl(struct ieee80211_hw *hw,
>>   						  struct sk_buff_head *list)
>>   {
>> @@ -756,11 +753,6 @@ static int rtl_usb_start(struct ieee80211_hw *hw)
>>   	return err;
>>   }
>>   -/**
>> - *
>> - *
>> - */
>> -
>>   /*=======================  tx =========================================*/
>>   static void rtl_usb_cleanup(struct ieee80211_hw *hw)
>>   {
>> @@ -786,11 +778,7 @@ static void rtl_usb_cleanup(struct ieee80211_hw *hw)
>>   	usb_kill_anchored_urbs(&rtlusb->tx_submitted);
>>   }
>>   -/**
>> - *
>> - * We may add some struct into struct rtl_usb later. Do deinit here.
>> - *
>> - */
>> +/* We may add some struct into struct rtl_usb later. Do deinit here.  */
>>   static void rtl_usb_deinit(struct ieee80211_hw *hw)
>>   {
>>   	rtl_usb_cleanup(hw);
>
> I missed that the subject line should be "rtwifi: Fix ....". Otherwise it is OK.

I can fix the subject during commit.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [bpf-next PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: Jesper Dangaard Brouer @ 2019-08-08  6:51 UTC (permalink / raw)
  To: Y Song
  Cc: netdev, Daniel Borkmann, Alexei Starovoitov, Xdp, a.s.protopopov,
	David Ahern, brouer
In-Reply-To: <CAH3MdRUf_2Sk8v2dPeQ_+LfKPPwX9N3QoMDMCGFehd5JQVktcw@mail.gmail.com>

On Wed, 7 Aug 2019 11:04:17 -0700
Y Song <ys114321@gmail.com> wrote:

> On Wed, Aug 7, 2019 at 5:37 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
> > that the chosen egress index should be checked for existence in the
> > devmap. This can now be done via taking advantage of Toke's work in
> > commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF").
> >
> > This change makes xdp_fwd more practically usable, as this allows for
> > a mixed environment, where IP-forwarding fallback to network stack, if
> > the egress device isn't configured to use XDP.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  samples/bpf/xdp_fwd_kern.c |   20 ++++++++++++++------
> >  samples/bpf/xdp_fwd_user.c |   36 +++++++++++++++++++++++++-----------
> >  2 files changed, 39 insertions(+), 17 deletions(-)
> >
> > diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
> > index e6ffc4ea06f4..4a5ad381ed2a 100644
> > --- a/samples/bpf/xdp_fwd_kern.c
> > +++ b/samples/bpf/xdp_fwd_kern.c
> > @@ -104,13 +104,21 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
> >
> >         rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
> >
> > -       /* verify egress index has xdp support
> > -        * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
> > -        *       cannot pass map_type 14 into func bpf_map_lookup_elem#1:
> > -        * NOTE: without verification that egress index supports XDP
> > -        *       forwarding packets are dropped.
> > -        */
> >         if (rc == 0) {
> > +               int *val;
> > +
> > +               /* Verify egress index has been configured as TX-port.
> > +                * (Note: User can still have inserted an egress ifindex that
> > +                * doesn't support XDP xmit, which will result in packet drops).
> > +                *
> > +                * Note: lookup in devmap supported since 0cdbb4b09a0.
> > +                * If not supported will fail with:
> > +                *  cannot pass map_type 14 into func bpf_map_lookup_elem#1:
> > +                */
> > +               val = bpf_map_lookup_elem(&tx_port, &fib_params.ifindex);  
> 
> It should be "xdp_tx_ports". Otherwise, you will have compilation errors.

Ups. This happened in my rebase, where I moved the rename patch [1/3]
before this one.  Thanks for catching this!

> > +               if (!val)
> > +                       return XDP_PASS;  
> 
> Also, maybe we can do
>          if (!bpf_map_lookup_elem(&tx_port, &fib_params.ifindex))
>             return XDP_PASS;
> so we do not need to define val at all.

I had it this way, because I also checked the contents (of the pointer
*val) to check if this was the correct ifindex, but I removed that
check again (as user side always insert correctly).  So, I guess I
could take your suggestion now.


> > +
> >                 if (h_proto == htons(ETH_P_IP))
> >                         ip_decrease_ttl(iph);
> >                 else if (h_proto == htons(ETH_P_IPV6))
> > diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
> > index ba012d9f93dd..20951bc27477 100644
> > --- a/samples/bpf/xdp_fwd_user.c
> > +++ b/samples/bpf/xdp_fwd_user.c
> > @@ -27,14 +27,20 @@
> >  #include "libbpf.h"
> >  #include <bpf/bpf.h>
> >
> > -
> > -static int do_attach(int idx, int fd, const char *name)
> > +static int do_attach(int idx, int prog_fd, int map_fd, const char
> > *name) {
> >         int err;
> >
> > -       err = bpf_set_link_xdp_fd(idx, fd, 0);
> > -       if (err < 0)
> > +       err = bpf_set_link_xdp_fd(idx, prog_fd, 0);
> > +       if (err < 0) {
> >                 printf("ERROR: failed to attach program to %s\n",
> > name);
> > +               return err;
> > +       }
> > +
> > +       /* Adding ifindex as a possible egress TX port */
> > +       err = bpf_map_update_elem(map_fd, &idx, &idx, 0);
> > +       if (err)
> > +               printf("ERROR: failed using device %s as
> > TX-port\n", name);
> >
> >         return err;
> >  }
> > @@ -47,6 +53,9 @@ static int do_detach(int idx, const char *name)
> >         if (err < 0)
> >                 printf("ERROR: failed to detach program from %s\n",
> > name);
> >
> > +       /* TODO: Remember to cleanup map, when adding use of shared
> > map
> > +        *  bpf_map_delete_elem((map_fd, &idx);
> > +        */
> >         return err;
> >  }
> >
> > @@ -67,10 +76,10 @@ int main(int argc, char **argv)
> >         };
> >         const char *prog_name = "xdp_fwd";
> >         struct bpf_program *prog;
> > +       int prog_fd, map_fd = -1;
> >         char filename[PATH_MAX];
> >         struct bpf_object *obj;
> >         int opt, i, idx, err;
> > -       int prog_fd, map_fd;
> >         int attach = 1;
> >         int ret = 0;
> >
> > @@ -103,8 +112,17 @@ int main(int argc, char **argv)
> >                         return 1;
> >                 }
> >
> > -               if (bpf_prog_load_xattr(&prog_load_attr, &obj,
> > &prog_fd))
> > +               err = bpf_prog_load_xattr(&prog_load_attr, &obj,
> > &prog_fd);
> > +               if (err) {
> > +                       if (err == -22) {  
> 
> -EINVAL?

Yes.

> For -EINVAL, many things could go wrong. But maybe the blow error
> is the most common one so I am fine with that.

Yes, it is rather sad, that we don't have better/more return codes,
such that we can react better to these.

E.g. if this was part of a open source project (external to the
kernel), I could have two XDP-BPF programs, one that use this feature
and one that don't.  If I had a more specific return code, then I could
load the other if the first failed.


> > +                               printf("Does kernel support devmap lookup?\n");
> > +                               /* If not, the error message will be:
> > +                                * "cannot pass map_type 14 into func
> > +                                * bpf_map_lookup_elem#1"
> > +                                */
> > +                       }
> >                         return 1;
> > +               }
> >
> >                 prog = bpf_object__find_program_by_title(obj, prog_name);
> >                 prog_fd = bpf_program__fd(prog);
> > @@ -119,10 +137,6 @@ int main(int argc, char **argv)
> >                         return 1;
> >                 }
> >         }
> > -       if (attach) {
> > -               for (i = 1; i < 64; ++i)
> > -                       bpf_map_update_elem(map_fd, &i, &i, 0);
> > -       }
> >
> >         for (i = optind; i < argc; ++i) {
> >                 idx = if_nametoindex(argv[i]);
> > @@ -138,7 +152,7 @@ int main(int argc, char **argv)
> >                         if (err)
> >                                 ret = err;
> >                 } else {
> > -                       err = do_attach(idx, prog_fd, argv[i]);
> > +                       err = do_attach(idx, prog_fd, map_fd, argv[i]);
> >                         if (err)
> >                                 ret = err;
> >                 }
> >  

I'll send a V2

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf: support cloning sk storage on accept()
From: Martin Lau @ 2019-08-08  6:39 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190807154720.260577-2-sdf@google.com>

On Wed, Aug 07, 2019 at 08:47:18AM -0700, Stanislav Fomichev wrote:
> Add new helper bpf_sk_storage_clone which optionally clones sk storage
> and call it from bpf_sk_storage_clone. Reuse the gap in
> bpf_sk_storage_elem to store clone/non-clone flag.
> 
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/net/bpf_sk_storage.h |  10 ++++
>  include/uapi/linux/bpf.h     |   1 +
>  net/core/bpf_sk_storage.c    | 102 +++++++++++++++++++++++++++++++++--
>  net/core/sock.c              |   9 ++--
>  4 files changed, 115 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
> index b9dcb02e756b..8e4f831d2e52 100644
> --- a/include/net/bpf_sk_storage.h
> +++ b/include/net/bpf_sk_storage.h
> @@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
>  extern const struct bpf_func_proto bpf_sk_storage_get_proto;
>  extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
> +#else
> +static inline int bpf_sk_storage_clone(const struct sock *sk,
> +				       struct sock *newsk)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #endif /* _BPF_SK_STORAGE_H */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..00459ca4c8cf 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2931,6 +2931,7 @@ enum bpf_func_id {
>  
>  /* BPF_FUNC_sk_storage_get flags */
>  #define BPF_SK_STORAGE_GET_F_CREATE	(1ULL << 0)
> +#define BPF_SK_STORAGE_GET_F_CLONE	(1ULL << 1)
It is only used in bpf_sk_storage_get().
What if the elem is created from bpf_fd_sk_storage_update_elem()
i.e. from the syscall API ?

What may be the use case for a map to have both CLONE and non-CLONE
elements?  If it is not the case, would it be better to add
BPF_F_CLONE to bpf_attr->map_flags?

>  
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 94c7f77ecb6b..b6dea67965bc 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -12,6 +12,9 @@
>  
>  static atomic_t cache_idx;
>  
> +#define BPF_SK_STORAGE_GET_F_MASK	(BPF_SK_STORAGE_GET_F_CREATE | \
> +					 BPF_SK_STORAGE_GET_F_CLONE)
> +
>  struct bucket {
>  	struct hlist_head list;
>  	raw_spinlock_t lock;
> @@ -66,7 +69,8 @@ struct bpf_sk_storage_elem {
>  	struct hlist_node snode;	/* Linked to bpf_sk_storage */
>  	struct bpf_sk_storage __rcu *sk_storage;
>  	struct rcu_head rcu;
> -	/* 8 bytes hole */
> +	u8 clone:1;
> +	/* 7 bytes hole */
>  	/* The data is stored in aother cacheline to minimize
>  	 * the number of cachelines access during a cache hit.
>  	 */
> @@ -509,7 +513,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
>  	return 0;
>  }
>  
> -/* Called by __sk_destruct() */
> +/* Called by __sk_destruct() & bpf_sk_storage_clone() */
>  void bpf_sk_storage_free(struct sock *sk)
>  {
>  	struct bpf_sk_storage_elem *selem;
> @@ -739,19 +743,106 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
>  	return err;
>  }
>  
> +static struct bpf_sk_storage_elem *
> +bpf_sk_storage_clone_elem(struct sock *newsk,
> +			  struct bpf_sk_storage_map *smap,
> +			  struct bpf_sk_storage_elem *selem)
> +{
> +	struct bpf_sk_storage_elem *copy_selem;
> +
> +	copy_selem = selem_alloc(smap, newsk, NULL, true);
> +	if (!copy_selem)
> +		return ERR_PTR(-ENOMEM);
nit.
may be just return NULL as selem_alloc() does.

> +
> +	if (map_value_has_spin_lock(&smap->map))
> +		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
> +				      SDATA(selem)->data, true);
> +	else
> +		copy_map_value(&smap->map, SDATA(copy_selem)->data,
> +			       SDATA(selem)->data);
> +
> +	return copy_selem;
> +}
> +
> +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> +{
> +	struct bpf_sk_storage *new_sk_storage = NULL;
> +	struct bpf_sk_storage *sk_storage;
> +	struct bpf_sk_storage_elem *selem;
> +	int ret;
> +
> +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> +
> +	rcu_read_lock();
> +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> +
> +	if (!sk_storage || hlist_empty(&sk_storage->list))
> +		goto out;
> +
> +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> +		struct bpf_sk_storage_map *smap;
> +		struct bpf_sk_storage_elem *copy_selem;
> +
> +		if (!selem->clone)
> +			continue;
> +
> +		smap = rcu_dereference(SDATA(selem)->smap);
> +		if (!smap)
smap should not be NULL.

> +			continue;
> +
> +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> +		if (IS_ERR(copy_selem)) {
> +			ret = PTR_ERR(copy_selem);
> +			goto err;
> +		}
> +
> +		if (!new_sk_storage) {
> +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> +			if (ret) {
> +				kfree(copy_selem);
> +				atomic_sub(smap->elem_size,
> +					   &newsk->sk_omem_alloc);
> +				goto err;
> +			}
> +
> +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> +			continue;
> +		}
> +
> +		raw_spin_lock_bh(&new_sk_storage->lock);
> +		selem_link_map(smap, copy_selem);
Unlike the existing selem-update use-cases in bpf_sk_storage.c,
the smap->map.refcnt has not been held here.  Reading the smap
is fine.  However, adding a new selem to a deleting smap is an issue.
Hence, I think bpf_map_inc_not_zero() should be done first.

> +		__selem_link_sk(new_sk_storage, copy_selem);
> +		raw_spin_unlock_bh(&new_sk_storage->lock);
> +	}
> +
> +out:
> +	rcu_read_unlock();
> +	return 0;
> +
> +err:
> +	rcu_read_unlock();
> +
> +	bpf_sk_storage_free(newsk);
> +	return ret;
> +}
> +
>  BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
>  	   void *, value, u64, flags)
>  {
>  	struct bpf_sk_storage_data *sdata;
>  
> -	if (flags > BPF_SK_STORAGE_GET_F_CREATE)
> +	if (flags & ~BPF_SK_STORAGE_GET_F_MASK)
> +		return (unsigned long)NULL;
> +
> +	if ((flags & BPF_SK_STORAGE_GET_F_CLONE) &&
> +	    !(flags & BPF_SK_STORAGE_GET_F_CREATE))
>  		return (unsigned long)NULL;
>  
>  	sdata = sk_storage_lookup(sk, map, true);
>  	if (sdata)
>  		return (unsigned long)sdata->data;
>  
> -	if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
> +	if ((flags & BPF_SK_STORAGE_GET_F_CREATE) &&
>  	    /* Cannot add new elem to a going away sk.
>  	     * Otherwise, the new elem may become a leak
>  	     * (and also other memory issues during map
> @@ -762,6 +853,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
>  		/* sk must be a fullsock (guaranteed by verifier),
>  		 * so sock_gen_put() is unnecessary.
>  		 */
> +		if (!IS_ERR(sdata))
> +			SELEM(sdata)->clone =
> +				!!(flags & BPF_SK_STORAGE_GET_F_CLONE);
>  		sock_put(sk);
>  		return IS_ERR(sdata) ?
>  			(unsigned long)NULL : (unsigned long)sdata->data;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index d57b0cc995a0..f5e801a9cea4 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
>  			goto out;
>  		}
>  		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> -#ifdef CONFIG_BPF_SYSCALL
> -		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> -#endif
> +
> +		if (bpf_sk_storage_clone(sk, newsk)) {
> +			sk_free_unlock_clone(newsk);
> +			newsk = NULL;
> +			goto out;
> +		}
>  
>  		newsk->sk_err	   = 0;
>  		newsk->sk_err_soft = 0;
> -- 
> 2.22.0.770.g0f2c4a37fd-goog
> 

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
From: Paul Blakey @ 2019-08-08  6:39 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev@vger.kernel.org, David S. Miller, Justin Pettit,
	Pravin B Shelar, Simon Horman, Vlad Buslov, Jiri Pirko, Roi Dayan,
	Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <20190807150026.GE21609@localhost.localdomain>


On 8/7/2019 6:00 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Aug 07, 2019 at 03:08:42PM +0300, Paul Blakey wrote:
>> Offloaded OvS datapath rules are translated one to one to tc rules,
>> for example the following simplified OvS rule:
>>
>> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
>>
>> Will be translated to the following tc rule:
>>
>> $ tc filter add dev dev1 ingress \
>> 	    prio 1 chain 0 proto ip \
>> 		flower tcp ct_state -trk \
>> 		action ct pipe \
>> 		action goto chain 2
>>
>> Received packets will first travel though tc, and if they aren't stolen
>> by it, like in the above rule, they will continue to OvS datapath.
>> Since we already did some actions (action ct in this case) which might
>> modify the packets, and updated action stats, we would like to continue
>> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
>> where we left off.
>>
>> To support this, introduce a new skb extension for tc, which
>> will be used for translating tc chain to ovs recirc_id to
>> handle these miss cases. Last tc chain index will be set
>> by tc goto chain action and read by OvS datapath.
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Thanks!


>> ---
>>   include/linux/skbuff.h    | 13 +++++++++++++
>>   include/net/sch_generic.h |  5 ++++-
>>   net/core/skbuff.c         |  6 ++++++
>>   net/openvswitch/flow.c    |  9 +++++++++
>>   net/sched/Kconfig         | 13 +++++++++++++
>>   net/sched/act_api.c       |  1 +
>>   net/sched/cls_api.c       | 12 ++++++++++++
>>   7 files changed, 58 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 3aef8d8..fb2a792 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -279,6 +279,16 @@ struct nf_bridge_info {
>>   };
>>   #endif
>>   
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +/* Chain in tc_skb_ext will be used to share the tc chain with
>> + * ovs recirc_id. It will be set to the current chain by tc
>> + * and read by ovs to recirc_id.
>> + */
>> +struct tc_skb_ext {
>> +	__u32 chain;
>> +};
>> +#endif
>> +
>>   struct sk_buff_head {
>>   	/* These two members must be first. */
>>   	struct sk_buff	*next;
>> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
>>   #ifdef CONFIG_XFRM
>>   	SKB_EXT_SEC_PATH,
>>   #endif
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +	TC_SKB_EXT,
>> +#endif
>>   	SKB_EXT_NUM, /* must be last */
>>   };
>>   
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 6b6b012..871feea 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -275,7 +275,10 @@ struct tcf_result {
>>   			unsigned long	class;
>>   			u32		classid;
>>   		};
>> -		const struct tcf_proto *goto_tp;
>> +		struct {
>> +			const struct tcf_proto *goto_tp;
>> +			u32 goto_index;
>> +		};
>>   
>>   		/* used in the skb_tc_reinsert function */
>>   		struct {
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index ea8e8d3..2b40b5a 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>>   #ifdef CONFIG_XFRM
>>   	[SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path),
>>   #endif
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +	[TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext),
>> +#endif
>>   };
>>   
>>   static __always_inline unsigned int skb_ext_total_length(void)
>> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
>>   #ifdef CONFIG_XFRM
>>   		skb_ext_type_len[SKB_EXT_SEC_PATH] +
>>   #endif
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +		skb_ext_type_len[TC_SKB_EXT] +
>> +#endif
>>   		0;
>>   }
>>   
>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> index bc89e16..0287ead 100644
>> --- a/net/openvswitch/flow.c
>> +++ b/net/openvswitch/flow.c
>> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb)
>>   int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>   			 struct sk_buff *skb, struct sw_flow_key *key)
>>   {
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +	struct tc_skb_ext *tc_ext;
>> +#endif
>>   	int res, err;
>>   
>>   	/* Extract metadata from packet. */
>> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>>   	if (res < 0)
>>   		return res;
>>   	key->mac_proto = res;
>> +
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
>> +	key->recirc_id = tc_ext ? tc_ext->chain : 0;
>> +#else
>>   	key->recirc_id = 0;
>> +#endif
>>   
>>   	err = key_extract(skb, key);
>>   	if (!err)
>> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
>> index afd2ba1..b3faafe 100644
>> --- a/net/sched/Kconfig
>> +++ b/net/sched/Kconfig
>> @@ -963,6 +963,19 @@ config NET_IFE_SKBTCINDEX
>>           tristate "Support to encoding decoding skb tcindex on IFE action"
>>           depends on NET_ACT_IFE
>>   
>> +config NET_TC_SKB_EXT
>> +	bool "TC recirculation support"
>> +	depends on NET_CLS_ACT
>> +	default y if NET_CLS_ACT
>> +	select SKB_EXTENSIONS
>> +
>> +	help
>> +	  Say Y here to allow tc chain misses to continue in OvS datapath in
>> +	  the correct recirc_id, and hardware chain misses to continue in
>> +	  the correct chain in tc software datapath.
>> +
>> +	  Say N here if you won't be using tc<->ovs offload or tc chains offload.
>> +
>>   endif # NET_SCHED
>>   
>>   config NET_SCH_FIFO
>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>> index 3397122..c393604 100644
>> --- a/net/sched/act_api.c
>> +++ b/net/sched/act_api.c
>> @@ -27,6 +27,7 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a,
>>   {
>>   	const struct tcf_chain *chain = rcu_dereference_bh(a->goto_chain);
>>   
>> +	res->goto_index = chain->index;
>>   	res->goto_tp = rcu_dereference_bh(chain->filter_chain);
>>   }
>>   
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 3565d9a..b0b829a 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -1660,6 +1660,18 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>>   			goto reset;
>>   		} else if (unlikely(TC_ACT_EXT_CMP(err, TC_ACT_GOTO_CHAIN))) {
>>   			first_tp = res->goto_tp;
>> +
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +			{
>> +				struct tc_skb_ext *ext;
>> +
>> +				ext = skb_ext_add(skb, TC_SKB_EXT);
>> +				if (WARN_ON_ONCE(!ext))
>> +					return TC_ACT_SHOT;
>> +
>> +				ext->chain = res->goto_index;
>> +			}
>> +#endif
>>   			goto reset;
>>   		}
>>   #endif
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox