Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] ath9k: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-05-25 21:22 UTC (permalink / raw)
  To: QCA ath9k Development, Kalle Valo, David S. Miller
  Cc: linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/wireless/ath/ath9k/ar5008_phy.c | 2 ++
 drivers/net/wireless/ath/ath9k/ar9002_phy.c | 1 +
 drivers/net/wireless/ath/ath9k/main.c       | 1 +
 3 files changed, 4 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/ar5008_phy.c b/drivers/net/wireless/ath/ath9k/ar5008_phy.c
index 7922550..ef2dd68 100644
--- a/drivers/net/wireless/ath/ath9k/ar5008_phy.c
+++ b/drivers/net/wireless/ath/ath9k/ar5008_phy.c
@@ -583,12 +583,14 @@ static void ar5008_hw_init_chain_masks(struct ath_hw *ah)
 	case 0x5:
 		REG_SET_BIT(ah, AR_PHY_ANALOG_SWAP,
 			    AR_PHY_SWAP_ALT_CHAIN);
+		/* fall through */
 	case 0x3:
 		if (ah->hw_version.macVersion == AR_SREV_REVISION_5416_10) {
 			REG_WRITE(ah, AR_PHY_RX_CHAINMASK, 0x7);
 			REG_WRITE(ah, AR_PHY_CAL_CHAINMASK, 0x7);
 			break;
 		}
+		/* else: fall through */
 	case 0x1:
 	case 0x2:
 	case 0x7:
diff --git a/drivers/net/wireless/ath/ath9k/ar9002_phy.c b/drivers/net/wireless/ath/ath9k/ar9002_phy.c
index 61a9b85..7132918 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_phy.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_phy.c
@@ -119,6 +119,7 @@ static int ar9002_hw_set_channel(struct ath_hw *ah, struct ath9k_channel *chan)
 				aModeRefSel = 2;
 			if (aModeRefSel)
 				break;
+			/* else: fall through */
 		case 1:
 		default:
 			aModeRefSel = 0;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index a3be8ad..11d84f4 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1928,6 +1928,7 @@ static int ath9k_ampdu_action(struct ieee80211_hw *hw,
 	case IEEE80211_AMPDU_TX_STOP_FLUSH:
 	case IEEE80211_AMPDU_TX_STOP_FLUSH_CONT:
 		flush = true;
+		/* fall through */
 	case IEEE80211_AMPDU_TX_STOP_CONT:
 		ath9k_ps_wakeup(sc);
 		ath_tx_aggr_stop(sc, sta, tid);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-05-25 21:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com>

On Fri, 25 May 2018 09:02:23 -0500, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:  
> > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:  
> > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > to not fail, e.g.:
> > > > 
> > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > 
> > > > For devices which VF support depends on loaded FW we have the
> > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > to 0.  Remove the special values completely and simply initialize
> > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > Add a helper for drivers to reset the VF limit back to total.    
> > > 
> > > I still can't really make sense out of the changelog.
> > >
> > > I think part of the reason it's confusing is because there are two
> > > things going on:
> > > 
> > >   1) You want this:
> > >   
> > >        pci_sriov_set_totalvfs(dev, 0);
> > >        x = pci_sriov_get_totalvfs(dev) 
> > > 
> > >      to return 0 instead of total_VFs.  That seems to connect with
> > >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> > >      0, but I don't know how that is useful (I'm sure it is; just
> > >      educate me :))  
> > 
> > Let me just quote the bug report that got filed on our internal bug
> > tracker :)
> > 
> >   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> >   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> >   then tries to set that as the sriov_numvfs parameter.
> > 
> >   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
> >   but it's set to max.  When FW is switched to flower*, the correct 
> >   sriov_totalvfs value is presented.
> > 
> > * flower is a project name  
> 
> From the point of view of the PCI core (which knows nothing about
> device firmware and relies on the architected config space described
> by the PCIe spec), this sounds like an erratum: with some firmware
> installed, the device is not capable of SR-IOV, but still advertises
> an SR-IOV capability with "TotalVFs > 0".
> 
> Regardless of whether that's an erratum, we do allow PF drivers to use
> pci_sriov_set_totalvfs() to limit the number of VFs that may be
> enabled by writing to the PF's "sriov_numvfs" sysfs file.

Think more of an FPGA which can be reprogrammed at runtime to have
different capabilities than an erratum.  Some FWs simply have no use
for VFs and save resources (and validation time) by not supporting it.

> But the current implementation does not allow a PF driver to limit VFs
> to 0, and that does seem nonsensical.
> 
> > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > VFs can be enabled, looks like this is the code:
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> >   
> > >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > >      sure what you intend for this.  Is *every* driver supposed to
> > >      call it in .remove()?  Could/should this be done in the core
> > >      somehow instead of depending on every driver?  
> > 
> > Good question, I was just thinking yesterday we may want to call it
> > from the core, but I don't think it's strictly necessary nor always
> > sufficient (we may reload FW without re-probing).
> > 
> > We have a device which supports different number of VFs based on the FW
> > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > support, because it supports max.  So the flow in our driver is this:
> > 
> > load_fw(dev);
> > ...
> > max_vfs = ask_fw_for_max_vfs(dev);
> > if (max_vfs >= 0)
> > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > else /* FW didn't tell us, assume max */
> > 	return pci_sriov_reset_totalvfs(dev); 
> > 
> > We also reset the max on device remove, but that's not strictly
> > necessary.
> > 
> > Other users of pci_sriov_set_totalvfs() always know the value to set
> > the total to (either always get it from FW or it's a constant).
> > 
> > If you prefer we can work out the correct max for those legacy cases in
> > the driver as well, although it seemed cleaner to just ask the core,
> > since it already has total_VFs value handy :)
> >   
> > > I'm also having a hard time connecting your user-space command example
> > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > after some coffee.  
> > 
> > OpenStack assumes it will always be able to set sriov_numvfs to
> > sriov_totalvfs, see this 'if':
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512  
> 
> Thanks for educating me.  I think there are two issues here that we
> can separate.  I extracted the patch below for the first.
> 
> The second is the question of resetting driver_max_VFs.  I think we
> currently have a general issue in the core:
> 
>   - load PF driver 1
>   - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
>   - unload PF driver 1
>   - load PF driver 2
> 
> Now driver_max_VFs is still stuck at the lower value set by driver 1.
> I don't think that's the way this should work.
> 
> I guess this is partly a consequence of setting driver_max_VFs in
> sriov_init(), which is called before driver attach and should only
> depend on hardware characteristics, so it is related to the patch
> below.  But I think we should fix it in general, not just for
> netronome.

Okay, perfect.  That makes sense.  The patch below certainly fixes the
first issue for us.  Thank you!

As far as the second issue goes - agreed, having the core reset the
number of VFs to total_VFs definitely makes sense.  It doesn't cater to
the case where FW is reloaded without reprobing, but we don't do this
today anyway.

Should I try to come up with a patch to reset total_VFs after detach?

> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date:   Fri May 25 08:18:34 2018 -0500
> 
>     PCI/IOV: Allow PF drivers to limit total_VFs to 0
>     
>     Some SR-IOV PF drivers implement .sriov_configure(), which allows
>     user-space to enable VFs by writing the desired number of VFs to the sysfs
>     "sriov_numvfs" file (see sriov_numvfs_store()).
>     
>     The PCI core limits the number of VFs to the TotalVFs advertised by the
>     device in its SR-IOV capability.  The PF driver can limit the number of VFs
>     to even fewer (it may have pre-allocated data structures or knowledge of
>     device limitations) by calling pci_sriov_set_totalvfs(), but previously it
>     could not limit the VFs to 0.
>     
>     Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
>     by the PF driver, even if the limit is 0.
>     
>     This sequence:
>     
>       pci_sriov_set_totalvfs(dev, 0);
>       x = pci_sriov_get_totalvfs(dev);
>     
>     previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
>     "x" to 0.
>     
>     Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 192b82898a38..d0d73dbbd5ca 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>  	iov->nres = nres;
>  	iov->ctrl = ctrl;
>  	iov->total_VFs = total;
> +	iov->driver_max_VFs = total;
>  	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
>  	iov->pgsz = pgsz;
>  	iov->self = dev;
> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  	if (!dev->is_physfn)
>  		return 0;
>  
> -	if (dev->sriov->driver_max_VFs)
> -		return dev->sriov->driver_max_VFs;
> -
> -	return dev->sriov->total_VFs;
> +	return dev->sriov->driver_max_VFs;
>  }
>  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>  

^ permalink raw reply

* Inefficient call to ipv6_chk_acast_addr_src in icmp6_send
From: Salam Noureddine @ 2018-05-25 21:02 UTC (permalink / raw)
  To: Network Development

Hi,

The call to ipv6_chk_acast_addr_src in icmp6_send can be pretty costly on
systems with a lot of net_devices since it can end up looping through all
net_devices in a net namespace searching for an anycast address. A few
thousand icmp6 error packets can end up consuming a whole CPU.
I am thinking of fixing this by adding a hash table along the lines of
inet6_addr_lst,
providing a fast lookup for anycast addresses. Is that the right way to go?

Thanks,

Salam

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2018-05-25 20:58 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Let's begin the holiday weekend with some networking fixes:

1) Whoops need to restrict cfg80211 wiphy names even more to 64
   bytes.  From Eric Biggers.

2) Fix flags being ignored when using kernel_connect() with SCTP, from
   Xin Long.

3) Use after free in DCCP, from Alexey Kodanev.

4) Need to check rhltable_init() return value in ipmr code, from
   Eric Dumazet.

5) XDP handling fixes in virtio_net from Jason Wang.

6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.

7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
   Morgenstein.

8) Prevent out-of-bounds speculation using indexes in BPF, from Daniel
   Borkmann.

9) Fix regression added by AF_PACKET link layer cure, from Willem
   de Bruijn.

10) Correct ENIC dma mask, from Govindarajulu Varadarajan.

11) Missing config options for PMTU tests, from Stefano Brivio.

Please pull, thanks a lot.

The following changes since commit 6741c4bb389da103c0d79ad1961884628900bfe6:

  Merge tag 'mips_fixes_4.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips (2018-05-21 08:58:00 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to eb110410b9f6477726026669f3f0c0567e8241e6:

  ibmvnic: Fix partial success login retries (2018-05-25 16:32:48 -0400)

----------------------------------------------------------------
Alexey Kodanev (1):
      dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()

Anders Roxell (2):
      selftests: bpf: config: enable NET_SCH_INGRESS for xdp_meta.sh
      selftests: net: reuseport_bpf_numa: don't fail if no numa support

Andrew Zaborowski (1):
      mac80211_hwsim: Fix radio dump for radio idx 0

Bo Chen (1):
      pcnet32: add an error handling path in pcnet32_probe_pci()

Bob Copeland (1):
      mac80211: mesh: fix premature update of rc stats

Colin Ian King (2):
      batman-adv: don't pass a NULL hard_iface to batadv_hardif_put
      net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message

Daniel Borkmann (1):
      bpf: properly enforce index mask to prevent out-of-bounds speculation

David S. Miller (6):
      Merge tag 'mac80211-for-davem-2018-05-23' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'virtio_net-mergeable-XDP'
      Merge tag 'wireless-drivers-for-davem-2018-05-22' of git://git.kernel.org/.../kvalo/wireless-drivers
      Merge tag 'mlx5-fixes-2018-05-24' of git://git.kernel.org/.../saeed/linux
      Merge tag 'batadv-net-for-davem-20180524' of git://git.open-mesh.org/linux-merge
      Merge git://git.kernel.org/.../bpf/bpf

Dedy Lansky (1):
      nl80211: fix nlmsg allocation in cfg80211_ft_event

Eran Ben Elisha (1):
      net/mlx5e: When RXFCS is set, add FCS data into checksum calculation

Eric Biggers (2):
      cfg80211: further limit wiphy names to 64 bytes
      ppp: remove the PPPIOCDETACH ioctl

Eric Dumazet (1):
      ipmr: properly check rhltable_init() return value

Fabio Estevam (2):
      net: fec: ptp: Switch to SPDX identifier
      net: fec: Add a SPDX identifier

Florian Fainelli (2):
      net: phy: broadcom: Fix auxiliary control register reads
      net: phy: broadcom: Fix bcm_write_exp()

Govindarajulu Varadarajan (1):
      enic: set DMA mask to 47 bit

Haim Dreyfuss (1):
      cfg80211: fix NULL pointer derference when querying regdb

Jack Morgenstein (1):
      net/mlx4: Fix irq-unsafe spinlock usage

Jason Wang (6):
      virtio-net: correctly redirect linearized packet
      virtio-net: correctly transmit XDP buff after linearizing
      virtio-net: correctly check num_buf during err path
      virtio-net: fix leaking page for gso packet during mergeable XDP
      tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
      vhost: synchronize IOTLB message with dev cleanup

Kalle Valo (3):
      MAINTAINERS: update Kalle's email address
      MAINTAINERS: change Kalle as ath.ko maintainer
      MAINTAINERS: change Kalle as wcn36xx maintainer

Linus Lüssing (1):
      batman-adv: Fix TT sync flags for intermediate TT responses

Marek Lindner (1):
      batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs

Nathan Fontenot (1):
      ibmvnic: Only do H_EOI for mobility events

Or Gerlitz (1):
      net : sched: cls_api: deal with egdev path only if needed

Qing Huang (1):
      mlx4_core: allocate ICM memory in page size chunks

Rafał Miłecki (3):
      bcma: fix buffer size caused crash in bcma_core_mips_print_irq()
      Revert "ssb: Prevent build of PCI host features in module"
      ssb: make SSB_PCICORE_HOSTMODE depend on SSB = y

Roopa Prabhu (1):
      net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy

Shahed Shaikh (1):
      qed: Fix mask for physical address in ILT entry

Stefano Brivio (1):
      selftests/net: Add missing config options for PMTU tests

Sven Eckelmann (1):
      batman-adv: Avoid race in TT TVLV allocator helper

Thomas Falcon (1):
      ibmvnic: Fix partial success login retries

Wenwen Wang (1):
      isdn: eicon: fix a missing-check bug

Willem de Bruijn (2):
      packet: fix reserve calculation
      ipv4: remove warning in ip_recv_error

Xin Long (1):
      sctp: fix the issue that flags are ignored when using kernel_connect

Yossi Kuperman (1):
      net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands

 Documentation/networking/ppp_generic.txt             |  6 ------
 MAINTAINERS                                          |  8 ++++----
 drivers/bcma/driver_mips.c                           |  2 +-
 drivers/isdn/hardware/eicon/diva.c                   | 22 +++++++++++++++-------
 drivers/isdn/hardware/eicon/diva.h                   |  5 +++--
 drivers/isdn/hardware/eicon/divasmain.c              | 18 +++++++++++-------
 drivers/net/ethernet/amd/pcnet32.c                   | 10 +++++++---
 drivers/net/ethernet/cisco/enic/enic_main.c          |  8 ++++----
 drivers/net/ethernet/freescale/fec_main.c            |  1 +
 drivers/net/ethernet/freescale/fec_ptp.c             | 14 +-------------
 drivers/net/ethernet/ibm/ibmvnic.c                   | 22 +++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx4/icm.c             | 16 +++++++++-------
 drivers/net/ethernet/mellanox/mlx4/intf.c            |  2 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c              |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c      | 42 ++++++++++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 12 +++++-------
 drivers/net/ethernet/qlogic/qed/qed_cxt.c            |  2 +-
 drivers/net/phy/bcm-cygnus.c                         |  6 +++---
 drivers/net/phy/bcm-phy-lib.c                        |  2 +-
 drivers/net/phy/bcm-phy-lib.h                        |  7 +++++++
 drivers/net/phy/bcm7xxx.c                            |  4 ++--
 drivers/net/ppp/ppp_generic.c                        | 27 +++++----------------------
 drivers/net/tun.c                                    | 19 +++++++++++++++----
 drivers/net/virtio_net.c                             | 21 ++++++++++-----------
 drivers/net/wireless/mac80211_hwsim.c                |  4 ++--
 drivers/ssb/Kconfig                                  |  4 ++--
 drivers/vhost/vhost.c                                |  3 +++
 include/linux/bpf_verifier.h                         |  2 +-
 include/net/sctp/sctp.h                              |  2 ++
 include/uapi/linux/nl80211.h                         |  2 +-
 include/uapi/linux/ppp-ioctl.h                       |  2 +-
 kernel/bpf/verifier.c                                | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
 net/batman-adv/multicast.c                           |  2 +-
 net/batman-adv/translation-table.c                   | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
 net/dccp/proto.c                                     |  2 --
 net/ipv4/fib_frontend.c                              |  1 +
 net/ipv4/ip_sockglue.c                               |  2 --
 net/ipv4/ipmr_base.c                                 |  5 ++++-
 net/mac80211/mesh_plink.c                            |  8 ++++----
 net/packet/af_packet.c                               |  2 +-
 net/sched/cls_api.c                                  |  2 +-
 net/sctp/ipv6.c                                      |  2 +-
 net/sctp/protocol.c                                  |  2 +-
 net/sctp/socket.c                                    | 51 +++++++++++++++++++++++++++++++++++----------------
 net/wireless/nl80211.c                               |  3 ++-
 net/wireless/reg.c                                   |  3 +++
 tools/testing/selftests/bpf/config                   |  2 ++
 tools/testing/selftests/net/config                   |  5 +++++
 tools/testing/selftests/net/reuseport_bpf_numa.c     |  4 +++-
 49 files changed, 372 insertions(+), 193 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: dsa_loop: Make dynamic debugging helpful
From: David Miller @ 2018-05-25 20:52 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20180525035215.19341-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 May 2018 20:52:14 -0700

> Remove redundant debug prints from phy_read/write since we can trace those
> calls through trace events. Enhance dynamic debug prints to print arguments
> which helps figuring how what is going on at the driver level with higher level
> configuration interfaces.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks Florian.

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-05-25 20:46 UTC (permalink / raw)
  To: Don Dutile
  Cc: Jakub Kicinski, Bjorn Helgaas, linux-pci, netdev, Sathya Perla,
	Felix Manlunas, alexander.duyck, john.fastabend, Jacob Keller,
	oss-drivers, Christoph Hellwig
In-Reply-To: <88390255-55f7-57a6-5324-d443373d1984@redhat.com>

On Fri, May 25, 2018 at 03:27:52PM -0400, Don Dutile wrote:
> On 05/25/2018 10:02 AM, Bjorn Helgaas wrote:
> > On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > > Hi Bjorn!
> > > 
> > > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > > to not fail, e.g.:
> > > > > 
> > > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > > 
> > > > > For devices which VF support depends on loaded FW we have the
> > > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > > to 0.  Remove the special values completely and simply initialize
> > > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > > Add a helper for drivers to reset the VF limit back to total.
> > > > 
> > > > I still can't really make sense out of the changelog.
> > > > 
> > > > I think part of the reason it's confusing is because there are two
> > > > things going on:
> > > > 
> > > >    1) You want this:
> > > >         pci_sriov_set_totalvfs(dev, 0);
> > > >         x = pci_sriov_get_totalvfs(dev)
> > > > 
> > > >       to return 0 instead of total_VFs.  That seems to connect with
> > > >       your subject line.  It means "sriov_totalvfs" in sysfs could be
> > > >       0, but I don't know how that is useful (I'm sure it is; just
> > > >       educate me :))
> > > 
> > > Let me just quote the bug report that got filed on our internal bug
> > > tracker :)
> > > 
> > >    When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > >    errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > >    then tries to set that as the sriov_numvfs parameter.
> > > 
> > >    For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0,
> > >    but it's set to max.  When FW is switched to flower*, the correct
> > >    sriov_totalvfs value is presented.
> > > 
> > > * flower is a project name
> > 
> >  From the point of view of the PCI core (which knows nothing about
> > device firmware and relies on the architected config space described
> > by the PCIe spec), this sounds like an erratum: with some firmware
> > installed, the device is not capable of SR-IOV, but still advertises
> > an SR-IOV capability with "TotalVFs > 0".
> > 
> > Regardless of whether that's an erratum, we do allow PF drivers to use
> > pci_sriov_set_totalvfs() to limit the number of VFs that may be
> > enabled by writing to the PF's "sriov_numvfs" sysfs file.
> > 
> +1.
> 
> > But the current implementation does not allow a PF driver to limit VFs
> > to 0, and that does seem nonsensical.
> > 
> Well, not really -- claiming to support VFs, and then wanting it to be 0...
> I could certainly argue is non-sensical.
> From a sw perspective, sure, see if we can set VFs to 0 (and reset to another value later).
> 
> /me wishes that implementers would follow the architecture vs torquing it into strange shapes.
> 
> > > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > > VFs can be enabled, looks like this is the code:
> > > 
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > > 
> > > >    2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > > >       sure what you intend for this.  Is *every* driver supposed to
> > > >       call it in .remove()?  Could/should this be done in the core
> > > >       somehow instead of depending on every driver?
> > > 
> > > Good question, I was just thinking yesterday we may want to call it
> > > from the core, but I don't think it's strictly necessary nor always
> > > sufficient (we may reload FW without re-probing).
> > > 
> > > We have a device which supports different number of VFs based on the FW
> > > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > > support, because it supports max.  So the flow in our driver is this:
> > > 
> > > load_fw(dev);
> > > ...
> > > max_vfs = ask_fw_for_max_vfs(dev);
> > > if (max_vfs >= 0)
> > > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > > else /* FW didn't tell us, assume max */
> > > 	return pci_sriov_reset_totalvfs(dev);
> > > 
> > > We also reset the max on device remove, but that's not strictly
> > > necessary.
> > > 
> > > Other users of pci_sriov_set_totalvfs() always know the value to set
> > > the total to (either always get it from FW or it's a constant).
> > > 
> > > If you prefer we can work out the correct max for those legacy cases in
> > > the driver as well, although it seemed cleaner to just ask the core,
> > > since it already has total_VFs value handy :)
> > > 
> > > > I'm also having a hard time connecting your user-space command example
> > > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > > after some coffee.
> > > 
> > > OpenStack assumes it will always be able to set sriov_numvfs to
> > > sriov_totalvfs, see this 'if':
> > > 
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
> > 
> > Thanks for educating me.  I think there are two issues here that we
> > can separate.  I extracted the patch below for the first.
> > 
> > The second is the question of resetting driver_max_VFs.  I think we
> > currently have a general issue in the core:
> > 
> >    - load PF driver 1
> >    - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
> >    - unload PF driver 1
> >    - load PF driver 2
> > 
> > Now driver_max_VFs is still stuck at the lower value set by driver 1.
> > I don't think that's the way this should work.
> > 
> > I guess this is partly a consequence of setting driver_max_VFs in
> > sriov_init(), which is called before driver attach and should only
> um, if it's at sriov_init() how is max changed by a PF driver?
> or am I missing something subtle (a new sysfs param) as to what is being changed?

sriov_init() basically just sets the default driver_max_VFs to Total_VFs.

If the PF driver later calls pci_sriov_set_totalvfs(), it can reduce
driver_max_VFs.

My concern is that there's nothing that resets driver_max_VFs back to
Total_VFs if we unload and reload the PF driver.

> > depend on hardware characteristics, so it is related to the patch
> > below.  But I think we should fix it in general, not just for
> > netronome.
> > 
> > 
> > commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> > Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Date:   Fri May 25 08:18:34 2018 -0500
> > 
> >      PCI/IOV: Allow PF drivers to limit total_VFs to 0
> >      Some SR-IOV PF drivers implement .sriov_configure(), which allows
> >      user-space to enable VFs by writing the desired number of VFs to the sysfs
> >      "sriov_numvfs" file (see sriov_numvfs_store()).
> >      The PCI core limits the number of VFs to the TotalVFs advertised by the
> >      device in its SR-IOV capability.  The PF driver can limit the number of VFs
> >      to even fewer (it may have pre-allocated data structures or knowledge of
> >      device limitations) by calling pci_sriov_set_totalvfs(), but previously it
> >      could not limit the VFs to 0.
> >      Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
> >      by the PF driver, even if the limit is 0.
> >      This sequence:
> >        pci_sriov_set_totalvfs(dev, 0);
> >        x = pci_sriov_get_totalvfs(dev);
> >      previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
> >      "x" to 0.
> >      Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >      Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 192b82898a38..d0d73dbbd5ca 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >   	iov->nres = nres;
> >   	iov->ctrl = ctrl;
> >   	iov->total_VFs = total;
> > +	iov->driver_max_VFs = total;
> >   	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> >   	iov->pgsz = pgsz;
> >   	iov->self = dev;
> > @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> >   	if (!dev->is_physfn)
> >   		return 0;
> > -	if (dev->sriov->driver_max_VFs)
> > -		return dev->sriov->driver_max_VFs;
> > -
> > -	return dev->sriov->total_VFs;
> > +	return dev->sriov->driver_max_VFs;
> >   }
> >   EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> > 
> 

^ permalink raw reply

* Re: [PATCH net-next v5 0/2] openvswitch: Support conntrack zone limit
From: David Miller @ 2018-05-25 20:45 UTC (permalink / raw)
  To: yihung.wei; +Cc: netdev, pshelar
In-Reply-To: <1527209803-48274-1-git-send-email-yihung.wei@gmail.com>

From: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Thu, 24 May 2018 17:56:41 -0700

> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max.  In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others from committing valid conntrack entries into the
> conntrack table.  Even if we can possibly put the VM in different network
> namespace, the current nf_conntrack_max configuration is kind of rigid
> that we cannot limit different VM/container to have different # conntrack
> entries.
> 
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone.  For example, we can designate different zone to different VM,
> and set conntrack limit to each zone.  By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs.  Moreover, the users can
> set various conntrack limit to different zone based on their preference.
> 
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone.  If the number of
> connection is above a configured limitation, OVS will return ENOMEM to the
> userspace.  If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
> 
> The first patch defines the conntrack limit netlink definition, and the
> second patch provides the implementation.
 ...

Series applied, thanks for sticking with it so long and responding to the
feedback you received.

^ permalink raw reply

* Re: [PATCH v4 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2
From: Alexei Starovoitov @ 2018-05-25 20:45 UTC (permalink / raw)
  To: Sean Young
  Cc: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cd5140387a0f9c5ffc68d1846774f12fed45f34d.1526651592.git.sean@mess.org>

On Fri, May 18, 2018 at 03:07:29PM +0100, Sean Young wrote:
> Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
> 
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
> 
> Signed-off-by: Sean Young <sean@mess.org>
...
>  enum bpf_attach_type {
> @@ -158,6 +159,7 @@ enum bpf_attach_type {
>  	BPF_CGROUP_INET6_CONNECT,
>  	BPF_CGROUP_INET4_POST_BIND,
>  	BPF_CGROUP_INET6_POST_BIND,
> +	BPF_LIRC_MODE2,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> @@ -1902,6 +1904,53 @@ union bpf_attr {
>   *		egress otherwise). This is the only flag supported for now.
>   *	Return
>   *		**SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
> + *	Description
> + *		This helper is used in programs implementing IR decoding, to
> + *		report a successfully decoded key press with *scancode*,
> + *		*toggle* value in the given *protocol*. The scancode will be
> + *		translated to a keycode using the rc keymap, and reported as
> + *		an input key down event. After a period a key up event is
> + *		generated. This period can be extended by calling either
> + *		**bpf_rc_keydown** () with the same values, or calling
> + *		**bpf_rc_repeat** ().
> + *
> + *		Some protocols include a toggle bit, in case the button
> + *		was released and pressed again between consecutive scancodes
> + *
> + *		The *ctx* should point to the lirc sample as passed into
> + *		the program.
> + *
> + *		The *protocol* is the decoded protocol number (see
> + *		**enum rc_proto** for some predefined values).
> + *
> + *		This helper is only available is the kernel was compiled with
> + *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + *		"**y**".
> + *
> + *	Return
> + *		0
> + *
> + * int bpf_rc_repeat(void *ctx)
> + *	Description
> + *		This helper is used in programs implementing IR decoding, to
> + *		report a successfully decoded repeat key message. This delays
> + *		the generation of a key up event for previously generated
> + *		key down event.
> + *
> + *		Some IR protocols like NEC have a special IR message for
> + *		repeating last button, for when a button is held down.
> + *
> + *		The *ctx* should point to the lirc sample as passed into
> + *		the program.
> + *
> + *		This helper is only available is the kernel was compiled with
> + *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + *		"**y**".

Hi Sean,

thank you for working on this. The patch set looks good to me.
I'd only ask to change above two helper names to something more specific.
Since BPF_PROG_TYPE_LIRC_MODE2 is the name of new prog type and kconfig.
May be bpf_lirc2_keydown() and bpf_lirc2_repeat() ?

> @@ -1576,6 +1577,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  	case BPF_SK_SKB_STREAM_PARSER:
>  	case BPF_SK_SKB_STREAM_VERDICT:
>  		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
> +	case BPF_LIRC_MODE2:
> +		return rc_dev_prog_attach(attr);
...
> +	case BPF_LIRC_MODE2:
> +		return rc_dev_prog_detach(attr);

and similar rename for internal function names that go into bpf core.

Please add accumulated acks when you respin.

Thanks

^ permalink raw reply

* Re: [PATCH net-next] ifb: fix packets checksum
From: David Miller @ 2018-05-25 20:43 UTC (permalink / raw)
  To: jmaxwell37
  Cc: dsahern, mschiffer, zhangshengju, ktkhai, netdev, linux-kernel,
	jmaxwell
In-Reply-To: <20180524213829.15208-1-jmaxwell37@gmail.com>

From: Jon Maxwell <jmaxwell37@gmail.com>
Date: Fri, 25 May 2018 07:38:29 +1000

> Fixup the checksum for CHECKSUM_COMPLETE when pulling skbs on RX path. 
> Otherwise we get splats when tc mirred is used to redirect packets to ifb.
> 
> Before fix:
> 
> nic: hw csum failure
> 
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>

This definitely seems correct, but I am really surprised a bug like this has
lasted as long as it has.

So I'll let this sit for another day or two for review.

^ permalink raw reply

* Re: [pull request][net-next V2 0/6] Mellanox, mlx5e updates 2018-05-19
From: David Miller @ 2018-05-25 20:42 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 24 May 2018 14:38:14 -0700

> This is a mlx5e only pull request, for more information please see tag
> log below.
> 
> Please pull and let me know if there's any problem.
> 
> v1->v2:
> 1) patch #1 commit message: lldptool usage example and explanation on why 
>    dcbnl is selected over devlink interface as was agreed on mailing list.
> 
> 2) patches #1 and #6: Add total_size in dcbnl_buffer to report the total
>    available buffer size of the netdev, as suggested by John.
> 
> 3) Added Reviewed-by tag to all the patches.

Ok, thanks for the discussion and details in patch #1.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Vlad Buslov @ 2018-05-25 20:39 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, Kees Cook,
	LKML, NetFilter, coreteam, kliteyn
In-Reply-To: <CAM_iQpXMbtUWsaGBrpJH08dM4p9oVwpMSGrev1PThbP5d23sdA@mail.gmail.com>

On Thu 24 May 2018 at 23:34, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov <vladbu@mellanox.com> wrote:
>> Currently, all netlink protocol handlers for updating rules, actions and
>> qdiscs are protected with single global rtnl lock which removes any
>> possibility for parallelism. This patch set is a first step to remove
>> rtnl lock dependency from TC rules update path. It updates act API to
>> use atomic operations, rcu and spinlocks for fine-grained locking. It
>> also extend API with functions that are needed to update existing
>> actions for parallel execution.
>
> Can you give a summary here for what and how it is achieved?

Got it, will expand cover letter in V2 with summary.
>
> You said this is the first step, what do you want to achieve in this
> very first step? And how do you achieve it? Do you break the RTNL

But aren't this questions answered in paragraph you quoted?
What: Change act API to not rely on one-big-global-RTNL-lock and to use
more fine-grained synchronization methods to allow safe concurrent
execution.
How: Refactor act API code to use atomics, rcu and spinlocks, etc. for
protecting shared data structures, add new functions required to update
specific actions implementation for parallel execution. (step 2)

If you feel that this cover letter is too terse, I will add outline of
changes in V2.

> lock down to, for a quick example, a per-device lock? Or perhaps you
> completely remove it because of what reason?

I want to remove RTNL _dependency_ from act API data structures and
code. I probably should me more specific in this case:

Florian recently made a change that allows registering netlink protocol
handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with
this flag are called without RTNL taken. My end goal is to have rule
update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered
with UNLOCKED flag to allow parallel execution.

I do not intend to globally remove or break RTNL.

>
> I go through all the descriptions of your 14 patches (but not any code),
> I still have no clue how you successfully avoid RTNL. Please don't
> let me read into your code to understand that, there must be some
> high-level justification on how it works. Without it, I don't event want
> to read into the code.

On internal code review I've been asked not to duplicate info from
commit messages in cover letter, but I guess I can expand it with some
high level outline in V2.

>
> Thanks.

Thank you for your feedback!

^ permalink raw reply

* Re: [PATCH] 8139too: Remove unnecessary netif_napi_del()
From: David Miller @ 2018-05-25 20:37 UTC (permalink / raw)
  To: chenbo; +Cc: netdev, linux-kernel
In-Reply-To: <20180524194835.14700-1-chenbo@pdx.edu>

From: Bo Chen <chenbo@pdx.edu>
Date: Thu, 24 May 2018 12:48:35 -0700

> The call to free_netdev() in __rtl8139_cleanup_dev() clears the network device
> napi list, and explicit calls to netif_napi_del() are unnecessary.
> 
> Signed-off-by: Bo Chen <chenbo@pdx.edu>

Since this is just unnecessary work and not a bug, applied to net-next.

Thanks.

^ permalink raw reply

* Re: [PATCH net] ibmvnic: Fix partial success login retries
From: David Miller @ 2018-05-25 20:36 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, nfont, jallen
In-Reply-To: <1527190673-11146-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Thu, 24 May 2018 14:37:53 -0500

> In its current state, the driver will handle backing device
> login in a loop for a certain number of retries while the
> device returns a partial success, indicating that the driver
> may need to try again using a smaller number of resources.
> 
> The variable it checks to continue retrying may change
> over the course of operations, resulting in reallocation
> of resources but exits without sending the login attempt.
> Guard against this by introducing a boolean variable that
> will retain the state indicating that the driver needs to
> reattempt login with backing device firmware.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

Applied.

^ permalink raw reply

* Re: Poor TCP performance with XPS enabled after scrubbing skb
From: David Miller @ 2018-05-25 20:29 UTC (permalink / raw)
  To: fbl; +Cc: eric.dumazet, netdev, pabeni
In-Reply-To: <20180524191729.GA3770@plex.lan>

From: Flavio Leitner <fbl@sysclose.org>
Date: Thu, 24 May 2018 16:17:29 -0300

> veth originally called skb_orphan() on veth_xmit() most probably
> because there was no TX completion. Then the code got generalized to
> dev_forward_skb() and later on moved to skb_scrub_packet().
> 
> The issue is that we call skb_scrub_packet() on TX and RX paths and
> that is done while crossing netns.  It doesn't look correct to keep
> the ->sk because I suspect that iptables/selinux/bpf, or some code
> path that I am probably missing could expose/use the wrong ->sk, for
> example.
> 
> However, netdev_pick_tx() can't store the queue mapping without ->sk.
> 
> The hack in the first email relies on the headers (skb_tx_hash) to
> always selected the same TX queue, which solves the original problem
> but not the TCP small queues you mentioned.

Right, we can't allow a socket reference to escape over a netns
crossing.

However, that is where we get the queue mapping state.

We might need to put the sk based decision into the skb somehow in
order to satisfy these two incompatibel requirements.

^ permalink raw reply

* Re: [PATCH net-next] tcp: use data length instead of skb->len in tcp_probe
From: David Miller @ 2018-05-25 20:27 UTC (permalink / raw)
  To: songliubraving; +Cc: laoar.shao, netdev, linux-kernel
In-Reply-To: <630ED9AE-A932-4B6A-8236-D133CE3E0726@fb.com>

From: Song Liu <songliubraving@fb.com>
Date: Thu, 24 May 2018 17:44:46 +0000

> We should also rename __entry->length to __entry->data_len, so that whoever
> using this field will notice the change. 

Agreed.

^ permalink raw reply

* Re: [PATCH net-next 0/5] qed*: ethtool rx flow classification enhancements.
From: David Miller @ 2018-05-25 20:11 UTC (permalink / raw)
  To: manish.chopra; +Cc: netdev, ariel.elior, michal.kalderon
In-Reply-To: <20180524165453.11852-1-manish.chopra@cavium.com>

From: Manish Chopra <manish.chopra@cavium.com>
Date: Thu, 24 May 2018 09:54:48 -0700

> This series re-structures the driver's ethtool rx flow
> classification flow, following that it adds other flow
> profiles and rx flow classification enhancements
> via "ethtool -N/-U"
> 
> Please consider applying this to "net-next"

The code is definitely easier to read and understand, especially after
patch #1.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] vrf: add CRC32c offload to device features
From: David Ahern @ 2018-05-25 20:06 UTC (permalink / raw)
  To: Davide Caratti, Vlad Yasevich, Marcelo Ricardo Leitner; +Cc: linux-sctp, netdev
In-Reply-To: <bb3aa69eaef613f033f8f52674740286ba67dc31.1527175921.git.dcaratti@redhat.com>

On 5/24/18 9:49 AM, Davide Caratti wrote:
> SCTP sockets originated in a VRF can improve their performance if CRC32c
> computation is delegated to underlying devices: update device features,
> setting NETIF_F_SCTP_CRC. Iterating the following command in the topology
> proposed with [1],
> 
>  # ip vrf exec vrf-h2 netperf -H 192.0.2.1 -t SCTP_STREAM -- -m 10K
> 
> the measured throughput in Mbit/s improved from 2395 ± 1% to 2720 ± 1%.
> 
> [1] https://www.spinics.net/lists/netdev/msg486007.html
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>  drivers/net/vrf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: David Ahern <dsa@cumulusnetworks.com>

^ permalink raw reply

* Re: pull-request: bpf 2018-05-24
From: Daniel Borkmann @ 2018-05-25 19:54 UTC (permalink / raw)
  To: David Miller; +Cc: ast, netdev
In-Reply-To: <20180525.155153.205200316874660386.davem@davemloft.net>

On 05/25/2018 09:51 PM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Thu, 24 May 2018 18:38:02 +0200
> 
>> The following pull-request contains BPF updates for your *net* tree.
>>
>> The main changes are:
>>
>> 1) Fix a bug in the original fix to prevent out of bounds speculation when
>>    multiple tail call maps from different branches or calls end up at the
>>    same tail call helper invocation, from Daniel.
>>
>> 2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
>>    case of missing numa support and another one to update kernel config to
>>    properly support xdp_meta.sh test, from Anders.
>>
>> Please consider pulling these changes from:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
> 
> Pulled, thanks Danile.
> 
>> Would be great if you have a chance to merge net into net-next after that.
>>
>> The verifier fix would be needed later as a dependency in bpf-next for
>> upcomig work there. When you do the merge there's a trivial conflict on
>> BPF side with 849fa50662fb ("bpf/verifier: refine retval R0 state for
>> bpf_get_stack helper"): Resolution is to keep both functions, the
>> do_refine_retval_range() and record_func_map().
> 
> I'll try to push it along as soon as I can.
> 
> Thanks for the merge conflict heads-up.

Awesome, thanks a lot David!

^ permalink raw reply

* Re: pull-request: bpf 2018-05-24
From: David Miller @ 2018-05-25 19:51 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20180524163802.2938-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 24 May 2018 18:38:02 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Fix a bug in the original fix to prevent out of bounds speculation when
>    multiple tail call maps from different branches or calls end up at the
>    same tail call helper invocation, from Daniel.
> 
> 2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
>    case of missing numa support and another one to update kernel config to
>    properly support xdp_meta.sh test, from Anders.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Danile.

> Would be great if you have a chance to merge net into net-next after that.
> 
> The verifier fix would be needed later as a dependency in bpf-next for
> upcomig work there. When you do the merge there's a trivial conflict on
> BPF side with 849fa50662fb ("bpf/verifier: refine retval R0 state for
> bpf_get_stack helper"): Resolution is to keep both functions, the
> do_refine_retval_range() and record_func_map().

I'll try to push it along as soon as I can.

Thanks for the merge conflict heads-up.

^ permalink raw reply

* [PATCH 2/2] batman-adv: Drop "experimental" from BATMAN_V Kconfig
From: Sven Eckelmann @ 2018-05-25 19:48 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, a, sw, mareklindner, joe, sergei.shtylyov,
	Sven Eckelmann
In-Reply-To: <20180525194837.12589-1-sven@narfation.org>

The Kconfig option BATMAN_ADV_BATMAN_V is now enabled by default when the
BATMAN_ADV is enabled. A feature which is enabled by default for a module
should not be considered experimental.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
 net/batman-adv/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
index 41bb67d70c83..da0b7aa98be9 100644
--- a/net/batman-adv/Kconfig
+++ b/net/batman-adv/Kconfig
@@ -32,7 +32,7 @@ config BATMAN_ADV
           tools.
 
 config BATMAN_ADV_BATMAN_V
-	bool "B.A.T.M.A.N. V protocol (experimental)"
+	bool "B.A.T.M.A.N. V protocol"
 	depends on BATMAN_ADV && !(CFG80211=m && BATMAN_ADV=y)
 	default y
 	help
-- 
2.17.0

^ permalink raw reply related

* [PATCH 1/2] batman-adv: Remove "default n" in Kconfig
From: Sven Eckelmann @ 2018-05-25 19:48 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, a,
	joe-6d6DIl74uiNBDgjK7y7TUQ

The "default n" is the default value for any bool or tristate Kconfig
setting. It is therefore not necessary to add it to the an config entry.

Reported-by: Sergei Shtylyov <sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8@public.gmane.org>
Signed-off-by: Sven Eckelmann <sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org>
---
 net/batman-adv/Kconfig | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
index de8034d80623..41bb67d70c83 100644
--- a/net/batman-adv/Kconfig
+++ b/net/batman-adv/Kconfig
@@ -24,7 +24,6 @@ config BATMAN_ADV
 	depends on NET
 	select CRC16
 	select LIBCRC32C
-        default n
 	help
           B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is
           a routing protocol for multi-hop ad-hoc mesh networks. The
@@ -60,7 +59,6 @@ config BATMAN_ADV_BLA
 config BATMAN_ADV_DAT
 	bool "Distributed ARP Table"
 	depends on BATMAN_ADV && INET
-	default n
 	help
 	  This option enables DAT (Distributed ARP Table), a DHT based
 	  mechanism that increases ARP reliability on sparse wireless
@@ -70,7 +68,6 @@ config BATMAN_ADV_DAT
 config BATMAN_ADV_NC
 	bool "Network Coding"
 	depends on BATMAN_ADV
-	default n
 	help
 	  This option enables network coding, a mechanism that aims to
 	  increase the overall network throughput by fusing multiple
@@ -84,7 +81,6 @@ config BATMAN_ADV_NC
 config BATMAN_ADV_MCAST
 	bool "Multicast optimisation"
 	depends on BATMAN_ADV && INET && !(BRIDGE=m && BATMAN_ADV=y)
-	default n
 	help
 	  This option enables the multicast optimisation which aims to
 	  reduce the air overhead while improving the reliability of
@@ -94,7 +90,6 @@ config BATMAN_ADV_DEBUGFS
 	bool "batman-adv debugfs entries"
 	depends on BATMAN_ADV
 	depends on DEBUG_FS
-	default n
 	help
 	  Enable this to export routing related debug tables via debugfs.
 	  The information for each soft-interface and used hard-interface can be
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net] net: sched: check netif_xmit_frozen_or_stopped() in sch_direct_xmit()
From: Song Liu @ 2018-05-25 19:46 UTC (permalink / raw)
  To: Song Liu, ast; +Cc: netdev, kernel-team, John Fastabend, David S . Miller
In-Reply-To: <20180525181144.224395-1-songliubraving@fb.com>

On Fri, May 25, 2018 at 11:11 AM, Song Liu <songliubraving@fb.com> wrote:
> Summary:
>
> At the end of sch_direct_xmit(), we are in the else path of
> !dev_xmit_complete(ret), which means ret == NETDEV_TX_OK. The following
> condition will always fail and netif_xmit_frozen_or_stopped() is not
> checked at all.
>
>     if (ret && netif_xmit_frozen_or_stopped(txq))
>          return false;
>
> In this patch, this condition is fixed as:
>
>     if (netif_xmit_frozen_or_stopped(txq))
>          return false;
>
> and further simplifies the code as:
>
>     return !netif_xmit_frozen_or_stopped(txq);
>
> Fixes: 29b86cdac00a ("net: sched: remove remaining uses for qdisc_qlen in xmit path")
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  net/sched/sch_generic.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 39c144b..8261d48 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -346,10 +346,7 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>                 return false;
>         }
>
> -       if (ret && netif_xmit_frozen_or_stopped(txq))
> -               return false;
> -
> -       return true;
> +       return !netif_xmit_frozen_or_stopped(txq);
>  }
>
>  /*
> --
> 2.9.5
>

Alexei and I discussed about this offline. We would like to share our
discussion here to
clarify the motivation.

Before 29b86cdac00a, ret in condition "if (ret &&
netif_xmit_frozen_or_stopped()" is not
the value from dev_hard_start_xmit(), because ret is overwritten by
either qdisc_qlen()
or dev_requeue_skb(). Therefore, 29b86cdac00a changed the behavior of
this condition.

For ret from dev_hard_start_xmit(), I dig into the function and found
it is from return value
of ndo_start_xmit(). Per netdevice.h, ndo_start_xmit() should only
return NETDEV_TX_OK
or NETDEV_TX_BUSY. I survey many drivers, and they all follow the rule. The only
exception is vlan.

Given ret could only be NETDEV_TX_OK or NETDEV_TX_BUSY (ignore vlan for now),
if it fails condition "if (!dev_xmit_complete(ret))", ret must be
NETDEV_TX_OK == 0. So
netif_xmit_frozen_or_stopped() will always be bypassed.

It is probably OK to ignore netif_xmit_frozen_or_stopped(), and return true from
sch_direct_xmit(), as I didn't see that break any functionality. But
it is more like "correct
by accident" to me. This is the motivation of my original patch.

Alexei pointed out that, the following condition is more like original logic:

      if (qdisc_qlen(q) && netif_xmit_frozen_or_stopped(txq))
            return false;

However, I think John would like to remove qdisc_qlen() from the tx
path. I didn't see
any issue without the extra qdisc_qlen() check, so the patch is
probably good AS-IS.

Please share your comments and feedback on this.

Thanks,
Song

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Alexei Starovoitov @ 2018-05-25 19:44 UTC (permalink / raw)
  To: Eugene Syromiatnikov
  Cc: Jesper Dangaard Brouer, netdev, linux-kernel, linux-doc,
	Kees Cook, Kai-Heng Feng, Daniel Borkmann, Alexei Starovoitov,
	Jonathan Corbet, Jiri Olsa
In-Reply-To: <20180525165009.GA20952@asgard.redhat.com>

On Fri, May 25, 2018 at 06:50:09PM +0200, Eugene Syromiatnikov wrote:
> On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:
> > On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:
> > > On Wed, 23 May 2018 15:02:45 -0700
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > 
> > > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> > > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > > system boot/init stages these sysctls are not yet configured.
> > > > > A concrete example is systemd, that has implemented loading of BPF
> > > > > programs.
> > > > > 
> > > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > > adds the ability to change the default setting of these sysctl knobs
> > > > > as well as option to override them via a boot-time kernel parameter
> > > > > (in order to avoid rebuilding kernel each time a need of changing these
> > > > > defaults arises).
> > > > > 
> > > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.  
> > > > 
> > > > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > > >   so disabling unpriv during boot time makes no difference to systemd.
> > > >   what is the actual reason to present time?
> systemd also runs a lot of code, some of which is unprivileged.

systemd processes sysctl configs first. It's essential for system
security to do so. If you have concerns in how systemd does that
please bring it up with systemd folks.

> > > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > > >   networking. With unpriv disable during boot, it will force systemd
> > > >   to do such networking from root, which will lower its security barrier.
> No, it will force systemd not to use SO_REUSEPORT BPF.

sorry this argument makes no sense to me.

> > > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > > >   Flipping it during the boot or right after or any time after
> > > >   is the same thing. Why add such boot flag then?
> Well, that one was for completeness.

Should we convert all sysctls to bootparams for 'completeness' ?

> > > > - jit_harden can be turned on by systemd. so turning it during the boot
> > > >   will make systemd progs to be constant blinded.
> > > >   Constant blinding protects kernel from unprivileged JIT spraying.
> > > >   Are you worried that systemd will attack the kernel with JIT spraying?
> I'm worried that systemd can be exploited for a JIT spraying attack.

I'm afraid we're not on the same page with definition of 'JIT spraying attack'.

> Another thing I'm concerned with is that the generated code is different,
> which introduces additional complication during debugging.

specifically what kind of complication?

^ permalink raw reply

* Re: [PATCH] net: netsec: reduce DMA mask to 40 bits
From: Robin Murphy @ 2018-05-25 19:37 UTC (permalink / raw)
  To: Jassi Brar
  Cc: Ard Biesheuvel, netdev, David S. Miller, Masahisa Kojima,
	Ilias Apalodimas, nd
In-Reply-To: <CAJe_ZhfeEuRUTncyueb2ZRJq1uKrV44eiYVvfNrFufXdbDFRAw@mail.gmail.com>

On Sat, 26 May 2018 00:33:05 +0530
Jassi Brar <jaswinder.singh@linaro.org> wrote:

> On 25 May 2018 at 18:20, Ard Biesheuvel <ard.biesheuvel@linaro.org>
> wrote:
> > The netsec network controller IP can drive 64 address bits for DMA,
> > and the DMA mask is set accordingly in the driver. However, the
> > SynQuacer SoC, which is the only silicon incorporating this IP at
> > the moment, integrates this IP in a manner that leaves address bits
> > [63:40] unconnected.
> >
> > Up until now, this has not resulted in any problems, given that the
> > DDR controller doesn't decode those bits to begin with. However,
> > recent firmware updates for platforms incorporating this SoC allow
> > the IOMMU to be enabled, which does decode address bits [47:40],
> > and allocates top down from the IOVA space, producing DMA addresses
> > that have bits set that have been left unconnected.
> >
> > Both the DT and ACPI (IORT) descriptions of the platform take this
> > into account, and only describe a DMA address space of 40 bits
> > (using either dma-ranges DT properties, or DMA address limits in
> > IORT named component nodes). However, even though our IOMMU and bus
> > layers may take such limitations into account by setting a narrower
> > DMA mask when creating the platform device, the netsec probe()
> > entrypoint follows the common practice of setting the DMA mask
> > uncondionally, according to the capabilities of the IP block itself
> > rather than to its integration into the chip.
> >
> > It is currently unclear what the correct fix is here. We could hack
> > around it by only setting the DMA mask if it deviates from its
> > default value of DMA_BIT_MASK(32). However, this makes it
> > impossible for the bus layer to use DMA_BIT_MASK(32) as the bus
> > limit, and so it appears that a more comprehensive approach is
> > required to take DMA limits imposed by the SoC as a whole into
> > account.
> >
> > In the mean time, let's limit the DMA mask to 40 bits. Given that
> > there is currently only one SoC that incorporates this IP, this is
> > a reasonable approach that can be backported to -stable and buys us
> > some time to come up with a proper fix going forward.
> >  
> I am sure you already thought about it, but why not let the platform
> specify the bit mask for the driver (via some "bus-width" property),
> to override the default 64 bit mask?

Because lack of a property to describe the integration is not the
problem. There are already at least two ways: the general DT/IORT
properties for describing DMA addressing - which it would be a bit
ungainly for a driver to parse for this reason, but not impossible -
and inferring it from a SoC-specific compatible - which is more
appropriate, and what we happen to be able to do here.

Robin.

^ permalink raw reply

* Re: [PATCH net-next] vrf: add CRC32c offload to device features
From: David Miller @ 2018-05-25 19:36 UTC (permalink / raw)
  To: dcaratti; +Cc: dsa, vyasevich, marcelo.leitner, linux-sctp, netdev
In-Reply-To: <bb3aa69eaef613f033f8f52674740286ba67dc31.1527175921.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Thu, 24 May 2018 17:49:35 +0200

> SCTP sockets originated in a VRF can improve their performance if CRC32c
> computation is delegated to underlying devices: update device features,
> setting NETIF_F_SCTP_CRC. Iterating the following command in the topology
> proposed with [1],
> 
>  # ip vrf exec vrf-h2 netperf -H 192.0.2.1 -t SCTP_STREAM -- -m 10K
> 
> the measured throughput in Mbit/s improved from 2395 ± 1% to 2720 ± 1%.
> 
> [1] https://www.spinics.net/lists/netdev/msg486007.html
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

David A., please review.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox