Netdev List
 help / color / mirror / Atom feed
* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-21  2:30 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <20180519022938.GA18888@debian>



On 2018年05月19日 10:29, Tiwei Bie wrote:
>> I don't hope so.
>>
>>> I agreed driver should track the DMA addrs or some
>>> other necessary things from the very beginning. And
>>> I also repeated the spec to emphasize that it does
>>> make sense. And I'd like to do that.
>>>
>>> What I was saying is that, to support OOO, we may
>>> need to manage these context (which saves DMA addrs
>>> etc) via a list which is similar to the desc list
>>> maintained via `next` in split ring instead of an
>>> array whose elements always can be indexed directly.
>> My point is these context is a must (not only for OOO).
> Yeah, and I have the exactly same point after you
> pointed that I shouldn't get the addrs from descs.
> I do think it makes sense. I'll do it in the next
> version. I don't have any doubt about it. All my
> questions are about the OOO, instead of whether we
> should save context or not. It just seems that you
> thought I don't want to do it, and were trying to
> convince me that I should do it.

Right, but looks like I was wrong :)

>
>>> The desc ring in split ring is an array, but its
>>> free entries are managed as list via next. I was
>>> just wondering, do we want to manage such a list
>>> because of OOO. It's just a very simple question
>>> that I want to hear your opinion... (It doesn't
>>> means anything, e.g. It doesn't mean I don't want
>>> to support OOO. It's just a simple question...)
>> So the question is yes. But I admit I don't have better idea other than what
>> you propose here (something like split ring which is a little bit sad).
>> Maybe Michael had.
> Yeah, that's why I asked this question. It will
> make the packed ring a bit similar to split ring
> at least in the driver part. So I want to draw
> your attention on this to make sure that we're
> on the same page.

Yes. I think we are.

Thanks

> Best regards,
> Tiwei Bie
>

^ permalink raw reply

* Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Marcelo Ricardo Leitner @ 2018-05-21  1:54 UTC (permalink / raw)
  To: Neil Horman; +Cc: Xin Long, network dev, linux-sctp, davem
In-Reply-To: <20180521005059.GA15233@neilslaptop.think-freely.org>

On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote:
> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote:
> > This feature is actually already supported by sk->sk_reuse which can be
> > set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in
> > section 8.1.27, like:
> > 
> >   - This option only supports one-to-one style SCTP sockets
> >   - This socket option must not be used after calling bind()
> >     or sctp_bindx().
> > 
> > Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
> > Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
> > work in linux.
> > 
> > This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR,
> > just with some extra setup limitations that are neeeded when it is being
> > enabled.
> > 
> > "It should be noted that the behavior of the socket-level socket option
> > to reuse ports and/or addresses for SCTP sockets is unspecified", so it
> > leaves SO_REUSEADDR as is for the compatibility.
> > 
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
> >  include/uapi/linux/sctp.h |  1 +
> >  net/sctp/socket.c         | 48 +++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 49 insertions(+)
> > 
> A few things:
> 
> 1) I agree with Tom, this feature is a complete duplication of the SK_REUSEPORT
> socket option.  I understand that this is an implementation of the option in the
> RFC, but its definately a duplication of a feature, which makes several things
> really messy.
> 
> 2) The overloading of the sk_reuse opeion is a bad idea, for several reasons.
> Chief among them is the behavioral interference between this patch and the
> SO_REUSEADDR socket level option, that also sets this feature.  If you set
> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature regardless
> of the bind or 1:1/1:m state of the socket.  Vice versa, if you set this socket
> option via the SCTP_PORT_REUSE option you will inadvertently turn on address
> reuse for the socket.  We can't do that.

Given your comments, going a bit further here, one other big
implication is that a port would never be able to be considered to
fully meet SCTP standards regarding reuse because a rogue application
may always abuse of the socket level opt to gain access to the port.

IOW, the patch allows the application to use such restrictions against
itself and nothing else, which undermines the patch idea.

I lack the knowledge on why the SCTP option was proposed in the RFC. I
guess they had a good reason to add the restriction on 1:1/1:m style.
Does the usage of the current imply in any risk to SCTP sockets? If
yes, that would give some grounds for going forward with the SCTP
option.

> 
> Its a bit frustrating, since SO_REUSEPORT is widely available on multiple
> operating systems, but isn't standard (AFAIK).  I would say however, given the
> prevalence of the socket level option, we should likely advocate for the removal
> of the sctp specific option, or at the least implement and document it as being

Is it possible, to remove/deprecate an option once it is published on a RFC?

> an alias for SO_REUSEPORT
> 
> 
> As this stands however, its a NACK from me.
> 
> Neil
> 

^ permalink raw reply

* RE: [PATCH 1/2] net: fec: ptp: Switch to SPDX identifier
From: Andy Duan @ 2018-05-21  1:32 UTC (permalink / raw)
  To: Fabio Estevam, davem@davemloft.net; +Cc: netdev@vger.kernel.org, Fabio Estevam
In-Reply-To: <1526835319-17408-1-git-send-email-festevam@gmail.com>

From: Fabio Estevam <festevam@gmail.com> Sent: 2018年5月21日 0:55
> Adopt the SPDX license identifier headers to ease license compliance
> management.
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>

Acked-by: Fugang Duan <fugang.duan@nxp.com>

> ---
>  drivers/net/ethernet/freescale/fec_ptp.c | 14 +-------------
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_ptp.c
> b/drivers/net/ethernet/freescale/fec_ptp.c
> index f814397..43d9732 100644
> --- a/drivers/net/ethernet/freescale/fec_ptp.c
> +++ b/drivers/net/ethernet/freescale/fec_ptp.c
> @@ -1,20 +1,8 @@
> +// SPDX-License-Identifier: GPL-2.0
>  /*
>   * Fast Ethernet Controller (ENET) PTP driver for MX6x.
>   *
>   * Copyright (C) 2012 Freescale Semiconductor, Inc.
> - *
> - * This program is free software; you can redistribute it and/or modify it
> - * under the terms and conditions of the GNU General Public License,
> - * version 2, as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope it will be useful, but WITHOUT
> - * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> - * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> - * more details.
> - *
> - * You should have received a copy of the GNU General Public License
> along with
> - * this program; if not, write to the Free Software Foundation, Inc.,
> - * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
>   */
> 
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> --
> 2.7.4


^ permalink raw reply

* RE: [PATCH 2/2] net: fec: Add a SPDX identifier
From: Andy Duan @ 2018-05-21  1:32 UTC (permalink / raw)
  To: Fabio Estevam, davem@davemloft.net; +Cc: netdev@vger.kernel.org, Fabio Estevam
In-Reply-To: <1526835319-17408-2-git-send-email-festevam@gmail.com>

From: Fabio Estevam <festevam@gmail.com>  Sent: 2018年5月21日 0:55
> From: Fabio Estevam <fabio.estevam@nxp.com>
> 
> Currently there is no license information in the header of this file.
> 
> The MODULE_LICENSE field contains ("GPL"), which means GNU Public
> License v2 or later, so add a corresponding SPDX license identifier.
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>

Acked-by: Fugang Duan <fugang.duan@nxp.com>

> ---
>  drivers/net/ethernet/freescale/fec_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index f3e43db..1625b74 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0+
>  /*
>   * Fast Ethernet Controller (FEC) driver for Motorola MPC8xx.
>   * Copyright (c) 1997 Dan Malek (dmalek@jlc.net)
> --
> 2.7.4


^ permalink raw reply

* Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Neil Horman @ 2018-05-21  0:50 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, davem
In-Reply-To: <be8e672439406e6c8b838321d63aa109b9620f4c.1526715880.git.lucien.xin@gmail.com>

On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote:
> This feature is actually already supported by sk->sk_reuse which can be
> set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in
> section 8.1.27, like:
> 
>   - This option only supports one-to-one style SCTP sockets
>   - This socket option must not be used after calling bind()
>     or sctp_bindx().
> 
> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
> work in linux.
> 
> This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR,
> just with some extra setup limitations that are neeeded when it is being
> enabled.
> 
> "It should be noted that the behavior of the socket-level socket option
> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
> leaves SO_REUSEADDR as is for the compatibility.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  include/uapi/linux/sctp.h |  1 +
>  net/sctp/socket.c         | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
A few things:

1) I agree with Tom, this feature is a complete duplication of the SK_REUSEPORT
socket option.  I understand that this is an implementation of the option in the
RFC, but its definately a duplication of a feature, which makes several things
really messy.

2) The overloading of the sk_reuse opeion is a bad idea, for several reasons.
Chief among them is the behavioral interference between this patch and the
SO_REUSEADDR socket level option, that also sets this feature.  If you set
sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature regardless
of the bind or 1:1/1:m state of the socket.  Vice versa, if you set this socket
option via the SCTP_PORT_REUSE option you will inadvertently turn on address
reuse for the socket.  We can't do that.

Its a bit frustrating, since SO_REUSEPORT is widely available on multiple
operating systems, but isn't standard (AFAIK).  I would say however, given the
prevalence of the socket level option, we should likely advocate for the removal
of the sctp specific option, or at the least implement and document it as being
an alias for SO_REUSEPORT


As this stands however, its a NACK from me.

Neil

^ permalink raw reply

* Re: [PATCH v2] isdn: eicon: fix a missing-check bug
From: kbuild test robot @ 2018-05-21  0:01 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: kbuild-all, Wenwen Wang, Kangjie Lu, Armin Schindler,
	Karsten Keil, open list:ISDN SUBSYSTEM, open list
In-Reply-To: <1526679228-1596-1-git-send-email-wang6495@umn.edu>

[-- Attachment #1: Type: text/plain, Size: 3744 bytes --]

Hi Wenwen,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Wenwen-Wang/isdn-eicon-fix-a-missing-check-bug/20180521-034229
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from drivers/isdn/hardware/eicon/divasfunc.c:18:0:
>> drivers/isdn/hardware/eicon/diva.h:23:18: error: unknown type name 'diva_xdi_um_cfg_cmd_t'; did you mean 'diva_xdi_capi_cfg_t'?
         int length, diva_xdi_um_cfg_cmd_t *msg,
                     ^~~~~~~~~~~~~~~~~~~~~
                     diva_xdi_capi_cfg_t
   drivers/isdn/hardware/eicon/diva.h:27:20: error: unknown type name 'diva_xdi_um_cfg_cmd_t'; did you mean 'diva_xdi_capi_cfg_t'?
           int length, diva_xdi_um_cfg_cmd_t *msg,
                       ^~~~~~~~~~~~~~~~~~~~~
                       diva_xdi_capi_cfg_t
--
   In file included from drivers/isdn/hardware/eicon/divasmain.c:30:0:
>> drivers/isdn/hardware/eicon/diva.h:23:18: error: unknown type name 'diva_xdi_um_cfg_cmd_t'
         int length, diva_xdi_um_cfg_cmd_t *msg,
                     ^~~~~~~~~~~~~~~~~~~~~
   drivers/isdn/hardware/eicon/diva.h:27:20: error: unknown type name 'diva_xdi_um_cfg_cmd_t'
           int length, diva_xdi_um_cfg_cmd_t *msg,
                       ^~~~~~~~~~~~~~~~~~~~~
   drivers/isdn/hardware/eicon/divasmain.c: In function 'divas_write':
>> drivers/isdn/hardware/eicon/divasmain.c:598:24: error: implicit declaration of function 'diva_xdi_open_adapter'; did you mean 'diva_xdi_close_adapter'? [-Werror=implicit-function-declaration]
      file->private_data = diva_xdi_open_adapter(file, buf,
                           ^~~~~~~~~~~~~~~~~~~~~
                           diva_xdi_close_adapter
>> drivers/isdn/hardware/eicon/divasmain.c:598:22: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      file->private_data = diva_xdi_open_adapter(file, buf,
                         ^
>> drivers/isdn/hardware/eicon/divasmain.c:603:9: error: implicit declaration of function 'diva_xdi_write'; did you mean 'divas_write'? [-Werror=implicit-function-declaration]
      ret = diva_xdi_write(file->private_data, file,
            ^~~~~~~~~~~~~~
            divas_write
   drivers/isdn/hardware/eicon/divasmain.c: In function 'divas_read':
   drivers/isdn/hardware/eicon/divasmain.c:632:22: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      file->private_data = diva_xdi_open_adapter(file, buf,
                         ^
   cc1: some warnings being treated as errors

vim +23 drivers/isdn/hardware/eicon/diva.h

    12	
    13	typedef int (*divas_xdi_copy_to_user_fn_t) (void *os_handle, void __user *dst,
    14						    const void *src, int length);
    15	
    16	typedef int (*divas_xdi_copy_from_user_fn_t) (void *os_handle, void *dst,
    17						      const void __user *src, int length);
    18	
    19	int diva_xdi_read(void *adapter, void *os_handle, void __user *dst,
    20			  int max_length, divas_xdi_copy_to_user_fn_t cp_fn);
    21	
    22	int diva_xdi_write(void *adapter, void *os_handle, const void __user *src,
  > 23			   int length, diva_xdi_um_cfg_cmd_t *msg,
    24			   divas_xdi_copy_from_user_fn_t cp_fn);
    25	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 62952 bytes --]

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2018-05-20 23:30 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Sorry, no drug fuelled pull request this time.  As for the commits
themselves, I can't say for sure.... :-)

1) Fix refcounting bug for connections in on-packet scheduling
   mode of IPVS, from Julian Anastasov.

2) Set network header properly in AF_PACKET's packet_snd, from
   Willem de Bruijn.

3) Fix regressions in 3c59x by converting to generic DMA API.
   It was relying upon the hack that the PCI DMA interfaces
   would accept NULL for EISA devices.  From Christoph
   Hellwig.

4) Remove RDMA devices before unregistering netdev in QEDE driver,
   from Michal Kalderon.

5) Use after free in TUN driver ptr_ring usage, from Jason Wang.

6) Properly check for missing netlink attributes in SMC_PNETID
   requests, from Eric Biggers.

7) Set DMA mask before performaing any DMA operations in vmxnet3
   driver, from Regis Duchesne.

8) Fix mlx5 build with SMP=n, from Saeed Mahameed.

9) Classifier fixes in bcm_sf2 driver from Florian Fainelli.

10) Tuntap use after free during release, from Jason Wang.

11) Don't use stack memory in scatterlists in tls code, from
    Matt Mullins.

12) Not fully initialized flow key object in ipv4 routing code,
    from David Ahern.

13) Various packet headroom bug fixes in ip6_gre driver, from Petr
    Machata.

14) Remove queues from XPS maps using correct index, from Amritha
    Nambiar.

15) Fix use after free in sock_diag, from Eric Dumazet.

Please pull, thanks a lot!

The following changes since commit 4bc871984f7cb5b2dec3ae64b570cb02f9ce2227:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-05-11 14:14:46 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to b80d0b93b991e551a32157e0d9d38fc5bc9348a7:

  net: ip6_gre: fix tunnel metadata device sharing. (2018-05-19 23:32:12 -0400)

----------------------------------------------------------------
Amritha Nambiar (1):
      net: Fix a bug in removing queues from XPS map

Christoph Hellwig (1):
      3c59x: convert to generic DMA API

Colin Ian King (1):
      netfilter: nf_tables: fix memory leak on error exit return

Daniel Borkmann (1):
      bpf: fix truncated jump targets on heavy expansions

David Ahern (1):
      net/ipv4: Initialize proto and ports in flow struct

David S. Miller (7):
      Merge git://git.kernel.org/.../pablo/nf
      Merge git://git.kernel.org/.../bpf/bpf
      Merge branch 'dsa-bcm_sf2-CFP-fixes'
      Merge branch 'qed-LL2-fixes'
      Merge branch 'ibmvnic-Fix-bugs-and-memory-leaks'
      Merge branch 'ip6_gre-Fixes-in-headroom-handling'
      Merge git://git.kernel.org/.../bpf/bpf

Davide Caratti (1):
      net/sched: fix refcnt leak in the error path of tcf_vlan_init()

Eric Biggers (1):
      net/smc: check for missing nlattrs in SMC_PNETID messages

Eric Dumazet (2):
      tcp: purge write queue in tcp_connect_init()
      sock_diag: fix use-after-free read in __sk_free

Florian Fainelli (4):
      net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule
      net: dsa: bcm_sf2: Fix IPv6 rules and chain ID
      net: dsa: bcm_sf2: Fix IPv6 rule half deletion
      net: dsa: Do not register devlink for unused ports

Florian Westphal (9):
      netfilter: x_tables: check name length in find_match/target, too
      netfilter: nf_tables: skip synchronize_rcu if transaction log is empty
      netfilter: nf_tables: nft_compat: fix refcount leak on xt module
      netfilter: core: add missing __rcu annotation
      netfilter: prefer nla_strlcpy for dealing with NLA_STRING attributes
      netfilter: x_tables: add module alias for icmp matches
      netfilter: nf_tables: don't assume chain stats are set when jumplabel is set
      netfilter: nft_compat: prepare for indirect info storage
      netfilter: nft_compat: fix handling of large matchinfo size

Geert Uytterhoeven (2):
      net: 8390: ne: Fix accidentally removed RBTX4927 support
      sh_eth: Change platform check to CONFIG_ARCH_RENESAS

Jakub Kicinski (2):
      nfp: bpf: allow zero-length capabilities
      tools: bpf: handle NULL return in bpf_prog_load_xattr()

Jason Wang (2):
      tun: fix use after free for ptr_ring
      tuntap: fix use after free during release

Jesper Dangaard Brouer (1):
      selftests/bpf: check return value of fopen in test_verifier.c

John Fastabend (2):
      bpf: sockmap update rollback on error can incorrectly dec prog refcnt
      bpf: parse and verdict prog attach may race with bpf map update

Jozsef Kadlecsik (1):
      netfilter: Fix handling simultaneous open in TCP conntrack

Julian Anastasov (2):
      ipvs: fix refcount usage for conns in ops mode
      ipvs: fix stats update from local clients

Keefe Liu (1):
      ipvlan: call netdevice notifier when master mac address changed

Kumar Sanghvi (1):
      cxgb4: Correct ntuple mask validation for hash filters

Markus Niebel (1):
      net: phy: micrel: add 125MHz reference clock workaround

Matt Mullins (1):
      tls: don't use stack memory in a scatterlist

Michal Kalderon (4):
      qede: Fix ref-cnt usage count
      qed: LL2 flush isles when connection is closed
      qed: Fix possibility of list corruption during rmmod flows
      qed: Fix LL2 race during connection terminate

Pablo Neira Ayuso (1):
      netfilter: nf_tables: bogus EBUSY in chain deletions

Paolo Abeni (1):
      net: sched: red: avoid hashing NULL child

Petr Machata (7):
      net: ip6_gre: Request headroom in __gre6_xmit()
      net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
      net: ip6_gre: Split up ip6gre_tnl_link_config()
      net: ip6_gre: Split up ip6gre_tnl_change()
      net: ip6_gre: Split up ip6gre_newlink()
      net: ip6_gre: Split up ip6gre_changelink()
      net: ip6_gre: Fix ip6erspan hlen calculation

Rahul Lakkireddy (1):
      cxgb4: fix offset in collecting TX rate limit info

Saeed Mahameed (1):
      net/mlx5: Fix build break when CONFIG_SMP=n

Stephen Hemminger (1):
      netfilter: bridge: stp fix reference to uninitialized data

Tarick Bedeir (1):
      net/mlx4_core: Fix error handling in mlx4_init_port_info.

Thomas Falcon (3):
      ibmvnic: Free coherent DMA memory if FW map failed
      ibmvnic: Fix non-fatal firmware error reset
      ibmvnic: Fix statistics buffers memory leak

Willem de Bruijn (2):
      packet: in packet_snd start writing at link layer allocation
      net: test tailroom before appending to linear skb

William Tu (2):
      erspan: fix invalid erspan version.
      net: ip6_gre: fix tunnel metadata device sharing.

hpreg@vmware.com (2):
      vmxnet3: set the DMA mask before the first DMA map operation
      vmxnet3: use DMA memory barriers where required

 Documentation/devicetree/bindings/net/micrel-ksz90x1.txt |   7 ++
 drivers/net/dsa/bcm_sf2_cfp.c                            |  36 +++++----
 drivers/net/ethernet/3com/3c59x.c                        | 104 ++++++++++++------------
 drivers/net/ethernet/8390/ne.c                           |   4 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h        |  28 +++----
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c        |  88 +++++++--------------
 drivers/net/ethernet/ibm/ibmvnic.c                       |  28 ++++---
 drivers/net/ethernet/mellanox/mlx4/main.c                |   4 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.c            |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c                |  61 +++++++++++---
 drivers/net/ethernet/qlogic/qede/qede_main.c             |   3 +-
 drivers/net/ethernet/renesas/sh_eth.h                    |   2 +-
 drivers/net/ipvlan/ipvlan_main.c                         |   4 +-
 drivers/net/phy/micrel.c                                 |  31 ++++++++
 drivers/net/tun.c                                        |  27 +++----
 drivers/net/vmxnet3/vmxnet3_drv.c                        |  72 +++++++++++------
 drivers/net/vmxnet3/vmxnet3_int.h                        |   8 +-
 include/linux/mlx5/driver.h                              |  12 +--
 include/net/netfilter/nf_tables.h                        |   5 ++
 include/net/tls.h                                        |   3 +
 include/uapi/linux/netfilter/nf_conntrack_tcp.h          |   3 +
 kernel/bpf/core.c                                        | 100 +++++++++++++++++------
 kernel/bpf/sockmap.c                                     |  18 ++---
 net/bridge/netfilter/ebt_stp.c                           |   4 +-
 net/core/dev.c                                           |   2 +-
 net/core/filter.c                                        |  11 ++-
 net/core/sock.c                                          |   2 +-
 net/dsa/dsa2.c                                           |   9 ++-
 net/ipv4/fib_frontend.c                                  |   8 +-
 net/ipv4/ip_gre.c                                        |   4 +-
 net/ipv4/ip_output.c                                     |   3 +-
 net/ipv4/netfilter/ip_tables.c                           |   1 +
 net/ipv4/netfilter/ipt_rpfilter.c                        |   2 +-
 net/ipv4/route.c                                         |   7 +-
 net/ipv4/tcp_output.c                                    |   7 +-
 net/ipv6/ip6_gre.c                                       | 286 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
 net/ipv6/ip6_output.c                                    |   3 +-
 net/ipv6/netfilter/ip6_tables.c                          |   1 +
 net/netfilter/core.c                                     |   3 +-
 net/netfilter/ipvs/ip_vs_conn.c                          |  17 ++--
 net/netfilter/ipvs/ip_vs_core.c                          |  12 +++
 net/netfilter/nf_conntrack_proto_tcp.c                   |  11 +++
 net/netfilter/nf_tables_api.c                            |  77 ++++++++++++++----
 net/netfilter/nf_tables_core.c                           |  21 +++--
 net/netfilter/nfnetlink_acct.c                           |   2 +-
 net/netfilter/nfnetlink_cthelper.c                       |   7 +-
 net/netfilter/nft_compat.c                               | 201 +++++++++++++++++++++++++++++++++++-----------
 net/netfilter/nft_immediate.c                            |  15 +++-
 net/netfilter/x_tables.c                                 |   6 ++
 net/packet/af_packet.c                                   |   4 +-
 net/sched/act_vlan.c                                     |   2 +
 net/sched/sch_red.c                                      |   5 +-
 net/sched/sch_tbf.c                                      |   5 +-
 net/smc/smc_pnet.c                                       |  71 +++++++++--------
 net/tls/tls_sw.c                                         |   9 +--
 tools/lib/bpf/libbpf.c                                   |   2 +-
 tools/testing/selftests/bpf/test_verifier.c              |   5 ++
 57 files changed, 1010 insertions(+), 465 deletions(-)

^ permalink raw reply

* Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
From: Willem de Bruijn @ 2018-05-20 23:22 UTC (permalink / raw)
  To: Jon Rosen
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
	David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
	Benjamin Poirier, Thomas Gleixner, Greg Kroah-Hartman,
	open list:NETWORKING [GENERAL], open list
In-Reply-To: <CAF=yD-+kjceErK=CApiu1oY9uTjVWG2qeX=NkGVYufVVC_L2PA@mail.gmail.com>

On Sun, May 20, 2018 at 6:51 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Sat, May 19, 2018 at 8:07 AM, Jon Rosen <jrosen@cisco.com> wrote:
>> Fix PACKET_RX_RING bug for versions TPACKET_V1 and TPACKET_V2 which
>> casues the ring to get corrupted by allowing multiple kernel threads
>> to claim ownership of the same ring entry. Track ownership in a shadow
>> ring structure to prevent other kernel threads from reusing the same
>> entry before it's fully filled in, passed to user space, and then
>> eventually passed back to the kernel for use with a new packet.
>>
>> Signed-off-by: Jon Rosen <jrosen@cisco.com>
>> ---
>>
>> There is a bug in net/packet/af_packet.c:tpacket_rcv in how it manages
>> the PACKET_RX_RING for versions TPACKET_V1 and TPACKET_V2.  This bug makes
>> it possible for multiple kernel threads to claim ownership of the same
>> ring entry, corrupting the ring and the corresponding packet(s).
>>
>> These diffs are the second proposed solution, previous proposal was described
>> in https://www.mail-archive.com/netdev@vger.kernel.org/msg227468.html
>> subject [RFC PATCH] packet: mark ring entry as in-use inside spin_lock
>> to prevent RX ring overrun
>>
>> Those diffs would have changed the binary interface and have broken certain
>> applications. Consensus was that such a change would be inappropriate.
>>
>> These new diffs use a shadow ring in kernel space for tracking intermediate
>> state of an entry and prevent more than one kernel thread from simultaneously
>> allocating a ring entry. This avoids any impact to the binary interface
>> between kernel and userspace but comes at the additional cost of requiring a
>> second spin_lock when passing ownership of a ring entry to userspace.
>>
>> Jon Rosen (1):
>>   packet: track ring entry use using a shadow ring to prevent RX ring
>>     overrun
>>
>>  net/packet/af_packet.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  net/packet/internal.h  | 14 +++++++++++
>>  2 files changed, 78 insertions(+)
>>
>
>> @@ -2383,7 +2412,11 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
>>  #endif
>>
>>         if (po->tp_version <= TPACKET_V2) {
>> +               spin_lock(&sk->sk_receive_queue.lock);
>>                 __packet_set_status(po, h.raw, status);
>> +               packet_rx_shadow_release(rx_shadow_ring_entry);
>> +               spin_unlock(&sk->sk_receive_queue.lock);
>> +
>>                 sk->sk_data_ready(sk);
>
> Thanks for continuing to look at this. I spent some time on it last time
> around but got stuck, too.
>
> This version takes an extra spinlock in the hot path. That will be very
> expensive. Once we need to accept that, we could opt for a simpler
> implementation akin to the one discussed in the previous thread:
>
> stash a value in tp_padding or similar while tp_status remains
> TP_STATUS_KERNEL to signal ownership to concurrent kernel
> threads. The issue previously was that that field could not atomically
> be cleared together with __packet_set_status. This is no longer
> an issue when holding the queue lock.
>
> With a field like tp_padding, unlike tp_len, it is arguably also safe to
> clear it after flipping status (userspace should treat it as undefined).
>
> With v1 tpacket_hdr, no explicit padding field is defined but due to
> TPACKET_HDRLEN alignment it exists on both 32 and 64 bit
> platforms.
>
> The danger with using padding is that a process may write to it
> and cause deadlock, of course. There is no logical reason for doing
> so.

For the ring, there is no requirement to allocate exactly the amount
specified by the user request. Safer than relying on shared memory
and simpler than the extra allocation in this patch would be to allocate
extra shadow memory at the end of the ring (and not mmap that).

That still leaves an extra cold cacheline vs using tp_padding.

^ permalink raw reply

* Re: WARNING in ip_recv_error
From: Willem de Bruijn @ 2018-05-20 23:13 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, DaeLyong Jeong, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Network Development, LKML, Byoungyoung Lee, Kyungtae Kim,
	bammanag, Willem de Bruijn
In-Reply-To: <CAF=yD-+Ai35L2=dGZzpjYYavBmBGNFXd-q9ju93WrPuewnhELg@mail.gmail.com>

On Fri, May 18, 2018 at 2:59 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 18, 2018 at 2:46 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>> On Fri, May 18, 2018 at 11:44 AM, David Miller <davem@davemloft.net> wrote:
>>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>>> Date: Fri, 18 May 2018 08:30:43 -0700
>>>>>
>>>>>> We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>>>>
>>>>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>>>>
>>>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>>>> atomic operation, so that the socket is neither fully ipv4 nor fully
>>>> ipv6 by the time it reaches ip_recv_error.
>>>>
>>>>   sk->sk_socket->ops = &inet_dgram_ops;
>>>>   < HERE >
>>>>   sk->sk_family = PF_INET;
>>>>
>>>> Even calling inet_recv_error to demux would not necessarily help.
>>>>
>>>> Safest would be to look up by skb->protocol, similar to what
>>>> ipv6_recv_error does to handle v4-mapped-v6.
>>>>
>>>> Or to make that function safe with PF_INET and swap the order
>>>> of the above two operations.
>>>>
>>>> All sound needlessly complicated for this rare socket option, but
>>>> I don't have a better idea yet. Dropping on the floor is not nice,
>>>> either.
>>>
>>> Ensuring that ip_recv_error correctly handles packets from either
>>> socket and removing the warning should indeed be good.
>>>
>>> It is robust against v4-mapped packets from an AF_INET6 socket,
>>> but see caveat on reconnect below.
>>>
>>> The code between ipv6_recv_error for v4-mapped addresses and
>>> ip_recv_error is essentially the same, the main difference being
>>> whether to return network headers as sockaddr_in with SOL_IP
>>> or sockaddr_in6 with SOL_IPV6.
>>>
>>> There are very few other locations in the stack that explicitly test
>>> sk_family in this way and thus would be vulnerable to races with
>>> IPV6_ADDRFORM.
>>>
>>> I'm not sure whether it is possible for a udpv6 socket to queue a
>>> real ipv6 packet on the error queue, disconnect, connect to an
>>> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
>>> on a true ipv6 packet. That would return buggy data, e.g., in
>>> msg_name.
>>
>> In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
>> error queue is empty, and then take its lock for the duration of the
>> operation.
>
> Actually, no reason to hold the lock. This setsockopt holds the socket
> lock, which connect would need, too. So testing that the queue
> is empty after testing that it is connected to a v4 address is
> sufficient to ensure that no ipv6 packets are queued for reception.
>
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 4d780c7f0130..a975d6311341 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -199,6 +199,11 @@ static int do_ipv6_setsockopt(struct sock *sk,
> int level, int optname,
>
>                         if (ipv6_only_sock(sk) ||
>                             !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) {
>                                 retv = -EADDRNOTAVAIL;
>                                 break;
>                         }
>
> +                       if (!skb_queue_empty(&sk->sk_error_queue)) {
> +                               retv = -EBUSY;
> +                               break;
> +                       }
> +
>                         fl6_free_socklist(sk);
>                         __ipv6_sock_mc_close(sk);
>
> After this it should be safe to remove the warning in ip_recv_error.

Hmm.. nope.

This ensures that the socket cannot produce any new true v6 packets.
But it does not guarantee that they are not already in the system, e.g.
queued in tc, and will find their way to the error queue later.

We'll have to just be able to handle ipv6 packets in ip_recv_error.
Since IPV6_ADDRFORM is used to pass to legacy v4-only
processes and those likely are only confused by SOL_IPV6
error messages, it is probably best to just drop them and perhaps
WARN_ONCE.

^ permalink raw reply

* Re: WARNING in __static_key_slow_dec
From: Willem de Bruijn @ 2018-05-20 23:07 UTC (permalink / raw)
  To: DaeRyong Jeong
  Cc: David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Network Development, LKML, Byoungyoung Lee, Kyungtae Kim,
	bammanag
In-Reply-To: <CAF=yD-+jS5BJKHEDp+K4fB41YnY1mR8ksVg98oWPJV=mG01pfg@mail.gmail.com>

>
> -static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
> +static void sock_disable_timestamp(struct sock *sk, unsigned long flag)
>  {
> -       if (sk->sk_flags & flags) {
> -               sk->sk_flags &= ~flags;
> -               if (sock_needs_netstamp(sk) &&
> -                   !(sk->sk_flags & SK_FLAGS_TIMESTAMP))
> -                       net_disable_timestamp();
> -       }
> +       unsigned long prev;
> +
> +       do {
> +               prev = READ_ONCE(sk->sk_flags);
> +
> +               if (!(prev & flag))
> +                       return;
> +
> +               if (cmpxchg(&sk->sk_flags, prev, prev & ~flag) == prev)
> +                       break;
> +       } while (1);

and this can just use set_mask_bits

^ permalink raw reply

* [PATCH net-next] mv88e6xxx: Fix uninitialized variable warning.
From: David Miller @ 2018-05-20 23:05 UTC (permalink / raw)
  To: andrew; +Cc: netdev


In mv88e6xxx_probe(), ("np" or "pdata") might be an invariant
but GCC can't see that, therefore:

drivers/net/dsa/mv88e6xxx/chip.c: In function ‘mv88e6xxx_probe’:
drivers/net/dsa/mv88e6xxx/chip.c:4420:13: warning: ‘compat_info’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  chip->info = compat_info;

Actually, it should have warned on the "if (!compat_info)" test, but
whatever.

Explicitly initialize to NULL in the variable declaration to
deal with this.

Fixes: 877b7cb0b6f2 ("net: dsa: mv88e6xxx: Add minimal platform_data support")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 1fa1f820a437..12df00f593b7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4382,9 +4382,9 @@ static const void *pdata_device_get_match_data(struct device *dev)
 static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 {
 	struct dsa_mv88e6xxx_pdata *pdata = mdiodev->dev.platform_data;
+	const struct mv88e6xxx_info *compat_info = NULL;
 	struct device *dev = &mdiodev->dev;
 	struct device_node *np = dev->of_node;
-	const struct mv88e6xxx_info *compat_info;
 	struct mv88e6xxx_chip *chip;
 	int port;
 	int err;
-- 
2.17.0


^ permalink raw reply related

* Re: WARNING in __static_key_slow_dec
From: Willem de Bruijn @ 2018-05-20 23:04 UTC (permalink / raw)
  To: DaeRyong Jeong
  Cc: David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Network Development, LKML, Byoungyoung Lee, Kyungtae Kim,
	bammanag
In-Reply-To: <CAF=yD-JbiE0W6pv77DUO0pYrJCYfgszm6zwCEfkv8HedvRfujQ@mail.gmail.com>

On Fri, May 18, 2018 at 4:30 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 18, 2018 at 4:03 AM, DaeRyong Jeong <threeearcat@gmail.com> wrote:
>> We report the crash: WARNING in __static_key_slow_dec
>>
>> This crash has been found in v4.8 using RaceFuzzer (a modified
>> version of Syzkaller), which we describe more at the end of this
>> report.
>> Even though v4.8 is the relatively old version, we did manual verification
>> and we think the bug still exists.
>> Our analysis shows that the race occurs when invoking two syscalls
>> concurrently, setsockopt() with optname SO_TIMESTAMPING and ioctl() with
>> cmd SIOCGSTAMPNS.
>>
>>
>> Diagnosis:
>> We think if timestamp was previously enabled with
>> SOCK_TIMESTAMPING_RX_SOFTWARE flag, the concurrent execution of
>> sock_disable_timestamp() and sock_enable_timestamp() causes the crash.
>>
>>
>> Thread interleaving:
>> (Assume sk->flag has the SOCK_TIMESTAMPING_RX_SOFTWARE flag by the
>> previous setsockopt() call with SO_TIMESTAMPING)
>>
>> CPU0 (sock_disable_timestamp())                 CPU1 (sock_enable_timestamp())
>> =====                                           =====
>> (flag == 1UL << SOCK_TIMESTAMPING_RX_SOFTWARE)  (flag == SOCK_TIMESTAMP)
>>
>>                                                 if (!sock_flag(sk, flag)) {
>>                                                         unsigned long previous_flags = sk->sk_flags;
>>
>> if (sk->sk_flags & flags) {
>>         sk->sk_flags &= ~flags;
>>         if (sock_needs_netstamp(sk) &&
>>             !(sk->sk_flags & SK_FLAGS_TIMESTAMP))
>>                 net_disable_timestamp();
>>                                                         sock_set_flag(sk, flag);
>>
>>                                                         if (sock_needs_netstamp(sk) &&
>>                                                             !(previous_flags & SK_FLAGS_TIMESTAMP))
>>                                                                 net_enable_timestamp();
>>                                                         /* Here, net_enable_timestamp() is not called because
>>                                                          * previous_flags has the SOCK_TIMESTAMPING_RX_SOFTWARE
>>                                                          * flag
>>                                                          */
>> /* After the race, sk->sk has the flag SOCK_TIMESTAMP, but
>>  * net_enable_timestamp() is not called one more time.
>>  * Consequently, when the socket is closed, __sk_destruct()
>>  * calls net_disable_timestamp() that leads WARNING.
>>  */
>
> Thanks for the detailed analysis.
>
> Indeed the updates to sk->sk_flags and calls to net_(dis|en)able_timestamp
> should happen atomically, but this is not the case. The setsockopt
> path holds the socket lock, but not all ioctl paths.
>
> Perhaps we can take lock_sock_fast in sock_get_timestamp and
> variants.

Some callers of sock_get_timestamp already hold the socket lock,
e.g., ax25_ioctl, so that is out.

There is some known non-determinism in this path. Callers of
sock_get_timestamp do not necessarily expect a valid sk_stamp
when they enable the timestamp, so that function can continue
to test sk_flags lockless.

net_enable_timestamp enables timestamping using a static_branch
and possibly a workqueue, so already does not complete synchronously
in the sock_enable_timestamp call.

The only requirement is that updates to sk_flags do not race. This
should be solvable with cmpxchg. The situation is slightly complicated
because sk_flags has two bits that may toggle timestamping. Only the
first bit set must trigger a call to net_enable_timestamp and only the
last bit cleared must call net_disable_timestamp.

Something like

-static bool sock_needs_netstamp(const struct sock *sk)
+static bool sock_needs_netstamp(const struct sock *sk, unsigned long flags)
 {
        switch (sk->sk_family) {
        case AF_UNSPEC:
        case AF_UNIX:
                return false;
        default:
-               return true;
+               return (flags & SK_FLAGS_TIMESTAMP);
        }
 }

-static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
+static void sock_disable_timestamp(struct sock *sk, unsigned long flag)
 {
-       if (sk->sk_flags & flags) {
-               sk->sk_flags &= ~flags;
-               if (sock_needs_netstamp(sk) &&
-                   !(sk->sk_flags & SK_FLAGS_TIMESTAMP))
-                       net_disable_timestamp();
-       }
+       unsigned long prev;
+
+       do {
+               prev = READ_ONCE(sk->sk_flags);
+
+               if (!(prev & flag))
+                       return;
+
+               if (cmpxchg(&sk->sk_flags, prev, prev & ~flag) == prev)
+                       break;
+       } while (1);
+
+       /* disable only if this operation removed the last tstamp flag */
+       if (!sock_needs_netstamp(sk, prev & ~flag))
+               net_disable_timestamp();
 }

and analogous for enable.

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: b53: Extend platform data to include DSA ports
From: David Miller @ 2018-05-20 22:59 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20180520155630.23432-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sun, 20 May 2018 08:56:30 -0700

> The b53 driver already defines and internally uses platform data to let the
> glue drivers specify parameters such as the chip id.  What we were missing was
> a way to tell the core DSA layer about the ports and their type.
> 
> Place a dsa_chip_data structure at the beginning of b53_platform_data for
> dsa_register_switch() to access it. This does not require modifications to
> b53_common.c which will pass platform_data trough.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/3] Platform data support for mv88exxx
From: David Miller @ 2018-05-20 22:58 UTC (permalink / raw)
  To: andrew; +Cc: vivien.didelot, f.fainelli, netdev
In-Reply-To: <1526761895-15839-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Sat, 19 May 2018 22:31:32 +0200

> There are a few Intel based platforms making use of the mv88exxx.
> These don't easily have access to device tree in order to instantiate
> the switch driver. These patches allow the use of platform data to
> hold the configuration.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH 0/3] sh_eth: fix typos/grammar
From: David Miller @ 2018-05-20 22:56 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <d47afee4-6fd5-af6d-885c-5b07996749fc@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Sat, 19 May 2018 23:57:50 +0300

> Here's a set of 3 patches against DaveM's 'net-next.git' repo plus the R8A77980
> support patches posted earlier. They fix the comments typos/grammar and another
> typo in the EESR bit...

Series applied.

^ permalink raw reply

* Re: [PATCH net-next 0/9] Misc. bug fixes and cleanup for HNS3 driver
From: David Miller @ 2018-05-20 22:56 UTC (permalink / raw)
  To: salil.mehta
  Cc: yisen.zhuang, lipeng321, mehta.salil, netdev, linux-kernel,
	linuxarm
In-Reply-To: <20180519155323.68960-1-salil.mehta@huawei.com>

From: Salil Mehta <salil.mehta@huawei.com>
Date: Sat, 19 May 2018 16:53:14 +0100

> This patch-set presents miscellaneous bug fixes and cleanups found
> during internal review, system testing and cleanup.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
From: Willem de Bruijn @ 2018-05-20 22:51 UTC (permalink / raw)
  To: Jon Rosen
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
	David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
	Benjamin Poirier, Thomas Gleixner, Greg Kroah-Hartman,
	open list:NETWORKING [GENERAL], open list
In-Reply-To: <1526731655-10087-1-git-send-email-jrosen@cisco.com>

On Sat, May 19, 2018 at 8:07 AM, Jon Rosen <jrosen@cisco.com> wrote:
> Fix PACKET_RX_RING bug for versions TPACKET_V1 and TPACKET_V2 which
> casues the ring to get corrupted by allowing multiple kernel threads
> to claim ownership of the same ring entry. Track ownership in a shadow
> ring structure to prevent other kernel threads from reusing the same
> entry before it's fully filled in, passed to user space, and then
> eventually passed back to the kernel for use with a new packet.
>
> Signed-off-by: Jon Rosen <jrosen@cisco.com>
> ---
>
> There is a bug in net/packet/af_packet.c:tpacket_rcv in how it manages
> the PACKET_RX_RING for versions TPACKET_V1 and TPACKET_V2.  This bug makes
> it possible for multiple kernel threads to claim ownership of the same
> ring entry, corrupting the ring and the corresponding packet(s).
>
> These diffs are the second proposed solution, previous proposal was described
> in https://www.mail-archive.com/netdev@vger.kernel.org/msg227468.html
> subject [RFC PATCH] packet: mark ring entry as in-use inside spin_lock
> to prevent RX ring overrun
>
> Those diffs would have changed the binary interface and have broken certain
> applications. Consensus was that such a change would be inappropriate.
>
> These new diffs use a shadow ring in kernel space for tracking intermediate
> state of an entry and prevent more than one kernel thread from simultaneously
> allocating a ring entry. This avoids any impact to the binary interface
> between kernel and userspace but comes at the additional cost of requiring a
> second spin_lock when passing ownership of a ring entry to userspace.
>
> Jon Rosen (1):
>   packet: track ring entry use using a shadow ring to prevent RX ring
>     overrun
>
>  net/packet/af_packet.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/packet/internal.h  | 14 +++++++++++
>  2 files changed, 78 insertions(+)
>

> @@ -2383,7 +2412,11 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
>  #endif
>
>         if (po->tp_version <= TPACKET_V2) {
> +               spin_lock(&sk->sk_receive_queue.lock);
>                 __packet_set_status(po, h.raw, status);
> +               packet_rx_shadow_release(rx_shadow_ring_entry);
> +               spin_unlock(&sk->sk_receive_queue.lock);
> +
>                 sk->sk_data_ready(sk);

Thanks for continuing to look at this. I spent some time on it last time
around but got stuck, too.

This version takes an extra spinlock in the hot path. That will be very
expensive. Once we need to accept that, we could opt for a simpler
implementation akin to the one discussed in the previous thread:

stash a value in tp_padding or similar while tp_status remains
TP_STATUS_KERNEL to signal ownership to concurrent kernel
threads. The issue previously was that that field could not atomically
be cleared together with __packet_set_status. This is no longer
an issue when holding the queue lock.

With a field like tp_padding, unlike tp_len, it is arguably also safe to
clear it after flipping status (userspace should treat it as undefined).

With v1 tpacket_hdr, no explicit padding field is defined but due to
TPACKET_HDRLEN alignment it exists on both 32 and 64 bit
platforms.

The danger with using padding is that a process may write to it
and cause deadlock, of course. There is no logical reason for doing
so.

^ permalink raw reply

* Re: [PATCH 13/14] net: sched: use unique idr insert function in unlocked actions
From: Marcelo Ricardo Leitner @ 2018-05-20 22:43 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Vlad Buslov, Linux Netdev List, David Miller, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Pablo Neira Ayuso, kadlec,
	Florian Westphal, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, keescook, Linux Kernel, netfilter-devel, coreteam,
	Yevgeny Kliteynik
In-Reply-To: <CAJ3xEMji3Mjb7sWnZyqt2nYjQ9c8s6Ok8DVTSwVdGqS1EVOjtg@mail.gmail.com>

On Mon, May 21, 2018 at 12:40:54AM +0300, Or Gerlitz wrote:
> On Mon, May 21, 2018 at 12:33 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Mon, May 21, 2018 at 12:13:06AM +0300, Or Gerlitz wrote:
> >> On Sun, May 20, 2018 at 1:20 AM, Marcelo Ricardo Leitner
> >> <marcelo.leitner@gmail.com> wrote:
> >> > On Mon, May 14, 2018 at 05:27:14PM +0300, Vlad Buslov wrote:
> >> >> Substitute calls to action insert function with calls to action insert
> >> >> unique function that warns if insertion overwrites index in idr.
> >> >
> >> > I know this patch may be gone on V2, but a general comment, please use
> >> > the function names themselves instead of a textualized version. I.e.,
> >> > s/action insert unique/tcf_idr_insert_unique/
> >>
> >> disagree. While doing reviews I found out that if I ask the developer
> >> to use higher
> >> level / descriptive language and specifically to avoid putting
> >> variable / function names in
> >> patch titles and change logs, the quality gets ++ big time, vs if the
> >> developer is allowed to say
> >>
> >> net/mlx5: Changed add_vovo_bobo()
> >>
> >> Added variable do_it_right to add_vovo_bobo(), now we are terribly good.
> >
> > In your example I agree that it is not helping and it is even allowing
> > such empty changelog, just as in the section I highlighted, the
> > descriptive language is also not helping IMHO.
> >
> > I had to read it 3 times to make sure I wasn't missing a modifier word
> > when comparing the two functions and well, it's just saying
> > "Substitute calls to foo function to bar function". I don't see how
> > the textualized version helps in this case while, at least in this
> > one, I would have visually recognized the function names way faster.
> >
> > Sounds like 2 bad examples for either approach.
>
> Properly written descriptive language is maybe harder to come up with
> (specifically for those of us who are not native english speakers, like me)
> but is more expressive and helpful for reviews and maintenance. Lets try
> to add Vlad to come up with the right higher language and not ask him to
> quote function and variable named.

Alright.

^ permalink raw reply

* Re: [PATCH net-next] r8169: fix network error on resume from suspend
From: David Miller @ 2018-05-20 22:39 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <aa718c92-294e-d489-5a09-79acdc9eedfe@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 19 May 2018 10:29:33 +0200

> This commit removed calls to rtl_set_rx_mode(). This is ok for the
> standard path if the link is brought up, however it breaks system
> resume from suspend. Link comes up but no network traffic.
> 
> Meanwhile common code from rtl_hw_start_8169/8101/8168() was moved
> to rtl_hw_start(), therefore re-add the call to rtl_set_rx_mode()
> there.
> 
> Due to adding this call we have to move definition of rtl_hw_start()
> after definition of rtl_set_rx_mode().
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> Fixes: 82d3ff6dd199 ("r8169: remove calls to rtl_set_rx_mode")

Applied, thank you.

^ permalink raw reply

* Re: [PATCH v2] isdn: eicon: fix a missing-check bug
From: David Miller @ 2018-05-20 22:37 UTC (permalink / raw)
  To: wang6495; +Cc: kjlu, mac, isdn, netdev, linux-kernel
In-Reply-To: <1526679228-1596-1-git-send-email-wang6495@umn.edu>

From: Wenwen Wang <wang6495@umn.edu>
Date: Fri, 18 May 2018 16:33:47 -0500

> In divasmain.c, the function divas_write() firstly invokes the function
> diva_xdi_open_adapter() to open the adapter that matches with the adapter
> number provided by the user, and then invokes the function diva_xdi_write()
> to perform the write operation using the matched adapter. The two functions
> diva_xdi_open_adapter() and diva_xdi_write() are located in diva.c.

This doesn't even compile:

In file included from drivers/isdn/hardware/eicon/divasmain.c:30:
drivers/isdn/hardware/eicon/diva.h:23:18: error: unknown type name ‘diva_xdi_um_cfg_cmd_t’
      int length, diva_xdi_um_cfg_cmd_t *msg,
                  ^~~~~~~~~~~~~~~~~~~~~
drivers/isdn/hardware/eicon/diva.h:27:20: error: unknown type name ‘diva_xdi_um_cfg_cmd_t’
        int length, diva_xdi_um_cfg_cmd_t *msg,
                    ^~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply

* Re: [PATCHv2 net-next] erspan: set bso bit based on mirrored packet's len
From: David Miller @ 2018-05-20 22:32 UTC (permalink / raw)
  To: u9012063; +Cc: netdev, tobin
In-Reply-To: <1526697661-8958-1-git-send-email-u9012063@gmail.com>

From: William Tu <u9012063@gmail.com>
Date: Fri, 18 May 2018 19:41:01 -0700

> Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not
> handled.  BSO has 4 possible values:
>   00 --> Good frame with no error, or unknown integrity
>   11 --> Payload is a Bad Frame with CRC or Alignment Error
>   01 --> Payload is a Short Frame
>   10 --> Payload is an Oversized Frame
> 
> Based the short/oversized definitions in RFC1757, the patch sets
> the bso bit based on the mirrored packet's size.
> 
> Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
> Signed-off-by: William Tu <u9012063@gmail.com>
> ---
> v1->v2
>   Improve code comments, make enum erspan_bso clearer

Applied, thanks William.

^ permalink raw reply

* Re: pull request: bluetooth-next 2018-05-18
From: David Miller @ 2018-05-20 22:25 UTC (permalink / raw)
  To: johan.hedberg; +Cc: linux-bluetooth, netdev
In-Reply-To: <20180518195105.GA15231@x1c.home>

From: Johan Hedberg <johan.hedberg@gmail.com>
Date: Fri, 18 May 2018 22:51:05 +0300

> Here's the first bluetooth-next pull request for the 4.18 kernel:
> 
>  - Refactoring of the btbcm driver
>  - New USB IDs for QCA_ROME and LiteOn controllers
>  - Buffer overflow fix if the controller sends invalid advertising data length
>  - Various cleanups & fixes for Qualcomm controllers
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thank you.

^ permalink raw reply

* Re: [net-next] Revert "ixgbe: release lock for the duration of ixgbe_suspend_close()"
From: David Miller @ 2018-05-20 22:23 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180518185830.23681-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 18 May 2018 11:58:30 -0700

> This reverts commit 6710f970d9979d8f03f6e292bb729b2ee1526d0e.
> 
> Gotta love when developers have offline discussions, thinking everyone
> is reading their responses/dialog.
> 
> The change had the potential for a number of race conditions on
> shutdown, which is why we are reverting the change.
> 
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] net: qcom/emac: Allocate buffers from local node
From: David Miller @ 2018-05-20 22:23 UTC (permalink / raw)
  To: hpuranik; +Cc: netdev, linux-kernel, timur
In-Reply-To: <1526614169-9586-1-git-send-email-hpuranik@codeaurora.org>

From: Hemanth Puranik <hpuranik@codeaurora.org>
Date: Fri, 18 May 2018 08:59:29 +0530

> Currently we use non-NUMA aware allocation for TPD and RRD buffers,
> this patch modifies to use NUMA friendly allocation.
> 
> Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org>
> ---
> Change since v1:
> - Addressed comments related to ordering

Applied to net-next.

^ permalink raw reply

* Re: [PATCH 13/14] net: sched: use unique idr insert function in unlocked actions
From: Or Gerlitz @ 2018-05-20 21:40 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Vlad Buslov, Linux Netdev List, David Miller, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Pablo Neira Ayuso, kadlec,
	Florian Westphal, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, keescook, Linux Kernel, netfilter-devel, coreteam,
	Yevgeny Kliteynik
In-Reply-To: <20180520213349.GC26212@localhost.localdomain>

On Mon, May 21, 2018 at 12:33 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Mon, May 21, 2018 at 12:13:06AM +0300, Or Gerlitz wrote:
>> On Sun, May 20, 2018 at 1:20 AM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Mon, May 14, 2018 at 05:27:14PM +0300, Vlad Buslov wrote:
>> >> Substitute calls to action insert function with calls to action insert
>> >> unique function that warns if insertion overwrites index in idr.
>> >
>> > I know this patch may be gone on V2, but a general comment, please use
>> > the function names themselves instead of a textualized version. I.e.,
>> > s/action insert unique/tcf_idr_insert_unique/
>>
>> disagree. While doing reviews I found out that if I ask the developer
>> to use higher
>> level / descriptive language and specifically to avoid putting
>> variable / function names in
>> patch titles and change logs, the quality gets ++ big time, vs if the
>> developer is allowed to say
>>
>> net/mlx5: Changed add_vovo_bobo()
>>
>> Added variable do_it_right to add_vovo_bobo(), now we are terribly good.
>
> In your example I agree that it is not helping and it is even allowing
> such empty changelog, just as in the section I highlighted, the
> descriptive language is also not helping IMHO.
>
> I had to read it 3 times to make sure I wasn't missing a modifier word
> when comparing the two functions and well, it's just saying
> "Substitute calls to foo function to bar function". I don't see how
> the textualized version helps in this case while, at least in this
> one, I would have visually recognized the function names way faster.
>
> Sounds like 2 bad examples for either approach.

Properly written descriptive language is maybe harder to come up with
(specifically for those of us who are not native english speakers, like me)
but is more expressive and helpful for reviews and maintenance. Lets try
to add Vlad to come up with the right higher language and not ask him to
quote function and variable named.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox