Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/4] net: Generic UDP Encapsulation
From: Tom Herbert @ 2014-10-02  4:46 UTC (permalink / raw)
  To: davem, netdev

Generic UDP Encapsulation (GUE) is UDP encapsulation protocol that
encapsulates packets of various IP protocols. The GUE protocol is
described in http://tools.ietf.org/html/draft-herbert-gue-01.

The receive path of GUE is implemented in the FOU over UDP module (FOU).
This includes a UDP encap receive function for GUE as well as GUE
specific GRO functions. Management and configuration of GUE ports shares
most of the same code with FOU.

For the transmit path, the previous FOU support for IPIP, sit, and GRE
was simply extended for GUE (when GUE is enabled insert the GUE
header on transmit in addition to UDP header inserted for FOU).

Semantically GUE is the same as FOU in that the encapsulation (UDP
and GUE headers) that are inserted on transmission and removed on
reception so that IP packet is processed with the inner header.

This patch set includes:
 - Some fixes to FOU, removal of IPv4,v6 specific GRO functions
 - Support to configure a GUE receive port
 - Implementation of GUE receive path (normal and GRO)
 - Additions to ip_tunnel netlink to configure GUE
 - GUE header inserion in ip_tunnel transmit path

Follow on patches will include

Testng:

I ran performance numbers using netperf TCP_RR with 200 streams,
comparing encapsulation without GUE, encapsulation with GUE, and
encapsulation with FOU.

 GRE
    TCP_STREAM
      IPv4, FOU, UDP checksum enabled
        14.04% TX CPU utilization
        13.17% RX CPU utilization
        9211 Mbps
      IPv4, GUE, UDP checksum enabled
        14.99% TX CPU utilization
        13.79% RX CPU utilization
        9185 Mbps
      IPv4, FOU, UDP checksum disabled
        13.14% TX CPU utilization
        23.18% RX CPU utilization
        9277 Mbps
      IPv4, GUE, UDP checksum disabled
        13.66% TX CPU utilization
        23.57% RX CPU utilization
        9184 Mbps
    TCP_RR
      IPv4, FOU, UDP checksum enabled
        94.2% CPU utilization
        155/249/460 90/95/99% latencies
        1.17018e+06 tps
      IPv4, GUE, UDP checksum enabled
        93.9% CPU utilization
        158/253/472 90/95/99% latencies
        1.15045e+06 tps


  IPIP
    TCP_STREAM
      FOU, UDP checksum enabled
        15.28% TX CPU utilization
        13.92% RX CPU utilization
        9342 Mbps
      GUE, UDP checksum enabled
        13.99% TX CPU utilization
        13.34% RX CPU utilization
        9210 Mbps
      FOU, UDP checksum disabled
        15.08% TX CPU utilization
        24.64% RX CPU utilization
        9226 Mbps
      GUE, UDP checksum disabled
        15.90% TX CPU utilization
        24.77% RX CPU utilization
        9197 Mbps
    TCP_RR
      FOU, UDP checksum enabled
        94.23% CPU utilization
        149/237/429 90/95/99% latencies
        1.19553e+06 tps
      GUE, UDP checksum enabled
        93.75% CPU utilization
        152/243/442 90/95/99% latencies
        1.17027e+06 tps

  SIT
    TCP_STREAM
      FOU, UDP checksum enabled
        14.47% TX CPU utilization
        14.58% RX CPU utilization
        9106 Mbps
      GUE, UDP checksum enabled
        15.09% TX CPU utilization
        14.84% RX CPU utilization
        9080 Mbps
      FOU, UDP checksum disabled
        15.70% TX CPU utilization
        27.93% RX CPU utilization
        9097 Mbps
      GUE, UDP checksum disabled
        15.04% TX CPU utilization
        27.54% RX CPU utilization
        9073 Mbps
    TCP_RR
      FOU, UDP checksum enabled
        96.9% CPU utilization
        170/281/581 90/95/99% latencies
        1.03372e+06 tps
      GUE, UDP checksum enabled
        97.16% CPU utilization
        172/286/576 90/95/99% latencies
        1.00469e+06 tps

Tom Herbert (4):
  ip_tunnel: Account for secondary encapsulation header in max_headroom
  fou: eliminate IPv4,v6 specific GRO functions
  gue: Receive side for Generic UDP Encapsulation
  ip_tunnel: Add GUE support

 include/linux/netdevice.h      |   3 +
 include/uapi/linux/fou.h       |   7 ++
 include/uapi/linux/if_tunnel.h |   1 +
 net/ipv4/fou.c                 | 224 ++++++++++++++++++++++++++++++++++-------
 net/ipv4/ip_tunnel.c           |  15 ++-
 net/ipv4/udp_offload.c         |   1 +
 net/ipv6/udp_offload.c         |   1 +
 7 files changed, 212 insertions(+), 40 deletions(-)

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply

* Re: [PATCH net-next v6 2/2] bonding: Simplify the xmit function for modes that use xmit_hash
From: Cong Wang @ 2014-10-02  4:42 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David Miller,
	netdev, Eric Dumazet, Maciej Zenczykowski
In-Reply-To: <CAHA+R7PiP2Ce-+3S6Uy5kkKjk1sN6uE2gXzvL_HtfnyVHLoKoQ@mail.gmail.com>

On Wed, Oct 1, 2014 at 9:40 PM, Cong Wang <cwang@twopensource.com> wrote:
> On Wed, Oct 1, 2014 at 1:38 AM, Mahesh Bandewar <maheshb@google.com> wrote:
>> +int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>> +{
>> +       struct slave *slave;
>> +       struct list_head *iter;
>> +       struct bond_up_slave *new_arr, *old_arr;
>> +       int slaves_in_agg;
>> +       int agg_id = 0;
>> +       int ret = 0;
>> +
>> +#ifdef CONFIG_LOCKDEP
>> +       WARN_ON(lockdep_is_held(&bond->mode_lock));
>> +#endif
>
>
> I think you can use lockdep_is_held().

I meant lockdep_assert_held()....

^ permalink raw reply

* Re: [PATCH net-next v6 2/2] bonding: Simplify the xmit function for modes that use xmit_hash
From: Cong Wang @ 2014-10-02  4:40 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David Miller,
	netdev, Eric Dumazet, Maciej Zenczykowski
In-Reply-To: <1412152711-12646-1-git-send-email-maheshb@google.com>

On Wed, Oct 1, 2014 at 1:38 AM, Mahesh Bandewar <maheshb@google.com> wrote:
> +int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> +{
> +       struct slave *slave;
> +       struct list_head *iter;
> +       struct bond_up_slave *new_arr, *old_arr;
> +       int slaves_in_agg;
> +       int agg_id = 0;
> +       int ret = 0;
> +
> +#ifdef CONFIG_LOCKDEP
> +       WARN_ON(lockdep_is_held(&bond->mode_lock));
> +#endif


I think you can use lockdep_is_held().

> +
> +       new_arr = kzalloc(offsetof(struct bond_up_slave, arr[bond->slave_cnt]),
> +                         GFP_KERNEL);
> +       if (!new_arr) {
> +               ret = -ENOMEM;
> +               pr_err("Failed to build slave-array.\n");
> +               goto out;
> +       }


No need to print an error message for OOM, it is already noisy. :)


> +       if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> +               struct ad_info ad_info;
> +
> +               if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
> +                       pr_debug("bond_3ad_get_active_agg_info failed\n");


I suspect how useful this debug info is since your patch is almost ready
to merge.

> +                       kfree_rcu(new_arr, rcu);
> +                       /* No active aggragator means its not safe to use

s/its/it's/

> +                        * the previous array.
> +                        */
> +                       old_arr = rtnl_dereference(bond->slave_arr);
> +                       if (old_arr) {
> +                               RCU_INIT_POINTER(bond->slave_arr, NULL);
> +                               kfree_rcu(old_arr, rcu);
> +                       }
> +                       goto out;
> +               }
> +               slaves_in_agg = ad_info.ports;
> +               agg_id = ad_info.aggregator_id;
> +       }



Thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next] mlx4: optimize xmit path
From: Eric Dumazet @ 2014-10-02  4:35 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Alexei Starovoitov, David S. Miller, Jesper Dangaard Brouer,
	Eric Dumazet, John Fastabend, Linux Netdev List, Amir Vadai,
	Or Gerlitz
In-Reply-To: <1411964353.30721.6.camel@edumazet-glaptop2.roam.corp.google.com>

On Sun, 2014-09-28 at 21:19 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>

...
> 6) mdev->mr.key stored in ring->mr_key to also avoid bswap() and access
>    to cold cache line.
>  


>  		ring->bf.offset ^= ring->bf.buf_size;
>  	} else {
> +		tx_desc->ctrl.vlan_tag = cpu_to_be16(vlan_tag);
> +		tx_desc->ctrl.ins_vlan = MLX4_WQE_CTRL_INS_VLAN *
> +			!!vlan_tx_tag_present(skb);
> +		tx_desc->ctrl.fence_size = real_size;
> +
>  		/* Ensure new descriptor hits memory
>  		 * before setting ownership of this descriptor to HW
>  		 */


Sorry, there is a missing replacement of 

	iowrite32be(ring->doorbell_qpn,
		    ring->bf.uar->map + MLX4_SEND_DOORBELL);

by iowrite32(ring->doorbell_qpn,
	     ring->bf.uar->map + MLX4_SEND_DOORBELL);

Since doorbel_qpn was changed to a __be32 and setup in
mlx4_en_activate_tx_ring()

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2014-10-02  4:16 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, hayeswang

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
drivers/net/usb/r8152.c between commit 204c87041289 ("r8152: remove
clearing bp") from the net tree and commit 8ddfa07778af ("r8152: use
usleep_range") from the net-next tree.

I fixed it up (the former removed some of the code updated by the
latter) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH iproute2 v2] iplink: do not require assigning negative ifindex at link creation
From: Cong Wang @ 2014-10-02  4:16 UTC (permalink / raw)
  To: Atzm Watanabe; +Cc: netdev, Stephen Hemminger
In-Reply-To: <87tx3o5ojd.wl%atzm@stratosphere.co.jp>

On Tue, Sep 30, 2014 at 10:47 PM, Atzm Watanabe <atzm@stratosphere.co.jp> wrote:
> Since commit 3c682146aeff, iplink requires assigning negative
> ifindex (-1) to the kernel when creating interface without
> specifying index.
>
> v2: checking whether index is -1, suggested by Cong Wang.
>
> Cc: Cong Wang <cwang@twopensource.com>
> Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>

Acked-by:  Cong Wang <cwang@twopensource.com>

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 2/2] sunvnet: vnet_start_xmit() must hold a refcnt on port.
From: Raghuram Kothakota @ 2014-10-02  3:36 UTC (permalink / raw)
  To: David Miller; +Cc: david.stevens, sowmini.varadhan, netdev
In-Reply-To: <20141001.155210.882272719949254470.davem@davemloft.net>

On Oct 1, 2014, at 12:52 PM, David Miller <davem@davemloft.net> wrote:

> From: David L Stevens <david.stevens@oracle.com>
> Date: Wed, 01 Oct 2014 15:31:49 -0400
> 
>> 
>> 
>> On 10/01/2014 03:23 PM, Sowmini Varadhan wrote:
>>> On (10/01/14 15:06), David L Stevens wrote:
>>>> 
>>>> This "vp->switch_port" addition doesn't appear to be related to the port refcnt
>>>> change, and doesn't allow for multiple switch ports.
>>> 
>>> The switch_port is the connection to Dom0. Do you envision us having more than
>>> one switch_port? How?
>> 
>> While Dom0 might only create one port with the "switch" flag, the flag just means
>> "I can reach anybody" and is not inherently unique. I don't think an attached
>> VM should assume there is always only one; it prevents multipath load balancing
>> kinds of things in the future.
>> 
>> Also, there is the broader point that this sort of change should be a separate patch.
>> It isn't required for fixing the dangling reference -- it is an independent change.
> 
> Multiple switch ports are absolutely allowed by the protocol spec and can
> provide the suggested facilities David mentioned, don't prevent them from
> being used.

In reality, introducing multiple switch-ports will need Guest driver change
as well. The existing sunvnet driver will not automatically utilize all switch-ports
and requires changes. When we add the full support to use multiple switch-ports,
I am sure we can change the current optimization for switch-port lookup with
a different method probably with another optimized method than what it is today.

-Raghuram

^ permalink raw reply

* Re: [PATCH net-next 2/2] sunvnet: vnet_start_xmit() must hold a refcnt on port.
From: Raghuram Kothakota @ 2014-10-02  3:23 UTC (permalink / raw)
  To: David L Stevens; +Cc: Sowmini Varadhan, davem, netdev
In-Reply-To: <542C56A5.4070805@oracle.com>

On Oct 1, 2014, at 12:31 PM, David L Stevens <david.stevens@oracle.com> wrote:

> 
> 
> On 10/01/2014 03:23 PM, Sowmini Varadhan wrote:
>> On (10/01/14 15:06), David L Stevens wrote:
>>> 
>>> This "vp->switch_port" addition doesn't appear to be related to the port refcnt
>>> change, and doesn't allow for multiple switch ports.
>> 
>> The switch_port is the connection to Dom0. Do you envision us having more than
>> one switch_port? How?
> 
> While Dom0 might only create one port with the "switch" flag, the flag just means
> "I can reach anybody" and is not inherently unique. I don't think an attached
> VM should assume there is always only one; it prevents multipath load balancing
> kinds of things in the future.

At the moment our architecture defines only one switch-port. We certainly toyed
with the idea of multiple paths(actually LDCs) mainly for the performance but
we were able to achieve our goal with one LDC so that was not introduced.
Our original design included the idea of multiple LDCs in a port, but not 
multiple ports for performance purpose. We also explored adding paths to
different virtual switches, mainly intended for path failover when one of them
failed due to service domain crash/panic.  We have such feature in our virtual
disk, there we realized it is very proprietary implementation and cost of maintaining
and improving those features is high. We abandoned these proprietary solutions
but rely on the rich features that already exists in the Guest OS operating systems
with multiple network devices.  In summary, currently we do not have any plans
to multiple paths.

Even if we implement multiple paths, we expect it is the choice of the Guest on how
it utilizes those paths as it need to have the knowledge of these multiple paths
and use them as designed.  That is, the driver implementation has to change.
For now, I do not see any issues with this driver assuming one switch-port. If we
ever support multiple paths, I would expect this driver to change, at that time
the code can implement appropriate method of load balancing or failover.

-Raghuram
> 
> Also, there is the broader point that this sort of change should be a separate patch.
> It isn't required for fixing the dangling reference -- it is an independent change.
> 
> 							+-DLS
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2014-10-02  3:03 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Don't halt the firmware in r8152 driver, from Hayes Wang.

2) Handle full sized 802.1ad frames in bnx2 and tg3 drivers
   properly, from Vlad Yasevich.

3) Don't sleep while holding tx_clean_lock in netxen driver,
   fix from Manish Chopra.

4) Certain kinds of ipv6 routes can end up endlessly failing
   the route validation test, causing it to be re-looked up
   over and over again.  This particularly kills input route
   caching in TCP sockets.  Fix from Hannes Frederic Sowa.

5) netvsc_start_xmit() has a use-after-free access to skb->len,
   fix from K. Y. Srinivasan.

6) Fix matching of inverted containers in ematch module, from
   Ignacy Gawędzki.

7) Aggregation of GRO frames via SKB ->frag_list for linear skbs isn't
   handled properly, regression fix from Eric Dumazet.

8) Don't test return value of ipv4_neigh_lookup(), which returns an
   error pointer, against NULL.  From WANG Cong.

9) Fix an old regression where we mistakenly allow a double add
   of the same tunnel.  Fixes from Steffen Klassert.

10) macvtap device delete and open can run in parallel and corrupt
    lists etc., fix from Vlad Yasevich.

11) Fix build error with IPV6=m NETFILTER_XT_TARGET_TPROXY=y, from
    Pablo Neira Ayuso.

12) rhashtable_destroy() triggers lockdep splats, fix also from
    Pablo.

Please pull, thanks a lot!

The following changes since commit b94d525e58dc9638dd3f98094cb468bcfb262039:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2014-09-24 12:45:24 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 439e9575e777bdad1d9da15941e02adf34f4c392:

  bna: Update Maintainer Email (2014-10-01 22:13:41 -0400)

----------------------------------------------------------------
David S. Miller (5):
      Merge branch 'qlcnic'
      Merge git://git.kernel.org/.../pablo/nf
      Merge branch 'ipv6_tunnel'
      Merge branch 'netxen'
      Merge branch 'r8152'

Eric Dumazet (1):
      gro: fix aggregation for skb using frag_list

Hannes Frederic Sowa (1):
      ipv6: remove rt6i_genid

Ignacy Gawędzki (1):
      ematch: Fix matching of inverted containers.

KY Srinivasan (1):
      hyperv: Fix a bug in netvsc_start_xmit()

Kweh, Hock Leong (1):
      net: stmmac: fix stmmac_pci_probe failed when CONFIG_HAVE_CLK is selected

Manish Chopra (5):
      qlcnic: Fix memory corruption while reading stats using ethtool.
      qlcnic: Remove __QLCNIC_DEV_UP bit check to read TX queues statistics.
      qlcnic: Fix ordering of stats in stats buffer.
      netxen: Fix BUG "sleeping function called from invalid context"
      netxen: Fix bug in Tx completion path.

Matan Barak (1):
      net/mlx4_core: Allow not to specify probe_vf in SRIOV IB mode

Nicolas Dichtel (1):
      ip6gre: add a rtnl link alias for ip6gretap

Pablo Neira Ayuso (5):
      netfilter: nft_hash: no need for rcu in the hash set destroy path
      netfilter: nft_rbtree: no need for spinlock from set destroy path
      rhashtable: fix lockdep splat in rhashtable_destroy()
      netfilter: nfnetlink: deliver netlink errors on batch completion
      netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup'

Rasesh Mody (1):
      bna: Update Maintainer Email

Sony Chacko (1):
      qlcnic: Use qlcnic_83xx_flash_read32() API instead of lockless version of the API.

Soren Brinkmann (1):
      Revert "net/macb: add pinctrl consumer support"

Steffen Klassert (4):
      ip_tunnel: Don't allow to add the same tunnel multiple times.
      ip6_tunnel: Return an error when adding an existing tunnel.
      ip6_vti: Return an error when adding an existing tunnel.
      ip6_gre: Return an error when adding an existing tunnel.

Vlad Yasevich (3):
      macvtap: Fix race between device delete and open.
      tg3: Allow for recieve of full-size 8021AD frames
      bnx2: Correctly receive full sized 802.1ad fragmes

WANG Cong (1):
      neigh: check error pointer instead of NULL for ipv4_neigh_lookup()

hayeswang (4):
      r8152: fix the carrier off when autoresuming
      r8152: fix setting RTL8152_UNPLUG
      r8152: remove clearing bp
      r8152: disable power cut for RTL8153

 MAINTAINERS                                           |  2 +-
 drivers/net/ethernet/broadcom/bnx2.c                  |  5 ++--
 drivers/net/ethernet/broadcom/tg3.c                   |  3 +-
 drivers/net/ethernet/cadence/macb.c                   | 11 -------
 drivers/net/ethernet/mellanox/mlx4/main.c             |  4 +--
 drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c  |  6 ++--
 drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c  |  2 --
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c |  5 ++--
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c   | 10 +++----
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c     | 11 +++++--
 drivers/net/hyperv/netvsc_drv.c                       |  3 +-
 drivers/net/macvtap.c                                 | 18 +++++------
 drivers/net/usb/r8152.c                               | 88 ++++++++++++++++++++++++++----------------------------
 include/net/ip6_fib.h                                 |  5 +---
 include/net/net_namespace.h                           | 20 ++-----------
 lib/rhashtable.c                                      |  8 ++---
 net/core/skbuff.c                                     |  3 ++
 net/ipv4/ip_tunnel.c                                  | 11 +++++--
 net/ipv4/route.c                                      |  2 +-
 net/ipv6/addrconf.c                                   |  3 +-
 net/ipv6/addrconf_core.c                              |  7 +++++
 net/ipv6/ip6_fib.c                                    | 20 +++++++++++++
 net/ipv6/ip6_gre.c                                    |  3 ++
 net/ipv6/ip6_tunnel.c                                 |  6 +++-
 net/ipv6/ip6_vti.c                                    |  6 +++-
 net/ipv6/route.c                                      |  4 ---
 net/netfilter/Kconfig                                 |  1 +
 net/netfilter/nfnetlink.c                             | 64 ++++++++++++++++++++++++++++++++++++++-
 net/netfilter/nft_hash.c                              | 12 ++++----
 net/netfilter/nft_rbtree.c                            |  2 --
 net/sched/ematch.c                                    |  6 ++--
 31 files changed, 217 insertions(+), 134 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next v6 2/2] bonding: Simplify the xmit function for modes that use xmit_hash
From: Mahesh Bandewar @ 2014-10-02  2:56 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, David Miller, netdev,
	Eric Dumazet, Maciej Zenczykowski
In-Reply-To: <6356.1412180365@famine>

On Wed, Oct 1, 2014 at 9:49 PM, Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
> Mahesh Bandewar <maheshb@google.com> wrote:
>
>>Earlier change to use usable slave array for TLB mode had an additional
>>performance advantage. So extending the same logic to all other modes
>>that use xmit-hash for slave selection (viz 802.3AD, and XOR modes).
>>Also consolidating this with the earlier TLB change.
>>
>>The main idea is to build the usable slaves array in the control path
>>and use that array for slave selection during xmit operation.
>>
>>Measured performance in a setup with a bond of 4x1G NICs with 200
>>instances of netperf for the modes involved (3ad, xor, tlb)
>>cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5
>>
>>Mode        TPS-Before   TPS-After
>>
>>802.3ad   : 468,694      493,101
>>TLB (lb=0): 392,583      392,965
>>XOR       : 475,696      484,517
>>
>>Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>>---
>>v1:
>>  (a) If bond_update_slave_arr() fails to allocate memory, it will overwrite
>>      the slave that need to be removed.
>>  (b) Freeing of array will assign NULL (to handle bond->down to bond->up
>>      transition gracefully.
>>  (c) Change from pr_debug() to pr_err() if bond_update_slave_arr() returns
>>      failure.
>>  (d) XOR: bond_update_slave_arr() will consider mii-mon, arp-mon cases and
>>      will populate the array even if these parameters are not used.
>>  (e) 3AD: Should handle the ad_agg_selection_logic correctly.
>>v2:
>>  (a) Removed rcu_read_{un}lock() calls from array manipulation code.
>>  (b) Slave link-events now refresh array for all these modes.
>>  (c) Moved free-array call from bond_close() to bond_uninit().
>>v3:
>>  (a) Fixed null pointer dereference.
>>  (b) Removed bond->lock lockdep dependency.
>>v4:
>>  (a) Made to changes to comply with Nikolay's locking changes
>>  (b) Added a work-queue to refresh slave-array when RTNL is not held
>>  (c) Array refresh happens ONLY with RTNL now.
>>  (d) alloc changed from GFP_ATOMIC to GFP_KERNEL
>>v5:
>>  (a) Consolidated all delayed slave-array updates at one place in
>>      3ad_state_machine_handler()
>>v6:
>>  (a) Free slave array when there is no active aggregator
>>
>> drivers/net/bonding/bond_3ad.c  | 140 +++++++++++------------------
>> drivers/net/bonding/bond_alb.c  |  51 ++---------
>> drivers/net/bonding/bond_alb.h  |   8 --
>> drivers/net/bonding/bond_main.c | 192 +++++++++++++++++++++++++++++++++++++---
>> drivers/net/bonding/bonding.h   |  10 +++
>> 5 files changed, 249 insertions(+), 152 deletions(-)
>>
>>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>index 7e9e522fd476..2110215f3528 100644
>>--- a/drivers/net/bonding/bond_3ad.c
>>+++ b/drivers/net/bonding/bond_3ad.c
>>@@ -102,17 +102,20 @@ static const u8 lacpdu_mcast_addr[ETH_ALEN] = MULTICAST_LACPDU_ADDR;
>> /* ================= main 802.3ad protocol functions ================== */
>> static int ad_lacpdu_send(struct port *port);
>> static int ad_marker_send(struct port *port, struct bond_marker *marker);
>>-static void ad_mux_machine(struct port *port);
>>+static void ad_mux_machine(struct port *port, bool *update_slave_arr);
>> static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port);
>> static void ad_tx_machine(struct port *port);
>> static void ad_periodic_machine(struct port *port);
>>-static void ad_port_selection_logic(struct port *port);
>>-static void ad_agg_selection_logic(struct aggregator *aggregator);
>>+static void ad_port_selection_logic(struct port *port, bool *update_slave_arr);
>>+static void ad_agg_selection_logic(struct aggregator *aggregator,
>>+                                 bool *update_slave_arr);
>> static void ad_clear_agg(struct aggregator *aggregator);
>> static void ad_initialize_agg(struct aggregator *aggregator);
>> static void ad_initialize_port(struct port *port, int lacp_fast);
>>-static void ad_enable_collecting_distributing(struct port *port);
>>-static void ad_disable_collecting_distributing(struct port *port);
>>+static void ad_enable_collecting_distributing(struct port *port,
>>+                                            bool *update_slave_arr);
>>+static void ad_disable_collecting_distributing(struct port *port,
>>+                                             bool *update_slave_arr);
>> static void ad_marker_info_received(struct bond_marker *marker_info,
>>                                   struct port *port);
>> static void ad_marker_response_received(struct bond_marker *marker,
>>@@ -796,8 +799,9 @@ static int ad_marker_send(struct port *port, struct bond_marker *marker)
>> /**
>>  * ad_mux_machine - handle a port's mux state machine
>>  * @port: the port we're looking at
>>+ * @update_slave_arr: Does slave array need update?
>>  */
>>-static void ad_mux_machine(struct port *port)
>>+static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>> {
>>       mux_states_t last_state;
>>
>>@@ -901,7 +905,8 @@ static void ad_mux_machine(struct port *port)
>>               switch (port->sm_mux_state) {
>>               case AD_MUX_DETACHED:
>>                       port->actor_oper_port_state &= ~AD_STATE_SYNCHRONIZATION;
>>-                      ad_disable_collecting_distributing(port);
>>+                      ad_disable_collecting_distributing(port,
>>+                                                         update_slave_arr);
>>                       port->actor_oper_port_state &= ~AD_STATE_COLLECTING;
>>                       port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING;
>>                       port->ntt = true;
>>@@ -913,13 +918,15 @@ static void ad_mux_machine(struct port *port)
>>                       port->actor_oper_port_state |= AD_STATE_SYNCHRONIZATION;
>>                       port->actor_oper_port_state &= ~AD_STATE_COLLECTING;
>>                       port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING;
>>-                      ad_disable_collecting_distributing(port);
>>+                      ad_disable_collecting_distributing(port,
>>+                                                         update_slave_arr);
>>                       port->ntt = true;
>>                       break;
>>               case AD_MUX_COLLECTING_DISTRIBUTING:
>>                       port->actor_oper_port_state |= AD_STATE_COLLECTING;
>>                       port->actor_oper_port_state |= AD_STATE_DISTRIBUTING;
>>-                      ad_enable_collecting_distributing(port);
>>+                      ad_enable_collecting_distributing(port,
>>+                                                        update_slave_arr);
>>                       port->ntt = true;
>>                       break;
>>               default:
>>@@ -1187,12 +1194,13 @@ static void ad_periodic_machine(struct port *port)
>> /**
>>  * ad_port_selection_logic - select aggregation groups
>>  * @port: the port we're looking at
>>+ * @update_slave_arr: Does slave array need update?
>>  *
>>  * Select aggregation groups, and assign each port for it's aggregetor. The
>>  * selection logic is called in the inititalization (after all the handshkes),
>>  * and after every lacpdu receive (if selected is off).
>>  */
>>-static void ad_port_selection_logic(struct port *port)
>>+static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
>
>         Since this function is void, why not have it return a value
> instead of the bool *update_slave_arr?  That would eliminate the need
> for some call sites to pass a "dummy" to the function.  This comment
> applies to ad_agg_selection_logic and ad_enable_collecting_distributing
> as well.
>
Yes, I had similar discussion with Nik earlier and overloading the
return value did not feel clean and future-proof and hence decided to
take this approach.

>         -J
>
>> {
>>       struct aggregator *aggregator, *free_aggregator = NULL, *temp_aggregator;
>>       struct port *last_port = NULL, *curr_port;
>>@@ -1347,7 +1355,7 @@ static void ad_port_selection_logic(struct port *port)
>>                             __agg_ports_are_ready(port->aggregator));
>>
>>       aggregator = __get_first_agg(port);
>>-      ad_agg_selection_logic(aggregator);
>>+      ad_agg_selection_logic(aggregator, update_slave_arr);
>> }
>>
>> /* Decide if "agg" is a better choice for the new active aggregator that
>>@@ -1435,6 +1443,7 @@ static int agg_device_up(const struct aggregator *agg)
>> /**
>>  * ad_agg_selection_logic - select an aggregation group for a team
>>  * @aggregator: the aggregator we're looking at
>>+ * @update_slave_arr: Does slave array need update?
>>  *
>>  * It is assumed that only one aggregator may be selected for a team.
>>  *
>>@@ -1457,7 +1466,8 @@ static int agg_device_up(const struct aggregator *agg)
>>  * __get_active_agg() won't work correctly. This function should be better
>>  * called with the bond itself, and retrieve the first agg from it.
>>  */
>>-static void ad_agg_selection_logic(struct aggregator *agg)
>>+static void ad_agg_selection_logic(struct aggregator *agg,
>>+                                 bool *update_slave_arr)
>> {
>>       struct aggregator *best, *active, *origin;
>>       struct bonding *bond = agg->slave->bond;
>>@@ -1550,6 +1560,8 @@ static void ad_agg_selection_logic(struct aggregator *agg)
>>                               __disable_port(port);
>>                       }
>>               }
>>+              /* Slave array needs update. */
>>+              *update_slave_arr = true;
>>       }
>>
>>       /* if the selected aggregator is of join individuals
>>@@ -1678,24 +1690,30 @@ static void ad_initialize_port(struct port *port, int lacp_fast)
>> /**
>>  * ad_enable_collecting_distributing - enable a port's transmit/receive
>>  * @port: the port we're looking at
>>+ * @update_slave_arr: Does slave array need update?
>>  *
>>  * Enable @port if it's in an active aggregator
>>  */
>>-static void ad_enable_collecting_distributing(struct port *port)
>>+static void ad_enable_collecting_distributing(struct port *port,
>>+                                            bool *update_slave_arr)
>> {
>>       if (port->aggregator->is_active) {
>>               pr_debug("Enabling port %d(LAG %d)\n",
>>                        port->actor_port_number,
>>                        port->aggregator->aggregator_identifier);
>>               __enable_port(port);
>>+              /* Slave array needs update */
>>+              *update_slave_arr = true;
>>       }
>> }
>>
>> /**
>>  * ad_disable_collecting_distributing - disable a port's transmit/receive
>>  * @port: the port we're looking at
>>+ * @update_slave_arr: Does slave array need update?
>>  */
>>-static void ad_disable_collecting_distributing(struct port *port)
>>+static void ad_disable_collecting_distributing(struct port *port,
>>+                                             bool *update_slave_arr)
>> {
>>       if (port->aggregator &&
>>           !MAC_ADDRESS_EQUAL(&(port->aggregator->partner_system),
>>@@ -1704,6 +1722,8 @@ static void ad_disable_collecting_distributing(struct port *port)
>>                        port->actor_port_number,
>>                        port->aggregator->aggregator_identifier);
>>               __disable_port(port);
>>+              /* Slave array needs an update */
>>+              *update_slave_arr = true;
>>       }
>> }
>>
>>@@ -1868,6 +1888,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
>>       struct bonding *bond = slave->bond;
>>       struct slave *slave_iter;
>>       struct list_head *iter;
>>+      bool dummy_slave_update; /* Ignore this value as caller updates array */
>>
>>       /* Sync against bond_3ad_state_machine_handler() */
>>       spin_lock_bh(&bond->mode_lock);
>>@@ -1951,7 +1972,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
>>                               ad_clear_agg(aggregator);
>>
>>                               if (select_new_active_agg)
>>-                                      ad_agg_selection_logic(__get_first_agg(port));
>>+                                      ad_agg_selection_logic(__get_first_agg(port),
>>+                                                             &dummy_slave_update);
>>                       } else {
>>                               netdev_warn(bond->dev, "unbinding aggregator, and could not find a new aggregator for its ports\n");
>>                       }
>>@@ -1966,7 +1988,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
>>                               /* select new active aggregator */
>>                               temp_aggregator = __get_first_agg(port);
>>                               if (temp_aggregator)
>>-                                      ad_agg_selection_logic(temp_aggregator);
>>+                                      ad_agg_selection_logic(temp_aggregator,
>>+                                                             &dummy_slave_update);
>>                       }
>>               }
>>       }
>>@@ -1996,7 +2019,8 @@ void bond_3ad_unbind_slave(struct slave *slave)
>>                                       if (select_new_active_agg) {
>>                                               netdev_info(bond->dev, "Removing an active aggregator\n");
>>                                               /* select new active aggregator */
>>-                                              ad_agg_selection_logic(__get_first_agg(port));
>>+                                              ad_agg_selection_logic(__get_first_agg(port),
>>+                                                                     &dummy_slave_update);
>>                                       }
>>                               }
>>                               break;
>>@@ -2031,6 +2055,7 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>>       struct slave *slave;
>>       struct port *port;
>>       bool should_notify_rtnl = BOND_SLAVE_NOTIFY_LATER;
>>+      bool update_slave_arr = false;
>>
>>       /* Lock to protect data accessed by all (e.g., port->sm_vars) and
>>        * against running with bond_3ad_unbind_slave. ad_rx_machine may run
>>@@ -2058,7 +2083,7 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>>                       }
>>
>>                       aggregator = __get_first_agg(port);
>>-                      ad_agg_selection_logic(aggregator);
>>+                      ad_agg_selection_logic(aggregator, &update_slave_arr);
>>               }
>>               bond_3ad_set_carrier(bond);
>>       }
>>@@ -2074,8 +2099,8 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>>
>>               ad_rx_machine(NULL, port);
>>               ad_periodic_machine(port);
>>-              ad_port_selection_logic(port);
>>-              ad_mux_machine(port);
>>+              ad_port_selection_logic(port, &update_slave_arr);
>>+              ad_mux_machine(port, &update_slave_arr);
>>               ad_tx_machine(port);
>>
>>               /* turn off the BEGIN bit, since we already handled it */
>>@@ -2093,6 +2118,9 @@ re_arm:
>>       rcu_read_unlock();
>>       spin_unlock_bh(&bond->mode_lock);
>>
>>+      if (update_slave_arr)
>>+              bond_slave_arr_work_rearm(bond, 0);
>>+
>>       if (should_notify_rtnl && rtnl_trylock()) {
>>               bond_slave_state_notify(bond);
>>               rtnl_unlock();
>>@@ -2283,6 +2311,11 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
>>       port->sm_vars |= AD_PORT_BEGIN;
>>
>>       spin_unlock_bh(&slave->bond->mode_lock);
>>+
>>+      /* RTNL is held and mode_lock is released so it's safe
>>+       * to update slave_array here.
>>+       */
>>+      bond_update_slave_arr(slave->bond, NULL);
>> }
>>
>> /**
>>@@ -2377,73 +2410,6 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>>       return ret;
>> }
>>
>>-int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>>-{
>>-      struct bonding *bond = netdev_priv(dev);
>>-      struct slave *slave, *first_ok_slave;
>>-      struct aggregator *agg;
>>-      struct ad_info ad_info;
>>-      struct list_head *iter;
>>-      int slaves_in_agg;
>>-      int slave_agg_no;
>>-      int agg_id;
>>-
>>-      if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
>>-              netdev_dbg(dev, "__bond_3ad_get_active_agg_info failed\n");
>>-              goto err_free;
>>-      }
>>-
>>-      slaves_in_agg = ad_info.ports;
>>-      agg_id = ad_info.aggregator_id;
>>-
>>-      if (slaves_in_agg == 0) {
>>-              netdev_dbg(dev, "active aggregator is empty\n");
>>-              goto err_free;
>>-      }
>>-
>>-      slave_agg_no = bond_xmit_hash(bond, skb) % slaves_in_agg;
>>-      first_ok_slave = NULL;
>>-
>>-      bond_for_each_slave_rcu(bond, slave, iter) {
>>-              agg = SLAVE_AD_INFO(slave)->port.aggregator;
>>-              if (!agg || agg->aggregator_identifier != agg_id)
>>-                      continue;
>>-
>>-              if (slave_agg_no >= 0) {
>>-                      if (!first_ok_slave && bond_slave_can_tx(slave))
>>-                              first_ok_slave = slave;
>>-                      slave_agg_no--;
>>-                      continue;
>>-              }
>>-
>>-              if (bond_slave_can_tx(slave)) {
>>-                      bond_dev_queue_xmit(bond, skb, slave->dev);
>>-                      goto out;
>>-              }
>>-      }
>>-
>>-      if (slave_agg_no >= 0) {
>>-              netdev_err(dev, "Couldn't find a slave to tx on for aggregator ID %d\n",
>>-                         agg_id);
>>-              goto err_free;
>>-      }
>>-
>>-      /* we couldn't find any suitable slave after the agg_no, so use the
>>-       * first suitable found, if found.
>>-       */
>>-      if (first_ok_slave)
>>-              bond_dev_queue_xmit(bond, skb, first_ok_slave->dev);
>>-      else
>>-              goto err_free;
>>-
>>-out:
>>-      return NETDEV_TX_OK;
>>-err_free:
>>-      /* no suitable interface, frame not sent */
>>-      dev_kfree_skb_any(skb);
>>-      goto out;
>>-}
>>-
>> int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
>>                        struct slave *slave)
>> {
>>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>>index 615f3bebd019..d2eadab787c5 100644
>>--- a/drivers/net/bonding/bond_alb.c
>>+++ b/drivers/net/bonding/bond_alb.c
>>@@ -177,7 +177,6 @@ static int tlb_initialize(struct bonding *bond)
>> static void tlb_deinitialize(struct bonding *bond)
>> {
>>       struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>>-      struct tlb_up_slave *arr;
>>
>>       spin_lock_bh(&bond->mode_lock);
>>
>>@@ -185,10 +184,6 @@ static void tlb_deinitialize(struct bonding *bond)
>>       bond_info->tx_hashtbl = NULL;
>>
>>       spin_unlock_bh(&bond->mode_lock);
>>-
>>-      arr = rtnl_dereference(bond_info->slave_arr);
>>-      if (arr)
>>-              kfree_rcu(arr, rcu);
>> }
>>
>> static long long compute_gap(struct slave *slave)
>>@@ -1336,39 +1331,9 @@ out:
>>       return NETDEV_TX_OK;
>> }
>>
>>-static int bond_tlb_update_slave_arr(struct bonding *bond,
>>-                                   struct slave *skipslave)
>>-{
>>-      struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>>-      struct slave *tx_slave;
>>-      struct list_head *iter;
>>-      struct tlb_up_slave *new_arr, *old_arr;
>>-
>>-      new_arr = kzalloc(offsetof(struct tlb_up_slave, arr[bond->slave_cnt]),
>>-                        GFP_ATOMIC);
>>-      if (!new_arr)
>>-              return -ENOMEM;
>>-
>>-      bond_for_each_slave(bond, tx_slave, iter) {
>>-              if (!bond_slave_can_tx(tx_slave))
>>-                      continue;
>>-              if (skipslave == tx_slave)
>>-                      continue;
>>-              new_arr->arr[new_arr->count++] = tx_slave;
>>-      }
>>-
>>-      old_arr = rtnl_dereference(bond_info->slave_arr);
>>-      rcu_assign_pointer(bond_info->slave_arr, new_arr);
>>-      if (old_arr)
>>-              kfree_rcu(old_arr, rcu);
>>-
>>-      return 0;
>>-}
>>-
>> int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>> {
>>       struct bonding *bond = netdev_priv(bond_dev);
>>-      struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>>       struct ethhdr *eth_data;
>>       struct slave *tx_slave = NULL;
>>       u32 hash_index;
>>@@ -1389,12 +1354,14 @@ int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>                                                             hash_index & 0xFF,
>>                                                             skb->len);
>>                       } else {
>>-                              struct tlb_up_slave *slaves;
>>+                              struct bond_up_slave *slaves;
>>+                              unsigned int count;
>>
>>-                              slaves = rcu_dereference(bond_info->slave_arr);
>>-                              if (slaves && slaves->count)
>>+                              slaves = rcu_dereference(bond->slave_arr);
>>+                              count = slaves ? ACCESS_ONCE(slaves->count) : 0;
>>+                              if (likely(count))
>>                                       tx_slave = slaves->arr[hash_index %
>>-                                                             slaves->count];
>>+                                                             count];
>>                       }
>>                       break;
>>               }
>>@@ -1641,10 +1608,6 @@ void bond_alb_deinit_slave(struct bonding *bond, struct slave *slave)
>>               rlb_clear_slave(bond, slave);
>>       }
>>
>>-      if (bond_is_nondyn_tlb(bond))
>>-              if (bond_tlb_update_slave_arr(bond, slave))
>>-                      pr_err("Failed to build slave-array for TLB mode.\n");
>>-
>> }
>>
>> void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char link)
>>@@ -1669,7 +1632,7 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char
>>       }
>>
>>       if (bond_is_nondyn_tlb(bond)) {
>>-              if (bond_tlb_update_slave_arr(bond, NULL))
>>+              if (bond_update_slave_arr(bond, NULL))
>>                       pr_err("Failed to build slave-array for TLB mode.\n");
>>       }
>> }
>>diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
>>index 3c6a7ff974d7..1ad473b4ade5 100644
>>--- a/drivers/net/bonding/bond_alb.h
>>+++ b/drivers/net/bonding/bond_alb.h
>>@@ -139,19 +139,11 @@ struct tlb_slave_info {
>>                        */
>> };
>>
>>-struct tlb_up_slave {
>>-      unsigned int    count;
>>-      struct rcu_head rcu;
>>-      struct slave    *arr[0];
>>-};
>>-
>> struct alb_bond_info {
>>       struct tlb_client_info  *tx_hashtbl; /* Dynamically allocated */
>>       u32                     unbalanced_load;
>>       int                     tx_rebalance_counter;
>>       int                     lp_counter;
>>-      /* -------- non-dynamic tlb mode only ---------*/
>>-      struct tlb_up_slave __rcu *slave_arr;     /* Up slaves */
>>       /* -------- rlb parameters -------- */
>>       int rlb_enabled;
>>       struct rlb_client_info  *rx_hashtbl;    /* Receive hash table */
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index c2adc2755ff6..6f79f495b01a 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -210,6 +210,7 @@ static int bond_init(struct net_device *bond_dev);
>> static void bond_uninit(struct net_device *bond_dev);
>> static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
>>                                               struct rtnl_link_stats64 *stats);
>>+static void bond_slave_arr_handler(struct work_struct *work);
>>
>> /*---------------------------- General routines -----------------------------*/
>>
>>@@ -1551,6 +1552,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>               unblock_netpoll_tx();
>>       }
>>
>>+      if (bond_mode_uses_xmit_hash(bond))
>>+              bond_update_slave_arr(bond, NULL);
>>+
>>       netdev_info(bond_dev, "Enslaving %s as %s interface with %s link\n",
>>                   slave_dev->name,
>>                   bond_is_active_slave(new_slave) ? "an active" : "a backup",
>>@@ -1668,6 +1672,9 @@ static int __bond_release_one(struct net_device *bond_dev,
>>       if (BOND_MODE(bond) == BOND_MODE_8023AD)
>>               bond_3ad_unbind_slave(slave);
>>
>>+      if (bond_mode_uses_xmit_hash(bond))
>>+              bond_update_slave_arr(bond, slave);
>>+
>>       netdev_info(bond_dev, "Releasing %s interface %s\n",
>>                   bond_is_active_slave(slave) ? "active" : "backup",
>>                   slave_dev->name);
>>@@ -1970,6 +1977,9 @@ static void bond_miimon_commit(struct bonding *bond)
>>                               bond_alb_handle_link_change(bond, slave,
>>                                                           BOND_LINK_UP);
>>
>>+                      if (BOND_MODE(bond) == BOND_MODE_XOR)
>>+                              bond_update_slave_arr(bond, NULL);
>>+
>>                       if (!bond->curr_active_slave || slave == primary)
>>                               goto do_failover;
>>
>>@@ -1997,6 +2007,9 @@ static void bond_miimon_commit(struct bonding *bond)
>>                               bond_alb_handle_link_change(bond, slave,
>>                                                           BOND_LINK_DOWN);
>>
>>+                      if (BOND_MODE(bond) == BOND_MODE_XOR)
>>+                              bond_update_slave_arr(bond, NULL);
>>+
>>                       if (slave == rcu_access_pointer(bond->curr_active_slave))
>>                               goto do_failover;
>>
>>@@ -2453,6 +2466,8 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
>>
>>               if (slave_state_changed) {
>>                       bond_slave_state_change(bond);
>>+                      if (BOND_MODE(bond) == BOND_MODE_XOR)
>>+                              bond_update_slave_arr(bond, NULL);
>>               } else if (do_failover) {
>>                       block_netpoll_tx();
>>                       bond_select_active_slave(bond);
>>@@ -2829,8 +2844,20 @@ static int bond_slave_netdev_event(unsigned long event,
>>                       if (old_duplex != slave->duplex)
>>                               bond_3ad_adapter_duplex_changed(slave);
>>               }
>>+              /* Refresh slave-array if applicable!
>>+               * If the setup does not use miimon or arpmon (mode-specific!),
>>+               * then these events will not cause the slave-array to be
>>+               * refreshed. This will cause xmit to use a slave that is not
>>+               * usable. Avoid such situation by refeshing the array at these
>>+               * events. If these (miimon/arpmon) parameters are configured
>>+               * then array gets refreshed twice and that should be fine!
>>+               */
>>+              if (bond_mode_uses_xmit_hash(bond))
>>+                      bond_update_slave_arr(bond, NULL);
>>               break;
>>       case NETDEV_DOWN:
>>+              if (bond_mode_uses_xmit_hash(bond))
>>+                      bond_update_slave_arr(bond, NULL);
>>               break;
>>       case NETDEV_CHANGEMTU:
>>               /* TODO: Should slaves be allowed to
>>@@ -3010,6 +3037,7 @@ static void bond_work_init_all(struct bonding *bond)
>>       else
>>               INIT_DELAYED_WORK(&bond->arp_work, bond_loadbalance_arp_mon);
>>       INIT_DELAYED_WORK(&bond->ad_work, bond_3ad_state_machine_handler);
>>+      INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
>> }
>>
>> static void bond_work_cancel_all(struct bonding *bond)
>>@@ -3019,6 +3047,7 @@ static void bond_work_cancel_all(struct bonding *bond)
>>       cancel_delayed_work_sync(&bond->alb_work);
>>       cancel_delayed_work_sync(&bond->ad_work);
>>       cancel_delayed_work_sync(&bond->mcast_work);
>>+      cancel_delayed_work_sync(&bond->slave_arr_work);
>> }
>>
>> static int bond_open(struct net_device *bond_dev)
>>@@ -3068,6 +3097,9 @@ static int bond_open(struct net_device *bond_dev)
>>               bond_3ad_initiate_agg_selection(bond, 1);
>>       }
>>
>>+      if (bond_mode_uses_xmit_hash(bond))
>>+              bond_update_slave_arr(bond, NULL);
>>+
>>       return 0;
>> }
>>
>>@@ -3573,20 +3605,148 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
>>       return NETDEV_TX_OK;
>> }
>>
>>-/* In bond_xmit_xor() , we determine the output device by using a pre-
>>- * determined xmit_hash_policy(), If the selected device is not enabled,
>>- * find the next active slave.
>>+/* Use this to update slave_array when (a) it's not appropriate to update
>>+ * slave_array right away (note that update_slave_array() may sleep)
>>+ * and / or (b) RTNL is not held.
>>  */
>>-static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
>>+void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay)
>> {
>>-      struct bonding *bond = netdev_priv(bond_dev);
>>-      int slave_cnt = ACCESS_ONCE(bond->slave_cnt);
>>+      queue_delayed_work(bond->wq, &bond->slave_arr_work, delay);
>>+}
>>
>>-      if (likely(slave_cnt))
>>-              bond_xmit_slave_id(bond, skb,
>>-                                 bond_xmit_hash(bond, skb) % slave_cnt);
>>-      else
>>+/* Slave array work handler. Holds only RTNL */
>>+static void bond_slave_arr_handler(struct work_struct *work)
>>+{
>>+      struct bonding *bond = container_of(work, struct bonding,
>>+                                          slave_arr_work.work);
>>+      int ret;
>>+
>>+      if (!rtnl_trylock())
>>+              goto err;
>>+
>>+      ret = bond_update_slave_arr(bond, NULL);
>>+      rtnl_unlock();
>>+      if (ret) {
>>+              pr_warn_ratelimited("Failed to update slave array from WT\n");
>>+              goto err;
>>+      }
>>+      return;
>>+
>>+err:
>>+      bond_slave_arr_work_rearm(bond, 1);
>>+}
>>+
>>+/* Build the usable slaves array in control path for modes that use xmit-hash
>>+ * to determine the slave interface -
>>+ * (a) BOND_MODE_8023AD
>>+ * (b) BOND_MODE_XOR
>>+ * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0
>>+ *
>>+ * The caller is expected to hold RTNL only and NO other lock!
>>+ */
>>+int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>>+{
>>+      struct slave *slave;
>>+      struct list_head *iter;
>>+      struct bond_up_slave *new_arr, *old_arr;
>>+      int slaves_in_agg;
>>+      int agg_id = 0;
>>+      int ret = 0;
>>+
>>+#ifdef CONFIG_LOCKDEP
>>+      WARN_ON(lockdep_is_held(&bond->mode_lock));
>>+#endif
>>+
>>+      new_arr = kzalloc(offsetof(struct bond_up_slave, arr[bond->slave_cnt]),
>>+                        GFP_KERNEL);
>>+      if (!new_arr) {
>>+              ret = -ENOMEM;
>>+              pr_err("Failed to build slave-array.\n");
>>+              goto out;
>>+      }
>>+      if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>>+              struct ad_info ad_info;
>>+
>>+              if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
>>+                      pr_debug("bond_3ad_get_active_agg_info failed\n");
>>+                      kfree_rcu(new_arr, rcu);
>>+                      /* No active aggragator means its not safe to use
>>+                       * the previous array.
>>+                       */
>>+                      old_arr = rtnl_dereference(bond->slave_arr);
>>+                      if (old_arr) {
>>+                              RCU_INIT_POINTER(bond->slave_arr, NULL);
>>+                              kfree_rcu(old_arr, rcu);
>>+                      }
>>+                      goto out;
>>+              }
>>+              slaves_in_agg = ad_info.ports;
>>+              agg_id = ad_info.aggregator_id;
>>+      }
>>+      bond_for_each_slave(bond, slave, iter) {
>>+              if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>>+                      struct aggregator *agg;
>>+
>>+                      agg = SLAVE_AD_INFO(slave)->port.aggregator;
>>+                      if (!agg || agg->aggregator_identifier != agg_id)
>>+                              continue;
>>+              }
>>+              if (!bond_slave_can_tx(slave))
>>+                      continue;
>>+              if (skipslave == slave)
>>+                      continue;
>>+              new_arr->arr[new_arr->count++] = slave;
>>+      }
>>+
>>+      old_arr = rtnl_dereference(bond->slave_arr);
>>+      rcu_assign_pointer(bond->slave_arr, new_arr);
>>+      if (old_arr)
>>+              kfree_rcu(old_arr, rcu);
>>+out:
>>+      if (ret != 0 && skipslave) {
>>+              int idx;
>>+
>>+              /* Rare situation where caller has asked to skip a specific
>>+               * slave but allocation failed (most likely!). BTW this is
>>+               * only possible when the call is initiated from
>>+               * __bond_release_one(). In this situation; overwrite the
>>+               * skipslave entry in the array with the last entry from the
>>+               * array to avoid a situation where the xmit path may choose
>>+               * this to-be-skipped slave to send a packet out.
>>+               */
>>+              old_arr = rtnl_dereference(bond->slave_arr);
>>+              for (idx = 0; idx < old_arr->count; idx++) {
>>+                      if (skipslave == old_arr->arr[idx]) {
>>+                              old_arr->arr[idx] =
>>+                                  old_arr->arr[old_arr->count-1];
>>+                              old_arr->count--;
>>+                              break;
>>+                      }
>>+              }
>>+      }
>>+      return ret;
>>+}
>>+
>>+/* Use this Xmit function for 3AD as well as XOR modes. The current
>>+ * usable slave array is formed in the control path. The xmit function
>>+ * just calculates hash and sends the packet out.
>>+ */
>>+int bond_3ad_xor_xmit(struct sk_buff *skb, struct net_device *dev)
>>+{
>>+      struct bonding *bond = netdev_priv(dev);
>>+      struct slave *slave;
>>+      struct bond_up_slave *slaves;
>>+      unsigned int count;
>>+
>>+      slaves = rcu_dereference(bond->slave_arr);
>>+      count = slaves ? ACCESS_ONCE(slaves->count) : 0;
>>+      if (likely(count)) {
>>+              slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
>>+              bond_dev_queue_xmit(bond, skb, slave->dev);
>>+      } else {
>>               dev_kfree_skb_any(skb);
>>+              atomic_long_inc(&dev->tx_dropped);
>>+      }
>>
>>       return NETDEV_TX_OK;
>> }
>>@@ -3682,12 +3842,11 @@ static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev
>>               return bond_xmit_roundrobin(skb, dev);
>>       case BOND_MODE_ACTIVEBACKUP:
>>               return bond_xmit_activebackup(skb, dev);
>>+      case BOND_MODE_8023AD:
>>       case BOND_MODE_XOR:
>>-              return bond_xmit_xor(skb, dev);
>>+              return bond_3ad_xor_xmit(skb, dev);
>>       case BOND_MODE_BROADCAST:
>>               return bond_xmit_broadcast(skb, dev);
>>-      case BOND_MODE_8023AD:
>>-              return bond_3ad_xmit_xor(skb, dev);
>>       case BOND_MODE_ALB:
>>               return bond_alb_xmit(skb, dev);
>>       case BOND_MODE_TLB:
>>@@ -3861,6 +4020,7 @@ static void bond_uninit(struct net_device *bond_dev)
>>       struct bonding *bond = netdev_priv(bond_dev);
>>       struct list_head *iter;
>>       struct slave *slave;
>>+      struct bond_up_slave *arr;
>>
>>       bond_netpoll_cleanup(bond_dev);
>>
>>@@ -3869,6 +4029,12 @@ static void bond_uninit(struct net_device *bond_dev)
>>               __bond_release_one(bond_dev, slave->dev, true);
>>       netdev_info(bond_dev, "Released all slaves\n");
>>
>>+      arr = rtnl_dereference(bond->slave_arr);
>>+      if (arr) {
>>+              kfree_rcu(arr, rcu);
>>+              RCU_INIT_POINTER(bond->slave_arr, NULL);
>>+      }
>>+
>>       list_del(&bond->bond_list);
>>
>>       bond_debug_unregister(bond);
>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>>index 5b022da9cad2..10920f0686e2 100644
>>--- a/drivers/net/bonding/bonding.h
>>+++ b/drivers/net/bonding/bonding.h
>>@@ -179,6 +179,12 @@ struct slave {
>>       struct rtnl_link_stats64 slave_stats;
>> };
>>
>>+struct bond_up_slave {
>>+      unsigned int    count;
>>+      struct rcu_head rcu;
>>+      struct slave    *arr[0];
>>+};
>>+
>> /*
>>  * Link pseudo-state only used internally by monitors
>>  */
>>@@ -193,6 +199,7 @@ struct bonding {
>>       struct   slave __rcu *curr_active_slave;
>>       struct   slave __rcu *current_arp_slave;
>>       struct   slave __rcu *primary_slave;
>>+      struct   bond_up_slave __rcu *slave_arr; /* Array of usable slaves */
>>       bool     force_primary;
>>       s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
>>       int     (*recv_probe)(const struct sk_buff *, struct bonding *,
>>@@ -222,6 +229,7 @@ struct bonding {
>>       struct   delayed_work alb_work;
>>       struct   delayed_work ad_work;
>>       struct   delayed_work mcast_work;
>>+      struct   delayed_work slave_arr_work;
>> #ifdef CONFIG_DEBUG_FS
>>       /* debugging support via debugfs */
>>       struct   dentry *debug_dir;
>>@@ -534,6 +542,8 @@ const char *bond_slave_link_status(s8 link);
>> struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
>>                                             struct net_device *end_dev,
>>                                             int level);
>>+int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
>>+void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
>>
>> #ifdef CONFIG_PROC_FS
>> void bond_create_proc_entry(struct bonding *bond);
>>--
>>2.1.0.rc2.206.gedb03e5
>
> ---
>         -Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [PATCH net-next v6 2/2] bonding: Simplify the xmit function for modes that use xmit_hash
From: Mahesh Bandewar @ 2014-10-02  2:52 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David Miller,
	netdev, Eric Dumazet, Maciej Zenczykowski
In-Reply-To: <542BD293.5090404@redhat.com>

On Wed, Oct 1, 2014 at 3:38 PM, Nikolay Aleksandrov <nikolay@redhat.com> wrote:
> On 01/10/14 10:38, Mahesh Bandewar wrote:
>>
>> Earlier change to use usable slave array for TLB mode had an additional
>> performance advantage. So extending the same logic to all other modes
>> that use xmit-hash for slave selection (viz 802.3AD, and XOR modes).
>> Also consolidating this with the earlier TLB change.
>>
>> The main idea is to build the usable slaves array in the control path
>> and use that array for slave selection during xmit operation.
>>
>> Measured performance in a setup with a bond of 4x1G NICs with 200
>> instances of netperf for the modes involved (3ad, xor, tlb)
>> cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5
>>
>> Mode        TPS-Before   TPS-After
>>
>> 802.3ad   : 468,694      493,101
>> TLB (lb=0): 392,583      392,965
>> XOR       : 475,696      484,517
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> ---
>> v1:
>>    (a) If bond_update_slave_arr() fails to allocate memory, it will
>> overwrite
>>        the slave that need to be removed.
>>    (b) Freeing of array will assign NULL (to handle bond->down to bond->up
>>        transition gracefully.
>>    (c) Change from pr_debug() to pr_err() if bond_update_slave_arr()
>> returns
>>        failure.
>>    (d) XOR: bond_update_slave_arr() will consider mii-mon, arp-mon cases
>> and
>>        will populate the array even if these parameters are not used.
>>    (e) 3AD: Should handle the ad_agg_selection_logic correctly.
>> v2:
>>    (a) Removed rcu_read_{un}lock() calls from array manipulation code.
>>    (b) Slave link-events now refresh array for all these modes.
>>    (c) Moved free-array call from bond_close() to bond_uninit().
>> v3:
>>    (a) Fixed null pointer dereference.
>>    (b) Removed bond->lock lockdep dependency.
>> v4:
>>    (a) Made to changes to comply with Nikolay's locking changes
>>    (b) Added a work-queue to refresh slave-array when RTNL is not held
>>    (c) Array refresh happens ONLY with RTNL now.
>>    (d) alloc changed from GFP_ATOMIC to GFP_KERNEL
>> v5:
>>    (a) Consolidated all delayed slave-array updates at one place in
>>        3ad_state_machine_handler()
>> v6:
>>    (a) Free slave array when there is no active aggregator
>>
>>   drivers/net/bonding/bond_3ad.c  | 140 +++++++++++------------------
>>   drivers/net/bonding/bond_alb.c  |  51 ++---------
>>   drivers/net/bonding/bond_alb.h  |   8 --
>>   drivers/net/bonding/bond_main.c | 192
>> +++++++++++++++++++++++++++++++++++++---
>>   drivers/net/bonding/bonding.h   |  10 +++
>>   5 files changed, 249 insertions(+), 152 deletions(-)
>>
> <<<snip>>>
>>
>> @@ -3869,6 +4029,12 @@ static void bond_uninit(struct net_device
>> *bond_dev)
>>                 __bond_release_one(bond_dev, slave->dev, true);
>>         netdev_info(bond_dev, "Released all slaves\n");
>>
>> +       arr = rtnl_dereference(bond->slave_arr);
>> +       if (arr) {
>> +               kfree_rcu(arr, rcu);
>> +               RCU_INIT_POINTER(bond->slave_arr, NULL);
>> +       }
>> +
>>         list_del(&bond->bond_list);
>>
>>         bond_debug_unregister(bond);
>
> <<<snip>>>
> I'm fine with this version, just one last question about something I just
> noticed in the hunk above:
> You first call kfree_rcu() and then RCU_INIT_POINTER(). This feels wrong as
> the currently used slave_arr can get freed before it's set to NULL if we get
> preempted after the kfree_rcu(). Now, I know it's not really a problem
> because at this point the bond device has been closed and shouldn't operate,
> but just in case I think it'd be nice to first NULL it and call kfree_rcu()
> after that.
>
I don't see that as a problem but that's a trivial change and I'll
reverse the order.

> Thanks for all your hard work on this.
>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>

^ permalink raw reply

* [PATCH] drivers:ethernet:davinci_emac.c:Fixes flaw in mac address handling.
From: Michael Welling @ 2014-10-02  2:32 UTC (permalink / raw)
  To: davem, tony, netdev, linux-kernel; +Cc: Michael Welling

The code currently checks the mac_addr variable that is clearly
zero'd out during allocation.

Further code is added to bring the mac_addr from the partial pdata.

Signed-off-by: Michael Welling <mwelling@ieee.org>
---
 drivers/net/ethernet/ti/davinci_emac.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index ea71251..e06f97c 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1804,11 +1804,9 @@ davinci_emac_of_get_pdata(struct platform_device *pdev, struct emac_priv *priv)
 	np = pdev->dev.of_node;
 	pdata->version = EMAC_VERSION_2;
 
-	if (!is_valid_ether_addr(pdata->mac_addr)) {
-		mac_addr = of_get_mac_address(np);
-		if (mac_addr)
-			memcpy(pdata->mac_addr, mac_addr, ETH_ALEN);
-	}
+	mac_addr = of_get_mac_address(np);
+	if (mac_addr)
+		memcpy(pdata->mac_addr, mac_addr, ETH_ALEN);
 
 	of_property_read_u32(np, "ti,davinci-ctrl-reg-offset",
 			     &pdata->ctrl_reg_offset);
@@ -1834,6 +1832,8 @@ davinci_emac_of_get_pdata(struct platform_device *pdev, struct emac_priv *priv)
 	if (auxdata) {
 		pdata->interrupt_enable = auxdata->interrupt_enable;
 		pdata->interrupt_disable = auxdata->interrupt_disable;
+		if (is_valid_ether_addr(auxdata->mac_addr))
+			memcpy(pdata->mac_addr, auxdata->mac_addr, ETH_ALEN);
 	}
 
 	match = of_match_device(davinci_emac_of_match, &pdev->dev);
-- 
1.7.9.5

^ permalink raw reply related

* Re: [net-next 1/1] bna: Update Maintainer Email
From: David Miller @ 2014-10-02  2:13 UTC (permalink / raw)
  To: rasesh.mody; +Cc: netdev
In-Reply-To: <1412198441-32366-1-git-send-email-rasesh.mody@qlogic.com>

From: <rasesh.mody@qlogic.com>
Date: Wed, 1 Oct 2014 17:20:41 -0400

> From: Rasesh Mody <rasesh.mody@qlogic.com>
> 
> Update the maintainer email for BNA driver.
> 
> Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: phy: add BCM7425 and BCM7429 PHYs
From: David Miller @ 2014-10-02  2:12 UTC (permalink / raw)
  To: pgynther; +Cc: netdev, f.fainelli
In-Reply-To: <20141001185802.4479F100A08@puck.mtv.corp.google.com>

From: Petri Gynther <pgynther@google.com>
Date: Wed,  1 Oct 2014 11:58:02 -0700 (PDT)

> Signed-off-by: Petri Gynther <pgynther@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: fix bcmgenet_put_tx_csum()
From: David Miller @ 2014-10-02  2:12 UTC (permalink / raw)
  To: pgynther; +Cc: netdev, f.fainelli
In-Reply-To: <20141001183001.C7D68100A08@puck.mtv.corp.google.com>

From: Petri Gynther <pgynther@google.com>
Date: Wed,  1 Oct 2014 11:30:01 -0700 (PDT)

> bcmgenet_put_tx_csum() needs to return skb pointer back to the caller
> because it reallocates a new one in case of lack of skb headroom.
> 
> Signed-off-by: Petri Gynther <pgynther@google.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: pktgen: packet bursting via skb->xmit_more
From: David Miller @ 2014-10-02  2:08 UTC (permalink / raw)
  To: ast; +Cc: edumazet, brouer, netdev
In-Reply-To: <1412124801-32096-1-git-send-email-ast@plumgrid.com>

From: Alexei Starovoitov <ast@plumgrid.com>
Date: Tue, 30 Sep 2014 17:53:21 -0700

> This patch demonstrates the effect of delaying update of HW tailptr.
> (based on earlier patch by Jesper)
> 
> burst=1 is the default. It sends one packet with xmit_more=false
> burst=2 sends one packet with xmit_more=true and
>         2nd copy of the same packet with xmit_more=false
> burst=3 sends two copies of the same packet with xmit_more=true and
>         3rd copy with xmit_more=false
> 
> Performance with ixgbe (usec 30):
> burst=1  tx:9.2 Mpps
> burst=2  tx:13.5 Mpps
> burst=3  tx:14.5 Mpps full 10G line rate
> 
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>

Applied, great work.

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: add a br_set_state helper function
From: David Miller @ 2014-10-02  2:05 UTC (permalink / raw)
  To: f.fainelli; +Cc: stephen, netdev, vyasevich, bridge, jiri
In-Reply-To: <1412118799-17128-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 30 Sep 2014 16:13:19 -0700

> In preparation for being able to propagate port states to e.g: notifiers
> or other kernel parts, do not manipulate the port state directly, but
> instead use a helper function which will allow us to do a bit more than
> just setting the state.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

This looks fine to me, applied, thanks Florian.

^ permalink raw reply

* Re: [Patch net-next] net_sched: avoid calling tcf_unbind_filter() in call_rcu callback
From: David Miller @ 2014-10-02  2:01 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, john.r.fastabend
In-Reply-To: <1412118444-29179-2-git-send-email-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 30 Sep 2014 16:07:24 -0700

> This fixes the following crash:
 ...
> tp could be freed in call_rcu callback too, the order is not guaranteed.
> 
> Cc: John Fastabend <john.r.fastabend@intel.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, and I added John's description of why this is legal to the
commit message.

^ permalink raw reply

* Re: [Patch net-next] net_sched: fix another crash in cls_tcindex
From: David Miller @ 2014-10-02  2:01 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, john.r.fastabend
In-Reply-To: <1412118444-29179-1-git-send-email-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 30 Sep 2014 16:07:23 -0700

> This patch fixes the following crash:
 ...
> struct list_head can not be simply copied and we should always init it.
> 
> Cc: John Fastabend <john.r.fastabend@intel.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied.

^ permalink raw reply

* Soft lockup in 3.17-rc7 when using PPP over L2TP over IPSEC
From: Alan Stern @ 2014-10-02  1:59 UTC (permalink / raw)
  To: netdev, linux-ppp, linux-wireless; +Cc: Kernel development list

I reliably get the following lockup when trying to set up a VPN tunnel 
using L2TP over IPSEC:

[ 2214.970639] BUG: soft lockup - CPU#1 stuck for 22s! [pppd:9423]
[ 2214.970648] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc authenc cmac rmd160 crypto_null ip_vti ip_tunnel af_key ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp salsa20_i586 camellia_generic cast6_generic cast5_generic cast_common deflate cts gcm ccm serpent_sse2_i586 serpent_generic glue_helper blowfish_generic blowfish_common twofish_generic twofish_i586 twofish_common xcbc sha512_generic des_generic geode_aes tpm_rng tpm timeriomem_rng virtio_rng uas usb_storage fuse ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 ip6table_filter xt_conntrack ip6_tables nf_con
 ntrack vfat
[ 2214.970769]  fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel arc4 iwldvm snd_hda_controller snd_hda_codec uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev mac80211 snd_hwdep coretemp kvm_intel kvm media snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd joydev iwlwifi microcode serio_raw cfg80211 asus_laptop lpc_ich atl1c soundcore sparse_keymap rfkill input_polldev acpi_cpufreq binfmt_misc i915 i2c_algo_bit drm_kms_helper drm i2c_core video
[ 2214.970854] CPU: 1 PID: 9423 Comm: pppd Tainted: G        W     3.16.3-200.fc20.i686 #1
[ 2214.970860] Hardware name: ASUSTeK Computer Inc.         UL20A               /UL20A     , BIOS 207     11/02/2009
[ 2214.970866] task: f0706a00 ti: e359c000 task.ti: e359c000
[ 2214.970873] EIP: 0060:[<c0a077b8>] EFLAGS: 00200287 CPU: 1
[ 2214.970885] EIP is at _raw_spin_lock_bh+0x28/0x40
[ 2214.970890] EAX: e5ff02a4 EBX: e5ff02a4 ECX: 00000060 EDX: 0000005f
[ 2214.970895] ESI: e5ff02b0 EDI: e3470d40 EBP: e359dc34 ESP: e359dc34
[ 2214.970900]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2214.970906] CR0: 8005003b CR2: b72eb000 CR3: 25f28000 CR4: 000407d0
[ 2214.970910] Stack:
[ 2214.970914]  e359dc9c f94efe2a f5140000 e359dc50 c045aba6 f5140000 00200286 e359dc78
[ 2214.970929]  c045c553 00000001 f5140000 001f9076 00200286 628a17c6 f075acc0 f075acc0
[ 2214.970942]  00000000 00200246 f4464848 00200246 00200246 e359dc9c e3470d84 f075acc0
[ 2214.970957] Call Trace:
[ 2214.970973]  [<f94efe2a>] ppp_push+0x32a/0x550 [ppp_generic]
[ 2214.970986]  [<c045aba6>] ? internal_add_timer+0x26/0x60
[ 2214.970994]  [<c045c553>] ? mod_timer_pending+0x63/0x130
[ 2214.971005]  [<f94f288d>] ppp_xmit_process+0x3cd/0x5e0 [ppp_generic]
[ 2214.971007]  [<c0914ae1>] ? harmonize_features+0x31/0x1d0
[ 2214.971007]  [<f94f2c78>] ppp_start_xmit+0x108/0x180 [ppp_generic]
[ 2214.971007]  [<c0915024>] dev_hard_start_xmit+0x2c4/0x540
[ 2214.971007]  [<c093244f>] sch_direct_xmit+0x9f/0x170
[ 2214.971007]  [<c091546a>] __dev_queue_xmit+0x1ca/0x430
[ 2214.971007]  [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007]  [<c09156df>] dev_queue_xmit+0xf/0x20
[ 2214.971007]  [<c091bacf>] neigh_direct_output+0xf/0x20
[ 2214.971007]  [<c094cb5a>] ip_finish_output+0x1aa/0x850
[ 2214.971007]  [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007]  [<c094dbbf>] ip_output+0x8f/0xe0
[ 2214.971007]  [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007]  [<c09a4f52>] xfrm_output_resume+0x342/0x3a0
[ 2214.971007]  [<c09a5013>] xfrm_output+0x43/0xf0
[ 2214.971007]  [<c0998f4d>] xfrm4_output_finish+0x3d/0x40
[ 2214.971007]  [<c0998e25>] __xfrm4_output+0x25/0x40
[ 2214.971007]  [<c0998f7f>] xfrm4_output+0x2f/0x70
[ 2214.971007]  [<c0998e00>] ? xfrm4_udp_encap_rcv+0x1b0/0x1b0
[ 2214.971007]  [<c094d2e7>] ip_local_out_sk+0x27/0x30
[ 2214.971007]  [<c094d5f4>] ip_queue_xmit+0x124/0x3f0
[ 2214.971007]  [<c0999f04>] ? xfrm_bundle_ok+0x64/0x170
[ 2214.971007]  [<c099a0ab>] ? xfrm_dst_check+0x1b/0x30
[ 2214.971007]  [<f94fd618>] l2tp_xmit_skb+0x298/0x4b0 [l2tp_core]
[ 2214.971007]  [<f950cd04>] pppol2tp_xmit+0x124/0x1d0 [l2tp_ppp]
[ 2214.971007]  [<f94f2adb>] ppp_channel_push+0x3b/0xb0 [ppp_generic]
[ 2214.971007]  [<f94f2d77>] ppp_write+0x87/0xc8 [ppp_generic]
[ 2214.971007]  [<f94f2cf0>] ? ppp_start_xmit+0x180/0x180 [ppp_generic]
[ 2214.971007]  [<c057723d>] vfs_write+0x9d/0x1d0
[ 2214.971007]  [<c0577951>] SyS_write+0x51/0xb0
[ 2214.971007]  [<c0a07b9f>] sysenter_do_call+0x12/0x12
[ 2214.971007] Code: 00 00 00 55 89 e5 66 66 66 66 90 64 81 05 90 b6 dc c0 00 02 00 00 ba 00 01 00 00 f0 66 0f c1 10 0f b6 ce 38 d1 75 04 5d c3 f3 90 <0f> b6 10 38 ca 75 f7 5d c3 90 90 90 90 90 90 90 90 90 90 90 90
[ 2220.002045] iwlwifi 0000:01:00.0: No space in command queue
[ 2220.002058] iwlwifi 0000:01:00.0: Restarting adapter queue is full
[ 2220.002073] iwlwifi 0000:01:00.0: Error sending REPLY_LEDS_CMD: enqueue_hcmd failed: -28
[ 2220.002524] ieee80211 phy0: Hardware restart was requested

A few seconds after this appears, the second CPU also locks up and the 
system becomes useless.

I don't know if the problem is in the networking core or in the 
wireless driver.  If anyone wants, I can try testing using a wired 
Ethernet connection.

Note: This problem is not new.  It has been happening for at least the
last several kernel versions -- in fact, I don't know if this has ever
worked.

Can this be debugged and fixed?

Alan Stern


^ permalink raw reply

* Re: [PATCH net-next] enic: add client name to port profile
From: David Miller @ 2014-10-02  1:55 UTC (permalink / raw)
  To: _govind; +Cc: netdev, ssujith, benve
In-Reply-To: <1412114357-11557-1-git-send-email-_govind@gmx.com>

From: Govindarajulu Varadarajan <_govind@gmx.com>
Date: Wed,  1 Oct 2014 03:29:17 +0530

> Firmware has support for sending client name of the port profile to the switch
> it's connected to.
> 
> This patch adds client name to port profile which is sent to hardware while
> associating a port profile to VF.
> 
> Since port profile are defined in switch, this patch makes it easier to check
> what VM is using a port profile.
> 
> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>

I think you should split this up into two patches, one which adds
the new netlink facility, and second which adds support to the
specific driver.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 net 0/3] bridge: Add vlan filtering support for default pvid
From: David Miller @ 2014-10-02  1:53 UTC (permalink / raw)
  To: vyasevich; +Cc: stephen, netdev, vyasevic, bridge
In-Reply-To: <1412105462-340-1-git-send-email-vyasevic@redhat.com>

From: Vladislav Yasevich <vyasevich@gmail.com>
Date: Tue, 30 Sep 2014 15:30:59 -0400

> Version 2 of the series to introduce the default pvid support to
> vlan filtering in the bridge.  VLAN 1 (as recommended by 802.1q spec)
> is used as default pvid on ports. 
> The the user can over-ride this configuration by configuring their
> own vlan information. 
> The user can additionally change the default value throught the
> sysfs interface (netlink comming shortly).
> The user can turn off default pvid functionality by setting default
> pvid to 0. 
> This series changes the default behavior of the bridge when
> vlan filtering is turned on.  Currently, ports without any vlan
> filtering configured will not recevie any traffic at all.  This patch
> changes the behavior of the above ports to receive only untagged traffic.
> 
> Since v2:
> - Add ability to turn off default_pvid settings.
> - Drop the automiatic filtering support based on configured vlan devices (will
>   be its own series)

Please address the given feedback, thanks Vladislav.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/5] udp: Generalize GSO for UDP tunnels
From: David Miller @ 2014-10-02  1:36 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <1412047353-28502-1-git-send-email-therbert@google.com>

From: Tom Herbert <therbert@google.com>
Date: Mon, 29 Sep 2014 20:22:28 -0700

> This patch set generalizes the UDP tunnel segmentation functions so
> that they can work with various protocol encapsulations. The primary
> change is to set the inner_protocol field in the skbuff when creating
> the encapsulated packet, and then in skb_udp_tunnel_segment this data
> is used to determine the function for segmenting the encapsulated
> packet. The inner_protocol field is overloaded to take either an
> Ethertype or IP protocol.
> 
> The inner_protocol is set on transmit using skb_set_inner_ipproto or
> skb_set_inner_protocol functions. VXLAN and IP tunnels (for fou GSO)
> were modified to call these.

Series applied, thanks Tom.

^ permalink raw reply

* Re: [PATCH net-next 0/2] bpf: add search pruning optimization and tests
From: David Miller @ 2014-10-02  1:31 UTC (permalink / raw)
  To: ast
  Cc: mingo, torvalds, luto, dborkman, hannes, chema, edumazet,
	a.p.zijlstra, pablo, hpa, akpm, keescook, netdev, linux-kernel
In-Reply-To: <1412041802-24858-1-git-send-email-ast@plumgrid.com>

From: Alexei Starovoitov <ast@plumgrid.com>
Date: Mon, 29 Sep 2014 18:50:00 -0700

> patch #1 commit log explains why eBPF verifier has to examine some
> instructions multiple times and describes the search pruning optimization
> that improves verification speed for branchy programs and allows more
> complex programs to be verified successfully.
> This patch completes the core verifier logic.
> 
> patch #2 adds more verifier tests related to branches and search pruning
> 
> I'm still working on Andy's 'bitmask for stack slots' suggestion. It will be
> done on top of this patch.
> 
> The current verifier algorithm is brute force depth first search with
> state pruning. If anyone can come up with another algorithm that demonstrates
> better results, we'll replace the algorithm without affecting user space.
> 
> Note verifier doesn't guarantee that all possible valid programs are accepted.
> Overly complex programs may still be rejected.
> Verifier improvements/optimizations will guarantee that if a program
> was passing verification in the past, it will still be passing.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v3 1/1] net: fec: implement rx_copybreak to improve rx performance
From: David Miller @ 2014-10-02  1:28 UTC (permalink / raw)
  To: b38611; +Cc: b20596, netdev, bhutchings, shawn.guo, romieu, eric.dumazet
In-Reply-To: <1412040485-19130-1-git-send-email-b38611@freescale.com>

From: Fugang Duan <b38611@freescale.com>
Date: Tue, 30 Sep 2014 09:28:05 +0800

> - Copy short frames and keep the buffers mapped, re-allocate skb instead of
>   memory copy for long frames.
> - Add support for setting/getting rx_copybreak using generic ethtool tunable
> 
> Changes V3:
> * As Eric Dumazet's suggestion that removing the copybreak module parameter
>   and only keep the ethtool API support for rx_copybreak.
> 
> Changes V2:
> * Implements rx_copybreak
> * Rx_copybreak provides module parameter to change this value
> * Add tunable_ops support for rx_copybreak
> 
> Signed-off-by: Fugang Duan <B38611@freescale.com>
> Signed-off-by: Frank Li <Frank.Li@freescale.com>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox