netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Alexander Duyck <alexander.h.duyck@intel.com>,
	intel-wired-lan@osuosl.org, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org
Subject: Re: [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
Date: Tue, 12 Jun 2018 10:56:48 -0700	[thread overview]
Message-ID: <be1b5bed-d8b2-244e-167a-1f79bfb5f6e9@gmail.com> (raw)
In-Reply-To: <20180612151322.86792.97587.stgit@ahduyck-green-test.jf.intel.com>

On 06/12/2018 08:18 AM, Alexander Duyck wrote:
> This patch series is meant to allow support for the L2 forward offload, aka
> MACVLAN offload without the need for using ndo_select_queue.
> 
> The existing solution currently requires that we use ndo_select_queue in
> the transmit path if we want to associate specific Tx queues with a given
> MACVLAN interface. In order to get away from this we need to repurpose the
> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
> a means of accessing the queues on the lower device. As a result we cannot
> offload a device that is configured as multiqueue, however it doesn't
> really make sense to configure a macvlan interfaced as being multiqueue
> anyway since it doesn't really have a qdisc of its own in the first place.

Interesting, so at some point I had came up with the following for
mapping queues between the DSA slave network devices and the DSA master
network device (doing the actual transmission). The DSA master network
device driver is just a normal network device driver.

The set-up is as follows: 4 external Ethernet switch ports, each with 8
egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet
controller has 32 output queues, so you can do a 1:1 mapping of those,
that's actually what we want. A subsequent hardware generation only
provides 16 output queues, so we can still do 2:1 mapping.

The implementation is done like this:

- DSA slave network devices are always created after the DSA master
network device so we can leverage that

- a specific notifier is running from the DSA core and tells the DSA
master about the switch position in the tree (position 0 = directly
attached), and the switch port number and a pointer to the slave network
device

- we establish the mapping between the queues within the bcmsysport
driver as a simple array

- when transmitting, DSA slave network devices set a specific queue/port
number within the 16-bits that skb->queue_mapping permits

- this gets re-used by bcmsysport.c to extract the correct queue number
during ndo_select_queue such that the appropriate queue number gets used
and congestion works end-to-end.

The reason why we do that is because there is some out of band HW that
monitors the queue depth of the switch port's egress queue and
back-pressure the Ethernet controller directly when trying to transmit
to a congested queue.

I had initially considered establishing the mapping using tc and some
custom "bind" argument of some kind, but ended-up doing things the way
they are which are more automatic though they leave less configuration
to an user. This has a number of caveats though:

- this is made generic within the context of DSA in that nothing is
switch driver or Ethernet MAC driver specific and the notifier
represents the contract between these two seemingly independent subsystems

- the queue indicated between DSA slave and master is unfortunately
switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE,
BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE)

What I like about your patchset is the mapping establishment, but as you
will read from my reply in patch 2, I think the (upper) 1:N (lower)
mapping might not work for my specific use case.

Anyhow, not intended to be blocking this, as it seems to be going in the
right direction anyway.

> 
> I am submitting this as an RFC for the netdev mailing list, and officially
> submitting it for testing to Jeff Kirsher's next-queue in order to validate
> the ixgbe specific bits.
> 
> The big changes in this set are:
>   Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>   Disable XPS for single queue devices
>   Replace accel_priv with sb_dev in ndo_select_queue
>   Add sb_dev parameter to fallback function for ndo_select_queue
>   Consolidated ndo_select_queue functions that appeared to be duplicates

Interesting, turns out I had a possibly similar use case with DSA with
the slave network devices need to select an outgoing queue number for

> 
> v2: Implement generic "select_queue" functions instead of "fallback" functions.
>     Tweak last two patches to account for changes in dev_pick_tx_xxx functions.
> 
> ---
> 
> Alexander Duyck (7):
>       net-sysfs: Drop support for XPS and traffic_class on single queue device
>       net: Add support for subordinate device traffic classes
>       ixgbe: Add code to populate and use macvlan tc to Tx queue map
>       net: Add support for subordinate traffic classes to netdev_pick_tx
>       net: Add generic ndo_select_queue functions
>       net: allow ndo_select_queue to pass netdev
>       net: allow fallback function to pass netdev
> 
> 
>  drivers/infiniband/hw/hfi1/vnic_main.c            |    2 
>  drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |    4 -
>  drivers/net/bonding/bond_main.c                   |    3 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c      |    5 -
>  drivers/net/ethernet/broadcom/bcmsysport.c        |    6 -
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |    6 +
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |    3 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   |    5 -
>  drivers/net/ethernet/hisilicon/hns/hns_enet.c     |    5 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   62 ++++++--
>  drivers/net/ethernet/lantiq_etop.c                |   10 -
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c        |    7 +
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |    3 
>  drivers/net/ethernet/mellanox/mlx5/core/en.h      |    3 
>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |    5 -
>  drivers/net/ethernet/renesas/ravb_main.c          |    3 
>  drivers/net/ethernet/sun/ldmvsw.c                 |    3 
>  drivers/net/ethernet/sun/sunvnet.c                |    3 
>  drivers/net/ethernet/ti/netcp_core.c              |    9 -
>  drivers/net/hyperv/netvsc_drv.c                   |    6 -
>  drivers/net/macvlan.c                             |   10 -
>  drivers/net/net_failover.c                        |    7 +
>  drivers/net/team/team.c                           |    3 
>  drivers/net/tun.c                                 |    3 
>  drivers/net/wireless/marvell/mwifiex/main.c       |    3 
>  drivers/net/xen-netback/interface.c               |    4 -
>  drivers/net/xen-netfront.c                        |    3 
>  drivers/staging/netlogic/xlr_net.c                |    9 -
>  drivers/staging/rtl8188eu/os_dep/os_intfs.c       |    3 
>  drivers/staging/rtl8723bs/os_dep/os_intfs.c       |    7 -
>  include/linux/netdevice.h                         |   34 ++++-
>  net/core/dev.c                                    |  156 ++++++++++++++++++---
>  net/core/net-sysfs.c                              |   36 ++++-
>  net/mac80211/iface.c                              |    4 -
>  net/packet/af_packet.c                            |    7 +
>  35 files changed, 312 insertions(+), 130 deletions(-)
> 
> --
> 


-- 
Florian

  parent reply	other threads:[~2018-06-12 17:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-12 15:18 [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue Alexander Duyck
2018-06-12 15:18 ` [jkirsher/next-queue PATCH v2 1/7] net-sysfs: Drop support for XPS and traffic_class on single queue device Alexander Duyck
2018-06-12 15:18 ` [jkirsher/next-queue PATCH v2 2/7] net: Add support for subordinate device traffic classes Alexander Duyck
2018-06-12 17:49   ` Florian Fainelli
2018-06-12 22:18     ` [Intel-wired-lan] " Alexander Duyck
2018-06-12 15:18 ` [jkirsher/next-queue PATCH v2 3/7] ixgbe: Add code to populate and use macvlan tc to Tx queue map Alexander Duyck
2018-06-12 15:18 ` [jkirsher/next-queue PATCH v2 4/7] net: Add support for subordinate traffic classes to netdev_pick_tx Alexander Duyck
2018-06-12 15:18 ` [jkirsher/next-queue PATCH v2 5/7] net: Add generic ndo_select_queue functions Alexander Duyck
2018-06-12 15:18 ` [jkirsher/next-queue PATCH v2 6/7] net: allow ndo_select_queue to pass netdev Alexander Duyck
2018-06-12 15:19 ` [jkirsher/next-queue PATCH v2 7/7] net: allow fallback function " Alexander Duyck
2018-06-12 17:50 ` [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue Stephen Hemminger
2018-06-12 22:47   ` [Intel-wired-lan] " Alexander Duyck
2018-06-12 17:56 ` Florian Fainelli [this message]
2018-06-12 22:33   ` Alexander Duyck
2018-07-03 18:42 ` Jeff Kirsher
  -- strict thread matches above, loose matches on Subject: below --
2018-07-09 16:19 Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be1b5bed-d8b2-244e-167a-1f79bfb5f6e9@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=intel-wired-lan@osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).