Netdev List
 help / color / mirror / Atom feed
* [PATCH] amd-xgbe: Fix unused suspend handlers build warning
From: Borislav Petkov @ 2016-11-26 20:53 UTC (permalink / raw)
  To: LKML; +Cc: Tom Lendacky, netdev

From: Borislav Petkov <bp@suse.de>

Fix:

  drivers/net/ethernet/amd/xgbe/xgbe-main.c:835:12: warning: ‘xgbe_suspend’ defined
    but not used [-Wunused-function]
  drivers/net/ethernet/amd/xgbe/xgbe-main.c:855:12: warning: ‘xgbe_resume’ defined
    but not used [-Wunused-function]

I see it during randconfig builds here.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/amd/xgbe/xgbe-main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index e10e569c0d5f..2e8451b0a74a 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -831,7 +831,7 @@ static int xgbe_remove(struct platform_device *pdev)
 	return 0;
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 static int xgbe_suspend(struct device *dev)
 {
 	struct net_device *netdev = dev_get_drvdata(dev);
@@ -876,7 +876,7 @@ static int xgbe_resume(struct device *dev)
 
 	return ret;
 }
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_ACPI
 static const struct acpi_device_id xgbe_acpi_match[] = {
-- 
2.10.0

^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2016-11-26 21:04 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Fix leak in fsl/fman driver, from Dan Carpenter.

2) Call flow dissector initcall earlier than any networking driver can
   register and start to use it, from Eric Dumazet.

3) Some dup header fixes from Geliang Tang.

4) TIPC link monitoring compat fix from Jon Paul Maloy.

5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
   Florian Fainelli.

6) Fix bogus handle ID passed into tfilter_notify_chain(), from
   Roman Mashak.

7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.

Please pull, thanks a lot!

The following changes since commit 3b404a519815b9820f73f1ecf404e5546c9270ba:

  Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security (2016-11-21 15:27:41 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 6998cc6ec23740347670da13186d2979c5401903:

  tipc: resolve connection flow control compatibility problem (2016-11-25 21:38:16 -0500)

----------------------------------------------------------------
Andrew Lunn (1):
      net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented

Andy Gospodarek (1):
      bnxt: do not busy-poll when link is down

Arnd Bergmann (1):
      mvpp2: use correct size for memset

Christophe Jaillet (1):
      bnxt_en: Fix a VXLAN vs GENEVE issue

Dan Carpenter (1):
      fsl/fman: fix a leak in tgec_free()

David S. Miller (2):
      Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge tag 'linux-can-fixes-for-4.9-20161123' of git://git.kernel.org/.../mkl/linux-can

Eric Dumazet (2):
      flow_dissect: call init_default_flow_dissectors() earlier
      udplite: call proper backlog handlers

Florian Fainelli (1):
      net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change

Gao Feng (1):
      driver: macvlan: Check if need rollback multicast setting in macvlan_open

Geliang Tang (4):
      dwc_eth_qos: drop duplicate headers
      ibmvnic: drop duplicate header seq_file.h
      net: ieee802154: drop duplicate header delay.h
      net/mlx5: drop duplicate header delay.h

Johan Hedberg (1):
      Bluetooth: Fix using the correct source address type

Jon Paul Maloy (3):
      tipc: fix compatibility bug in link monitoring
      tipc: improve sanity check for received domain records
      tipc: resolve connection flow control compatibility problem

Kirill Esipov (1):
      net: phy: micrel: fix KSZ8041FTL supported value

Miroslav Lichvar (1):
      net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS

Oliver Hartkopp (1):
      can: bcm: fix support for CAN FD frames

Paolo Abeni (1):
      ipv6: bump genid when the IFA_F_TENTATIVE flag is clear

Randy Dunlap (1):
      netdevice.h: fix kernel-doc warning

Roman Mashak (1):
      net sched filters: fix filter handle ID in tfilter_notify_chain()

Tariq Toukan (1):
      net/mlx4_en: Free netdev resources under state lock

WANG Cong (1):
      net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"

Zhang Shengju (1):
      rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()

 drivers/net/dsa/bcm_sf2.c                       |  4 ++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 15 ++++++++++++---
 drivers/net/ethernet/freescale/fman/fman_tgec.c |  3 ---
 drivers/net/ethernet/ibm/ibmvnic.c              |  1 -
 drivers/net/ethernet/marvell/mvneta.c           |  2 +-
 drivers/net/ethernet/marvell/mvpp2.c            |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  5 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c  |  1 -
 drivers/net/ethernet/synopsys/dwc_eth_qos.c     |  2 --
 drivers/net/ieee802154/adf7242.c                |  1 -
 drivers/net/macvlan.c                           |  3 ++-
 drivers/net/phy/micrel.c                        |  8 ++++----
 include/linux/netdevice.h                       |  2 +-
 include/net/bluetooth/hci_core.h                |  2 +-
 net/bluetooth/6lowpan.c                         |  4 ++--
 net/bluetooth/hci_conn.c                        | 26 ++++++++++++++++++++++++--
 net/bluetooth/l2cap_core.c                      |  2 +-
 net/bluetooth/rfcomm/tty.c                      |  2 +-
 net/bluetooth/sco.c                             |  2 +-
 net/can/bcm.c                                   | 18 ++++++++++--------
 net/core/ethtool.c                              |  1 +
 net/core/flow_dissector.c                       |  2 +-
 net/core/rtnetlink.c                            |  2 +-
 net/ipv4/udp.c                                  |  2 +-
 net/ipv4/udp_impl.h                             |  2 +-
 net/ipv4/udplite.c                              |  2 +-
 net/ipv6/addrconf.c                             | 18 ++++++++++++------
 net/ipv6/udp.c                                  |  2 +-
 net/ipv6/udp_impl.h                             |  2 +-
 net/ipv6/udplite.c                              |  2 +-
 net/l2tp/l2tp_eth.c                             |  2 +-
 net/sched/cls_api.c                             |  2 +-
 net/tipc/link.c                                 |  5 +++--
 net/tipc/monitor.c                              | 10 +++++-----
 net/tipc/socket.c                               |  2 +-
 35 files changed, 101 insertions(+), 60 deletions(-)

^ permalink raw reply

* Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code
From: Bjørn Mork @ 2016-11-26 21:17 UTC (permalink / raw)
  To: Daniele Palmas
  Cc: Oliver Neukum, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87k2bq7zn8.fsf-3F4PFWf5pNjpjLOzFPqGjWGXanvQGlWp@public.gmane.org>

Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org> writes:

> Finally, I found my modems (or at least a number of them) again today.
> But I'm sorry to say, that the troublesome Huawei E3372h-153 is still
> giving us a hard time.  It does not work with your patch. The symptom is
> the same as earlier:  The modem returns MBIM frames with 32bit headers.
>
> So for now, I have to NAK this patch.
>
> I am sure we can find a good solution that makes all of these modems
> work, but I cannot support a patch that breaks previously working
> configurations. Sorry.  I'll do a few experiments and see if there is a
> simple fix for this.  Otherwise we'll probably have to do the quirk
> game.


This is a proof-of-concept only, but it appears to be working.  Please
test with your device(s) too.  It's still mostly your code, as you can
see.

If this turns out to work, then I believe we should refactor
cdc_ncm_init() and cdc_ncm_bind_common() to make the whole
initialisation sequence a bit cleaner.  And maybe also include
cdc_mbim_bind().  Ideally, the MBIM specific RESET should happen there
instead of "polluting" the NCM driver with MBIM specific code.

But anyway:  The sequence that seems to work for both the  E3372h-153
and the EM7455 is

 USB_CDC_GET_NTB_PARAMETERS
 USB_CDC_RESET_FUNCTION
 usb_set_interface(dev->udev, 'data interface no', 0);
 remaining parts of cdc_ncm_init(), excluding USB_CDC_GET_NTB_PARAMETERS
 usb_set_interface(dev->udev, 'data interface no', 'data alt setting');

without any additional delay between the two usb_set_interface() calls.
So the major difference from your patch is that I moved the two control
requests out of cdc_ncm_init() to allow running them _before_ setting
the data interface to altsetting 0.

But maybe I was just lucky.  This was barely proof tested.  Needs a lot
more testing and cleanups as suggested.  I'd appreciate it if you
continued that, as I don't really have any time for it...

FWIW, I also ran a quick test with a D-Link DWM-156A7 (Mediatek MBIM
firmware) and a Huawei E367 (Qualcomm device with early Huawei MBIM
firmware, distinctly different from the E3372h-153 and most other
MBIM devices I've seen)



Bjørn

---
 drivers/net/usb/cdc_ncm.c    | 48 ++++++++++++++++++++++++++++----------------
 include/uapi/linux/usb/cdc.h |  1 +
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 877c9516e781..be019cbf1719 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -488,16 +488,6 @@ static int cdc_ncm_init(struct usbnet *dev)
 	u8 iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber;
 	int err;
 
-	err = usbnet_read_cmd(dev, USB_CDC_GET_NTB_PARAMETERS,
-			      USB_TYPE_CLASS | USB_DIR_IN
-			      |USB_RECIP_INTERFACE,
-			      0, iface_no, &ctx->ncm_parm,
-			      sizeof(ctx->ncm_parm));
-	if (err < 0) {
-		dev_err(&dev->intf->dev, "failed GET_NTB_PARAMETERS\n");
-		return err; /* GET_NTB_PARAMETERS is required */
-	}
-
 	/* set CRC Mode */
 	if (cdc_ncm_flags(dev) & USB_CDC_NCM_NCAP_CRC_MODE) {
 		dev_dbg(&dev->intf->dev, "Setting CRC mode off\n");
@@ -837,12 +827,43 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_
 		}
 	}
 
+	iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber;
+	temp = usbnet_read_cmd(dev, USB_CDC_GET_NTB_PARAMETERS,
+			       USB_TYPE_CLASS | USB_DIR_IN
+			       | USB_RECIP_INTERFACE,
+			       0, iface_no, &ctx->ncm_parm,
+			       sizeof(ctx->ncm_parm));
+	if (temp < 0) {
+		dev_err(&dev->intf->dev, "failed GET_NTB_PARAMETERS\n");
+		goto error; /* GET_NTB_PARAMETERS is required */
+	}
+
+	/* Some modems (e.g. Telit LE922A6) need to reset the MBIM function
+	 * or they will fail to work properly.
+	 * For details on RESET_FUNCTION request see document
+	 * "USB Communication Class Subclass Specification for MBIM"
+	 * RESET_FUNCTION should be harmless for all the other MBIM modems
+	 */
+	if (cdc_ncm_comm_intf_is_mbim(ctx->control->cur_altsetting)) {
+		temp = usbnet_write_cmd(dev, USB_CDC_RESET_FUNCTION,
+					USB_TYPE_CLASS | USB_DIR_OUT
+					| USB_RECIP_INTERFACE,
+					0, iface_no, NULL, 0);
+		if (temp < 0)
+			dev_err(&dev->intf->dev, "failed RESET_FUNCTION\n");
+	}
+
 	iface_no = ctx->data->cur_altsetting->desc.bInterfaceNumber;
 
 	/* Reset data interface. Some devices will not reset properly
 	 * unless they are configured first.  Toggle the altsetting to
 	 * force a reset
+	 * This is applied only to ncm devices, since it has been verified
+	 * to cause issues with some MBIM modems (e.g. Telit LE922A6).
+	 * MBIM devices reset is achieved using MBIM request RESET_FUNCTION
+	 * in cdc_ncm_init
 	 */
+
 	usb_set_interface(dev->udev, iface_no, data_altsetting);
 	temp = usb_set_interface(dev->udev, iface_no, 0);
 	if (temp) {
@@ -854,13 +875,6 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_
 	if (cdc_ncm_init(dev))
 		goto error2;
 
-	/* Some firmwares need a pause here or they will silently fail
-	 * to set up the interface properly.  This value was decided
-	 * empirically on a Sierra Wireless MC7455 running 02.08.02.00
-	 * firmware.
-	 */
-	usleep_range(10000, 20000);

^ permalink raw reply related

* Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: Saeed Mahameed @ 2016-11-26 22:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Tariq Toukan
In-Reply-To: <1480088780.8455.543.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> mlx4 stats are chaotic because a deferred work queue is responsible
> to update them every 250 ms.
>
Hello Eric,

Well the only historical reason for this deferred work is that we
query FW for some counters which might sleep.
and there is one place in the kernel where dev_get_stats(dev, &temp)
is called under a rw lock "read_lock(&dev_base_lock);"
in http://lxr.free-electrons.com/source/net/core/net-sysfs.c#L552, i
am not sure why is it this way ? Maybe it is time fix this and get rid
of the deferred work, which will give you the same precision even for
when reading ehttool stats, which this patch didn't take care off.
this will also improve other drivers who might sleep while reading
stats.

> Even sampling stats every one second with "sar -n DEV 1" gives
> variations like the following :
>
> lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
> 07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
> 07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
> 07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
> 07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
> 07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
> 07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
> 07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
> 07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
> 07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
> 07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
> Average:         eth0 142827.50 3179259.70   9206.30 4700578.16
>
> This patch allows rx/tx bytes/packets counters being folded at the
> time we need stats.
>
> We now can fetch stats every 1 ms if we want to check NIC behavior
> on a small time window. It is also easier to detect anomalies.
>
> lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
> 07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
> 07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
> 07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
> 07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
> 07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
> 07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
> 07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
> 07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
> 07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
> 07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
> Average:         eth0 142835.66 3182165.93   9206.98 4704874.08
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |    2
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    1
>  drivers/net/ethernet/mellanox/mlx4/en_port.c    |   77 +++++++++-----
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    1
>  4 files changed, 58 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index 487a58f9c192896852fef271b6cce9bde132deb7..d9c9f86a30df953fa555934c5406057dcaf28960 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -367,6 +367,8 @@ static void mlx4_en_get_ethtool_stats(struct net_device *dev,
>
>         spin_lock_bh(&priv->stats_lock);
>
> +       mlx4_en_fold_software_stats(dev);
> +
>         for (i = 0; i < NUM_MAIN_STATS; i++, bitmap_iterator_inc(&it))
>                 if (bitmap_iterator_test(&it))
>                         data[index++] = ((unsigned long *)&dev->stats)[i];
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 9018bb1b2e12142e048281a9d28ddf95e0023a61..d28d841db23ce885d2011877a156bacf23f65afe 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1321,6 +1321,7 @@ mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>         struct mlx4_en_priv *priv = netdev_priv(dev);
>
>         spin_lock_bh(&priv->stats_lock);
> +       mlx4_en_fold_software_stats(dev);
>         netdev_stats_to_stats64(stats, &dev->stats);
>         spin_unlock_bh(&priv->stats_lock);
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> index 1eb4c1e10bad1dad26049876acf107a2073a6ab1..c6c4f1238923e09eced547454b86c68720292859 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> @@ -147,6 +147,39 @@ static unsigned long en_stats_adder(__be64 *start, __be64 *next, int num)
>         return ret;
>  }
>
> +void mlx4_en_fold_software_stats(struct net_device *dev)
> +{
> +       struct mlx4_en_priv *priv = netdev_priv(dev);
> +       struct mlx4_en_dev *mdev = priv->mdev;
> +       unsigned long packets, bytes;
> +       int i;
> +
> +       if (mlx4_is_master(mdev->dev))
> +               return;

hmm, I think here you are just dragging a wrong discussion made in
mlx4 driver that the PF (only in SRIOV mode) netdev stats should
report the whole port stats from MLX4_CMD_DUMP_ETH_STATS FW command.

IMHO mlx4_en_get_stats64 should always report SW stats.
regardless, this function "mlx4_en_fold_software_stats" should always
fold the SW stats unconditionally, and W/A it somewhere else if SW
stats should be reported from FW. otherwise we will keep dragging this
confusion.

> +
> +       packets = 0;
> +       bytes = 0;
> +       for (i = 0; i < priv->rx_ring_num; i++) {
> +               const struct mlx4_en_rx_ring *ring = priv->rx_ring[i];
> +
> +               packets += READ_ONCE(ring->packets);
> +               bytes   += READ_ONCE(ring->bytes);
> +       }
> +       dev->stats.rx_packets = packets;
> +       dev->stats.rx_bytes = bytes;
> +
> +       packets = 0;
> +       bytes = 0;
> +       for (i = 0; i < priv->tx_ring_num[TX]; i++) {
> +               const struct mlx4_en_tx_ring *ring = priv->tx_ring[TX][i];
> +
> +               packets += READ_ONCE(ring->packets);
> +               bytes   += READ_ONCE(ring->bytes);
> +       }
> +       dev->stats.tx_packets = packets;
> +       dev->stats.tx_bytes = bytes;
> +}
> +
>  int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
>  {
>         struct mlx4_counter tmp_counter_stats;
> @@ -159,6 +192,7 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
>         u64 in_mod = reset << 8 | port;
>         int err;
>         int i, counter_index;
> +       unsigned long sw_tx_dropped = 0;
>         unsigned long sw_rx_dropped = 0;
>
>         mailbox = mlx4_alloc_cmd_mailbox(mdev->dev);
> @@ -174,8 +208,8 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
>
>         spin_lock_bh(&priv->stats_lock);
>
> -       stats->rx_packets = 0;
> -       stats->rx_bytes = 0;
> +       mlx4_en_fold_software_stats(dev);
> +
>         priv->port_stats.rx_chksum_good = 0;
>         priv->port_stats.rx_chksum_none = 0;
>         priv->port_stats.rx_chksum_complete = 0;
> @@ -183,19 +217,16 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
>         priv->xdp_stats.rx_xdp_tx      = 0;
>         priv->xdp_stats.rx_xdp_tx_full = 0;
>         for (i = 0; i < priv->rx_ring_num; i++) {
> -               stats->rx_packets += priv->rx_ring[i]->packets;
> -               stats->rx_bytes += priv->rx_ring[i]->bytes;
> -               sw_rx_dropped += priv->rx_ring[i]->dropped;
> -               priv->port_stats.rx_chksum_good += priv->rx_ring[i]->csum_ok;
> -               priv->port_stats.rx_chksum_none += priv->rx_ring[i]->csum_none;
> -               priv->port_stats.rx_chksum_complete += priv->rx_ring[i]->csum_complete;
> -               priv->xdp_stats.rx_xdp_drop    += priv->rx_ring[i]->xdp_drop;
> -               priv->xdp_stats.rx_xdp_tx      += priv->rx_ring[i]->xdp_tx;
> -               priv->xdp_stats.rx_xdp_tx_full += priv->rx_ring[i]->xdp_tx_full;
> +               const struct mlx4_en_rx_ring *ring = priv->rx_ring[i];
> +
> +               sw_rx_dropped                   += READ_ONCE(ring->dropped);
> +               priv->port_stats.rx_chksum_good += READ_ONCE(ring->csum_ok);
> +               priv->port_stats.rx_chksum_none += READ_ONCE(ring->csum_none);
> +               priv->port_stats.rx_chksum_complete += READ_ONCE(ring->csum_complete);
> +               priv->xdp_stats.rx_xdp_drop     += READ_ONCE(ring->xdp_drop);
> +               priv->xdp_stats.rx_xdp_tx       += READ_ONCE(ring->xdp_tx);
> +               priv->xdp_stats.rx_xdp_tx_full  += READ_ONCE(ring->xdp_tx_full);
>         }
> -       stats->tx_packets = 0;
> -       stats->tx_bytes = 0;
> -       stats->tx_dropped = 0;
>         priv->port_stats.tx_chksum_offload = 0;
>         priv->port_stats.queue_stopped = 0;
>         priv->port_stats.wake_queue = 0;
> @@ -205,15 +236,14 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
>         for (i = 0; i < priv->tx_ring_num[TX]; i++) {
>                 const struct mlx4_en_tx_ring *ring = priv->tx_ring[TX][i];
>
> -               stats->tx_packets += ring->packets;
> -               stats->tx_bytes += ring->bytes;
> -               stats->tx_dropped += ring->tx_dropped;
> -               priv->port_stats.tx_chksum_offload += ring->tx_csum;
> -               priv->port_stats.queue_stopped     += ring->queue_stopped;
> -               priv->port_stats.wake_queue        += ring->wake_queue;
> -               priv->port_stats.tso_packets       += ring->tso_packets;
> -               priv->port_stats.xmit_more         += ring->xmit_more;
> +               sw_tx_dropped                      += READ_ONCE(ring->tx_dropped);
> +               priv->port_stats.tx_chksum_offload += READ_ONCE(ring->tx_csum);
> +               priv->port_stats.queue_stopped     += READ_ONCE(ring->queue_stopped);
> +               priv->port_stats.wake_queue        += READ_ONCE(ring->wake_queue);
> +               priv->port_stats.tso_packets       += READ_ONCE(ring->tso_packets);
> +               priv->port_stats.xmit_more         += READ_ONCE(ring->xmit_more);
>         }
> +
>         if (mlx4_is_master(mdev->dev)) {
>                 stats->rx_packets = en_stats_adder(&mlx4_en_stats->RTOT_prio_0,
>                                                    &mlx4_en_stats->RTOT_prio_1,

As you see here in SRIOV mode (PF only) reads   sw stats from FW.
Tariq, I think we need to fix this.


> @@ -251,7 +281,8 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
>         stats->rx_length_errors = be32_to_cpu(mlx4_en_stats->RdropLength);
>         stats->rx_crc_errors = be32_to_cpu(mlx4_en_stats->RCRC);
>         stats->rx_fifo_errors = be32_to_cpu(mlx4_en_stats->RdropOvflw);
> -       stats->tx_dropped += be32_to_cpu(mlx4_en_stats->TDROP);
> +       stats->tx_dropped = be32_to_cpu(mlx4_en_stats->TDROP) +
> +                           sw_tx_dropped;
>
>         /* RX stats */
>         priv->pkstats.rx_multicast_packets = stats->multicast;
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 574bcbb1b38fc4758511d8f7bd17a87b0a507a73..20a936428f4a44c8ca0a7161855da310f9166b50 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -755,6 +755,7 @@ void mlx4_en_rx_irq(struct mlx4_cq *mcq);
>  int mlx4_SET_MCAST_FLTR(struct mlx4_dev *dev, u8 port, u64 mac, u64 clear, u8 mode);
>  int mlx4_SET_VLAN_FLTR(struct mlx4_dev *dev, struct mlx4_en_priv *priv);
>
> +void mlx4_en_fold_software_stats(struct net_device *dev);
>  int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset);
>  int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port);
>
>
>

^ permalink raw reply

* [PATCH net] net, sched: respect rcu grace period on cls destruction
From: Daniel Borkmann @ 2016-11-27  0:18 UTC (permalink / raw)
  To: davem
  Cc: xiyou.wangcong, john.fastabend, roid, ast, hannes, jiri, netdev,
	Daniel Borkmann

Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.

The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.

In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.

Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.

Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)

This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.

Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_basic.c    |  4 ----
 net/sched/cls_bpf.c      |  4 ----
 net/sched/cls_cgroup.c   |  7 +++----
 net/sched/cls_flow.c     |  1 -
 net/sched/cls_flower.c   | 31 ++++++++++++++++++++++++++-----
 net/sched/cls_matchall.c |  1 -
 net/sched/cls_rsvp.h     |  3 ++-
 net/sched/cls_tcindex.c  |  1 -
 8 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index eb219b7..5877f60 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -62,9 +62,6 @@ static unsigned long basic_get(struct tcf_proto *tp, u32 handle)
 	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f;
 
-	if (head == NULL)
-		return 0UL;
-
 	list_for_each_entry(f, &head->flist, link) {
 		if (f->handle == handle) {
 			l = (unsigned long) f;
@@ -109,7 +106,6 @@ static bool basic_destroy(struct tcf_proto *tp, bool force)
 		tcf_unbind_filter(tp, &f->res);
 		call_rcu(&f->rcu, basic_delete_filter);
 	}
-	RCU_INIT_POINTER(tp->root, NULL);
 	kfree_rcu(head, rcu);
 	return true;
 }
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index bb1d5a4..0a47ba5 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -292,7 +292,6 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
 		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
 	}
 
-	RCU_INIT_POINTER(tp->root, NULL);
 	kfree_rcu(head, rcu);
 	return true;
 }
@@ -303,9 +302,6 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
 	struct cls_bpf_prog *prog;
 	unsigned long ret = 0UL;
 
-	if (head == NULL)
-		return 0UL;
-
 	list_for_each_entry(prog, &head->plist, link) {
 		if (prog->handle == handle) {
 			ret = (unsigned long) prog;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 85233c47..c1f2007 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -137,11 +137,10 @@ static bool cls_cgroup_destroy(struct tcf_proto *tp, bool force)
 
 	if (!force)
 		return false;
-
-	if (head) {
-		RCU_INIT_POINTER(tp->root, NULL);
+	/* Head can still be NULL due to cls_cgroup_init(). */
+	if (head)
 		call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
-	}
+
 	return true;
 }
 
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index e396723..6575aba 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -596,7 +596,6 @@ static bool flow_destroy(struct tcf_proto *tp, bool force)
 		list_del_rcu(&f->list);
 		call_rcu(&f->rcu, flow_destroy_filter);
 	}
-	RCU_INIT_POINTER(tp->root, NULL);
 	kfree_rcu(head, rcu);
 	return true;
 }
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index f6f40fb..b296f39 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/rhashtable.h>
+#include <linux/workqueue.h>
 
 #include <linux/if_ether.h>
 #include <linux/in6.h>
@@ -64,7 +65,10 @@ struct cls_fl_head {
 	bool mask_assigned;
 	struct list_head filters;
 	struct rhashtable_params ht_params;
-	struct rcu_head rcu;
+	union {
+		struct work_struct work;
+		struct rcu_head	rcu;
+	};
 };
 
 struct cls_fl_filter {
@@ -269,6 +273,24 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
 	dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, &tc);
 }
 
+static void fl_destroy_sleepable(struct work_struct *work)
+{
+	struct cls_fl_head *head = container_of(work, struct cls_fl_head,
+						work);
+	if (head->mask_assigned)
+		rhashtable_destroy(&head->ht);
+	kfree(head);
+	module_put(THIS_MODULE);
+}
+
+static void fl_destroy_rcu(struct rcu_head *rcu)
+{
+	struct cls_fl_head *head = container_of(rcu, struct cls_fl_head, rcu);
+
+	INIT_WORK(&head->work, fl_destroy_sleepable);
+	schedule_work(&head->work);
+}
+
 static bool fl_destroy(struct tcf_proto *tp, bool force)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
@@ -282,10 +304,9 @@ static bool fl_destroy(struct tcf_proto *tp, bool force)
 		list_del_rcu(&f->list);
 		call_rcu(&f->rcu, fl_destroy_filter);
 	}
-	RCU_INIT_POINTER(tp->root, NULL);
-	if (head->mask_assigned)
-		rhashtable_destroy(&head->ht);
-	kfree_rcu(head, rcu);
+
+	__module_get(THIS_MODULE);
+	call_rcu(&head->rcu, fl_destroy_rcu);
 	return true;
 }
 
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 25927b6..f935429 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -114,7 +114,6 @@ static bool mall_destroy(struct tcf_proto *tp, bool force)
 
 		call_rcu(&f->rcu, mall_destroy_filter);
 	}
-	RCU_INIT_POINTER(tp->root, NULL);
 	kfree_rcu(head, rcu);
 	return true;
 }
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 4f05a19..322438f 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -152,7 +152,8 @@ static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		return -1;
 	nhptr = ip_hdr(skb);
 #endif
-
+	if (unlikely(!head))
+		return -1;
 restart:
 
 #if RSVP_DST_LEN == 4
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 96144bd..0751245 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -543,7 +543,6 @@ static bool tcindex_destroy(struct tcf_proto *tp, bool force)
 	walker.fn = tcindex_destroy_element;
 	tcindex_walk(tp, &walker);
 
-	RCU_INIT_POINTER(tp->root, NULL);
 	call_rcu(&p->rcu, __tcindex_destroy);
 	return true;
 }
-- 
1.9.3

^ permalink raw reply related

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: Daniel Borkmann @ 2016-11-27  0:33 UTC (permalink / raw)
  To: Cong Wang
  Cc: Roi Dayan, Linux Kernel Network Developers, Jiri Pirko,
	John Fastabend
In-Reply-To: <58396D71.8070703@iogearbox.net>

On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
> On 11/26/2016 07:46 AM, Cong Wang wrote:
>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
[...]
>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>> Outstanding readers should either bail out due to if (!cl) or can still
>>> process the chain until read section ends, but during that time, cl->q
>>> resp. bstats should be good. Do you happen to know what's at address
>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
>>> at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock()
>>> here. The KASAN report is reliably happening at this location, right?
>>
>> I am confused as well, I don't see how it could be related to my patch yet.
>> I will take a deep look in the weekend.
>
> Ok, I'm currently on the run. Got too late yesterday night, but I'll
> write what I found in the evening today, not related to ingress though.

Just pushed out my analysis to netdev under "[PATCH net] net, sched: respect
rcu grace period on cls destruction". My conclusion is that both issues are
actually separate, and that one is small enough where we could route it via
net actually. Perhaps this at the same time shrinks your "[PATCH net-next]
net_sched: move the empty tp check from ->destroy() to ->delete()" to a
reasonable size that it's suitable to net as well. Your ->delete()/->destroy()
one is definitely needed, too. The tp->root one is independant of ->delete()/
->destroy() as they are different races and tp->root could also happen when
you just destroy the whole tp directly. I think that seems like a good path
forward to me.

Thanks,
Daniel

^ permalink raw reply

* [PATCH net-next 0/4] bnxt_en: Add DCBNL support.
From: Michael Chan @ 2016-11-27  0:38 UTC (permalink / raw)
  To: davem; +Cc: netdev

This series adds DCBNL operations to support host-based IEEE DCBX.

Michael Chan (4):
  bnxt_en: Re-factor bnxt_setup_tc().
  bnxt_en: Update firmware header file to include DCB command structs.
  bnxt_en: Implement DCBNL to support host-based DCBX.
  bnxt_en: Add PFC statistics.

 drivers/net/ethernet/broadcom/Kconfig             |  10 +
 drivers/net/ethernet/broadcom/bnxt/Makefile       |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         |  30 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  18 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     | 502 ++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h     |  59 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  23 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h     | 326 ++++++++++++++
 8 files changed, 952 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next 1/4] bnxt_en: Re-factor bnxt_setup_tc().
From: Michael Chan @ 2016-11-27  0:38 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1480207115-21294-1-git-send-email-michael.chan@broadcom.com>

Add a new function bnxt_setup_mq_tc() to handle MQPRIO.  This new function
will be called during ETS setup when we add DCBNL in the next patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++++++++++--------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8c7bdbe..b75f4d0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6328,17 +6328,10 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
-			 struct tc_to_netdev *ntc)
+int bnxt_setup_mq_tc(struct net_device *dev, u8 tc)
 {
 	struct bnxt *bp = netdev_priv(dev);
 	bool sh = false;
-	u8 tc;
-
-	if (ntc->type != TC_SETUP_MQPRIO)
-		return -EINVAL;
-
-	tc = ntc->tc;
 
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
@@ -6381,6 +6374,15 @@ static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 	return 0;
 }
 
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			 struct tc_to_netdev *ntc)
+{
+	if (ntc->type != TC_SETUP_MQPRIO)
+		return -EINVAL;
+
+	return bnxt_setup_mq_tc(dev, ntc->tc);
+}
+
 #ifdef CONFIG_RFS_ACCEL
 static bool bnxt_fltr_match(struct bnxt_ntuple_filter *f1,
 			    struct bnxt_ntuple_filter *f2)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 47be789..fcd07ee 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1225,5 +1225,6 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi)
 int bnxt_hwrm_fw_set_time(struct bnxt *);
 int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
 int bnxt_get_max_rings(struct bnxt *, int *, int *, bool);
 #endif
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/4] bnxt_en: Update firmware header file to include DCB command structs.
From: Michael Chan @ 2016-11-27  0:38 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1480207115-21294-1-git-send-email-michael.chan@broadcom.com>

Get and store the max number of lossless TCs the hardware can support.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   4 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 326 ++++++++++++++++++++++++++
 3 files changed, 331 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b75f4d0..58a75f4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4252,12 +4252,16 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
 		goto qportcfg_exit;
 	}
 	bp->max_tc = resp->max_configurable_queues;
+	bp->max_lltc = resp->max_configurable_lossless_queues;
 	if (bp->max_tc > BNXT_MAX_QUEUE)
 		bp->max_tc = BNXT_MAX_QUEUE;
 
 	if (resp->queue_cfg_info & QUEUE_QPORTCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG)
 		bp->max_tc = 1;
 
+	if (bp->max_lltc > bp->max_tc)
+		bp->max_lltc = bp->max_tc;
+
 	qptr = &resp->queue_id0;
 	for (i = 0; i < bp->max_tc; i++) {
 		bp->q_info[i].queue_id = *qptr++;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index fcd07ee..edde11e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1010,6 +1010,7 @@ struct bnxt {
 	u32			rss_hash_cfg;
 
 	u8			max_tc;
+	u8			max_lltc;	/* lossless TCs */
 	struct bnxt_queue_info	q_info[BNXT_MAX_QUEUE];
 
 	unsigned int		current_interval;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
index 0456d5b..5565612 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
@@ -2355,6 +2355,39 @@ struct hwrm_queue_cfg_output {
 	u8 valid;
 };
 
+/* hwrm_queue_pfcenable_qcfg */
+/* Input (24 bytes) */
+struct hwrm_queue_pfcenable_qcfg_input {
+	__le16 req_type;
+	__le16 cmpl_ring;
+	__le16 seq_id;
+	__le16 target_id;
+	__le64 resp_addr;
+	__le16 port_id;
+	__le16 unused_0[3];
+};
+
+/* Output (16 bytes) */
+struct hwrm_queue_pfcenable_qcfg_output {
+	__le16 error_code;
+	__le16 req_type;
+	__le16 seq_id;
+	__le16 resp_len;
+	__le32 flags;
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI0_PFC_ENABLED   0x1UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI1_PFC_ENABLED   0x2UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI2_PFC_ENABLED   0x4UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI3_PFC_ENABLED   0x8UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI4_PFC_ENABLED   0x10UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI5_PFC_ENABLED   0x20UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI6_PFC_ENABLED   0x40UL
+	#define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI7_PFC_ENABLED   0x80UL
+	u8 unused_0;
+	u8 unused_1;
+	u8 unused_2;
+	u8 valid;
+};
+
 /* hwrm_queue_pfcenable_cfg */
 /* Input (24 bytes) */
 struct hwrm_queue_pfcenable_cfg_input {
@@ -2389,6 +2422,48 @@ struct hwrm_queue_pfcenable_cfg_output {
 	u8 valid;
 };
 
+/* hwrm_queue_pri2cos_qcfg */
+/* Input (24 bytes) */
+struct hwrm_queue_pri2cos_qcfg_input {
+	__le16 req_type;
+	__le16 cmpl_ring;
+	__le16 seq_id;
+	__le16 target_id;
+	__le64 resp_addr;
+	__le32 flags;
+	#define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH		    0x1UL
+	#define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_TX		   (0x0UL << 0)
+	#define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_RX		   (0x1UL << 0)
+	#define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_LAST    QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_RX
+	#define QUEUE_PRI2COS_QCFG_REQ_FLAGS_IVLAN		    0x2UL
+	u8 port_id;
+	u8 unused_0[3];
+};
+
+/* Output (24 bytes) */
+struct hwrm_queue_pri2cos_qcfg_output {
+	__le16 error_code;
+	__le16 req_type;
+	__le16 seq_id;
+	__le16 resp_len;
+	u8 pri0_cos_queue_id;
+	u8 pri1_cos_queue_id;
+	u8 pri2_cos_queue_id;
+	u8 pri3_cos_queue_id;
+	u8 pri4_cos_queue_id;
+	u8 pri5_cos_queue_id;
+	u8 pri6_cos_queue_id;
+	u8 pri7_cos_queue_id;
+	u8 queue_cfg_info;
+	#define QUEUE_PRI2COS_QCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG    0x1UL
+	u8 unused_0;
+	__le16 unused_1;
+	u8 unused_2;
+	u8 unused_3;
+	u8 unused_4;
+	u8 valid;
+};
+
 /* hwrm_queue_pri2cos_cfg */
 /* Input (40 bytes) */
 struct hwrm_queue_pri2cos_cfg_input {
@@ -2439,6 +2514,257 @@ struct hwrm_queue_pri2cos_cfg_output {
 	u8 valid;
 };
 
+/* hwrm_queue_cos2bw_qcfg */
+/* Input (24 bytes) */
+struct hwrm_queue_cos2bw_qcfg_input {
+	__le16 req_type;
+	__le16 cmpl_ring;
+	__le16 seq_id;
+	__le16 target_id;
+	__le64 resp_addr;
+	__le16 port_id;
+	__le16 unused_0[3];
+};
+
+/* Output (112 bytes) */
+struct hwrm_queue_cos2bw_qcfg_output {
+	__le16 error_code;
+	__le16 req_type;
+	__le16 seq_id;
+	__le16 resp_len;
+	u8 queue_id0;
+	u8 unused_0;
+	__le16 unused_1;
+	__le32 queue_id0_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id0_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id0_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id0_pri_lvl;
+	u8 queue_id0_bw_weight;
+	u8 queue_id1;
+	__le32 queue_id1_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id1_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id1_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID1_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id1_pri_lvl;
+	u8 queue_id1_bw_weight;
+	u8 queue_id2;
+	__le32 queue_id2_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id2_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id2_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID2_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id2_pri_lvl;
+	u8 queue_id2_bw_weight;
+	u8 queue_id3;
+	__le32 queue_id3_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id3_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id3_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID3_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id3_pri_lvl;
+	u8 queue_id3_bw_weight;
+	u8 queue_id4;
+	__le32 queue_id4_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id4_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id4_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID4_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id4_pri_lvl;
+	u8 queue_id4_bw_weight;
+	u8 queue_id5;
+	__le32 queue_id5_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id5_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id5_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID5_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id5_pri_lvl;
+	u8 queue_id5_bw_weight;
+	u8 queue_id6;
+	__le32 queue_id6_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id6_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id6_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID6_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id6_pri_lvl;
+	u8 queue_id6_bw_weight;
+	u8 queue_id7;
+	__le32 queue_id7_min_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MIN_BW_BW_VALUE_UNIT_INVALID
+	__le32 queue_id7_max_bw;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_MASK 0xfffffffUL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_SFT 0
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_RSVD       0x10000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_MASK 0xe0000000UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_SFT 29
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_MBPS (0x0UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_PERCENT1_100 (0x1UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_INVALID (0x7UL << 29)
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_LAST    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_MAX_BW_BW_VALUE_UNIT_INVALID
+	u8 queue_id7_tsa_assign;
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_TSA_ASSIGN_SP    0x0UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_TSA_ASSIGN_ETS   0x1UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_TSA_ASSIGN_RESERVED_FIRST 0x2UL
+	#define QUEUE_COS2BW_QCFG_RESP_QUEUE_ID7_TSA_ASSIGN_RESERVED_LAST 0xffUL
+	u8 queue_id7_pri_lvl;
+	u8 queue_id7_bw_weight;
+	u8 unused_2;
+	u8 unused_3;
+	u8 unused_4;
+	u8 unused_5;
+	u8 valid;
+};
+
 /* hwrm_queue_cos2bw_cfg */
 /* Input (128 bytes) */
 struct hwrm_queue_cos2bw_cfg_input {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/4] bnxt_en: Implement DCBNL to support host-based DCBX.
From: Michael Chan @ 2016-11-27  0:38 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1480207115-21294-1-git-send-email-michael.chan@broadcom.com>

Support only IEEE DCBX initially.  Add IEEE DCBNL ops and functions to
get and set the hardware DCBX parameters.  The DCB code is conditional on
Kconfig CONFIG_BNXT_DCB.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/Kconfig         |  10 +
 drivers/net/ethernet/broadcom/bnxt/Makefile   |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   8 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   9 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 490 ++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h |  59 ++++
 6 files changed, 575 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h

diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index bd8c80c..404c020 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -203,4 +203,14 @@ config BNXT_SRIOV
 	  Virtualization support in the NetXtreme-C/E products. This
 	  allows for virtual function acceleration in virtual environments.
 
+config BNXT_DCB
+	bool "Data Center Bridging (DCB) Support"
+	default n
+	depends on BNXT && DCB
+	---help---
+	  Say Y here if you want to use Data Center Bridging (DCB) in the
+	  driver.
+
+	  If unsure, say N.
+
 endif # NET_VENDOR_BROADCOM
diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 97e78e2..b233a86 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 58a75f4..cec24b4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -54,6 +54,7 @@
 #include "bnxt.h"
 #include "bnxt_sriov.h"
 #include "bnxt_ethtool.h"
+#include "bnxt_dcb.h"
 
 #define BNXT_TX_TIMEOUT		(5 * HZ)
 
@@ -4988,7 +4989,7 @@ static void bnxt_enable_napi(struct bnxt *bp)
 	}
 }
 
-static void bnxt_tx_disable(struct bnxt *bp)
+void bnxt_tx_disable(struct bnxt *bp)
 {
 	int i;
 	struct bnxt_tx_ring_info *txr;
@@ -5006,7 +5007,7 @@ static void bnxt_tx_disable(struct bnxt *bp)
 	netif_carrier_off(bp->dev);
 }
 
-static void bnxt_tx_enable(struct bnxt *bp)
+void bnxt_tx_enable(struct bnxt *bp)
 {
 	int i;
 	struct bnxt_tx_ring_info *txr;
@@ -6677,6 +6678,7 @@ static void bnxt_remove_one(struct pci_dev *pdev)
 
 	bnxt_hwrm_func_drv_unrgtr(bp);
 	bnxt_free_hwrm_resources(bp);
+	bnxt_dcb_free(bp);
 	pci_iounmap(pdev, bp->bar2);
 	pci_iounmap(pdev, bp->bar1);
 	pci_iounmap(pdev, bp->bar0);
@@ -6904,6 +6906,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->min_mtu = ETH_ZLEN;
 	dev->max_mtu = 9500;
 
+	bnxt_dcb_init(bp);
+
 #ifdef CONFIG_BNXT_SRIOV
 	init_waitqueue_head(&bp->sriov_cfg_wait);
 #endif
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index edde11e..275e560 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1026,6 +1026,13 @@ struct bnxt {
 	struct bnxt_irq	*irq_tbl;
 	u8			mac_addr[ETH_ALEN];
 
+#ifdef CONFIG_BNXT_DCB
+	struct ieee_pfc		*ieee_pfc;
+	struct ieee_ets		*ieee_ets;
+	u8			dcbx_cap;
+	u8			default_pri;
+#endif /* CONFIG_BNXT_DCB */
+
 	u32			msg_enable;
 
 	u32			hwrm_spec_code;
@@ -1221,6 +1228,8 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi)
 int hwrm_send_message_silent(struct bnxt *, void *, u32, int);
 int bnxt_hwrm_set_coal(struct bnxt *);
 int bnxt_hwrm_func_qcaps(struct bnxt *);
+void bnxt_tx_disable(struct bnxt *bp);
+void bnxt_tx_enable(struct bnxt *bp);
 int bnxt_hwrm_set_pause(struct bnxt *);
 int bnxt_hwrm_set_link_setting(struct bnxt *, bool, bool);
 int bnxt_hwrm_fw_set_time(struct bnxt *);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
new file mode 100644
index 0000000..f391b47
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -0,0 +1,490 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2014-2016 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/rtnetlink.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/etherdevice.h>
+#include "bnxt_hsi.h"
+#include "bnxt.h"
+#include "bnxt_dcb.h"
+
+#ifdef CONFIG_BNXT_DCB
+static int bnxt_hwrm_queue_pri2cos_cfg(struct bnxt *bp, struct ieee_ets *ets)
+{
+	struct hwrm_queue_pri2cos_cfg_input req = {0};
+	int rc = 0, i;
+	u8 *pri2cos;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_PRI2COS_CFG, -1, -1);
+	req.flags = cpu_to_le32(QUEUE_PRI2COS_CFG_REQ_FLAGS_PATH_BIDIR |
+				QUEUE_PRI2COS_CFG_REQ_FLAGS_IVLAN);
+
+	pri2cos = &req.pri0_cos_queue_id;
+	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+		req.enables |= cpu_to_le32(
+			QUEUE_PRI2COS_CFG_REQ_ENABLES_PRI0_COS_QUEUE_ID << i);
+
+		pri2cos[i] = bp->q_info[ets->prio_tc[i]].queue_id;
+	}
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	return rc;
+}
+
+static int bnxt_hwrm_queue_pri2cos_qcfg(struct bnxt *bp, struct ieee_ets *ets)
+{
+	struct hwrm_queue_pri2cos_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+	struct hwrm_queue_pri2cos_qcfg_input req = {0};
+	int rc = 0;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_PRI2COS_QCFG, -1, -1);
+	req.flags = cpu_to_le32(QUEUE_PRI2COS_QCFG_REQ_FLAGS_IVLAN);
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (!rc) {
+		u8 *pri2cos = &resp->pri0_cos_queue_id;
+		int i, j;
+
+		for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+			u8 queue_id = pri2cos[i];
+
+			for (j = 0; j < bp->max_tc; j++) {
+				if (bp->q_info[j].queue_id == queue_id) {
+					ets->prio_tc[i] = j;
+					break;
+				}
+			}
+		}
+	}
+	return rc;
+}
+
+static int bnxt_hwrm_queue_cos2bw_cfg(struct bnxt *bp, struct ieee_ets *ets,
+				      u8 max_tc)
+{
+	struct hwrm_queue_cos2bw_cfg_input req = {0};
+	struct bnxt_cos2bw_cfg cos2bw;
+	int rc = 0, i;
+	void *data;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_COS2BW_CFG, -1, -1);
+	data = &req.unused_0;
+	for (i = 0; i < max_tc; i++, data += sizeof(cos2bw) - 4) {
+		req.enables |= cpu_to_le32(
+			QUEUE_COS2BW_CFG_REQ_ENABLES_COS_QUEUE_ID0_VALID << i);
+
+		memset(&cos2bw, 0, sizeof(cos2bw));
+		cos2bw.queue_id = bp->q_info[i].queue_id;
+		if (ets->tc_tsa[i] == IEEE_8021QAZ_TSA_STRICT) {
+			cos2bw.tsa =
+				QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP;
+			cos2bw.pri_lvl = i;
+		} else {
+			cos2bw.tsa =
+				QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_ETS;
+			cos2bw.bw_weight = ets->tc_tx_bw[i];
+		}
+		memcpy(data, &cos2bw.queue_id, sizeof(cos2bw) - 4);
+		if (i == 0) {
+			req.queue_id0 = cos2bw.queue_id;
+			req.unused_0 = 0;
+		}
+	}
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	return rc;
+}
+
+static int bnxt_hwrm_queue_cos2bw_qcfg(struct bnxt *bp, struct ieee_ets *ets)
+{
+	struct hwrm_queue_cos2bw_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+	struct hwrm_queue_cos2bw_qcfg_input req = {0};
+	struct bnxt_cos2bw_cfg cos2bw;
+	void *data;
+	int rc, i;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_COS2BW_QCFG, -1, -1);
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		return rc;
+
+	data = &resp->queue_id0 + offsetof(struct bnxt_cos2bw_cfg, queue_id);
+	for (i = 0; i < bp->max_tc; i++, data += sizeof(cos2bw) - 4) {
+		int j;
+
+		memcpy(&cos2bw.queue_id, data, sizeof(cos2bw) - 4);
+		if (i == 0)
+			cos2bw.queue_id = resp->queue_id0;
+
+		for (j = 0; j < bp->max_tc; j++) {
+			if (bp->q_info[j].queue_id != cos2bw.queue_id)
+				continue;
+			if (cos2bw.tsa ==
+			    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP) {
+				ets->tc_tsa[j] = IEEE_8021QAZ_TSA_STRICT;
+			} else {
+				ets->tc_tsa[j] = IEEE_8021QAZ_TSA_ETS;
+				ets->tc_tx_bw[j] = cos2bw.bw_weight;
+			}
+		}
+	}
+	return 0;
+}
+
+static int bnxt_hwrm_queue_cfg(struct bnxt *bp, unsigned int lltc_mask)
+{
+	struct hwrm_queue_cfg_input req = {0};
+	int i;
+
+	if (netif_running(bp->dev))
+		bnxt_tx_disable(bp);
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_CFG, -1, -1);
+	req.flags = cpu_to_le32(QUEUE_CFG_REQ_FLAGS_PATH_BIDIR);
+	req.enables = cpu_to_le32(QUEUE_CFG_REQ_ENABLES_SERVICE_PROFILE);
+
+	/* Configure lossless queues to lossy first */
+	req.service_profile = QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSY;
+	for (i = 0; i < bp->max_tc; i++) {
+		if (BNXT_LLQ(bp->q_info[i].queue_profile)) {
+			req.queue_id = cpu_to_le32(bp->q_info[i].queue_id);
+			hwrm_send_message(bp, &req, sizeof(req),
+					  HWRM_CMD_TIMEOUT);
+			bp->q_info[i].queue_profile =
+				QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSY;
+		}
+	}
+
+	/* Now configure desired queues to lossless */
+	req.service_profile = QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSLESS;
+	for (i = 0; i < bp->max_tc; i++) {
+		if (lltc_mask & (1 << i)) {
+			req.queue_id = cpu_to_le32(bp->q_info[i].queue_id);
+			hwrm_send_message(bp, &req, sizeof(req),
+					  HWRM_CMD_TIMEOUT);
+			bp->q_info[i].queue_profile =
+				QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSLESS;
+		}
+	}
+	if (netif_running(bp->dev))
+		bnxt_tx_enable(bp);
+
+	return 0;
+}
+
+static int bnxt_hwrm_queue_pfc_cfg(struct bnxt *bp, struct ieee_pfc *pfc)
+{
+	struct hwrm_queue_pfcenable_cfg_input req = {0};
+	struct ieee_ets *my_ets = bp->ieee_ets;
+	unsigned int tc_mask = 0, pri_mask = 0;
+	u8 i, pri, lltc_count = 0;
+	bool need_q_recfg = false;
+	int rc;
+
+	if (!my_ets)
+		return -EINVAL;
+
+	for (i = 0; i < bp->max_tc; i++) {
+		for (pri = 0; pri < IEEE_8021QAZ_MAX_TCS; pri++) {
+			if ((pfc->pfc_en & (1 << pri)) &&
+			    (my_ets->prio_tc[pri] == i)) {
+				pri_mask |= 1 << pri;
+				tc_mask |= 1 << i;
+			}
+		}
+		if (tc_mask & (1 << i))
+			lltc_count++;
+	}
+	if (lltc_count > bp->max_lltc)
+		return -EINVAL;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_PFCENABLE_CFG, -1, -1);
+	req.flags = cpu_to_le32(pri_mask);
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < bp->max_tc; i++) {
+		if (tc_mask & (1 << i)) {
+			if (!BNXT_LLQ(bp->q_info[i].queue_profile))
+				need_q_recfg = true;
+		}
+	}
+
+	if (need_q_recfg)
+		rc = bnxt_hwrm_queue_cfg(bp, tc_mask);
+
+	return rc;
+}
+
+static int bnxt_hwrm_queue_pfc_qcfg(struct bnxt *bp, struct ieee_pfc *pfc)
+{
+	struct hwrm_queue_pfcenable_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+	struct hwrm_queue_pfcenable_qcfg_input req = {0};
+	u8 pri_mask;
+	int rc;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_PFCENABLE_QCFG, -1, -1);
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		return rc;
+
+	pri_mask = le32_to_cpu(resp->flags);
+	pfc->pfc_en = pri_mask;
+	return 0;
+}
+
+static int bnxt_ets_validate(struct bnxt *bp, struct ieee_ets *ets, u8 *tc)
+{
+	int total_ets_bw = 0;
+	u8 max_tc = 0;
+	int i;
+
+	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+		if (ets->prio_tc[i] > bp->max_tc) {
+			netdev_err(bp->dev, "priority to TC mapping exceeds TC count %d\n",
+				   ets->prio_tc[i]);
+			return -EINVAL;
+		}
+		if (ets->prio_tc[i] > max_tc)
+			max_tc = ets->prio_tc[i];
+
+		if ((ets->tc_tx_bw[i] || ets->tc_tsa[i]) && i > bp->max_tc)
+			return -EINVAL;
+
+		switch (ets->tc_tsa[i]) {
+		case IEEE_8021QAZ_TSA_STRICT:
+			break;
+		case IEEE_8021QAZ_TSA_ETS:
+			total_ets_bw += ets->tc_tx_bw[i];
+			break;
+		default:
+			return -ENOTSUPP;
+		}
+	}
+	if (total_ets_bw > 100)
+		return -EINVAL;
+
+	*tc = max_tc + 1;
+	return 0;
+}
+
+static int bnxt_dcbnl_ieee_getets(struct net_device *dev, struct ieee_ets *ets)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	struct ieee_ets *my_ets = bp->ieee_ets;
+
+	ets->ets_cap = bp->max_tc;
+
+	if (!my_ets) {
+		int rc;
+
+		if (bp->dcbx_cap & DCB_CAP_DCBX_HOST)
+			return 0;
+
+		my_ets = kzalloc(sizeof(*my_ets), GFP_KERNEL);
+		if (!my_ets)
+			return 0;
+		rc = bnxt_hwrm_queue_cos2bw_qcfg(bp, my_ets);
+		if (rc)
+			return 0;
+		rc = bnxt_hwrm_queue_pri2cos_qcfg(bp, my_ets);
+		if (rc)
+			return 0;
+	}
+
+	ets->cbs = my_ets->cbs;
+	memcpy(ets->tc_tx_bw, my_ets->tc_tx_bw, sizeof(ets->tc_tx_bw));
+	memcpy(ets->tc_rx_bw, my_ets->tc_rx_bw, sizeof(ets->tc_rx_bw));
+	memcpy(ets->tc_tsa, my_ets->tc_tsa, sizeof(ets->tc_tsa));
+	memcpy(ets->prio_tc, my_ets->prio_tc, sizeof(ets->prio_tc));
+	return 0;
+}
+
+static int bnxt_dcbnl_ieee_setets(struct net_device *dev, struct ieee_ets *ets)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	struct ieee_ets *my_ets = bp->ieee_ets;
+	u8 max_tc = 0;
+	int rc, i;
+
+	if (!(bp->dcbx_cap & DCB_CAP_DCBX_VER_IEEE) ||
+	    !(bp->dcbx_cap & DCB_CAP_DCBX_HOST))
+		return -EINVAL;
+
+	rc = bnxt_ets_validate(bp, ets, &max_tc);
+	if (!rc) {
+		if (!my_ets) {
+			my_ets = kzalloc(sizeof(*my_ets), GFP_KERNEL);
+			if (!my_ets)
+				return -ENOMEM;
+			/* initialize PRI2TC mappings to invalid value */
+			for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+				my_ets->prio_tc[i] = IEEE_8021QAZ_MAX_TCS;
+			bp->ieee_ets = my_ets;
+		}
+		rc = bnxt_setup_mq_tc(dev, max_tc);
+		if (rc)
+			return rc;
+		rc = bnxt_hwrm_queue_cos2bw_cfg(bp, ets, max_tc);
+		if (rc)
+			return rc;
+		rc = bnxt_hwrm_queue_pri2cos_cfg(bp, ets);
+		if (rc)
+			return rc;
+		memcpy(my_ets, ets, sizeof(*my_ets));
+	}
+	return rc;
+}
+
+static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	struct ieee_pfc *my_pfc = bp->ieee_pfc;
+	int rc;
+
+	pfc->pfc_cap = bp->max_lltc;
+
+	if (!my_pfc) {
+		if (bp->dcbx_cap & DCB_CAP_DCBX_HOST)
+			return 0;
+
+		my_pfc = kzalloc(sizeof(*my_pfc), GFP_KERNEL);
+		if (!my_pfc)
+			return 0;
+		bp->ieee_pfc = my_pfc;
+		rc = bnxt_hwrm_queue_pfc_qcfg(bp, my_pfc);
+		if (rc)
+			return 0;
+	}
+
+	pfc->pfc_en = my_pfc->pfc_en;
+	pfc->mbc = my_pfc->mbc;
+	pfc->delay = my_pfc->delay;
+
+	return 0;
+}
+
+static int bnxt_dcbnl_ieee_setpfc(struct net_device *dev, struct ieee_pfc *pfc)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	struct ieee_pfc *my_pfc = bp->ieee_pfc;
+	int rc;
+
+	if (!(bp->dcbx_cap & DCB_CAP_DCBX_VER_IEEE) ||
+	    !(bp->dcbx_cap & DCB_CAP_DCBX_HOST))
+		return -EINVAL;
+
+	if (!my_pfc) {
+		my_pfc = kzalloc(sizeof(*my_pfc), GFP_KERNEL);
+		if (!my_pfc)
+			return -ENOMEM;
+		bp->ieee_pfc = my_pfc;
+	}
+	rc = bnxt_hwrm_queue_pfc_cfg(bp, pfc);
+	if (!rc)
+		memcpy(my_pfc, pfc, sizeof(*my_pfc));
+
+	return rc;
+}
+
+static int bnxt_dcbnl_ieee_setapp(struct net_device *dev, struct dcb_app *app)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	int rc = -EINVAL;
+
+	if (!(bp->dcbx_cap & DCB_CAP_DCBX_VER_IEEE) ||
+	    !(bp->dcbx_cap & DCB_CAP_DCBX_HOST))
+		return -EINVAL;
+
+	rc = dcb_ieee_setapp(dev, app);
+	return rc;
+}
+
+static int bnxt_dcbnl_ieee_delapp(struct net_device *dev, struct dcb_app *app)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	int rc;
+
+	if (!(bp->dcbx_cap & DCB_CAP_DCBX_VER_IEEE))
+		return -EINVAL;
+
+	rc = dcb_ieee_delapp(dev, app);
+	return rc;
+}
+
+static u8 bnxt_dcbnl_getdcbx(struct net_device *dev)
+{
+	struct bnxt *bp = netdev_priv(dev);
+
+	return bp->dcbx_cap;
+}
+
+static u8 bnxt_dcbnl_setdcbx(struct net_device *dev, u8 mode)
+{
+	struct bnxt *bp = netdev_priv(dev);
+
+	/* only support IEEE */
+	if ((mode & DCB_CAP_DCBX_VER_CEE) || !(mode & DCB_CAP_DCBX_VER_IEEE))
+		return 1;
+
+	if ((mode & DCB_CAP_DCBX_HOST) && BNXT_VF(bp))
+		return 1;
+
+	if (mode == bp->dcbx_cap)
+		return 0;
+
+	bp->dcbx_cap = mode;
+	return 0;
+}
+
+static const struct dcbnl_rtnl_ops dcbnl_ops = {
+	.ieee_getets	= bnxt_dcbnl_ieee_getets,
+	.ieee_setets	= bnxt_dcbnl_ieee_setets,
+	.ieee_getpfc	= bnxt_dcbnl_ieee_getpfc,
+	.ieee_setpfc	= bnxt_dcbnl_ieee_setpfc,
+	.ieee_setapp	= bnxt_dcbnl_ieee_setapp,
+	.ieee_delapp	= bnxt_dcbnl_ieee_delapp,
+	.getdcbx	= bnxt_dcbnl_getdcbx,
+	.setdcbx	= bnxt_dcbnl_setdcbx,
+};
+
+void bnxt_dcb_init(struct bnxt *bp)
+{
+	if (bp->hwrm_spec_code < 0x10501)
+		return;
+
+	bp->dcbx_cap = DCB_CAP_DCBX_VER_IEEE;
+	if (BNXT_PF(bp))
+		bp->dcbx_cap |= DCB_CAP_DCBX_HOST;
+	else
+		bp->dcbx_cap |= DCB_CAP_DCBX_LLD_MANAGED;
+	bp->dev->dcbnl_ops = &dcbnl_ops;
+}
+
+void bnxt_dcb_free(struct bnxt *bp)
+{
+	kfree(bp->ieee_pfc);
+	kfree(bp->ieee_ets);
+	bp->ieee_pfc = NULL;
+	bp->ieee_ets = NULL;
+}
+
+#else
+
+void bnxt_dcb_init(struct bnxt *bp)
+{
+}
+
+void bnxt_dcb_free(struct bnxt *bp)
+{
+}
+
+#endif
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h
new file mode 100644
index 0000000..bef31fb
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h
@@ -0,0 +1,59 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2014-2016 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef BNXT_DCB_H
+#define BNXT_DCB_H
+
+#include <net/dcbnl.h>
+
+struct bnxt_dcb {
+	u8			max_tc;
+	struct ieee_pfc		*ieee_pfc;
+	struct ieee_ets		*ieee_ets;
+	u8			dcbx_cap;
+	u8			default_pri;
+};
+
+struct bnxt_cos2bw_cfg {
+	u8			pad[3];
+	u8			queue_id;
+	__le32			min_bw;
+	__le32			max_bw;
+	u8			tsa;
+	u8			pri_lvl;
+	u8			bw_weight;
+	u8			unused;
+};
+
+#define BNXT_LLQ(q_profile)	\
+	((q_profile) == QUEUE_QPORTCFG_RESP_QUEUE_ID0_SERVICE_PROFILE_LOSSLESS)
+
+struct struct_data_dcbx_app_cfg {
+	__le16		protocol_id;
+	u8		protocol_selector;
+	u8		priority;
+	u8		valid;
+	u8		rsvd1[3];
+};
+
+struct hwrm_struct_data_hdr {
+	__le16		struct_id;
+#define HWRM_STRUCT_DATA_HDR_STRUCT_ID_DCBX_CFG_APP	0x0421
+	__le16		len;
+	u8		version;
+	u8		count;
+	__le16		subtype;
+#define HWRM_STRUCT_DATA_SUBTYPE_HOST_OPERATIONAL	0x0300
+	__le16		next_offset;
+	u8		rsvd1[6];
+};
+
+void bnxt_dcb_init(struct bnxt *bp);
+void bnxt_dcb_free(struct bnxt *bp);
+#endif
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 4/4] bnxt_en: Add PFC statistics.
From: Michael Chan @ 2016-11-27  0:38 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1480207115-21294-1-git-send-email-michael.chan@broadcom.com>

Report PFC statistics to ethtool -S and DCBNL.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  7 +++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     | 14 +++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 ++++++++++++++++-------
 3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 275e560..a72adec 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1124,6 +1124,13 @@ struct bnxt {
 	u32			lpi_tmr_hi;
 };
 
+#define BNXT_RX_STATS_OFFSET(counter)			\
+	(offsetof(struct rx_port_stats, counter) / 8)
+
+#define BNXT_TX_STATS_OFFSET(counter)			\
+	((offsetof(struct tx_port_stats, counter) +	\
+	  sizeof(struct rx_port_stats) + 512) / 8)
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 static inline void bnxt_enable_poll(struct bnxt_napi *bnapi)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index f391b47..fdf2d8c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -347,8 +347,10 @@ static int bnxt_dcbnl_ieee_setets(struct net_device *dev, struct ieee_ets *ets)
 static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc)
 {
 	struct bnxt *bp = netdev_priv(dev);
+	__le64 *stats = (__le64 *)bp->hw_rx_port_stats;
 	struct ieee_pfc *my_pfc = bp->ieee_pfc;
-	int rc;
+	long rx_off, tx_off;
+	int i, rc;
 
 	pfc->pfc_cap = bp->max_lltc;
 
@@ -369,6 +371,16 @@ static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc)
 	pfc->mbc = my_pfc->mbc;
 	pfc->delay = my_pfc->delay;
 
+	if (!stats)
+		return 0;
+
+	rx_off = BNXT_RX_STATS_OFFSET(rx_pfc_ena_frames_pri0);
+	tx_off = BNXT_TX_STATS_OFFSET(tx_pfc_ena_frames_pri0);
+	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++, rx_off++, tx_off++) {
+		pfc->requests[i] = le64_to_cpu(*(stats + tx_off));
+		pfc->indications[i] = le64_to_cpu(*(stats + rx_off));
+	}
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index fa6125e..784aa77 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -107,16 +107,9 @@ static int bnxt_set_coalesce(struct net_device *dev,
 
 #define BNXT_NUM_STATS	21
 
-#define BNXT_RX_STATS_OFFSET(counter)	\
-	(offsetof(struct rx_port_stats, counter) / 8)
-
 #define BNXT_RX_STATS_ENTRY(counter)	\
 	{ BNXT_RX_STATS_OFFSET(counter), __stringify(counter) }
 
-#define BNXT_TX_STATS_OFFSET(counter)			\
-	((offsetof(struct tx_port_stats, counter) +	\
-	  sizeof(struct rx_port_stats) + 512) / 8)
-
 #define BNXT_TX_STATS_ENTRY(counter)	\
 	{ BNXT_TX_STATS_OFFSET(counter), __stringify(counter) }
 
@@ -150,6 +143,14 @@ static int bnxt_set_coalesce(struct net_device *dev,
 	BNXT_RX_STATS_ENTRY(rx_tagged_frames),
 	BNXT_RX_STATS_ENTRY(rx_double_tagged_frames),
 	BNXT_RX_STATS_ENTRY(rx_good_frames),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri0),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri1),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri2),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri3),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri4),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri5),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri6),
+	BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri7),
 	BNXT_RX_STATS_ENTRY(rx_undrsz_frames),
 	BNXT_RX_STATS_ENTRY(rx_eee_lpi_events),
 	BNXT_RX_STATS_ENTRY(rx_eee_lpi_duration),
@@ -179,6 +180,14 @@ static int bnxt_set_coalesce(struct net_device *dev,
 	BNXT_TX_STATS_ENTRY(tx_fcs_err_frames),
 	BNXT_TX_STATS_ENTRY(tx_err),
 	BNXT_TX_STATS_ENTRY(tx_fifo_underruns),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri0),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri1),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri2),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri3),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri4),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri5),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri6),
+	BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri7),
 	BNXT_TX_STATS_ENTRY(tx_eee_lpi_events),
 	BNXT_TX_STATS_ENTRY(tx_eee_lpi_duration),
 	BNXT_TX_STATS_ENTRY(tx_total_collisions),
-- 
1.8.3.1

^ permalink raw reply related

* Re: Large performance regression with 6in4 tunnel (sit)
From: Stephen Rothwell @ 2016-11-27  0:54 UTC (permalink / raw)
  To: Eli Cooper; +Cc: netdev
In-Reply-To: <a4a7b888-04d1-55ce-28de-cc15f01a15fc@gmx.com>

Hi Eli,

On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooper <elicooper@gmx.com> wrote:
>
> I think this is similar to the bug I fixed in commit ae148b085876
> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
> 
> I can reproduce a similar problem by applying xfrm to sit traffic.
> TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput
> drops to 10s of Kbps. I am not sure if this is the same issue you
> experienced, but I wrote a patch that fixed at least the issue I had.
> 
> Could you test the patch I sent to the mailing list just now?

Thanks for the patch!

Its a bit tricky to test since the problem only occurs in a production
machine (I tried reproducing in a VM, but the problem did not occur),
but I will try to just rebuild the sit module and see if I can insert
the modified one.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
From: Lino Sanfilippo @ 2016-11-27  1:02 UTC (permalink / raw)
  To: Rami Rosen
  Cc: devel, andrew, gregkh, linux-kernel@vger.kernel.org, liodot,
	Netdev, David Miller
In-Reply-To: <CAKoUArncMYtQZQZ62efWsXr25RZPiAx4USJFjf2emzVRkhELjg@mail.gmail.com>


Hi Rami,


On 26.11.2016 16:48, Rami Rosen wrote:
>> @@ -0,0 +1,28 @@
>> +config NET_VENDOR_ALACRITECH
>> +        bool "Alacritech devices"
>> +        default y
>> +        ---help---
>> +          If you have a network (Ethernet) card belonging to this class, say Y.
>> +
>> +          Note that the answer to this question doesn't directly affect the
>> +          kernel: saying N will just cause the configurator to skip all
> 
> Shouldn't it be "Alacritech devices" here, as appears earlier ?
> 
>> +          the questions about Renesas devices. If you say Y, you will be asked

Yes, it definitely should not be Renesas :). This is a stupid copy and paste error, I will fix it,
thank you! 

>> +          for your specific device in the following questions.
>> +
> 
> ...
> ...
> ...
>> +struct slic_device {
>> +       struct pci_dev *pdev;
> ...
>> +       bool promisc;
> 
> Seems that the autoneg boolean is not used anywhere, apart from
> setting it once to true in
> the slic_set_link_autoneg() method. Apart from this member it is not
> accessed anywhere, so it seems it should be removed.
> 
>> +       bool autoneg;
>> +       int speed;

Agreed, this variable can be removed.

> ...
> 
>> +static int slic_load_rcvseq_firmware(struct slic_device *sdev)
>> +{
>> +       const struct firmware *fw;
>> +       const char *file;
>> +       u32 codelen;
>> +       int idx = 0;
>> +       u32 instr;
>> +       u32 addr;
>> +       int err;
>> +
> ...
>> +       /* Do an initial sanity check concerning firmware size now. A further
>> +        * check follows below.
>> +        */
>> +       if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
>> +               dev_err(&sdev->pdev->dev,
>> +                       "invalid firmware size %zu (min %u expected)\n",
>> +                       fw->size, SLIC_FIRMWARE_MIN_SIZE);
>> +               err = -EINVAL;
> 
> in the release label, always 0 is returned:
> 
>> +               goto release;
>> +       }
>> +
>> +       codelen = slic_read_dword_from_firmware(fw, &idx);
>> +
>> +       /* do another sanity check against firmware size */
>> +       if ((codelen + 4) > fw->size) {
>> +               dev_err(&sdev->pdev->dev,
>> +                       "invalid rcv-sequencer firmware size %zu\n", fw->size);
>> +               err = -EINVAL;
> 
> Again, in the release label, always 0 is returned:
> 
>> +               goto release;
>> +       }
>> +
>>
>> +release:
>> +       release_firmware(fw);
>> +
>> +       return 0;
>> +}

This should return "err", I will fix it.

Thanks a lot for the review!

Regards,
Lino

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Cong Wang @ 2016-11-27  1:11 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Johannes Berg, Florian Westphal, Eric Dumazet,
	Herbert Xu, netdev, LKML, syzkaller
In-Reply-To: <CACT4Y+aG1+91U1PWMTwpE_6vbEuqG7CdLCM1H=3WVJWtz=F7Ug@mail.gmail.com>

On Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> The following program triggers GPF in sock_sndtimeo:
> https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88002a0d0840 task.stack: ffff880036920000
> RIP: 0010:[<ffffffff86cb35e1>]  [<     inline     >] sock_sndtimeo
> include/net/sock.h:2075
> RIP: 0010:[<ffffffff86cb35e1>]  [<ffffffff86cb35e1>]
> netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232
> RSP: 0018:ffff880036926f68  EFLAGS: 00010202
> RAX: 0000000000000068 RBX: ffff880036927000 RCX: ffffc900021d0000
> RDX: 0000000000000d63 RSI: 00000000024000c0 RDI: 0000000000000340
> RBP: ffff880036927028 R08: ffffed0006ea7aab R09: ffffed0006ea7aab
> R10: 0000000000000001 R11: ffffed0006ea7aaa R12: dffffc0000000000
> R13: 0000000000000000 R14: ffff880035de3400 R15: ffff880035de3400
> FS:  00007f90a2fc7700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000006de0c0 CR3: 0000000035de6000 CR4: 00000000000006e0
> Stack:
>  ffff880035de3400 ffffffff819f02a1 1ffff10006d24df4 0000000000000004
>  00004db400000014 ffff880036926fd8 ffffffff00000000 0000000041b58ab3
>  ffffffff89653c11 ffffffff86cb3500 ffffffff819f0345 ffff880035de3400
> Call Trace:
>  [<     inline     >] audit_replace kernel/audit.c:817
>  [<ffffffff816c34b9>] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894
>  [<     inline     >] audit_receive_skb kernel/audit.c:1120
>  [<ffffffff816c40ac>] audit_receive+0x1dc/0x360 kernel/audit.c:1133
>  [<     inline     >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>  [<ffffffff86cb3a14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1240
>  [<ffffffff86cb46d4>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1786
>  [<     inline     >] sock_sendmsg_nosec net/socket.c:621
>  [<ffffffff86a6d54f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [<ffffffff86a6d8bb>] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [<     inline     >] new_sync_write fs/read_write.c:499
>  [<ffffffff81a6f24e>] __vfs_write+0x4fe/0x830 fs/read_write.c:512
>  [<ffffffff81a70cf5>] vfs_write+0x175/0x4e0 fs/read_write.c:560
>  [<     inline     >] SYSC_write fs/read_write.c:607
>  [<ffffffff81a75180>] SyS_write+0x100/0x240 fs/read_write.c:599
>  [<ffffffff81009a24>] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
>  [<ffffffff88149e8d>] entry_SYSCALL64_slow_path+0x25/0x25
> Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85
> c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73
> RIP  [<     inline     >] sock_sndtimeo include/net/sock.h:2075
> RIP  [<ffffffff86cb35e1>] netlink_unicast+0xe1/0x730
> net/netlink/af_netlink.c:1232
>  RSP <ffff880036926f68>
> ---[ end trace 8383a15fba6fdc59 ]---

It is racy on audit_sock, especially on the netns exit path.

Could the following patch help a little bit? Also, I don't see how the
synchronize_net() here could sync with netlink rcv path, since unlike
packets from wire, netlink messages are not handled in BH context
nor I see any RCU taken on rcv path.

diff --git a/kernel/audit.c b/kernel/audit.c
index f1ca116..20bc79e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1167,10 +1167,13 @@ static void __net_exit audit_net_exit(struct net *net)
 {
        struct audit_net *aunet = net_generic(net, audit_net_id);
        struct sock *sock = aunet->nlsk;
+
+       mutex_lock(&audit_cmd_mutex);
        if (sock == audit_sock) {
                audit_pid = 0;
                audit_sock = NULL;
        }
+       mutex_unlock(&audit_cmd_mutex);

        RCU_INIT_POINTER(aunet->nlsk, NULL);
        synchronize_net();

^ permalink raw reply related

* Re: Large performance regression with 6in4 tunnel (sit)
From: Stephen Rothwell @ 2016-11-27  2:02 UTC (permalink / raw)
  To: Eli Cooper; +Cc: netdev
In-Reply-To: <20161127115441.7654a4e0@canb.auug.org.au>

Hi Eli,

On Sun, 27 Nov 2016 11:54:41 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooper <elicooper@gmx.com> wrote:
> >
> > I think this is similar to the bug I fixed in commit ae148b085876
> > ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
> > 
> > I can reproduce a similar problem by applying xfrm to sit traffic.
> > TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput
> > drops to 10s of Kbps. I am not sure if this is the same issue you
> > experienced, but I wrote a patch that fixed at least the issue I had.
> > 
> > Could you test the patch I sent to the mailing list just now?  
> 
> Thanks for the patch!
> 
> Its a bit tricky to test since the problem only occurs in a production
> machine (I tried reproducing in a VM, but the problem did not occur),
> but I will try to just rebuild the sit module and see if I can insert
> the modified one.

OK, I tried your patch and unfortunately, it doesn't seem to have
worked ... I still get the large packets dropped and resent smaller.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
From: Stephen Rothwell @ 2016-11-27  2:04 UTC (permalink / raw)
  To: Eli Cooper; +Cc: netdev, David S . Miller
In-Reply-To: <20161125055017.7053-1-elicooper@gmx.com>

Hi Eli,

[Just for Dave's information]

On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper <elicooper@gmx.com> wrote:
>
> Similar to commit ae148b085876
> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
> might not be properly segmented, which causes the packets being dropped.
> 
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Tested-by: Eli Cooper <elicooper@gmx.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Eli Cooper <elicooper@gmx.com>

I tested this patch and it does *not* solve my problem.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [PATCH net-next 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: kbuild test robot @ 2016-11-27  2:06 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: kbuild-all, davem, soheil, francisyyan, netdev, ncardwell,
	edumazet, Yuchung Cheng
In-Reply-To: <1480191016-73210-7-git-send-email-ycheng@google.com>

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Hi Francis,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Yuchung-Cheng/tcp-sender-chronographs-instrumentation/20161127-041428
config: arm-spear6xx_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `__skb_tstamp_tx':
>> net/core/skbuff.c:3846: undefined reference to `tcp_get_timestamping_opt_stats'

vim +3846 net/core/skbuff.c

  3840			return;
  3841	
  3842		if (tsonly) {
  3843			if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
  3844			    sk->sk_protocol == IPPROTO_TCP &&
  3845			    sk->sk_type == SOCK_STREAM)
> 3846				skb = tcp_get_timestamping_opt_stats(sk);
  3847			else
  3848				skb = alloc_skb(0, GFP_ATOMIC);
  3849		} else {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14586 bytes --]

^ permalink raw reply

* Re: Crash due to mutex genl_lock called from RCU context
From: Cong Wang @ 2016-11-27  2:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: subashab, Thomas Graf, Linux Kernel Network Developers
In-Reply-To: <1480136078.8455.589.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Oh well, this wont work, since sk->sk_destruct will be called from RCU
> callback.
>
> Grabbing the mutex should not be done from netlink_sock_destruct() but
> from netlink_release()

But you also change the behavior of cb.done(), currently it is called when the
last sock ref is gone, with your patch it is called when the first
sock is closed.
No?

I don't see why we need to get genl_lock in ->done() here, because we are
already the last sock using it and module ref protects the ops from being
removed via module, seems we are pretty safe without any lock.

^ permalink raw reply

* Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: Eric Dumazet @ 2016-11-27  2:16 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David Miller, netdev, Tariq Toukan
In-Reply-To: <CALzJLG8nFBFy=-pjy-XBYb-y-F=grC5wjN35zTt9eUOT-612Eg@mail.gmail.com>

On Sun, 2016-11-27 at 00:47 +0200, Saeed Mahameed wrote:
> On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> As you see here in SRIOV mode (PF only) reads   sw stats from FW.
> Tariq, I think we need to fix this.

Sure, my patch does not change this at all.

If mlx4_is_master() is false, then we aggregate the software states and
only the software stats.

My patch makes this aggregation possible at the time ethtool or
ndo_get_stat64() are called, since this absolutely not depend on the 250
ms timer fetching hardware stats.

^ permalink raw reply

* Re: Crash due to mutex genl_lock called from RCU context
From: Eric Dumazet @ 2016-11-27  2:26 UTC (permalink / raw)
  To: Cong Wang; +Cc: subashab, Thomas Graf, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpU+7arMOHSwDhE5npH_82R9LTNrmny6jK1H-fkW2+9Jcg@mail.gmail.com>

On Sat, 2016-11-26 at 18:08 -0800, Cong Wang wrote:
> On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Oh well, this wont work, since sk->sk_destruct will be called from RCU
> > callback.
> >
> > Grabbing the mutex should not be done from netlink_sock_destruct() but
> > from netlink_release()
> 
> But you also change the behavior of cb.done(), currently it is called when the
> last sock ref is gone, with your patch it is called when the first
> sock is closed.

No. It will be called when last refcount on the socket is released,
sk_refcnt transitions to final 0.


My patch changes the sock_hold() to the variant that makes sure
sk_refcnt is not 0 before increase, otherwise a race can happen and
release could be called twice.

Classic refcounting stuff coupled to rcu rules.

> No?


Are you telling me inet_release() is called when we close() the first
file descriptor ?

fd1 = socket()
fd2 = dup(fd1);
close(fd2) -> release() ???


> 
> I don't see why we need to get genl_lock in ->done() here, because we are
> already the last sock using it and module ref protects the ops from being
> removed via module, seems we are pretty safe without any lock.

Well, at least this exposes a real bug in Thomas patch.

Removing the lock might be done for net-next, not stable branches.

^ permalink raw reply

* ip manpage comments
From: Jon LaBadie @ 2016-11-27  2:49 UTC (permalink / raw)
  To: netdev

Though not new to *nix, I am new to using the ip(8) command.
Thus some of my historical assumptions about ip may be wrong.

It seems that an inclusive manpage for the ip command was
broken up into a shorter ip(8) manpage and 15 or more
ip-<subcommand>(8) manpages.  I'm basing this assumption
on long, inclusive manpages on https://linux.die.net/man/8/ip
and CentOS 6 while CentOS 7 and Fedora 24 each have the
sub-divided style.

I won't debate the wisdom of this subdivision, only comment
on how it was done.

The ip(8) manpage make no mention of additional subordinate
documents.  The listing of the additional documents in the
See Also section is insufficient.  This section is typically
used to mention related commands and other sources of reference
materials such as info docs, wikis, blogs, or mailing lists.

When one does investigate one of the subordinate manpages,
they do not state that they document subcommands of the
ip command.  In fact, on the ip-address(8) manpage it says

  The `ip address command' ...   (quotes added)

My first thought was "typo", this is the manpage about the
"ip-address" command.  Of course there is no ip-address command.
But "ip address" is not a command either, it is the "ip" command
with an argument.

There are several commands that have broken their manpage into
several manpages.  Two which come to mind are zsh(1) and perl(1).
The authors of those pages clearly state on the primary manpage
that this is an overview page and give clear pointers to the
additional manpages as well as additional documentation.  I would
recommend reorganizing the ip(8) manpage in a similar fashion.

Thank you for consideration of my opinion and for the development
of an awesome tool.

Jon
-- 
Jon H. LaBadie                  jonfu@jgcomp.com

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Stephen Rothwell @ 2016-11-27  3:23 UTC (permalink / raw)
  To: Sven-Haegar Koch; +Cc: Eli Cooper, netdev, Eric Dumazet
In-Reply-To: <alpine.DEB.2.20.1611250503250.22094@aurora.sdinet.de>

Hi Sven-Haegar,

On Fri, 25 Nov 2016 05:06:53 +0100 (CET) Sven-Haegar Koch <haegar@sdinet.de> wrote:
>
> Somehow this problem description really reminds me of a report on 
> netdev a bit ago, which the following patch fixed:
> 
> commit 9ee6c5dc816aa8256257f2cd4008a9291ec7e985
> Author: Lance Richardson <lrichard@redhat.com>
> Date:   Wed Nov 2 16:36:17 2016 -0400
> 
>     ipv4: allow local fragmentation in ip_finish_output_gso()
>     
>     Some configurations (e.g. geneve interface with default
>     MTU of 1500 over an ethernet interface with 1500 MTU) result
>     in the transmission of packets that exceed the configured MTU.
>     While this should be considered to be a "bad" configuration,
>     it is still allowed and should not result in the sending
>     of packets that exceed the configured MTU.
> 
> Could this be related?
> 
> I suppose it would be difficult to test this patch on this machine?

The kernel I am running on is based on 4.7.8, so the above patch
doesn't come close to applying. Most fo what it is reverting was
introduced in commit 359ebda25aa0 ("net/ipv4: Introduce IPSKB_FRAG_SEGS
bit to inet_skb_parm.flags") in v4.8-rc1.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: Roi Dayan @ 2016-11-27  4:47 UTC (permalink / raw)
  To: Daniel Borkmann, Cong Wang
  Cc: roid, Linux Kernel Network Developers, Jiri Pirko, John Fastabend
In-Reply-To: <583A29E3.8030809@iogearbox.net>



On 27/11/2016 02:33, Daniel Borkmann wrote:
> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
>> On 11/26/2016 07:46 AM, Cong Wang wrote:
>>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann 
>>> <daniel@iogearbox.net> wrote:
> [...]
>>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>>> Outstanding readers should either bail out due to if (!cl) or can 
>>>> still
>>>> process the chain until read section ends, but during that time, cl->q
>>>> resp. bstats should be good. Do you happen to know what's at address
>>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
>>>> at least on ingress (netif_receive_skb_internal()) we hold 
>>>> rcu_read_lock()
>>>> here. The KASAN report is reliably happening at this location, right?
>>>
>>> I am confused as well, I don't see how it could be related to my 
>>> patch yet.
>>> I will take a deep look in the weekend.



Hi Cong,

When reported the new trace I didn't mean it's related to your patch, I 
just wanted to point it out it exposed something. I should have been 
clear about it.


>>
>> Ok, I'm currently on the run. Got too late yesterday night, but I'll
>> write what I found in the evening today, not related to ingress though.
>
> Just pushed out my analysis to netdev under "[PATCH net] net, sched: 
> respect
> rcu grace period on cls destruction". My conclusion is that both 
> issues are
> actually separate, and that one is small enough where we could route 
> it via
> net actually. Perhaps this at the same time shrinks your "[PATCH 
> net-next]
> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
> reasonable size that it's suitable to net as well. Your 
> ->delete()/->destroy()
> one is definitely needed, too. The tp->root one is independant of 
> ->delete()/
> ->destroy() as they are different races and tp->root could also happen 
> when
> you just destroy the whole tp directly. I think that seems like a good 
> path
> forward to me.
>
> Thanks,
> Daniel



Hi Daniel,

As for the tainted kernel. I was in old (week or two) net-next tree and 
only cherry-picked from latest net-next related patches to Mellanox HCA, 
cls_api, cls_flower, devlink. so those are the tainted modules.
I have the issue reproducing in that tree so wanted it to check it with 
Cong's patch instead of latest net-next.
I'll try running reproducing the issue with your new patch and later try 
latest net-next as well.

Thanks,
Roi

^ permalink raw reply

* Re: [PATCH 1/2] Documentation: devicetree: clarify usage of the RGMII phy-modes
From: Florian Fainelli @ 2016-11-27  5:53 UTC (permalink / raw)
  To: Martin Blumenstingl, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	sean.wang-NuS5LvNUpcJWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	jbrunet-rdvid1DuHRBWk0Htik3J/w
In-Reply-To: <20161125131201.19994-2-martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>



On 11/25/2016 05:12 AM, Martin Blumenstingl wrote:
> RGMII requires special RX and/or TX delays depending on the actual
> hardware circuit/wiring. These delays can be added by the MAC, the PHY
> or the designer of the circuit (the latter means that no delay has to
> be added by PHY or MAC).
> There are 4 RGMII phy-modes used describe where a delay should be
> applied:
> - rgmii: the RX and TX delays are either added by the MAC (where the
>   exact delay is typically configurable, and can be turned off when no
>   extra delay is needed) or not needed at all (because the hardware
>   wiring adds the delay already). The PHY should neither add the RX nor
>   TX delay in this case.
> - rgmii-rxid: configures the PHY to enable the RX delay. The MAC should
>   not add the RX delay in this case.
> - rgmii-txid: configures the PHY to enable the TX delay. The MAC should
>   not add the TX delay in this case.
> - rgmii-id: combines rgmii-rxid and rgmii-txid and thus configures the
>   PHY to enable the RX and TX delays. The MAC should neither add the RX
>   nor TX delay in this case.
> 
> Document these cases in the ethernet.txt documentation to make it clear
> when to use each mode.
> If applied incorrectly one might end up with MAC and PHY both enabling
> for example the TX delay, which breaks ethernet TX traffic on 1000Mbit/s
> links.
> 
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>

Reviewed-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] net: phy: realtek: fix enabling of the TX-delay for RTL8211F
From: Florian Fainelli @ 2016-11-27  5:55 UTC (permalink / raw)
  To: Martin Blumenstingl, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	sean.wang-NuS5LvNUpcJWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	jbrunet-rdvid1DuHRBWk0Htik3J/w
In-Reply-To: <20161125131201.19994-3-martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>



On 11/25/2016 05:12 AM, Martin Blumenstingl wrote:
> The old logic always enabled the TX-delay when the phy-mode was set to
> PHY_INTERFACE_MODE_RGMII. There are dedicated phy-modes which tell the
> PHY driver to enable the RX and/or TX delays:
> - PHY_INTERFACE_MODE_RGMII should disable the RX and TX delay in the
>   PHY (if required, the MAC should add the delays in this case)
> - PHY_INTERFACE_MODE_RGMII_ID should enable RX and TX delay in the PHY
> - PHY_INTERFACE_MODE_RGMII_TXID should enable the TX delay in the PHY
> - PHY_INTERFACE_MODE_RGMII_RXID should enable the RX delay in the PHY
>   (currently not supported by RTL8211F)
> 
> With this patch we enable the TX delay for PHY_INTERFACE_MODE_RGMII_ID
> and PHY_INTERFACE_MODE_RGMII_TXID.
> Additionally we now explicity disable the TX-delay, which seems to be
> enabled automatically after a hard-reset of the PHY (by triggering it's
> reset pin) to get a consistent state (as defined by the phy-mode).
> 
> This fixes a compatibility problem with some SoCs where the TX-delay was
> also added by the MAC. With the TX-delay being applied twice the TX
> clock was off and TX traffic was broken or very slow (<10Mbit/s) on
> 1000Mbit/s links.
> 
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>

Reviewed-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox