* Re: [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore
From: Jakub Kicinski @ 2019-08-22 22:15 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko
In-Reply-To: <20190822124353.16902-2-vladbu@mellanox.com>
On Thu, 22 Aug 2019 15:43:44 +0300, Vlad Buslov wrote:
> @@ -2987,19 +3007,26 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> int ok_count = 0;
> int err;
>
> + down_read(&block->cb_lock);
> /* Make sure all netdevs sharing this block are offload-capable. */
> - if (block->nooffloaddevcnt && err_stop)
> - return -EOPNOTSUPP;
> + if (block->nooffloaddevcnt && err_stop) {
> + ok_count = -EOPNOTSUPP;
> + goto errout;
> + }
>
> list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
> err = block_cb->cb(type, type_data, block_cb->cb_priv);
> if (err) {
> - if (err_stop)
> - return err;
> + if (err_stop) {
> + ok_count = err;
> + goto errout;
> + }
> } else {
> ok_count++;
> }
> }
> +errout:
Please name the labels with the first action they perform. Here:
err_unlock:
> + up_read(&block->cb_lock);
> return ok_count;
^ permalink raw reply
* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Nikolay Aleksandrov @ 2019-08-22 22:26 UTC (permalink / raw)
To: Horatiu Vultur, roopa, davem, UNGLinuxDriver, alexandre.belloni,
allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <1e16da88-08c5-abd5-0a3e-b8e6c3db134a@cumulusnetworks.com>
On 8/23/19 1:09 AM, Nikolay Aleksandrov wrote:
> On 22/08/2019 22:07, Horatiu Vultur wrote:
>> Current implementation of the SW bridge is setting the interfaces in
>> promisc mode when they are added to bridge if learning of the frames is
>> enabled.
>> In case of Ocelot which has HW capabilities to switch frames, it is not
>> needed to set the ports in promisc mode because the HW already capable of
>> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
>> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
>> the ports in promisc mode to do the switching.
>> This optimization takes places only if all the interfaces that are part
>> of the bridge have this flag and have the same network driver.
>>
>> If the bridge interfaces is added in promisc mode then also the ports part
>> of the bridge are set in promisc mode.
>>
>> Horatiu Vultur (3):
>> net: Add HW_BRIDGE offload feature
>> net: mscc: Use NETIF_F_HW_BRIDGE
>> net: mscc: Implement promisc mode.
>>
>> drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
>> include/linux/netdev_features.h | 3 +++
>> net/bridge/br_if.c | 29 ++++++++++++++++++++++++++++-
>> net/core/ethtool.c | 1 +
>> 4 files changed, 56 insertions(+), 3 deletions(-)
>>
>
Just to clarify:
> IMO the name is misleading.
- that's not mandatory or anything, just saying people might get confused when they see it
> Why do the devices have to be from the same driver ? This is too specific targeting some
> devices. The bridge should not care what's the port device, it should be the other way
That was mostly a rhetorical question, it's obvious why but please add an explanation
at least in the commit message and please fix the typos in the comment. Also HW
is capable of doing switching, this needs some clarification that the whole process
stays in HW IIUC. More details here would be great.
> around, so adding device-specific code to the bridge is not ok. Isn't there a solution
> where you can use NETDEV_JOIN and handle it all from your driver ?
> Would all HW-learned entries be hidden from user-space in this case ?
>
I.e. isn't there a way to do this without introducing a new feature flag ?
^ permalink raw reply
* [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e HV VHCA stats
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
This patch set adds paravirtual backchannel in software in pci_hyperv,
which is required by the mlx5e driver HV VHCA stats agent.
The stats agent is responsible on running a periodic rx/tx packets/bytes
stats update.
Dexuan Cui (1):
PCI: hv: Add a paravirtual backchannel in software
Eran Ben Elisha (4):
net/mlx5: Add wrappers for HyperV PCIe operations
net/mlx5: Add HV VHCA infrastructure
net/mlx5: Add HV VHCA control agent
net/mlx5e: Add mlx5e HV VHCA stats agent
Haiyang Zhang (1):
PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface
MAINTAINERS | 1 +
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 13 +
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 162 +++++++++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h | 25 ++
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 64 ++++
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h | 22 ++
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 371 +++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 104 ++++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +
drivers/pci/Kconfig | 1 +
drivers/pci/controller/Kconfig | 7 +
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pci-hyperv-intf.c | 67 ++++
drivers/pci/controller/pci-hyperv.c | 308 +++++++++++++++++
include/linux/hyperv.h | 29 ++
include/linux/mlx5/driver.h | 2 +
18 files changed, 1189 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
create mode 100644 drivers/pci/controller/pci-hyperv-intf.c
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next,v5, 2/6] PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
This interface driver is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
MAINTAINERS | 1 +
drivers/pci/Kconfig | 1 +
drivers/pci/controller/Kconfig | 7 ++++
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pci-hyperv-intf.c | 67 ++++++++++++++++++++++++++++++++
drivers/pci/controller/pci-hyperv.c | 12 ++++--
include/linux/hyperv.h | 30 ++++++++++----
7 files changed, 108 insertions(+), 11 deletions(-)
create mode 100644 drivers/pci/controller/pci-hyperv-intf.c
diff --git a/MAINTAINERS b/MAINTAINERS
index a406947..9860853 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7469,6 +7469,7 @@ F: drivers/hid/hid-hyperv.c
F: drivers/hv/
F: drivers/input/serio/hyperv-keyboard.c
F: drivers/pci/controller/pci-hyperv.c
+F: drivers/pci/controller/pci-hyperv-intf.c
F: drivers/net/hyperv/
F: drivers/scsi/storvsc_drv.c
F: drivers/uio/uio_hv_generic.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 2ab9240..c313de9 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -182,6 +182,7 @@ config PCI_LABEL
config PCI_HYPERV
tristate "Hyper-V PCI Frontend"
depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ select PCI_HYPERV_INTERFACE
help
The PCI device frontend driver allows the kernel to import arbitrary
PCI devices from a PCI backend to support PCI driver domains.
diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index fe9f9f1..70e0782 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -281,5 +281,12 @@ config VMD
To compile this driver as a module, choose M here: the
module will be called vmd.
+config PCI_HYPERV_INTERFACE
+ tristate "Hyper-V PCI Interface"
+ depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ help
+ The Hyper-V PCI Interface is a helper driver allows other drivers to
+ have a common interface with the Hyper-V PCI frontend driver.
+
source "drivers/pci/controller/dwc/Kconfig"
endmenu
diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
index d56a507..a2a22c9 100644
--- a/drivers/pci/controller/Makefile
+++ b/drivers/pci/controller/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_PCIE_CADENCE_HOST) += pcie-cadence-host.o
obj-$(CONFIG_PCIE_CADENCE_EP) += pcie-cadence-ep.o
obj-$(CONFIG_PCI_FTPCI100) += pci-ftpci100.o
obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
+obj-$(CONFIG_PCI_HYPERV_INTERFACE) += pci-hyperv-intf.o
obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
obj-$(CONFIG_PCI_AARDVARK) += pci-aardvark.o
obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
diff --git a/drivers/pci/controller/pci-hyperv-intf.c b/drivers/pci/controller/pci-hyperv-intf.c
new file mode 100644
index 0000000..cc96be4
--- /dev/null
+++ b/drivers/pci/controller/pci-hyperv-intf.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) Microsoft Corporation.
+ *
+ * Author:
+ * Haiyang Zhang <haiyangz@microsoft.com>
+ *
+ * This small module is a helper driver allows other drivers to
+ * have a common interface with the Hyper-V PCI frontend driver.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/hyperv.h>
+
+struct hyperv_pci_block_ops hvpci_block_ops;
+EXPORT_SYMBOL_GPL(hvpci_block_ops);
+
+int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned)
+{
+ if (!hvpci_block_ops.read_block)
+ return -EOPNOTSUPP;
+
+ return hvpci_block_ops.read_block(dev, buf, buf_len, block_id,
+ bytes_returned);
+}
+EXPORT_SYMBOL_GPL(hyperv_read_cfg_blk);
+
+int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id)
+{
+ if (!hvpci_block_ops.write_block)
+ return -EOPNOTSUPP;
+
+ return hvpci_block_ops.write_block(dev, buf, len, block_id);
+}
+EXPORT_SYMBOL_GPL(hyperv_write_cfg_blk);
+
+int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask))
+{
+ if (!hvpci_block_ops.reg_blk_invalidate)
+ return -EOPNOTSUPP;
+
+ return hvpci_block_ops.reg_blk_invalidate(dev, context,
+ block_invalidate);
+}
+EXPORT_SYMBOL_GPL(hyperv_reg_block_invalidate);
+
+static void __exit exit_hv_pci_intf(void)
+{
+}
+
+static int __init init_hv_pci_intf(void)
+{
+ return 0;
+}
+
+module_init(init_hv_pci_intf);
+module_exit(exit_hv_pci_intf);
+
+MODULE_DESCRIPTION("Hyper-V PCI Interface");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 57adeca..9c93ac2 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -983,7 +983,6 @@ int hv_read_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
*bytes_returned = comp_pkt.bytes_returned;
return 0;
}
-EXPORT_SYMBOL(hv_read_config_block);
/**
* hv_pci_write_config_compl() - Invoked when a response packet for a write
@@ -1070,7 +1069,6 @@ int hv_write_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
return 0;
}
-EXPORT_SYMBOL(hv_write_config_block);
/**
* hv_register_block_invalidate() - Invoked when a config block invalidation
@@ -1101,7 +1099,6 @@ int hv_register_block_invalidate(struct pci_dev *pdev, void *context,
return 0;
}
-EXPORT_SYMBOL(hv_register_block_invalidate);
/* Interrupt management hooks */
static void hv_int_desc_free(struct hv_pci_dev *hpdev,
@@ -3045,10 +3042,19 @@ static int hv_pci_remove(struct hv_device *hdev)
static void __exit exit_hv_pci_drv(void)
{
vmbus_driver_unregister(&hv_pci_drv);
+
+ hvpci_block_ops.read_block = NULL;
+ hvpci_block_ops.write_block = NULL;
+ hvpci_block_ops.reg_blk_invalidate = NULL;
}
static int __init init_hv_pci_drv(void)
{
+ /* Initialize PCI block r/w interface */
+ hvpci_block_ops.read_block = hv_read_config_block;
+ hvpci_block_ops.write_block = hv_write_config_block;
+ hvpci_block_ops.reg_blk_invalidate = hv_register_block_invalidate;
+
return vmbus_driver_register(&hv_pci_drv);
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 9d37f8c..2afe6fd 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1579,18 +1579,32 @@ struct vmpacket_descriptor *
pkt = hv_pkt_iter_next(channel, pkt))
/*
- * Functions for passing data between SR-IOV PF and VF drivers. The VF driver
+ * Interface for passing data between SR-IOV PF and VF drivers. The VF driver
* sends requests to read and write blocks. Each block must be 128 bytes or
* smaller. Optionally, the VF driver can register a callback function which
* will be invoked when the host says that one or more of the first 64 block
* IDs is "invalid" which means that the VF driver should reread them.
*/
#define HV_CONFIG_BLOCK_SIZE_MAX 128
-int hv_read_config_block(struct pci_dev *dev, void *buf, unsigned int buf_len,
- unsigned int block_id, unsigned int *bytes_returned);
-int hv_write_config_block(struct pci_dev *dev, void *buf, unsigned int len,
- unsigned int block_id);
-int hv_register_block_invalidate(struct pci_dev *dev, void *context,
- void (*block_invalidate)(void *context,
- u64 block_mask));
+
+int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned);
+int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id);
+int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
+
+struct hyperv_pci_block_ops {
+ int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned);
+ int (*write_block)(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id);
+ int (*reg_blk_invalidate)(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
+};
+
+extern struct hyperv_pci_block_ops hvpci_block_ops;
+
#endif /* _HYPERV_H */
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v5, 3/6] net/mlx5: Add wrappers for HyperV PCIe operations
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
Add wrapper functions for HyperV PCIe read / write /
block_invalidate_register operations. This will be used as an
infrastructure in the downstream patch for software communication.
This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 1 +
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 64 ++++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h | 22 ++++++++
3 files changed, 87 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index bcf3655..fd32a5b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -45,6 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o eswitch_offlo
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_VXLAN) += lib/vxlan.o
mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
#
# Ipoib netdev
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
new file mode 100644
index 0000000..fb19008
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include <linux/hyperv.h>
+#include "mlx5_core.h"
+#include "lib/hv.h"
+
+static int mlx5_hv_config_common(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset, bool read)
+{
+ int rc = -EOPNOTSUPP;
+ int bytes_returned;
+ int block_id;
+
+ if (offset % HV_CONFIG_BLOCK_SIZE_MAX || len % HV_CONFIG_BLOCK_SIZE_MAX)
+ return -EINVAL;
+
+ block_id = offset / HV_CONFIG_BLOCK_SIZE_MAX;
+
+ rc = read ?
+ hyperv_read_cfg_blk(dev->pdev, buf,
+ HV_CONFIG_BLOCK_SIZE_MAX, block_id,
+ &bytes_returned) :
+ hyperv_write_cfg_blk(dev->pdev, buf,
+ HV_CONFIG_BLOCK_SIZE_MAX, block_id);
+
+ /* Make sure len bytes were read successfully */
+ if (read && !rc && len != bytes_returned)
+ rc = -EIO;
+
+ if (rc) {
+ mlx5_core_err(dev, "Failed to %s hv config, err = %d, len = %d, offset = %d\n",
+ read ? "read" : "write", rc, len,
+ offset);
+ return rc;
+ }
+
+ return 0;
+}
+
+int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset)
+{
+ return mlx5_hv_config_common(dev, buf, len, offset, true);
+}
+
+int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset)
+{
+ return mlx5_hv_config_common(dev, buf, len, offset, false);
+}
+
+int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask))
+{
+ return hyperv_reg_block_invalidate(dev->pdev, context,
+ block_invalidate);
+}
+
+void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev)
+{
+ hyperv_reg_block_invalidate(dev->pdev, NULL, NULL);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
new file mode 100644
index 0000000..f9a4557
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __LIB_HV_H__
+#define __LIB_HV_H__
+
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+
+#include <linux/hyperv.h>
+#include <linux/mlx5/driver.h>
+
+int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset);
+int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset);
+int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
+void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev);
+#endif
+
+#endif /* __LIB_HV_H__ */
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v5, 4/6] net/mlx5: Add HV VHCA infrastructure
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
HV VHCA is a layer which provides PF to VF communication channel based on
HyperV PCI config channel. It implements Mellanox's Inter VHCA control
communication protocol. The protocol contains control block in order to
pass messages between the PF and VF drivers, and data blocks in order to
pass actual data.
The infrastructure is agent based. Each agent will be responsible of
contiguous buffer blocks in the VHCA config space. This infrastructure will
bind agents to their blocks, and those agents can only access read/write
the buffer blocks assigned to them. Each agent will provide three
callbacks (control, invalidate, cleanup). Control will be invoked when
block-0 is invalidated with a command that concerns this agent. Invalidate
callback will be invoked if one of the blocks assigned to this agent was
invalidated. Cleanup will be invoked before the agent is being freed in
order to clean all of its open resources or deferred works.
Block-0 serves as the control block. All execution commands from the PF
will be written by the PF over this block. VF will ack on those by
writing on block-0 as well. Its format is described by struct
mlx5_hv_vhca_control_block layout.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 253 +++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 102 +++++++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +
include/linux/mlx5/driver.h | 2 +
5 files changed, 365 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fd32a5b..8d443fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -45,7 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o eswitch_offlo
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_VXLAN) += lib/vxlan.o
mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
-mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o lib/hv_vhca.o
#
# Ipoib netdev
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
new file mode 100644
index 0000000..84d1d75
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include <linux/hyperv.h>
+#include "mlx5_core.h"
+#include "lib/hv.h"
+#include "lib/hv_vhca.h"
+
+struct mlx5_hv_vhca {
+ struct mlx5_core_dev *dev;
+ struct workqueue_struct *work_queue;
+ struct mlx5_hv_vhca_agent *agents[MLX5_HV_VHCA_AGENT_MAX];
+ struct mutex agents_lock; /* Protect agents array */
+};
+
+struct mlx5_hv_vhca_work {
+ struct work_struct invalidate_work;
+ struct mlx5_hv_vhca *hv_vhca;
+ u64 block_mask;
+};
+
+struct mlx5_hv_vhca_data_block {
+ u16 sequence;
+ u16 offset;
+ u8 reserved[4];
+ u64 data[15];
+};
+
+struct mlx5_hv_vhca_agent {
+ enum mlx5_hv_vhca_agent_type type;
+ struct mlx5_hv_vhca *hv_vhca;
+ void *priv;
+ u16 seq;
+ void (*control)(struct mlx5_hv_vhca_agent *agent,
+ struct mlx5_hv_vhca_control_block *block);
+ void (*invalidate)(struct mlx5_hv_vhca_agent *agent,
+ u64 block_mask);
+ void (*cleanup)(struct mlx5_hv_vhca_agent *agent);
+};
+
+struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
+{
+ struct mlx5_hv_vhca *hv_vhca = NULL;
+
+ hv_vhca = kzalloc(sizeof(*hv_vhca), GFP_KERNEL);
+ if (!hv_vhca)
+ return ERR_PTR(-ENOMEM);
+
+ hv_vhca->work_queue = create_singlethread_workqueue("mlx5_hv_vhca");
+ if (!hv_vhca->work_queue) {
+ kfree(hv_vhca);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ hv_vhca->dev = dev;
+ mutex_init(&hv_vhca->agents_lock);
+
+ return hv_vhca;
+}
+
+void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
+{
+ if (IS_ERR_OR_NULL(hv_vhca))
+ return;
+
+ destroy_workqueue(hv_vhca->work_queue);
+ kfree(hv_vhca);
+}
+
+static void mlx5_hv_vhca_invalidate_work(struct work_struct *work)
+{
+ struct mlx5_hv_vhca_work *hwork;
+ struct mlx5_hv_vhca *hv_vhca;
+ int i;
+
+ hwork = container_of(work, struct mlx5_hv_vhca_work, invalidate_work);
+ hv_vhca = hwork->hv_vhca;
+
+ mutex_lock(&hv_vhca->agents_lock);
+ for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
+ struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
+
+ if (!agent || !agent->invalidate)
+ continue;
+
+ if (!(BIT(agent->type) & hwork->block_mask))
+ continue;
+
+ agent->invalidate(agent, hwork->block_mask);
+ }
+ mutex_unlock(&hv_vhca->agents_lock);
+
+ kfree(hwork);
+}
+
+void mlx5_hv_vhca_invalidate(void *context, u64 block_mask)
+{
+ struct mlx5_hv_vhca *hv_vhca = (struct mlx5_hv_vhca *)context;
+ struct mlx5_hv_vhca_work *work;
+
+ work = kzalloc(sizeof(*work), GFP_ATOMIC);
+ if (!work)
+ return;
+
+ INIT_WORK(&work->invalidate_work, mlx5_hv_vhca_invalidate_work);
+ work->hv_vhca = hv_vhca;
+ work->block_mask = block_mask;
+
+ queue_work(hv_vhca->work_queue, &work->invalidate_work);
+}
+
+int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
+{
+ if (IS_ERR_OR_NULL(hv_vhca))
+ return IS_ERR_OR_NULL(hv_vhca);
+
+ return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
+ mlx5_hv_vhca_invalidate);
+}
+
+void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
+{
+ int i;
+
+ if (IS_ERR_OR_NULL(hv_vhca))
+ return;
+
+ mutex_lock(&hv_vhca->agents_lock);
+ for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++)
+ WARN_ON(hv_vhca->agents[i]);
+
+ mutex_unlock(&hv_vhca->agents_lock);
+
+ mlx5_hv_unregister_invalidate(hv_vhca->dev);
+}
+
+struct mlx5_hv_vhca_agent *
+mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
+ enum mlx5_hv_vhca_agent_type type,
+ void (*control)(struct mlx5_hv_vhca_agent*,
+ struct mlx5_hv_vhca_control_block *block),
+ void (*invalidate)(struct mlx5_hv_vhca_agent*,
+ u64 block_mask),
+ void (*cleaup)(struct mlx5_hv_vhca_agent *agent),
+ void *priv)
+{
+ struct mlx5_hv_vhca_agent *agent;
+
+ if (IS_ERR_OR_NULL(hv_vhca))
+ return ERR_PTR(-ENOMEM);
+
+ if (type >= MLX5_HV_VHCA_AGENT_MAX)
+ return ERR_PTR(-EINVAL);
+
+ mutex_lock(&hv_vhca->agents_lock);
+ if (hv_vhca->agents[type]) {
+ mutex_unlock(&hv_vhca->agents_lock);
+ return ERR_PTR(-EINVAL);
+ }
+ mutex_unlock(&hv_vhca->agents_lock);
+
+ agent = kzalloc(sizeof(*agent), GFP_KERNEL);
+ if (!agent)
+ return ERR_PTR(-ENOMEM);
+
+ agent->type = type;
+ agent->hv_vhca = hv_vhca;
+ agent->priv = priv;
+ agent->control = control;
+ agent->invalidate = invalidate;
+ agent->cleanup = cleaup;
+
+ mutex_lock(&hv_vhca->agents_lock);
+ hv_vhca->agents[type] = agent;
+ mutex_unlock(&hv_vhca->agents_lock);
+
+ return agent;
+}
+
+void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
+{
+ struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
+
+ mutex_lock(&hv_vhca->agents_lock);
+
+ if (WARN_ON(agent != hv_vhca->agents[agent->type])) {
+ mutex_unlock(&hv_vhca->agents_lock);
+ return;
+ }
+
+ hv_vhca->agents[agent->type] = NULL;
+ mutex_unlock(&hv_vhca->agents_lock);
+
+ if (agent->cleanup)
+ agent->cleanup(agent);
+
+ kfree(agent);
+}
+
+static int mlx5_hv_vhca_data_block_prepare(struct mlx5_hv_vhca_agent *agent,
+ struct mlx5_hv_vhca_data_block *data_block,
+ void *src, int len, int *offset)
+{
+ int bytes = min_t(int, (int)sizeof(data_block->data), len);
+
+ data_block->sequence = agent->seq;
+ data_block->offset = (*offset)++;
+ memcpy(data_block->data, src, bytes);
+
+ return bytes;
+}
+
+static void mlx5_hv_vhca_agent_seq_update(struct mlx5_hv_vhca_agent *agent)
+{
+ agent->seq++;
+}
+
+int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
+ void *buf, int len)
+{
+ int offset = agent->type * HV_CONFIG_BLOCK_SIZE_MAX;
+ int block_offset = 0;
+ int total = 0;
+ int err;
+
+ while (len) {
+ struct mlx5_hv_vhca_data_block data_block = {0};
+ int bytes;
+
+ bytes = mlx5_hv_vhca_data_block_prepare(agent, &data_block,
+ buf + total,
+ len, &block_offset);
+ if (!bytes)
+ return -ENOMEM;
+
+ err = mlx5_hv_write_config(agent->hv_vhca->dev, &data_block,
+ sizeof(data_block), offset);
+ if (err)
+ return err;
+
+ total += bytes;
+ len -= bytes;
+ }
+
+ mlx5_hv_vhca_agent_seq_update(agent);
+
+ return 0;
+}
+
+void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent)
+{
+ return agent->priv;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
new file mode 100644
index 0000000..cdf1303
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __LIB_HV_VHCA_H__
+#define __LIB_HV_VHCA_H__
+
+#include "en.h"
+#include "lib/hv.h"
+
+struct mlx5_hv_vhca_agent;
+struct mlx5_hv_vhca;
+struct mlx5_hv_vhca_control_block;
+
+enum mlx5_hv_vhca_agent_type {
+ MLX5_HV_VHCA_AGENT_MAX = 32,
+};
+
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+
+struct mlx5_hv_vhca_control_block {
+ u32 capabilities;
+ u32 control;
+ u16 command;
+ u16 command_ack;
+ u16 version;
+ u16 rings;
+ u32 reserved1[28];
+};
+
+struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev);
+void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca);
+int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca);
+void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca);
+void mlx5_hv_vhca_invalidate(void *context, u64 block_mask);
+
+struct mlx5_hv_vhca_agent *
+mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
+ enum mlx5_hv_vhca_agent_type type,
+ void (*control)(struct mlx5_hv_vhca_agent*,
+ struct mlx5_hv_vhca_control_block *block),
+ void (*invalidate)(struct mlx5_hv_vhca_agent*,
+ u64 block_mask),
+ void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
+ void *context);
+
+void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent);
+int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
+ void *buf, int len);
+void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent);
+
+#else
+
+static inline struct mlx5_hv_vhca *
+mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
+{
+ return NULL;
+}
+
+static inline void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
+{
+}
+
+static inline int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
+{
+ return 0;
+}
+
+static inline void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
+{
+}
+
+static inline void mlx5_hv_vhca_invalidate(void *context,
+ u64 block_mask)
+{
+}
+
+static inline struct mlx5_hv_vhca_agent *
+mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
+ enum mlx5_hv_vhca_agent_type type,
+ void (*control)(struct mlx5_hv_vhca_agent*,
+ struct mlx5_hv_vhca_control_block *block),
+ void (*invalidate)(struct mlx5_hv_vhca_agent*,
+ u64 block_mask),
+ void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
+ void *context)
+{
+ return NULL;
+}
+
+static inline void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
+{
+}
+
+static inline int
+mlx5_hv_vhca_write_agent(struct mlx5_hv_vhca_agent *agent,
+ void *buf, int len)
+{
+ return 0;
+}
+#endif
+
+#endif /* __LIB_HV_VHCA_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0b70b1d..61388ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -69,6 +69,7 @@
#include "lib/pci_vsc.h"
#include "diag/fw_tracer.h"
#include "ecpf.h"
+#include "lib/hv_vhca.h"
MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
@@ -870,6 +871,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
}
dev->tracer = mlx5_fw_tracer_create(dev);
+ dev->hv_vhca = mlx5_hv_vhca_create(dev);
return 0;
@@ -900,6 +902,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
{
+ mlx5_hv_vhca_destroy(dev->hv_vhca);
mlx5_fw_tracer_destroy(dev->tracer);
mlx5_fpga_cleanup(dev);
mlx5_eswitch_cleanup(dev->priv.eswitch);
@@ -1067,6 +1070,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
goto err_fw_tracer;
}
+ mlx5_hv_vhca_init(dev->hv_vhca);
+
err = mlx5_fpga_device_start(dev);
if (err) {
mlx5_core_err(dev, "fpga device start failed %d\n", err);
@@ -1122,6 +1127,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
err_ipsec_start:
mlx5_fpga_device_stop(dev);
err_fpga_start:
+ mlx5_hv_vhca_cleanup(dev->hv_vhca);
mlx5_fw_tracer_cleanup(dev->tracer);
err_fw_tracer:
mlx5_eq_table_destroy(dev);
@@ -1142,6 +1148,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
mlx5_accel_ipsec_cleanup(dev);
mlx5_accel_tls_cleanup(dev);
mlx5_fpga_device_stop(dev);
+ mlx5_hv_vhca_cleanup(dev->hv_vhca);
mlx5_fw_tracer_cleanup(dev->tracer);
mlx5_eq_table_destroy(dev);
mlx5_irq_table_destroy(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index df23f17..13b4cf2 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -659,6 +659,7 @@ struct mlx5_clock {
struct mlx5_fw_tracer;
struct mlx5_vxlan;
struct mlx5_geneve;
+struct mlx5_hv_vhca;
struct mlx5_core_dev {
struct device *device;
@@ -706,6 +707,7 @@ struct mlx5_core_dev {
struct mlx5_ib_clock_info *clock_info;
struct mlx5_fw_tracer *tracer;
u32 vsc_addr;
+ struct mlx5_hv_vhca *hv_vhca;
};
struct mlx5_db {
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v5, 6/6] net/mlx5e: Add mlx5e HV VHCA stats agent
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
HV VHCA stats agent is responsible on running a preiodic rx/tx
packets/bytes stats update. Currently the supported format is version
MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
transfer from the VF to the PF.
The reporter fetch the statistics data from all opened channels, fill it
in a buffer and send it to mlx5_hv_vhca_write_agent.
As the stats layer should include some metadata per block (sequence and
offset), the HV VHCA layer shall modify the buffer before actually send it
over block 1.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 13 ++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 162 +++++++++++++++++++++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h | 25 ++++
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 1 +
6 files changed, 205 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 8d443fc..f4de9cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -36,6 +36,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
mlx5_core-$(CONFIG_MLX5_ESWITCH) += en_rep.o en_tc.o en/tc_tun.o lib/port_tun.o lag_mp.o \
lib/geneve.o en/tc_tun_vxlan.o en/tc_tun_gre.o \
en/tc_tun_geneve.o diag/en_tc_tracepoint.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
#
# Core extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7316571..4467927 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -54,6 +54,7 @@
#include "mlx5_core.h"
#include "en_stats.h"
#include "en/fs.h"
+#include "lib/hv_vhca.h"
extern const struct net_device_ops mlx5e_netdev_ops;
struct page_pool;
@@ -782,6 +783,15 @@ struct mlx5e_modify_sq_param {
int rl_index;
};
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+struct mlx5e_hv_vhca_stats_agent {
+ struct mlx5_hv_vhca_agent *agent;
+ struct delayed_work work;
+ u16 delay;
+ void *buf;
+};
+#endif
+
struct mlx5e_xsk {
/* UMEMs are stored separately from channels, because we don't want to
* lose them when channels are recreated. The kernel also stores UMEMs,
@@ -853,6 +863,9 @@ struct mlx5e_priv {
struct devlink_health_reporter *tx_reporter;
struct devlink_health_reporter *rx_reporter;
struct mlx5e_xsk xsk;
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+ struct mlx5e_hv_vhca_stats_agent stats_agent;
+#endif
};
struct mlx5e_profile {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
new file mode 100644
index 0000000..c37b4ac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include "en.h"
+#include "en/hv_vhca_stats.h"
+#include "lib/hv_vhca.h"
+#include "lib/hv.h"
+
+struct mlx5e_hv_vhca_per_ring_stats {
+ u64 rx_packets;
+ u64 rx_bytes;
+ u64 tx_packets;
+ u64 tx_bytes;
+};
+
+static void
+mlx5e_hv_vhca_fill_ring_stats(struct mlx5e_priv *priv, int ch,
+ struct mlx5e_hv_vhca_per_ring_stats *data)
+{
+ struct mlx5e_channel_stats *stats;
+ int tc;
+
+ stats = &priv->channel_stats[ch];
+ data->rx_packets = stats->rq.packets;
+ data->rx_bytes = stats->rq.bytes;
+
+ for (tc = 0; tc < priv->max_opened_tc; tc++) {
+ data->tx_packets += stats->sq[tc].packets;
+ data->tx_bytes += stats->sq[tc].bytes;
+ }
+}
+
+static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, u64 *data,
+ int buf_len)
+{
+ int ch, i = 0;
+
+ for (ch = 0; ch < priv->max_nch; ch++) {
+ u64 *buf = data + i;
+
+ if (WARN_ON_ONCE(buf +
+ sizeof(struct mlx5e_hv_vhca_per_ring_stats) >
+ data + buf_len))
+ return;
+
+ mlx5e_hv_vhca_fill_ring_stats(priv, ch,
+ (struct mlx5e_hv_vhca_per_ring_stats *)buf);
+ i += sizeof(struct mlx5e_hv_vhca_per_ring_stats) / sizeof(u64);
+ }
+}
+
+static int mlx5e_hv_vhca_stats_buf_size(struct mlx5e_priv *priv)
+{
+ return (sizeof(struct mlx5e_hv_vhca_per_ring_stats) *
+ priv->max_nch);
+}
+
+static void mlx5e_hv_vhca_stats_work(struct work_struct *work)
+{
+ struct mlx5e_hv_vhca_stats_agent *sagent;
+ struct mlx5_hv_vhca_agent *agent;
+ struct delayed_work *dwork;
+ struct mlx5e_priv *priv;
+ int buf_len, rc;
+ void *buf;
+
+ dwork = to_delayed_work(work);
+ sagent = container_of(dwork, struct mlx5e_hv_vhca_stats_agent, work);
+ priv = container_of(sagent, struct mlx5e_priv, stats_agent);
+ buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
+ agent = sagent->agent;
+ buf = sagent->buf;
+
+ memset(buf, 0, buf_len);
+ mlx5e_hv_vhca_fill_stats(priv, buf, buf_len);
+
+ rc = mlx5_hv_vhca_agent_write(agent, buf, buf_len);
+ if (rc) {
+ mlx5_core_err(priv->mdev,
+ "%s: Failed to write stats, err = %d\n",
+ __func__, rc);
+ return;
+ }
+
+ if (sagent->delay)
+ queue_delayed_work(priv->wq, &sagent->work, sagent->delay);
+}
+
+enum {
+ MLX5_HV_VHCA_STATS_VERSION = 1,
+ MLX5_HV_VHCA_STATS_UPDATE_ONCE = 0xFFFF,
+};
+
+static void mlx5e_hv_vhca_stats_control(struct mlx5_hv_vhca_agent *agent,
+ struct mlx5_hv_vhca_control_block *block)
+{
+ struct mlx5e_hv_vhca_stats_agent *sagent;
+ struct mlx5e_priv *priv;
+
+ priv = mlx5_hv_vhca_agent_priv(agent);
+ sagent = &priv->stats_agent;
+
+ block->version = MLX5_HV_VHCA_STATS_VERSION;
+ block->rings = priv->max_nch;
+
+ if (!block->command) {
+ cancel_delayed_work_sync(&priv->stats_agent.work);
+ return;
+ }
+
+ sagent->delay = block->command == MLX5_HV_VHCA_STATS_UPDATE_ONCE ? 0 :
+ msecs_to_jiffies(block->command * 100);
+
+ queue_delayed_work(priv->wq, &sagent->work, sagent->delay);
+}
+
+static void mlx5e_hv_vhca_stats_cleanup(struct mlx5_hv_vhca_agent *agent)
+{
+ struct mlx5e_priv *priv = mlx5_hv_vhca_agent_priv(agent);
+
+ cancel_delayed_work_sync(&priv->stats_agent.work);
+}
+
+int mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
+{
+ int buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
+ struct mlx5_hv_vhca_agent *agent;
+
+ priv->stats_agent.buf = kvzalloc(buf_len, GFP_KERNEL);
+ if (!priv->stats_agent.buf)
+ return -ENOMEM;
+
+ agent = mlx5_hv_vhca_agent_create(priv->mdev->hv_vhca,
+ MLX5_HV_VHCA_AGENT_STATS,
+ mlx5e_hv_vhca_stats_control, NULL,
+ mlx5e_hv_vhca_stats_cleanup,
+ priv);
+
+ if (IS_ERR_OR_NULL(agent)) {
+ if (IS_ERR(agent))
+ netdev_warn(priv->netdev,
+ "Failed to create hv vhca stats agent, err = %ld\n",
+ PTR_ERR(agent));
+
+ kfree(priv->stats_agent.buf);
+ return IS_ERR_OR_NULL(agent);
+ }
+
+ priv->stats_agent.agent = agent;
+ INIT_DELAYED_WORK(&priv->stats_agent.work, mlx5e_hv_vhca_stats_work);
+
+ return 0;
+}
+
+void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
+{
+ if (IS_ERR_OR_NULL(priv->stats_agent.agent))
+ return;
+
+ mlx5_hv_vhca_agent_destroy(priv->stats_agent.agent);
+ kfree(priv->stats_agent.buf);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
new file mode 100644
index 0000000..664463f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __MLX5_EN_STATS_VHCA_H__
+#define __MLX5_EN_STATS_VHCA_H__
+#include "en.h"
+
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+
+int mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv);
+void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv);
+
+#else
+
+static inline int mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
+{
+ return 0;
+}
+
+static inline void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
+{
+}
+#endif
+
+#endif /* __MLX5_EN_STATS_VHCA_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7fdea64..fa4bf2d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -62,6 +62,7 @@
#include "en/xsk/setup.h"
#include "en/xsk/rx.h"
#include "en/xsk/tx.h"
+#include "en/hv_vhca_stats.h"
bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev)
@@ -5109,6 +5110,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
if (mlx5e_monitor_counter_supported(priv))
mlx5e_monitor_counter_init(priv);
+ mlx5e_hv_vhca_stats_create(priv);
if (netdev->reg_state != NETREG_REGISTERED)
return;
#ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -5141,6 +5143,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
queue_work(priv->wq, &priv->set_rx_mode_work);
+ mlx5e_hv_vhca_stats_destroy(priv);
if (mlx5e_monitor_counter_supported(priv))
mlx5e_monitor_counter_cleanup(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
index 984e7ad..4bad6a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
@@ -13,6 +13,7 @@
enum mlx5_hv_vhca_agent_type {
MLX5_HV_VHCA_AGENT_CONTROL = 0,
+ MLX5_HV_VHCA_AGENT_STATS = 1,
MLX5_HV_VHCA_AGENT_MAX = 32,
};
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v5, 5/6] net/mlx5: Add HV VHCA control agent
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
Control agent is responsible over of the control block (ID 0). It should
update the PF via this block about every capability change. In addition,
upon block 0 invalidate, it should activate all other supported agents
with data requests from the PF.
Upon agent create/destroy, the invalidate callback of the control agent
is being called in order to update the PF driver about this change.
The control agent is an integral part of HV VHCA and will be created
and destroy as part of the HV VHCA init/cleanup flow.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 122 ++++++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 1 +
2 files changed, 121 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
index 84d1d75..4047629 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
@@ -109,22 +109,131 @@ void mlx5_hv_vhca_invalidate(void *context, u64 block_mask)
queue_work(hv_vhca->work_queue, &work->invalidate_work);
}
+#define AGENT_MASK(type) (type ? BIT(type - 1) : 0 /* control */)
+
+static void mlx5_hv_vhca_agents_control(struct mlx5_hv_vhca *hv_vhca,
+ struct mlx5_hv_vhca_control_block *block)
+{
+ int i;
+
+ for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
+ struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
+
+ if (!agent || !agent->control)
+ continue;
+
+ if (!(AGENT_MASK(agent->type) & block->control))
+ continue;
+
+ agent->control(agent, block);
+ }
+}
+
+static void mlx5_hv_vhca_capabilities(struct mlx5_hv_vhca *hv_vhca,
+ u32 *capabilities)
+{
+ int i;
+
+ for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
+ struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
+
+ if (agent)
+ *capabilities |= AGENT_MASK(agent->type);
+ }
+}
+
+static void
+mlx5_hv_vhca_control_agent_invalidate(struct mlx5_hv_vhca_agent *agent,
+ u64 block_mask)
+{
+ struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
+ struct mlx5_core_dev *dev = hv_vhca->dev;
+ struct mlx5_hv_vhca_control_block *block;
+ u32 capabilities = 0;
+ int err;
+
+ block = kzalloc(sizeof(*block), GFP_KERNEL);
+ if (!block)
+ return;
+
+ err = mlx5_hv_read_config(dev, block, sizeof(*block), 0);
+ if (err)
+ goto free_block;
+
+ mlx5_hv_vhca_capabilities(hv_vhca, &capabilities);
+
+ /* In case no capabilities, send empty block in return */
+ if (!capabilities) {
+ memset(block, 0, sizeof(*block));
+ goto write;
+ }
+
+ if (block->capabilities != capabilities)
+ block->capabilities = capabilities;
+
+ if (block->control & ~capabilities)
+ goto free_block;
+
+ mlx5_hv_vhca_agents_control(hv_vhca, block);
+ block->command_ack = block->command;
+
+write:
+ mlx5_hv_write_config(dev, block, sizeof(*block), 0);
+
+free_block:
+ kfree(block);
+}
+
+static struct mlx5_hv_vhca_agent *
+mlx5_hv_vhca_control_agent_create(struct mlx5_hv_vhca *hv_vhca)
+{
+ return mlx5_hv_vhca_agent_create(hv_vhca, MLX5_HV_VHCA_AGENT_CONTROL,
+ NULL,
+ mlx5_hv_vhca_control_agent_invalidate,
+ NULL, NULL);
+}
+
+static void mlx5_hv_vhca_control_agent_destroy(struct mlx5_hv_vhca_agent *agent)
+{
+ mlx5_hv_vhca_agent_destroy(agent);
+}
+
int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
{
+ struct mlx5_hv_vhca_agent *agent;
+ int err;
+
if (IS_ERR_OR_NULL(hv_vhca))
return IS_ERR_OR_NULL(hv_vhca);
- return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
- mlx5_hv_vhca_invalidate);
+ err = mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
+ mlx5_hv_vhca_invalidate);
+ if (err)
+ return err;
+
+ agent = mlx5_hv_vhca_control_agent_create(hv_vhca);
+ if (IS_ERR_OR_NULL(agent)) {
+ mlx5_hv_unregister_invalidate(hv_vhca->dev);
+ return IS_ERR_OR_NULL(agent);
+ }
+
+ hv_vhca->agents[MLX5_HV_VHCA_AGENT_CONTROL] = agent;
+
+ return 0;
}
void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
{
+ struct mlx5_hv_vhca_agent *agent;
int i;
if (IS_ERR_OR_NULL(hv_vhca))
return;
+ agent = hv_vhca->agents[MLX5_HV_VHCA_AGENT_CONTROL];
+ if (agent)
+ mlx5_hv_vhca_control_agent_destroy(agent);
+
mutex_lock(&hv_vhca->agents_lock);
for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++)
WARN_ON(hv_vhca->agents[i]);
@@ -134,6 +243,11 @@ void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
mlx5_hv_unregister_invalidate(hv_vhca->dev);
}
+static void mlx5_hv_vhca_agents_update(struct mlx5_hv_vhca *hv_vhca)
+{
+ mlx5_hv_vhca_invalidate(hv_vhca, BIT(MLX5_HV_VHCA_AGENT_CONTROL));
+}
+
struct mlx5_hv_vhca_agent *
mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
enum mlx5_hv_vhca_agent_type type,
@@ -174,6 +288,8 @@ struct mlx5_hv_vhca_agent *
hv_vhca->agents[type] = agent;
mutex_unlock(&hv_vhca->agents_lock);
+ mlx5_hv_vhca_agents_update(hv_vhca);
+
return agent;
}
@@ -195,6 +311,8 @@ void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
agent->cleanup(agent);
kfree(agent);
+
+ mlx5_hv_vhca_agents_update(hv_vhca);
}
static int mlx5_hv_vhca_data_block_prepare(struct mlx5_hv_vhca_agent *agent,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
index cdf1303..984e7ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
@@ -12,6 +12,7 @@
struct mlx5_hv_vhca_control_block;
enum mlx5_hv_vhca_agent_type {
+ MLX5_HV_VHCA_AGENT_CONTROL = 0,
MLX5_HV_VHCA_AGENT_MAX = 32,
};
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v5, 1/6] PCI: hv: Add a paravirtual backchannel in software
From: Haiyang Zhang @ 2019-08-22 22:26 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org, Dexuan Cui, Jake Oshins
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
From: Dexuan Cui <decui@microsoft.com>
Windows SR-IOV provides a backchannel mechanism in software for communication
between a VF driver and a PF driver. These "configuration blocks" are
similar in concept to PCI configuration space, but instead of doing reads and
writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
can be sent or received asynchronously.
Nearly every SR-IOV device contains just such a communications channel in
hardware, so using this one in software is usually optional. Using the
software channel, however, allows driver implementers to leverage software
tools that fuzz the communications channel looking for vulnerabilities.
The usage model for these packets puts the responsibility for reading or
writing on the VF driver. The VF driver sends a read or a write packet,
indicating which "block" is being referred to by number.
If the PF driver wishes to initiate communication, it can "invalidate" one or
more of the first 64 blocks. This invalidation is delivered via a callback
supplied by the VF driver by this driver.
No protocol is implied, except that supplied by the PF and VF drivers.
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/pci/controller/pci-hyperv.c | 302 ++++++++++++++++++++++++++++++++++++
include/linux/hyperv.h | 15 ++
2 files changed, 317 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b6254..57adeca 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -365,6 +365,39 @@ struct pci_delete_interrupt {
struct tran_int_desc int_desc;
} __packed;
+/*
+ * Note: the VM must pass a valid block id, wslot and bytes_requested.
+ */
+struct pci_read_block {
+ struct pci_message message_type;
+ u32 block_id;
+ union win_slot_encoding wslot;
+ u32 bytes_requested;
+} __packed;
+
+struct pci_read_block_response {
+ struct vmpacket_descriptor hdr;
+ u32 status;
+ u8 bytes[HV_CONFIG_BLOCK_SIZE_MAX];
+} __packed;
+
+/*
+ * Note: the VM must pass a valid block id, wslot and byte_count.
+ */
+struct pci_write_block {
+ struct pci_message message_type;
+ u32 block_id;
+ union win_slot_encoding wslot;
+ u32 byte_count;
+ u8 bytes[HV_CONFIG_BLOCK_SIZE_MAX];
+} __packed;
+
+struct pci_dev_inval_block {
+ struct pci_incoming_message incoming;
+ union win_slot_encoding wslot;
+ u64 block_mask;
+} __packed;
+
struct pci_dev_incoming {
struct pci_incoming_message incoming;
union win_slot_encoding wslot;
@@ -499,6 +532,9 @@ struct hv_pci_dev {
struct hv_pcibus_device *hbus;
struct work_struct wrk;
+ void (*block_invalidate)(void *context, u64 block_mask);
+ void *invalidate_context;
+
/*
* What would be observed if one wrote 0xFFFFFFFF to a BAR and then
* read it back, for each of the BAR offsets within config space.
@@ -817,6 +853,256 @@ static int hv_pcifront_write_config(struct pci_bus *bus, unsigned int devfn,
.write = hv_pcifront_write_config,
};
+/*
+ * Paravirtual backchannel
+ *
+ * Hyper-V SR-IOV provides a backchannel mechanism in software for
+ * communication between a VF driver and a PF driver. These
+ * "configuration blocks" are similar in concept to PCI configuration space,
+ * but instead of doing reads and writes in 32-bit chunks through a very slow
+ * path, packets of up to 128 bytes can be sent or received asynchronously.
+ *
+ * Nearly every SR-IOV device contains just such a communications channel in
+ * hardware, so using this one in software is usually optional. Using the
+ * software channel, however, allows driver implementers to leverage software
+ * tools that fuzz the communications channel looking for vulnerabilities.
+ *
+ * The usage model for these packets puts the responsibility for reading or
+ * writing on the VF driver. The VF driver sends a read or a write packet,
+ * indicating which "block" is being referred to by number.
+ *
+ * If the PF driver wishes to initiate communication, it can "invalidate" one or
+ * more of the first 64 blocks. This invalidation is delivered via a callback
+ * supplied by the VF driver by this driver.
+ *
+ * No protocol is implied, except that supplied by the PF and VF drivers.
+ */
+
+struct hv_read_config_compl {
+ struct hv_pci_compl comp_pkt;
+ void *buf;
+ unsigned int len;
+ unsigned int bytes_returned;
+};
+
+/**
+ * hv_pci_read_config_compl() - Invoked when a response packet
+ * for a read config block operation arrives.
+ * @context: Identifies the read config operation
+ * @resp: The response packet itself
+ * @resp_packet_size: Size in bytes of the response packet
+ */
+static void hv_pci_read_config_compl(void *context, struct pci_response *resp,
+ int resp_packet_size)
+{
+ struct hv_read_config_compl *comp = context;
+ struct pci_read_block_response *read_resp =
+ (struct pci_read_block_response *)resp;
+ unsigned int data_len, hdr_len;
+
+ hdr_len = offsetof(struct pci_read_block_response, bytes);
+ if (resp_packet_size < hdr_len) {
+ comp->comp_pkt.completion_status = -1;
+ goto out;
+ }
+
+ data_len = resp_packet_size - hdr_len;
+ if (data_len > 0 && read_resp->status == 0) {
+ comp->bytes_returned = min(comp->len, data_len);
+ memcpy(comp->buf, read_resp->bytes, comp->bytes_returned);
+ } else {
+ comp->bytes_returned = 0;
+ }
+
+ comp->comp_pkt.completion_status = read_resp->status;
+out:
+ complete(&comp->comp_pkt.host_event);
+}
+
+/**
+ * hv_read_config_block() - Sends a read config block request to
+ * the back-end driver running in the Hyper-V parent partition.
+ * @pdev: The PCI driver's representation for this device.
+ * @buf: Buffer into which the config block will be copied.
+ * @len: Size in bytes of buf.
+ * @block_id: Identifies the config block which has been requested.
+ * @bytes_returned: Size which came back from the back-end driver.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int hv_read_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
+ unsigned int block_id, unsigned int *bytes_returned)
+{
+ struct hv_pcibus_device *hbus =
+ container_of(pdev->bus->sysdata, struct hv_pcibus_device,
+ sysdata);
+ struct {
+ struct pci_packet pkt;
+ char buf[sizeof(struct pci_read_block)];
+ } pkt;
+ struct hv_read_config_compl comp_pkt;
+ struct pci_read_block *read_blk;
+ int ret;
+
+ if (len == 0 || len > HV_CONFIG_BLOCK_SIZE_MAX)
+ return -EINVAL;
+
+ init_completion(&comp_pkt.comp_pkt.host_event);
+ comp_pkt.buf = buf;
+ comp_pkt.len = len;
+
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.pkt.completion_func = hv_pci_read_config_compl;
+ pkt.pkt.compl_ctxt = &comp_pkt;
+ read_blk = (struct pci_read_block *)&pkt.pkt.message;
+ read_blk->message_type.type = PCI_READ_BLOCK;
+ read_blk->wslot.slot = devfn_to_wslot(pdev->devfn);
+ read_blk->block_id = block_id;
+ read_blk->bytes_requested = len;
+
+ ret = vmbus_sendpacket(hbus->hdev->channel, read_blk,
+ sizeof(*read_blk), (unsigned long)&pkt.pkt,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret)
+ return ret;
+
+ ret = wait_for_response(hbus->hdev, &comp_pkt.comp_pkt.host_event);
+ if (ret)
+ return ret;
+
+ if (comp_pkt.comp_pkt.completion_status != 0 ||
+ comp_pkt.bytes_returned == 0) {
+ dev_err(&hbus->hdev->device,
+ "Read Config Block failed: 0x%x, bytes_returned=%d\n",
+ comp_pkt.comp_pkt.completion_status,
+ comp_pkt.bytes_returned);
+ return -EIO;
+ }
+
+ *bytes_returned = comp_pkt.bytes_returned;
+ return 0;
+}
+EXPORT_SYMBOL(hv_read_config_block);
+
+/**
+ * hv_pci_write_config_compl() - Invoked when a response packet for a write
+ * config block operation arrives.
+ * @context: Identifies the write config operation
+ * @resp: The response packet itself
+ * @resp_packet_size: Size in bytes of the response packet
+ */
+static void hv_pci_write_config_compl(void *context, struct pci_response *resp,
+ int resp_packet_size)
+{
+ struct hv_pci_compl *comp_pkt = context;
+
+ comp_pkt->completion_status = resp->status;
+ complete(&comp_pkt->host_event);
+}
+
+/**
+ * hv_write_config_block() - Sends a write config block request to the
+ * back-end driver running in the Hyper-V parent partition.
+ * @pdev: The PCI driver's representation for this device.
+ * @buf: Buffer from which the config block will be copied.
+ * @len: Size in bytes of buf.
+ * @block_id: Identifies the config block which is being written.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int hv_write_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
+ unsigned int block_id)
+{
+ struct hv_pcibus_device *hbus =
+ container_of(pdev->bus->sysdata, struct hv_pcibus_device,
+ sysdata);
+ struct {
+ struct pci_packet pkt;
+ char buf[sizeof(struct pci_write_block)];
+ u32 reserved;
+ } pkt;
+ struct hv_pci_compl comp_pkt;
+ struct pci_write_block *write_blk;
+ u32 pkt_size;
+ int ret;
+
+ if (len == 0 || len > HV_CONFIG_BLOCK_SIZE_MAX)
+ return -EINVAL;
+
+ init_completion(&comp_pkt.host_event);
+
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.pkt.completion_func = hv_pci_write_config_compl;
+ pkt.pkt.compl_ctxt = &comp_pkt;
+ write_blk = (struct pci_write_block *)&pkt.pkt.message;
+ write_blk->message_type.type = PCI_WRITE_BLOCK;
+ write_blk->wslot.slot = devfn_to_wslot(pdev->devfn);
+ write_blk->block_id = block_id;
+ write_blk->byte_count = len;
+ memcpy(write_blk->bytes, buf, len);
+ pkt_size = offsetof(struct pci_write_block, bytes) + len;
+ /*
+ * This quirk is required on some hosts shipped around 2018, because
+ * these hosts don't check the pkt_size correctly (new hosts have been
+ * fixed since early 2019). The quirk is also safe on very old hosts
+ * and new hosts, because, on them, what really matters is the length
+ * specified in write_blk->byte_count.
+ */
+ pkt_size += sizeof(pkt.reserved);
+
+ ret = vmbus_sendpacket(hbus->hdev->channel, write_blk, pkt_size,
+ (unsigned long)&pkt.pkt, VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret)
+ return ret;
+
+ ret = wait_for_response(hbus->hdev, &comp_pkt.host_event);
+ if (ret)
+ return ret;
+
+ if (comp_pkt.completion_status != 0) {
+ dev_err(&hbus->hdev->device,
+ "Write Config Block failed: 0x%x\n",
+ comp_pkt.completion_status);
+ return -EIO;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(hv_write_config_block);
+
+/**
+ * hv_register_block_invalidate() - Invoked when a config block invalidation
+ * arrives from the back-end driver.
+ * @pdev: The PCI driver's representation for this device.
+ * @context: Identifies the device.
+ * @block_invalidate: Identifies all of the blocks being invalidated.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int hv_register_block_invalidate(struct pci_dev *pdev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask))
+{
+ struct hv_pcibus_device *hbus =
+ container_of(pdev->bus->sysdata, struct hv_pcibus_device,
+ sysdata);
+ struct hv_pci_dev *hpdev;
+
+ hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
+ if (!hpdev)
+ return -ENODEV;
+
+ hpdev->block_invalidate = block_invalidate;
+ hpdev->invalidate_context = context;
+
+ put_pcichild(hpdev);
+ return 0;
+
+}
+EXPORT_SYMBOL(hv_register_block_invalidate);
+
/* Interrupt management hooks */
static void hv_int_desc_free(struct hv_pci_dev *hpdev,
struct tran_int_desc *int_desc)
@@ -1968,6 +2254,7 @@ static void hv_pci_onchannelcallback(void *context)
struct pci_response *response;
struct pci_incoming_message *new_message;
struct pci_bus_relations *bus_rel;
+ struct pci_dev_inval_block *inval;
struct pci_dev_incoming *dev_message;
struct hv_pci_dev *hpdev;
@@ -2045,6 +2332,21 @@ static void hv_pci_onchannelcallback(void *context)
}
break;
+ case PCI_INVALIDATE_BLOCK:
+
+ inval = (struct pci_dev_inval_block *)buffer;
+ hpdev = get_pcichild_wslot(hbus,
+ inval->wslot.slot);
+ if (hpdev) {
+ if (hpdev->block_invalidate) {
+ hpdev->block_invalidate(
+ hpdev->invalidate_context,
+ inval->block_mask);
+ }
+ put_pcichild(hpdev);
+ }
+ break;
+
default:
dev_warn(&hbus->hdev->device,
"Unimplemented protocol message %x\n",
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..9d37f8c 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1578,4 +1578,19 @@ struct vmpacket_descriptor *
for (pkt = hv_pkt_iter_first(channel); pkt; \
pkt = hv_pkt_iter_next(channel, pkt))
+/*
+ * Functions for passing data between SR-IOV PF and VF drivers. The VF driver
+ * sends requests to read and write blocks. Each block must be 128 bytes or
+ * smaller. Optionally, the VF driver can register a callback function which
+ * will be invoked when the host says that one or more of the first 64 block
+ * IDs is "invalid" which means that the VF driver should reread them.
+ */
+#define HV_CONFIG_BLOCK_SIZE_MAX 128
+int hv_read_config_block(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned);
+int hv_write_config_block(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id);
+int hv_register_block_invalidate(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
#endif /* _HYPERV_H */
--
1.8.3.1
^ permalink raw reply related
* Re: New skb extension for use by LSMs (skb "security blob")?
From: David Miller @ 2019-08-22 22:28 UTC (permalink / raw)
To: casey; +Cc: fw, paul, netdev, linux-security-module, selinux
In-Reply-To: <e2e22b41-2aa1-6a52-107d-e4efd9dcacf4@schaufler-ca.com>
From: Casey Schaufler <casey@schaufler-ca.com>
Date: Thu, 22 Aug 2019 14:59:37 -0700
> Sure, you *can* do that, but it would be insane to do so.
We look up the neighbour table entries on every single packet we
transmit from the kernel in the same exact way.
And it was exactly to get rid of a pointer in a data structure.
^ permalink raw reply
* Re: [net-next v2 00/13][pull request] 40GbE Intel Wired LAN Driver Updates 2019-08-22
From: David Miller @ 2019-08-22 22:29 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190822203039.15668-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 22 Aug 2019 13:30:26 -0700
> This series contains updates to i40e driver only.
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: David Miller @ 2019-08-22 22:32 UTC (permalink / raw)
To: horatiu.vultur
Cc: roopa, nikolay, UNGLinuxDriver, alexandre.belloni, allan.nielsen,
netdev, linux-kernel, bridge
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>
From: Horatiu Vultur <horatiu.vultur@microchip.com>
Date: Thu, 22 Aug 2019 21:07:27 +0200
> Current implementation of the SW bridge is setting the interfaces in
> promisc mode when they are added to bridge if learning of the frames is
> enabled.
> In case of Ocelot which has HW capabilities to switch frames, it is not
> needed to set the ports in promisc mode because the HW already capable of
> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> the ports in promisc mode to do the switching.
> This optimization takes places only if all the interfaces that are part
> of the bridge have this flag and have the same network driver.
>
> If the bridge interfaces is added in promisc mode then also the ports part
> of the bridge are set in promisc mode.
This doesn't look right at all.
The Linux bridge provides a software bridge.
By default, all hardware must provide a hardware implementation of
that software bridge behavior.
Anything that deviates from that behavior has to be explicitly asked
for by the user by explicit config commands.
^ permalink raw reply
* r8169: regression on MIPS/Loongson
From: Aaro Koskinen @ 2019-08-22 22:25 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: netdev, linux-mips
Hi,
After upgrading from v5.2 to v5.3-rc5 on MIPS/Loongson board, copying
large files from network with scp started to fail with "Integrity error".
Bisected to:
f072218cca5b076dd99f3dfa3aaafedfd0023a51 is the first bad commit
commit f072218cca5b076dd99f3dfa3aaafedfd0023a51
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu Jun 27 23:19:09 2019 +0200
r8169: remove not needed call to dma_sync_single_for_device
Any idea what goes wrong? Should this change be reverted?
A.
^ permalink raw reply
* Re: [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e HV VHCA stats
From: David Miller @ 2019-08-22 22:33 UTC (permalink / raw)
To: haiyangz
Cc: sashal, saeedm, leon, eranbe, lorenzo.pieralisi, bhelgaas,
linux-pci, linux-hyperv, netdev, kys, sthemmin, linux-kernel
In-Reply-To: <1566512708-13785-1-git-send-email-haiyangz@microsoft.com>
I applied this patch series already to net-next, what are you doing?
^ permalink raw reply
* Re: New skb extension for use by LSMs (skb "security blob")?
From: Casey Schaufler @ 2019-08-22 22:34 UTC (permalink / raw)
To: David Miller; +Cc: fw, paul, netdev, linux-security-module, selinux, casey
In-Reply-To: <20190822.152857.1388207414767202364.davem@davemloft.net>
On 8/22/2019 3:28 PM, David Miller wrote:
> From: Casey Schaufler <casey@schaufler-ca.com>
> Date: Thu, 22 Aug 2019 14:59:37 -0700
>
>> Sure, you *can* do that, but it would be insane to do so.
> We look up the neighbour table entries on every single packet we
> transmit from the kernel in the same exact way.
>
> And it was exactly to get rid of a pointer in a data structure.
I very much expect that the lifecycle management issues would
be completely different, but I'll admit to having little understanding
of the details of the neighbour table.
^ permalink raw reply
* Re: New skb extension for use by LSMs (skb "security blob")?
From: David Miller @ 2019-08-22 22:36 UTC (permalink / raw)
To: casey; +Cc: fw, paul, netdev, linux-security-module, selinux
In-Reply-To: <5d524e45-0d80-3a1c-4fd2-7610d2197bf8@schaufler-ca.com>
From: Casey Schaufler <casey@schaufler-ca.com>
Date: Thu, 22 Aug 2019 15:34:44 -0700
> On 8/22/2019 3:28 PM, David Miller wrote:
>> From: Casey Schaufler <casey@schaufler-ca.com>
>> Date: Thu, 22 Aug 2019 14:59:37 -0700
>>
>>> Sure, you *can* do that, but it would be insane to do so.
>> We look up the neighbour table entries on every single packet we
>> transmit from the kernel in the same exact way.
>>
>> And it was exactly to get rid of a pointer in a data structure.
>
> I very much expect that the lifecycle management issues would
> be completely different, but I'll admit to having little understanding
> of the details of the neighbour table.
Neighbour table entries can live anywhere from essentially forever down
to several microseconds.
If your hash is good, and you use RCU locking on the read side, it's a
single pointer dereference in cost.
^ permalink raw reply
* RE: [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e HV VHCA stats
From: Haiyang Zhang @ 2019-08-22 22:37 UTC (permalink / raw)
To: David Miller
Cc: sashal@kernel.org, saeedm@mellanox.com, leon@kernel.org,
eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
KY Srinivasan, Stephen Hemminger, linux-kernel@vger.kernel.org
In-Reply-To: <20190822.153315.1245817410062415025.davem@davemloft.net>
> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Thursday, August 22, 2019 3:33 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; saeedm@mellanox.com; leon@kernel.org;
> eranbe@mellanox.com; lorenzo.pieralisi@arm.com; bhelgaas@google.com;
> linux-pci@vger.kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e
> HV VHCA stats
>
>
> I applied this patch series already to net-next, what are you doing?
The v5 is pretty much the same as v4, except Eran had a fix to patch #3 in response to
Leon Romanovsky <leon@kernel.org>.
Thanks,
- Haiyang
^ permalink raw reply
* Re: [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e HV VHCA stats
From: David Miller @ 2019-08-22 22:39 UTC (permalink / raw)
To: haiyangz
Cc: sashal, saeedm, leon, eranbe, lorenzo.pieralisi, bhelgaas,
linux-pci, linux-hyperv, netdev, kys, sthemmin, linux-kernel
In-Reply-To: <DM6PR21MB133743FB2006A28AE10A170CCAA50@DM6PR21MB1337.namprd21.prod.outlook.com>
From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Thu, 22 Aug 2019 22:37:13 +0000
> The v5 is pretty much the same as v4, except Eran had a fix to patch #3 in response to
> Leon Romanovsky <leon@kernel.org>.
Well you now have to send me a patch relative to v4 in order to fix that.
When I say "applied", the series is in my tree and is therefore permanent.
It is therefore never appropriate to then post a new version of the series.
^ permalink raw reply
* Re: [PATCH net-next v4 0/2] dt-bindings: net: meson-dwmac: convert to yaml
From: David Miller @ 2019-08-22 22:42 UTC (permalink / raw)
To: narmstrong
Cc: robh+dt, martin.blumenstingl, devicetree, netdev, linux-amlogic,
linux-arm-kernel, linux-kernel
In-Reply-To: <20190820075742.14857-1-narmstrong@baylibre.com>
From: Neil Armstrong <narmstrong@baylibre.com>
Date: Tue, 20 Aug 2019 09:57:40 +0200
> This patchsets converts the Amlogic Meson DWMAC glue bindings over to
> YAML schemas using the already converted dwmac bindings.
>
> The first patch is needed because the Amlogic glue needs a supplementary
> reg cell to access the DWMAC glue registers.
>
> Changes since v3:
> - Specified net-next target tree
>
> Changes since v2:
> - Added review tags
> - Updated allwinner,sun7i-a20-gmac.yaml reg maxItems
Series applied, thanks.
^ permalink raw reply
* RE: [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e HV VHCA stats
From: Haiyang Zhang @ 2019-08-22 22:43 UTC (permalink / raw)
To: David Miller, eranbe@mellanox.com
Cc: sashal@kernel.org, saeedm@mellanox.com, leon@kernel.org,
lorenzo.pieralisi@arm.com, bhelgaas@google.com,
linux-pci@vger.kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <20190822.153912.2269276523787180347.davem@davemloft.net>
> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Thursday, August 22, 2019 3:39 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; saeedm@mellanox.com; leon@kernel.org;
> eranbe@mellanox.com; lorenzo.pieralisi@arm.com; bhelgaas@google.com;
> linux-pci@vger.kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next,v5, 0/6] Add software backchannel and mlx5e
> HV VHCA stats
>
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Date: Thu, 22 Aug 2019 22:37:13 +0000
>
> > The v5 is pretty much the same as v4, except Eran had a fix to patch #3 in
> response to
> > Leon Romanovsky <leon@kernel.org>.
>
> Well you now have to send me a patch relative to v4 in order to fix that.
>
> When I say "applied", the series is in my tree and is therefore permanent.
> It is therefore never appropriate to then post a new version of the series.
Thanks.
Eran, could you submit another patch for the fix to patch #3?
- Haiyang
^ permalink raw reply
* Re: [PATCH v2] sock: fix potential memory leak in proto_register()
From: David Miller @ 2019-08-22 22:43 UTC (permalink / raw)
To: zhang.lin16
Cc: ast, daniel, kafai, songliubraving, yhs, willemb, edumazet,
deepa.kernel, arnd, dh.herrmann, gnault, netdev, linux-kernel,
bpf, xue.zhihong, wang.yi59, jiang.xuexin
In-Reply-To: <1566348158-43942-1-git-send-email-zhang.lin16@zte.com.cn>
From: zhanglin <zhang.lin16@zte.com.cn>
Date: Wed, 21 Aug 2019 08:42:38 +0800
> @@ -3243,18 +3245,24 @@ int proto_register(struct proto *prot, int alloc_slab)
> }
>
> mutex_lock(&proto_list_mutex);
> + if (assign_proto_idx(prot)) {
> + mutex_unlock(&proto_list_mutex);
> + goto out_free_timewait_sock_slab_name;
> + }
Propagate the error code properly please.
^ permalink raw reply
* Re: [v4] ocelot_ace: fix action of trap
From: David Miller @ 2019-08-22 22:44 UTC (permalink / raw)
To: yangbo.lu
Cc: netdev, allan.nielsen, alexandre.belloni, UNGLinuxDriver, andrew
In-Reply-To: <20190821015912.43151-1-yangbo.lu@nxp.com>
From: Yangbo Lu <yangbo.lu@nxp.com>
Date: Wed, 21 Aug 2019 09:59:12 +0800
> The trap action should be copying the frame to CPU and
> dropping it for forwarding, but current setting was just
> copying frame to CPU.
>
> Fixes: b596229448dd ("net: mscc: ocelot: Add support for tcam")
> Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
> Acked-by: Allan W. Nielsen <allan.nielsen@microchip.com>
> ---
> Changes for v4:
> - Added ACK and Fixes info in commit message.
> Changes for v3:
> - Set MASK_MODE to 1 for dropping forwarding.
> - Dropped other patches of patch-set.
> Changes for v2:
> - None.
Applied.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-22 22:48 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Andy Lutomirski, Song Liu, Kees Cook, Networking, bpf,
Alexei Starovoitov, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
Linux API, LSM List, Chenbo Feng
In-Reply-To: <98fee747-795a-ff10-fa98-10ddb5afcc03@iogearbox.net>
On Thu, Aug 22, 2019 at 04:17:43PM +0200, Daniel Borkmann wrote:
>
> > > Hence unprivileged bpf is actually something that can be deprecated.
>
> There is actually a publicly known use-case on unprivileged bpf wrt
> socket filters, see the SO_ATTACH_BPF on sockets section as an example:
>
> https://blog.cloudflare.com/cloudflare-architecture-and-how-bpf-eats-the-world/
>
> If I'd have to take a good guess, I'd think it's major use-case is in
> SO_ATTACH_REUSEPORT_EBPF in the wild, I don't think the sysctl can be
> outright flipped or deprecated w/o breaking existing applications unless
> it's cleanly modeled into some sort of customizable CAP_BPF* type policy
> (more below) where this would be the lowest common denominator.
The cloudflare use case is the perfect example that a lot of program types
are used together.
Do people use SO_ATTACH_BPF ? Of course.
All program types are used by somebody. Before accepting them we had long
conversations with authors to understand that the use cases are real.
Some progs are probably used less than others by now.
Like cls_bpf without exts_integrated is probably not used at all.
We still have to support it, of course.
That cloudflare example demonstrates that kernel.unprivileged_bpf_disabled=1
is a reality. Companies that care about security already switched it on.
Different bits in cloudflare setup need different level of capabilities.
Some (like SO_ATTACH_BPF) need unpriv. Another need CAP_NET_ADMIN.
But common demoninator for them all is still CAP_SYS_ADMIN.
And that's why the system as a whole is not as safe as it could have
been with CAP_BPF. The system needs root in many places.
Folks going out of the way to reduce that SYS_ADMIN to something less.
The example with systemd and NET_ADMIN is just one of them.
^ permalink raw reply
* Re: r8169: regression on MIPS/Loongson
From: Heiner Kallweit @ 2019-08-22 22:52 UTC (permalink / raw)
To: Aaro Koskinen; +Cc: netdev, linux-mips
In-Reply-To: <20190822222549.GF30291@darkstar.musicnaut.iki.fi>
On 23.08.2019 00:25, Aaro Koskinen wrote:
> Hi,
>
> After upgrading from v5.2 to v5.3-rc5 on MIPS/Loongson board, copying
> large files from network with scp started to fail with "Integrity error".
> Bisected to:
>
> f072218cca5b076dd99f3dfa3aaafedfd0023a51 is the first bad commit
> commit f072218cca5b076dd99f3dfa3aaafedfd0023a51
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Thu Jun 27 23:19:09 2019 +0200
>
> r8169: remove not needed call to dma_sync_single_for_device
>
> Any idea what goes wrong? Should this change be reverted?
>
Typically the Realtek chips are used on Intel platforms and I haven't
seen any such report yet, so it seems to be platform-specific.
Which board (DT config) is it, and can you provide a full dmesg?
> A.
>
Heiner
^ permalink raw reply
* Re: [PATCH net-next 03/10] net: sched: refactor block offloads counter usage
From: Jakub Kicinski @ 2019-08-22 22:53 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko
In-Reply-To: <20190822124353.16902-4-vladbu@mellanox.com>
On Thu, 22 Aug 2019 15:43:46 +0300, Vlad Buslov wrote:
> Without rtnl lock protection filters can no longer safely manage block
> offloads counter themselves. Refactor cls API to protect block offloadcnt
> with tcf_block->cb_lock that is already used to protect driver callback
> list and nooffloaddevcnt counter. The counter can be modified by concurrent
> tasks by new functions that execute block callbacks (which is safe with
> previous patch that changed its type to atomic_t), however, block
> bind/unbind code that checks the counter value takes cb_lock in write mode
> to exclude any concurrent modifications. This approach prevents race
> conditions between bind/unbind and callback execution code but allows for
> concurrency for tc rule update path.
>
> Move block offload counter, filter in hardware counter and filter flags
> management from classifiers into cls hardware offloads API. Make functions
> tcf_block_offload_inc() and tcf_block_offload_dec() to be cls API private.
> Implement following new cls API to be used instead:
>
> tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
> already in hardware is successfully offloaded, increment block offloads
> counter, set filter in hardware counter and flag. On failure, previously
> offloaded filter is considered to be intact and offloads counter is not
> decremented.
>
> tc_setup_cb_replace() - destructive filter replace. Release existing
> filter block offload counter and reset its in hardware counter and flag.
> Set new filter in hardware counter and flag. On failure, previously
> offloaded filter is considered to be destroyed and offload counter is
> decremented.
>
> tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
> offloads counter.
>
> Refactor all offload-capable classifiers to atomically offload filters to
> hardware, change block offload counter, and set filter in hardware counter
> and flag by means of the new cls API functions.
>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
Looks good, minor nits
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 8502bd006b37..4215c849f4a3 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3000,13 +3000,97 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
> }
> EXPORT_SYMBOL(tcf_exts_dump_stats);
>
> -int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> - void *type_data, bool err_stop)
> +static void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
> +{
> + if (*flags & TCA_CLS_FLAGS_IN_HW)
> + return;
> + *flags |= TCA_CLS_FLAGS_IN_HW;
> + atomic_inc(&block->offloadcnt);
> +}
> +
> +static void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
> +{
> + if (!(*flags & TCA_CLS_FLAGS_IN_HW))
> + return;
> + *flags &= ~TCA_CLS_FLAGS_IN_HW;
> + atomic_dec(&block->offloadcnt);
> +}
> +
> +void tc_cls_offload_cnt_update(struct tcf_block *block, struct tcf_proto *tp,
> + u32 *cnt, u32 *flags, u32 diff, bool add)
> +{
> + lockdep_assert_held(&block->cb_lock);
> +
> + spin_lock(&tp->lock);
> + if (add) {
> + if (!*cnt)
> + tcf_block_offload_inc(block, flags);
> + (*cnt) += diff;
brackets unnecessary
> + } else {
> + (*cnt) -= diff;
> + if (!*cnt)
> + tcf_block_offload_dec(block, flags);
> + }
> + spin_unlock(&tp->lock);
> +}
> +EXPORT_SYMBOL(tc_cls_offload_cnt_update);
> +
> +static void
> +tc_cls_offload_cnt_reset(struct tcf_block *block, struct tcf_proto *tp,
> + u32 *cnt, u32 *flags)
> +{
> + lockdep_assert_held(&block->cb_lock);
> +
> + spin_lock(&tp->lock);
> + tcf_block_offload_dec(block, flags);
> + (*cnt) = 0;
ditto
> + spin_unlock(&tp->lock);
> +}
> +
> +static int
> +__tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> + void *type_data, bool err_stop)
> {
> struct flow_block_cb *block_cb;
> int ok_count = 0;
> int err;
>
> + list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
> + err = block_cb->cb(type, type_data, block_cb->cb_priv);
> + if (err) {
> + if (err_stop)
> + return err;
> + } else {
> + ok_count++;
> + }
> + }
> + return ok_count;
> +}
> +
> +int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> + void *type_data, bool err_stop, bool rtnl_held)
> +{
> + int ok_count;
> +
> + down_read(&block->cb_lock);
> + ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> + up_read(&block->cb_lock);
> + return ok_count;
> +}
> +EXPORT_SYMBOL(tc_setup_cb_call);
> +
> +/* Non-destructive filter add. If filter that wasn't already in hardware is
> + * successfully offloaded, increment block offloads counter. On failure,
> + * previously offloaded filter is considered to be intact and offloads counter
> + * is not decremented.
> + */
> +
Spurious new line here?
> +int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
> + enum tc_setup_type type, void *type_data, bool err_stop,
> + u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
> +{
> + int ok_count;
> +
> down_read(&block->cb_lock);
> /* Make sure all netdevs sharing this block are offload-capable. */
> if (block->nooffloaddevcnt && err_stop) {
> @@ -3014,22 +3098,67 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> goto errout;
> }
>
> - list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
> - err = block_cb->cb(type, type_data, block_cb->cb_priv);
> - if (err) {
> - if (err_stop) {
> - ok_count = err;
> - goto errout;
> - }
> - } else {
> - ok_count++;
> - }
> + ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> + if (ok_count > 0)
> + tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
> + ok_count, true);
> +errout:
and the labels again
> + up_read(&block->cb_lock);
> + return ok_count;
> +}
> +EXPORT_SYMBOL(tc_setup_cb_add);
> +
> +/* Destructive filter replace. If filter that wasn't already in hardware is
> + * successfully offloaded, increment block offload counter. On failure,
> + * previously offloaded filter is considered to be destroyed and offload counter
> + * is decremented.
> + */
> +
spurious new line?
> +int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
> + enum tc_setup_type type, void *type_data, bool err_stop,
> + u32 *old_flags, unsigned int *old_in_hw_count,
> + u32 *new_flags, unsigned int *new_in_hw_count,
> + bool rtnl_held)
> +{
> + int ok_count;
> +
> + down_read(&block->cb_lock);
> + /* Make sure all netdevs sharing this block are offload-capable. */
> + if (block->nooffloaddevcnt && err_stop) {
> + ok_count = -EOPNOTSUPP;
> + goto errout;
> }
> +
> + tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
> +
> + ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> + if (ok_count > 0)
> + tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
> + ok_count, true);
> errout:
> up_read(&block->cb_lock);
> return ok_count;
> }
> -EXPORT_SYMBOL(tc_setup_cb_call);
> +EXPORT_SYMBOL(tc_setup_cb_replace);
> +
> +/* Destroy filter and decrement block offload counter, if filter was previously
> + * offloaded.
> + */
> +
hm.. is this gap between comment and function it pertains to
intentional?
> +int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
> + enum tc_setup_type type, void *type_data, bool err_stop,
> + u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
> +{
> + int ok_count;
> +
> + down_read(&block->cb_lock);
> + ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> +
> + tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
> + up_read(&block->cb_lock);
> + return ok_count;
> +}
> +EXPORT_SYMBOL(tc_setup_cb_destroy);
>
> int tc_setup_flow_action(struct flow_action *flow_action,
> const struct tcf_exts *exts)
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 3f7a9c02b70c..7f304db7e697 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -162,17 +162,21 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
> cls_bpf.name = obj->bpf_name;
> cls_bpf.exts_integrated = obj->exts_integrated;
>
> - if (oldprog)
> - tcf_block_offload_dec(block, &oldprog->gen_flags);
> + if (cls_bpf.oldprog)
why the change from oldprog to cls_bpf.oldprog?
> + err = tc_setup_cb_replace(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
> + skip_sw, &oldprog->gen_flags,
> + &oldprog->in_hw_count,
> + &prog->gen_flags, &prog->in_hw_count,
> + true);
> + else
> + err = tc_setup_cb_add(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
> + skip_sw, &prog->gen_flags,
> + &prog->in_hw_count, true);
>
> - err = tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
> if (prog) {
> if (err < 0) {
> cls_bpf_offload_cmd(tp, oldprog, prog, extack);
> return err;
> - } else if (err > 0) {
> - prog->in_hw_count = err;
> - tcf_block_offload_inc(block, &prog->gen_flags);
> }
> }
>
> @@ -230,7 +234,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
> cls_bpf.name = prog->bpf_name;
> cls_bpf.exts_integrated = prog->exts_integrated;
>
> - tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false);
> + tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false, true);
> }
>
> static int cls_bpf_init(struct tcf_proto *tp)
> @@ -680,8 +684,8 @@ static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb
> continue;
> }
>
> - tc_cls_offload_cnt_update(block, &prog->in_hw_count,
> - &prog->gen_flags, add);
> + tc_cls_offload_cnt_update(block, tp, &prog->in_hw_count,
> + &prog->gen_flags, 1, add);
Since we're adding those higher level add/replace/destroy helpers,
would it also be possible to have a helper which takes care of
reoffload? tc_cls_offload_cnt_update() is kind of low level now, it'd
be cool to also hide it in the core.
> }
>
> return 0;
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index 054123742e32..0001a933d48b 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -419,10 +419,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
> cls_flower.command = FLOW_CLS_DESTROY;
> cls_flower.cookie = (unsigned long) f;
>
> - tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
> + tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
> + &f->flags, &f->in_hw_count, true);
> spin_lock(&tp->lock);
> list_del_init(&f->hw_list);
> - tcf_block_offload_dec(block, &f->flags);
> spin_unlock(&tp->lock);
>
> if (!rtnl_held)
> @@ -466,18 +466,15 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
> goto errout;
> }
>
> - err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
> + err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
> + skip_sw, &f->flags, &f->in_hw_count, true);
> kfree(cls_flower.rule);
>
> if (err < 0) {
> fl_hw_destroy_filter(tp, f, true, NULL);
> goto errout;
> } else if (err > 0) {
> - f->in_hw_count = err;
> err = 0;
Why does the tc_setup_cb* API still return the positive values, the
callers should no longer care, right?
> - spin_lock(&tp->lock);
> - tcf_block_offload_inc(block, &f->flags);
> - spin_unlock(&tp->lock);
> }
>
> if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox