* [PATCH v5 net-next 3/4] net: macb: Add phy-handle DT support
From: Brad Mouring @ 2018-03-13 21:32 UTC (permalink / raw)
To: Nicolas Ferre, Rob Herring, David S . Miller, Michael Grzeschik,
Andrew Lunn
Cc: Mark Rutland, netdev, Julia Cartwright, devicetree, Brad Mouring
In-Reply-To: <20180313213216.109153-1-brad.mouring@ni.com>
This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is useful for a phy
chip that has >1 phy in it, and two netdevs are using the same phy
chip (i.e. the second mac's phy lives on the first mac's MDIO bus)
The devicetree snippet would look something like this:
ethernet@feedf00d {
...
phy-handle = <&phy0> // the first netdev is physically wired to phy0
...
phy0: phy@0 {
...
reg = <0x0> // MDIO address 0
...
}
phy1: phy@1 {
...
reg = <0x1> // MDIO address 1
...
}
...
}
ethernet@deadbeef {
...
phy-handle = <&phy1> // tells the driver to use phy1 on the
// first mac's mdio bus (it's wired thusly)
...
}
The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
---
drivers/net/ethernet/cadence/macb_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index db1dc301bed3..d09bd43680b3 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -488,10 +488,12 @@ static int macb_mii_probe(struct net_device *dev)
}
bp->phy_node = of_node_get(np);
} else {
- /* fallback to standard phy registration if no phy were
- * found during dt phy registration
+ bp->phy_node = of_parse_phandle(np, "phy-handle", 0);
+ /* fallback to standard phy registration if no
+ * phy-handle was found nor any phy found during
+ * dt phy registration
*/
- if (!phy_find_first(bp->mii_bus)) {
+ if (!bp->phy_node && !phy_find_first(bp->mii_bus)) {
for (i = 0; i < PHY_MAX_ADDR; i++) {
struct phy_device *phydev;
--
2.16.2
^ permalink raw reply related
* [PATCH v5 net-next 1/4] net: macb: Reorganize macb_mii bringup
From: Brad Mouring @ 2018-03-13 21:32 UTC (permalink / raw)
To: Nicolas Ferre, Rob Herring, David S . Miller, Michael Grzeschik,
Andrew Lunn
Cc: Mark Rutland, netdev, Julia Cartwright, devicetree, Brad Mouring
In-Reply-To: <20180313213216.109153-1-brad.mouring@ni.com>
The macb mii setup (mii_probe() and mii_init()) previously was
somewhat interspersed, likely a result of organic growth and hacking.
This change moves mii bus registration into mii_init and probing the
bus for devices into mii_probe.
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/cadence/macb_main.c | 79 +++++++++++++++++---------------
1 file changed, 41 insertions(+), 38 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index e84afcf1ecb5..4b514c705e57 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -472,8 +472,42 @@ static int macb_mii_probe(struct net_device *dev)
struct macb *bp = netdev_priv(dev);
struct macb_platform_data *pdata;
struct phy_device *phydev;
- int phy_irq;
- int ret;
+ struct device_node *np;
+ int phy_irq, ret, i;
+
+ pdata = dev_get_platdata(&bp->pdev->dev);
+ np = bp->pdev->dev.of_node;
+ ret = 0;
+
+ if (np) {
+ if (of_phy_is_fixed_link(np)) {
+ if (of_phy_register_fixed_link(np) < 0) {
+ dev_err(&bp->pdev->dev,
+ "broken fixed-link specification\n");
+ return -ENODEV;
+ }
+ bp->phy_node = of_node_get(np);
+ } else {
+ /* fallback to standard phy registration if no phy were
+ * found during dt phy registration
+ */
+ if (!phy_find_first(bp->mii_bus)) {
+ for (i = 0; i < PHY_MAX_ADDR; i++) {
+ struct phy_device *phydev;
+
+ phydev = mdiobus_scan(bp->mii_bus, i);
+ if (IS_ERR(phydev) &&
+ PTR_ERR(phydev) != -ENODEV) {
+ ret = PTR_ERR(phydev);
+ break;
+ }
+ }
+
+ if (ret)
+ return -ENODEV;
+ }
+ }
+ }
if (bp->phy_node) {
phydev = of_phy_connect(dev, bp->phy_node,
@@ -488,7 +522,6 @@ static int macb_mii_probe(struct net_device *dev)
return -ENXIO;
}
- pdata = dev_get_platdata(&bp->pdev->dev);
if (pdata) {
if (gpio_is_valid(pdata->phy_irq_pin)) {
ret = devm_gpio_request(&bp->pdev->dev,
@@ -533,7 +566,7 @@ static int macb_mii_init(struct macb *bp)
{
struct macb_platform_data *pdata;
struct device_node *np;
- int err = -ENXIO, i;
+ int err, i;
/* Enable management port */
macb_writel(bp, NCR, MACB_BIT(MPE));
@@ -556,39 +589,9 @@ static int macb_mii_init(struct macb *bp)
dev_set_drvdata(&bp->dev->dev, bp->mii_bus);
np = bp->pdev->dev.of_node;
- if (np) {
- if (of_phy_is_fixed_link(np)) {
- if (of_phy_register_fixed_link(np) < 0) {
- dev_err(&bp->pdev->dev,
- "broken fixed-link specification\n");
- goto err_out_unregister_bus;
- }
- bp->phy_node = of_node_get(np);
-
- err = mdiobus_register(bp->mii_bus);
- } else {
- /* try dt phy registration */
- err = of_mdiobus_register(bp->mii_bus, np);
-
- /* fallback to standard phy registration if no phy were
- * found during dt phy registration
- */
- if (!err && !phy_find_first(bp->mii_bus)) {
- for (i = 0; i < PHY_MAX_ADDR; i++) {
- struct phy_device *phydev;
- phydev = mdiobus_scan(bp->mii_bus, i);
- if (IS_ERR(phydev) &&
- PTR_ERR(phydev) != -ENODEV) {
- err = PTR_ERR(phydev);
- break;
- }
- }
-
- if (err)
- goto err_out_unregister_bus;
- }
- }
+ if (np) {
+ err = of_mdiobus_register(bp->mii_bus, np);
} else {
for (i = 0; i < PHY_MAX_ADDR; i++)
bp->mii_bus->irq[i] = PHY_POLL;
@@ -610,10 +613,10 @@ static int macb_mii_init(struct macb *bp)
err_out_unregister_bus:
mdiobus_unregister(bp->mii_bus);
-err_out_free_mdiobus:
- of_node_put(bp->phy_node);
if (np && of_phy_is_fixed_link(np))
of_phy_deregister_fixed_link(np);
+err_out_free_mdiobus:
+ of_node_put(bp->phy_node);
mdiobus_free(bp->mii_bus);
err_out:
return err;
--
2.16.2
^ permalink raw reply related
* [PATCH v5 net-next 0/4] net: macb: Introduce phy-handle DT functionality
From: Brad Mouring @ 2018-03-13 21:32 UTC (permalink / raw)
To: Nicolas Ferre, Rob Herring, David S . Miller, Michael Grzeschik,
Andrew Lunn
Cc: Mark Rutland, netdev, Julia Cartwright, devicetree
In-Reply-To: <88aa8148-bf57-86a0-8f0d-fae7191865a6@gmail.com>
Consider the situation where a macb netdev is connected through
a phydev that sits on a mii bus other than the one provided to
this particular netdev. This situation is what this patchset aims
to accomplish through the existing phy-handle optional binding.
This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is precisely the
situation this patchset aims to solve, so it makes sense to introduce
the functionality to this driver (where the physical layout discussed
was encountered).
The devicetree snippet would look something like this:
...
ethernet@feedf00d {
...
phy-handle = <&phy0> // the first netdev is physically wired to phy0
...
phy0: phy@0 {
...
reg = <0x0> // MDIO address 0
...
}
phy1: phy@1 {
...
reg = <0x1> // MDIO address 1
...
}
...
}
ethernet@deadbeef {
...
phy-handle = <&phy1> // tells the driver to use phy1 on the
// first mac's mdio bus (it's wired thusly)
...
}
...
The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).
v2: Reorganization of mii probe/init functions, suggested by Andrew Lunn
v3: Moved some of the bus init code back into init (erroneously moved to probe)
some style issues, and an unintialized variable warning addressed.
v4: Add Reviewed-by: tags
Skip fallback code if phy-handle phandle is found
v5: Cleanup formatting issues
Fix compile failure introduced in 1/4 "net: macb: Reorganize macb_mii
bringup"
Fix typo in "Documentation: macb: Document phy-handle binding"
^ permalink raw reply
* [PATCH v5 net-next 4/4] Documentation: macb: Document phy-handle binding
From: Brad Mouring @ 2018-03-13 21:32 UTC (permalink / raw)
To: Nicolas Ferre, Rob Herring, David S . Miller, Michael Grzeschik,
Andrew Lunn
Cc: Mark Rutland, netdev, Julia Cartwright, devicetree, Brad Mouring
In-Reply-To: <20180313213216.109153-1-brad.mouring@ni.com>
Document the existence of the optional binding, directing to the
general ethernet document that describes this binding.
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
Documentation/devicetree/bindings/net/macb.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt
index 27966ae741e0..457d5ae16f23 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -29,6 +29,7 @@ Optional properties for PHY child node:
- reset-gpios : Should specify the gpio for phy reset
- magic-packet : If present, indicates that the hardware supports waking
up via magic packet.
+- phy-handle : see ethernet.txt file in the same directory
Examples:
--
2.16.2
^ permalink raw reply related
* [PATCH v5 net-next 2/4] net: macb: Remove redundant poll irq assignment
From: Brad Mouring @ 2018-03-13 21:32 UTC (permalink / raw)
To: Nicolas Ferre, Rob Herring, David S . Miller, Michael Grzeschik,
Andrew Lunn
Cc: Mark Rutland, netdev, Julia Cartwright, devicetree, Brad Mouring
In-Reply-To: <20180313213216.109153-1-brad.mouring@ni.com>
In phy_device's general probe, this device will already be set for
phy register polling, rendering this code redundant.
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/cadence/macb_main.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 4b514c705e57..db1dc301bed3 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -566,7 +566,7 @@ static int macb_mii_init(struct macb *bp)
{
struct macb_platform_data *pdata;
struct device_node *np;
- int err, i;
+ int err;
/* Enable management port */
macb_writel(bp, NCR, MACB_BIT(MPE));
@@ -593,9 +593,6 @@ static int macb_mii_init(struct macb *bp)
if (np) {
err = of_mdiobus_register(bp->mii_bus, np);
} else {
- for (i = 0; i < PHY_MAX_ADDR; i++)
- bp->mii_bus->irq[i] = PHY_POLL;
-
if (pdata)
bp->mii_bus->phy_mask = pdata->phy_mask;
--
2.16.2
^ permalink raw reply related
* Re: [PATCH bpf-next v4 1/2] bpf: extend stackmap to save binary_build_id+offset instead of address
From: Song Liu @ 2018-03-13 21:31 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: netdev@vger.kernel.org, ast@kernel.org, Peter Zijlstra,
Daniel Borkmann, Kernel Team, hannes@cmpxchg.org, Teng Qin
In-Reply-To: <C1507E62-2E3B-41A7-9C17-36CF86D8674B@fb.com>
> On Mar 12, 2018, at 3:47 PM, Song Liu <songliubraving@fb.com> wrote:
>
>
>
>> On Mar 12, 2018, at 2:31 PM, Alexei Starovoitov <ast@fb.com> wrote:
>>
>> On 3/12/18 2:12 PM, Song Liu wrote:
>>>
>>>> On Mar 12, 2018, at 2:00 PM, Alexei Starovoitov <ast@fb.com> wrote:
>>>>
>>>> On 3/12/18 1:39 PM, Song Liu wrote:
>>>>> + page = find_get_page(vma->vm_file->f_mapping, 0);
>>>>
>>>> did you test it with config_debug_atomic_sleep ?
>>>> it should have complained...
>>>
>>> Yeah, I have CONFIG_DEBUG_ATOMIC_SLEEP=y.
>>>
>>> I think find_get_page() will not sleep. The variation find_get_page_flags()
>>> may sleep with flag FGP_CREAT.
>>
>> I see. gfp_mask == 0 and no locks. should work indeed.
>> curious how perf report looks like for heavy bpf_get_stackid() usage?
>
> I modified samples/bpf/sampleip to only call bpf_get_stackid(). The following
> is captured with bpf_get_stackid() called at 10k Hz. stressapptest is running
> with 16 threads on a system with 56 cores.
>
>
> Samples: 1M of event 'cycles:pp', Event count (approx.): 628092326243
> Overhead Command Shared Object Symbol
> + 51.61% stressapptest stressapptest [.] AdlerMemcpyC
> - 20.82% stressapptest [kernel.vmlinux] [k] queued_spin_lock_slowpath
> - queued_spin_lock_slowpath
> - 20.80% pcpu_freelist_pop
> bpf_get_stackid
> bpf_get_stackid_tp
> - 0x590c
> 16.12% AdlerMemcpyC
> 4.50% OsLayer::CpuStressWorkload
> + 14.36% stressapptest stressapptest [.] OsLayer::CpuStressWorkload
> - 8.74% stressapptest [kernel.vmlinux] [k] _raw_spin_lock
> - _raw_spin_lock
> - 8.73% bpf_get_stackid
> bpf_get_stackid_tp
> + 0x590c
> - 0.67% stressapptest [kernel.vmlinux] [k] pcpu_freelist_pop
> - pcpu_freelist_pop
> - 0.67% bpf_get_stackid
> bpf_get_stackid_tp
> + 0x590c
>
> Seems lock contention is the dominating overhead here. This should be the same
> for original stackmap.
>
> Song
Samples: 172K of event 'cycles:pp', Event count (approx.): 102311012653
Overhead Command Shared Object Symbol
78.84% stressapptest stressapptest [.] AdlerMemcpyC
8.78% stressapptest stressapptest [.] OsLayer::CpuStressWorkload
3.14% stressapptest [kernel.vmlinux] [k] _raw_spin_lock
2.56% stressapptest stressapptest [.] WorkerThread::FillPage
0.45% stressapptest [kernel.vmlinux] [k] perf_callchain_user
0.37% stressapptest [kernel.vmlinux] [.] native_irq_return_iret
0.31% stressapptest [kernel.vmlinux] [k] clear_page_erms
0.29% stressapptest [kernel.vmlinux] [k] pcpu_freelist_pop
0.27% stressapptest stressapptest [.] CalculateAdlerChecksum
0.25% stressapptest [kernel.vmlinux] [k] bpf_get_stackid
0.22% swapper [kernel.vmlinux] [k] poll_idle
This perf output is taken with stressapptest running on 4 cores.
bpf_get_stackid() and pcps_free_list_pop combined about 0.54% of CPU.
Song
^ permalink raw reply
* [pci PATCH v6 5/5] pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
From: Alexander Duyck @ 2018-03-13 21:31 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212508.3553.65326.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Add a new driver called "pci-pf-stub" to act as a "white-list" for PF
devices that provide no other functionality other then acting as a means of
allocating a set of VFs. For now I only have one example ID provided by
Amazon in terms of devices that require this functionality. The general
idea is that in the future we will see other devices added as vendors come
up with devices where the PF is more or less just a lightweight shim used
to allocate VFs.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v6: New driver to address concerns about Amazon devices left unsupported
drivers/pci/Kconfig | 12 +++++++++
drivers/pci/Makefile | 2 ++
drivers/pci/pci-pf-stub.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/pci_ids.h | 2 ++
4 files changed, 76 insertions(+)
create mode 100644 drivers/pci/pci-pf-stub.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 34b56a8f8480..cdef2a2a9bc5 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -71,6 +71,18 @@ config PCI_STUB
When in doubt, say N.
+config PCI_PF_STUB
+ tristate "PCI PF Stub driver"
+ depends on PCI
+ depends on PCI_IOV
+ help
+ Say Y or M here if you want to enable support for devices that
+ require SR-IOV support, while at the same time the PF itself is
+ not providing any actual services on the host itself such as
+ storage or networking.
+
+ When in doubt, say N.
+
config XEN_PCIDEV_FRONTEND
tristate "Xen PCI Frontend"
depends on PCI && X86 && XEN
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 941970936840..4e133d3df403 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -43,6 +43,8 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
obj-$(CONFIG_PCI_STUB) += pci-stub.o
+obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o
+
obj-$(CONFIG_PCI_ECAM) += ecam.o
obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
new file mode 100644
index 000000000000..d218924d9bdb
--- /dev/null
+++ b/drivers/pci/pci-pf-stub.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/* pci-pf-stub - simple stub driver for PCI SR-IOV PF device
+ *
+ * This driver is meant to act as a "white-list" for devices that provde
+ * SR-IOV functionality while at the same time not actually needing a
+ * driver of their own.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+
+/**
+ * pci_pf_stub_white_list - White list of devices to bind pci-pf-stub onto
+ *
+ * This table provides the list of IDs this driver is supposed to bind
+ * onto. You could think of this as a list of "quirked" devices where we
+ * are adding support for SR-IOV here since there are no other drivers
+ * that they would be running under.
+ *
+ * Layout of the table below is as follows:
+ * { Vendor ID, Device ID,
+ * SubVendor ID, SubDevice ID,
+ * Class, Class Mask,
+ * private data (not used) }
+ */
+static const struct pci_device_id pci_pf_stub_white_list[] = {
+ { PCI_VDEVICE(AMAZON, 0x0053) },
+ /* required last entry */
+ { 0 }
+};
+MODULE_DEVICE_TABLE(pci, pci_pf_stub_white_list);
+
+static int pci_pf_stub_probe(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ pci_info(dev, "claimed by pci-pf-stub\n");
+ return 0;
+}
+
+static struct pci_driver pf_stub_driver = {
+ .name = "pci-pf-stub",
+ .id_table = pci_pf_stub_white_list,
+ .probe = pci_pf_stub_probe,
+ .sriov_configure = pci_sriov_configure_simple,
+};
+
+static int __init pci_pf_stub_init(void)
+{
+ return pci_register_driver(&pf_stub_driver);
+}
+
+static void __exit pci_pf_stub_exit(void)
+{
+ pci_unregister_driver(&pf_stub_driver);
+}
+
+module_init(pci_pf_stub_init);
+module_exit(pci_pf_stub_exit);
+
+MODULE_LICENSE("GPL");
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index a6b30667a331..b10621896017 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2548,6 +2548,8 @@
#define PCI_VENDOR_ID_CIRCUITCO 0x1cc8
#define PCI_SUBSYSTEM_ID_CIRCUITCO_MINNOWBOARD 0x0001
+#define PCI_VENDOR_ID_AMAZON 0x1d0f
+
#define PCI_VENDOR_ID_TEKRAM 0x1de1
#define PCI_DEVICE_ID_TEKRAM_DC290 0xdc29
^ permalink raw reply related
* [pci PATCH v6 4/5] nvme: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-03-13 21:30 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212508.3553.65326.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Instead of implementing our own version of a SR-IOV configuration stub in
the nvme driver we can just reuse the existing
pci_sriov_configure_simple function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
drivers/nvme/host/pci.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5933a5c732e8..5e963058882a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2580,24 +2580,6 @@ static void nvme_remove(struct pci_dev *pdev)
nvme_put_ctrl(&dev->ctrl);
}
-static int nvme_pci_sriov_configure(struct pci_dev *pdev, int numvfs)
-{
- int ret = 0;
-
- if (numvfs == 0) {
- if (pci_vfs_assigned(pdev)) {
- dev_warn(&pdev->dev,
- "Cannot disable SR-IOV VFs while assigned\n");
- return -EPERM;
- }
- pci_disable_sriov(pdev);
- return 0;
- }
-
- ret = pci_enable_sriov(pdev, numvfs);
- return ret ? ret : numvfs;
-}
-
#ifdef CONFIG_PM_SLEEP
static int nvme_suspend(struct device *dev)
{
@@ -2716,7 +2698,7 @@ static void nvme_error_resume(struct pci_dev *pdev)
.driver = {
.pm = &nvme_dev_pm_ops,
},
- .sriov_configure = nvme_pci_sriov_configure,
+ .sriov_configure = pci_sriov_configure_simple,
.err_handler = &nvme_err_handler,
};
^ permalink raw reply related
* [pci PATCH v6 3/5] ena: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-03-13 21:30 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212508.3553.65326.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Instead of implementing our own version of a SR-IOV configuration stub in
the ena driver we can just reuse the existing
pci_sriov_configure_simple function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
drivers/net/ethernet/amazon/ena/ena_netdev.c | 28 +-------------------------
1 file changed, 1 insertion(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150d144e..6054deb1e6aa 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3385,32 +3385,6 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
}
/*****************************************************************************/
-static int ena_sriov_configure(struct pci_dev *dev, int numvfs)
-{
- int rc;
-
- if (numvfs > 0) {
- rc = pci_enable_sriov(dev, numvfs);
- if (rc != 0) {
- dev_err(&dev->dev,
- "pci_enable_sriov failed to enable: %d vfs with the error: %d\n",
- numvfs, rc);
- return rc;
- }
-
- return numvfs;
- }
-
- if (numvfs == 0) {
- pci_disable_sriov(dev);
- return 0;
- }
-
- return -EINVAL;
-}
-
-/*****************************************************************************/
-/*****************************************************************************/
/* ena_remove - Device Removal Routine
* @pdev: PCI device information struct
@@ -3525,7 +3499,7 @@ static int ena_resume(struct pci_dev *pdev)
.suspend = ena_suspend,
.resume = ena_resume,
#endif
- .sriov_configure = ena_sriov_configure,
+ .sriov_configure = pci_sriov_configure_simple,
};
static int __init ena_init(void)
^ permalink raw reply related
* [pci PATCH v6 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck @ 2018-03-13 21:29 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212508.3553.65326.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Hardware-realized virtio_pci devices can implement SR-IOV, so this
patch enables its use. The device in question is an upcoming Intel
NIC that implements both a virtio_net PF and virtio_net VFs. These
are hardware realizations of what has been up to now been a software
interface.
The device in question has the following 4-part PCI IDs:
PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe
The patch currently needs no check for device ID, because the callback
will never be made for devices that do not assert the capability or
when run on a platform incapable of SR-IOV.
One reason for this patch is because the hardware requires the
vendor ID of a VF to be the same as the vendor ID of the PF that
created it. So it seemed logical to simply have a fully-functioning
virtio_net PF create the VFs. This patch makes that possible.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v4: Dropped call to pci_disable_sriov in virtio_pci_remove function
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
drivers/virtio/virtio_pci_common.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 48d4d1cf1cb6..67a227fd7aa0 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -596,6 +596,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
#ifdef CONFIG_PM_SLEEP
.driver.pm = &virtio_pci_pm_ops,
#endif
+ .sriov_configure = pci_sriov_configure_simple,
};
module_pci_driver(virtio_pci_driver);
^ permalink raw reply related
* [pci PATCH v6 1/5] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Alexander Duyck @ 2018-03-13 21:28 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212508.3553.65326.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds a common configuration function called
pci_sriov_configure_simple that will allow for managing VFs on devices
where the PF is not capable of managing VF resources.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v5: New patch replacing pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
Dropped bits related to autoprobe changes
v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled
drivers/pci/iov.c | 32 ++++++++++++++++++++++++++++++++
include/linux/pci.h | 3 +++
2 files changed, 35 insertions(+)
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924ae0350..bd7021491fdb 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -807,3 +807,35 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
return dev->sriov->total_VFs;
}
EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
+
+/**
+ * pci_sriov_configure_simple - helper to configure unmanaged SR-IOV
+ * @dev: the PCI device
+ * @nr_virtfn: number of virtual functions to enable, 0 to disable
+ *
+ * Used to provide generic enable/disable SR-IOV option for devices
+ * that do not manage the VFs generated by their driver
+ */
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
+{
+ int err = -EINVAL;
+
+ might_sleep();
+
+ if (!dev->is_physfn)
+ return -ENODEV;
+
+ if (pci_vfs_assigned(dev)) {
+ pci_warn(dev,
+ "Cannot modify SR-IOV while VFs are assigned\n");
+ err = -EPERM;
+ } else if (!nr_virtfn) {
+ sriov_disable(dev);
+ err = 0;
+ } else if (!dev->sriov->num_VFs) {
+ err = sriov_enable(dev, nr_virtfn);
+ }
+
+ return err ? err : nr_virtfn;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_simple);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..f3099e940cda 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1953,6 +1953,7 @@ static inline void pci_mmcfg_late_init(void) { }
int pci_vfs_assigned(struct pci_dev *dev);
int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
int pci_sriov_get_totalvfs(struct pci_dev *dev);
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
#else
@@ -1980,6 +1981,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
{ return 0; }
static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
{ return 0; }
+/* since this expected to be used as a function pointer just define as NULL */
+#define pci_sriov_configure_simple NULL
static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
{ return 0; }
static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }
^ permalink raw reply related
* [pci PATCH v6 0/5] Add support for unmanaged SR-IOV
From: Alexander Duyck @ 2018-03-13 21:27 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
This series is meant to add support for SR-IOV on devices when the VFs are
not managed by the kernel. Examples of recent patches attempting to do this
include:
virto - https://patchwork.kernel.org/patch/10241225/
pci-stub - https://patchwork.kernel.org/patch/10109935/
vfio - https://patchwork.kernel.org/patch/10103353/
uio - https://patchwork.kernel.org/patch/9974031/
Since this is quickly blowing up into a multi-driver problem it is probably
best to implement this solution as generically as possible.
This series is an attempt to do that. What we do with this patch set is
provide a generic framework to enable SR-IOV in the case that the PF driver
doesn't support managing the VFs itself.
I based my patch set originally on the patch by Mark Rustad but there isn't
much left after going through and cleaning out the bits that were no longer
needed, and after incorporating the feedback from David Miller. At this point
the only items to be fully reused was his patch description which is now
present in patch 3 of the set.
This solution is limited in scope to just adding support for devices that
provide no functionality for SR-IOV other than allocating the VFs by
calling pci_enable_sriov. Previous sets had included patches for VFIO, but
for now I am dropping that as the scope of that work is larger then I
think I can take on at this time.
v2: Reduced scope back to just virtio_pci and vfio-pci
Broke into 3 patch set from single patch
Changed autoprobe behavior to always set when num_vfs is set non-zero
v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
v4: Dropped vfio-pci patch
Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
Added new patch that enables pci_sriov_configure_simple
Updated drivers to use pci_sriov_configure_simple
v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
Updated drivers to drop "#ifdef" checks for IOV
Added pci-pf-stub as place for PF-only drivers to add support
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Liang-Min Wang <liang-min.wang@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
---
Alexander Duyck (5):
pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
ena: Migrate over to unmanaged SR-IOV support
nvme: Migrate over to unmanaged SR-IOV support
pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
drivers/net/ethernet/amazon/ena/ena_netdev.c | 28 ------------
drivers/nvme/host/pci.c | 20 ---------
drivers/pci/Kconfig | 12 +++++
drivers/pci/Makefile | 2 +
drivers/pci/iov.c | 32 ++++++++++++++
drivers/pci/pci-pf-stub.c | 60 ++++++++++++++++++++++++++
drivers/virtio/virtio_pci_common.c | 1
include/linux/pci.h | 3 +
include/linux/pci_ids.h | 2 +
9 files changed, 114 insertions(+), 46 deletions(-)
create mode 100644 drivers/pci/pci-pf-stub.c
--
^ permalink raw reply
* [PATCH iproute2 1/1] tc: use get_u32() in psample action to match types
From: Roman Mashak @ 2018-03-13 21:16 UTC (permalink / raw)
To: stephen; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
tc/m_sample.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tc/m_sample.c b/tc/m_sample.c
index ff5ee6bd1ef6..dff986f59999 100644
--- a/tc/m_sample.c
+++ b/tc/m_sample.c
@@ -65,7 +65,7 @@ static int parse_sample(struct action_util *a, int *argc_p, char ***argv_p,
while (argc > 0) {
if (matches(*argv, "rate") == 0) {
NEXT_ARG();
- if (get_unsigned(&rate, *argv, 10) != 0) {
+ if (get_u32(&rate, *argv, 10) != 0) {
fprintf(stderr, "Illegal rate %s\n", *argv);
usage();
return -1;
@@ -73,7 +73,7 @@ static int parse_sample(struct action_util *a, int *argc_p, char ***argv_p,
rate_set = true;
} else if (matches(*argv, "group") == 0) {
NEXT_ARG();
- if (get_unsigned(&group, *argv, 10) != 0) {
+ if (get_u32(&group, *argv, 10) != 0) {
fprintf(stderr, "Illegal group num %s\n",
*argv);
usage();
@@ -82,7 +82,7 @@ static int parse_sample(struct action_util *a, int *argc_p, char ***argv_p,
group_set = true;
} else if (matches(*argv, "trunc") == 0) {
NEXT_ARG();
- if (get_unsigned(&trunc, *argv, 10) != 0) {
+ if (get_u32(&trunc, *argv, 10) != 0) {
fprintf(stderr, "Illegal truncation size %s\n",
*argv);
usage();
--
2.7.4
^ permalink raw reply related
* Re: [Intel-wired-lan] [PATCH 12/15] ice: Add stats and ethtool support
From: Jesse Brandeburg @ 2018-03-13 21:14 UTC (permalink / raw)
To: Eric Dumazet, davem
Cc: Venkataramanan, Anirudh, kubakici@wp.pl, netdev@vger.kernel.org,
intel-wired-lan@lists.osuosl.org, jesse.brandeburg
In-Reply-To: <a6f3bd20-705e-2022-23c8-1e3b5a8c93a6@gmail.com>
On Tue, 13 Mar 2018 12:17:10 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Yes, this is a recurring mistake
>
> See commit
> bf909456f6a89654cb65c01fe83a4f9b133bf978 Revert "net: hns3: Add packet
> statistics of netdev"
Thanks for the pointer, that was a useful thread to review. I
understand the point that was made about not having the netdev stats
shown in ethtool -S. We definitely do provide per-queue stats as well
as these regular stats in ethtool -S.
I do remember from the past discussions that it *is* useless for the
driver to keep internally any stats that were already stored via the
get_stats NDO, and we missed it in the internal review that this driver
was doing that, so that will be fixed.
Maybe it's just that I've been doing this too long, but I regularly
(and many other customers/users do as well) depend on the ethtool stats
being atomically updated w.r.t. each other. This means that if I'm
getting the over rx_packets, as well as the per-queue rx_packets, and I
read them all at once from the driver with ethtool, then I can check
that things are working as expected.
If I have to gather the netdev stats from /proc/net/dev (which iproute2
tool shows the /proc/net/dev stats?) and somehow atomically gather the
ethtool -S stats. What's the user to do in this brave new world where
ethtool doesn't at least have rx/tx bytes and packets?
^ permalink raw reply
* Re: [PATCH v2 iproute2-next 0/6] cm_id, cq, mr, and pd resource tracking
From: Jason Gunthorpe @ 2018-03-13 21:13 UTC (permalink / raw)
To: David Ahern; +Cc: Leon Romanovsky, stephen, Steve Wise, netdev, linux-rdma
In-Reply-To: <dde7cc61-22af-b589-348f-e91c6657a289@gmail.com>
On Tue, Mar 13, 2018 at 01:45:12PM -0700, David Ahern wrote:
> On 3/13/18 1:32 AM, Leon Romanovsky wrote:
> > On Mon, Mar 12, 2018 at 10:53:03AM -0700, David Ahern wrote:
> >> On 3/12/18 8:16 AM, Steve Wise wrote:
> >>> Hey all,
> >>>
> >>> The kernel side of this series has been merged for rdma-next [1]. Let me
> >>> know if this iproute2 series can be merged, of if it needs more changes.
> >>>
> >>
> >> The problem is that iproute2 headers are synced to kernel headers from
> >> DaveM's tree (net-next mainly). I take it this series will not appear in
> >> Dave's tree until after a merge through Linus' tree. Correct?
> >
> > David,
> >
> > Technically, you are right, and we would like to ask you for an extra tweak
> > to the flow for the RDMAtool, because current scheme causes delays at least
> > cycle.
> >
> > Every RDMAtool's patchset which requires changes to headers is always
> > includes header patch, can you please accept those series and once you
> > are bringing new net-next headers from Linus, simply overwrite all our
> > headers?
>
> I did not follow the discussion back when this decision was made, so how
> did rdma tool end up in iproute2? I do not need the overhead of
> sometimes I sync the rdma header file and sometimes I don't.
Could you pull the uapi headers from linux-next? That tree will have
both netdev and rdma stuff merged together properly.
Jason
^ permalink raw reply
* Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()
From: Andrew Morton @ 2018-03-13 21:02 UTC (permalink / raw)
To: Kees Cook
Cc: Linus Torvalds, Linux Kernel Mailing List, Josh Poimboeuf,
Rasmus Villemoes, Gustavo A. R. Silva, Tobin C. Harding,
Steven Rostedt, Jonathan Corbet, Chris Mason, Josef Bacik,
David Sterba, David S. Miller, Alexey Kuznetsov,
Hideaki YOSHIFUJI, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
Masahiro Yamada, Borislav Petkov, Randy Dunlap <
In-Reply-To: <CAGXu5j+M0qbBS0og8buXHdubA=oi=jMcK=ehLGDeevFd2H7JmA@mail.gmail.com>
On Mon, 12 Mar 2018 21:28:57 -0700 Kees Cook <keescook@chromium.org> wrote:
> On Mon, Mar 12, 2018 at 4:57 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> >>
> >> Replacing the __builtin_choose_expr() with ?: works of course.
> >
> > Hmm. That sounds like the right thing to do. We were so myopically
> > staring at the __builtin_choose_expr() problem that we overlooked the
> > obvious solution.
> >
> > Using __builtin_constant_p() together with a ?: is in fact our common
> > pattern, so that should be fine. The only real reason to use
> > __builtin_choose_expr() is if you want to get the *type* to vary
> > depending on which side you choose, but that's not an issue for
> > min/max.
>
> This doesn't solve it for -Wvla, unfortunately. That was the point of
> Josh's original suggestion of __builtin_choose_expr().
>
> Try building with KCFLAGS=-Wval and checking net/ipv6/proc.c:
>
> net/ipv6/proc.c: In function ‘snmp6_seq_show_item’:
> net/ipv6/proc.c:198:2: warning: ISO C90 forbids array ‘buff’ whose
> size can’t be evaluated [-Wvla]
> unsigned long buff[SNMP_MIB_MAX];
> ^~~~~~~~
PITA. Didn't we once have a different way of detecting VLAs? Some
post-compilation asm parser, iirc.
I suppose the world wouldn't end if we had a gcc version ifdef in
kernel.h. We'll get to remove it in, oh, ten years.
^ permalink raw reply
* Re: [RESEND] rsi: Remove stack VLA usage
From: Andy Shevchenko @ 2018-03-13 21:00 UTC (permalink / raw)
To: tcharding
Cc: Kalle Valo, kernel-hardening, Linux Kernel Mailing List, netdev,
open list:TI WILINK WIRELES..., Tycho Andersen, Kees Cook
In-Reply-To: <20180313201757.GK8631@eros>
On Tue, Mar 13, 2018 at 10:17 PM, tcharding <me@tobin.cc> wrote:
> On Mon, Mar 12, 2018 at 09:46:06AM +0000, Kalle Valo wrote:
>> tcharding <me@tobin.cc> wrote:
I'm pretty much sure it depends on the original email headers, like
above ^^^ — no name.
Perhaps git config on your side should be done.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* [PATCH] pktgen: use dynamic allocation for debug print buffer
From: Arnd Bergmann @ 2018-03-13 20:58 UTC (permalink / raw)
To: David S. Miller
Cc: Gustavo A . R . Silva, Arnd Bergmann, Dmitry Safonov,
Johannes Berg, Eric Dumazet, netdev, linux-kernel
After the removal of the VLA, we get a harmless warning about a large
stack frame:
net/core/pktgen.c: In function 'pktgen_if_write':
net/core/pktgen.c:1710:1: error: the frame size of 1076 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
The function was previously shown to be safe despite hitting
the 1024 bye warning level. To get rid of the annoyging warning,
while keeping it readable, this changes it to use strndup_user().
Obviously this is not a fast path, so the kmalloc() overhead
can be disregarded.
Fixes: 35951393bbff ("pktgen: Remove VLA usage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
net/core/pktgen.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index de17a9f3e1f6..9216cf99b5a0 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -906,13 +906,14 @@ static ssize_t pktgen_if_write(struct file *file,
i += len;
if (debug) {
- size_t copy = min_t(size_t, count, 1023);
- char tb[1024];
- if (copy_from_user(tb, user_buffer, copy))
- return -EFAULT;
- tb[copy] = 0;
- pr_debug("%s,%lu buffer -:%s:-\n",
- name, (unsigned long)count, tb);
+ size_t copy = min_t(size_t, count + 1, 1024);
+ char *tp = strndup_user(user_buffer, copy);
+
+ if (IS_ERR(tp))
+ return PTR_ERR(tp);
+
+ pr_debug("%s,%zu buffer -:%s:-\n", name, count, tp);
+ kfree(buf);
}
if (!strcmp(name, "min_pkt_size")) {
--
2.9.0
^ permalink raw reply related
* Re: [PATCH v2 iproute2-next 0/6] cm_id, cq, mr, and pd resource tracking
From: Doug Ledford @ 2018-03-13 20:58 UTC (permalink / raw)
To: David Ahern, Leon Romanovsky, stephen; +Cc: Steve Wise, netdev, linux-rdma
In-Reply-To: <dde7cc61-22af-b589-348f-e91c6657a289@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1939 bytes --]
On Tue, 2018-03-13 at 13:45 -0700, David Ahern wrote:
> On 3/13/18 1:32 AM, Leon Romanovsky wrote:
> > On Mon, Mar 12, 2018 at 10:53:03AM -0700, David Ahern wrote:
> > > On 3/12/18 8:16 AM, Steve Wise wrote:
> > > > Hey all,
> > > >
> > > > The kernel side of this series has been merged for rdma-next [1]. Let me
> > > > know if this iproute2 series can be merged, of if it needs more changes.
> > > >
> > >
> > > The problem is that iproute2 headers are synced to kernel headers from
> > > DaveM's tree (net-next mainly). I take it this series will not appear in
> > > Dave's tree until after a merge through Linus' tree. Correct?
> >
> > David,
> >
> > Technically, you are right, and we would like to ask you for an extra tweak
> > to the flow for the RDMAtool, because current scheme causes delays at least
> > cycle.
> >
> > Every RDMAtool's patchset which requires changes to headers is always
> > includes header patch, can you please accept those series and once you
> > are bringing new net-next headers from Linus, simply overwrite all our
> > headers?
>
> I did not follow the discussion back when this decision was made, so how
> did rdma tool end up in iproute2?
It is modeled after the ip command, and for better or worse, the
iproute2 package has become the standard drop box for low level kernel
network configuring tools. The RDMA subsystem may not be IP networking,
but it is still networking, so it seemed an appropriate fit.
> I do not need the overhead of
> sometimes I sync the rdma header file and sometimes I don't.
>
> One option that comes to mind is to move the rdma header file under the
> rdma directory. It breaks the uapi model, but it seems that iproute2 is
> just a delivery vehicle for this command.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH 12/15] ice: Add stats and ethtool support
From: David Miller @ 2018-03-13 20:54 UTC (permalink / raw)
To: anirudh.venkataramanan; +Cc: kubakici, netdev, intel-wired-lan
In-Reply-To: <1520967916.696.21.camel@intel.com>
From: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com>
Date: Tue, 13 Mar 2018 19:05:19 +0000
> Thanks for the feedback. I am not sure I understand what's being asked
> here. Do you mean to say that standard netdev stats should not be
> printed when we do ethtool -S or something else?
Yes, that is what he is saying.
It is duplicating information already available with a normal
netdev statistics dump.
^ permalink raw reply
* [PATCH v2 net 1/1] net sched actions: return explicit error when tunnel_key mode is not specified
From: Roman Mashak @ 2018-03-13 20:53 UTC (permalink / raw)
To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, amir, Roman Mashak
If set/unset mode of the tunnel_key action is not provided, ->init() still
returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
this results in crash:
% tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1
[ 35.805515] general protection fault: 0000 [#1] SMP PTI
[ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
crypto_simd glue_helper cryptd serio_raw
[ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
[ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190
[ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
[ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
[ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
[ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
[ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
[ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
[ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
knlGS:0000000000000000
[ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
[ 35.816457] Call Trace:
[ 35.817158] tc_ctl_action+0x11a/0x220
[ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0
[ 35.818457] ? __slab_alloc+0x1c/0x30
[ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0
[ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0
[ 35.820231] netlink_rcv_skb+0xce/0x100
[ 35.820744] netlink_unicast+0x164/0x220
[ 35.821500] netlink_sendmsg+0x293/0x370
[ 35.822040] sock_sendmsg+0x30/0x40
[ 35.822508] ___sys_sendmsg+0x2c5/0x2e0
[ 35.823149] ? pagecache_get_page+0x27/0x220
[ 35.823714] ? filemap_fault+0xa2/0x640
[ 35.824423] ? page_add_file_rmap+0x108/0x200
[ 35.825065] ? alloc_set_pte+0x2aa/0x530
[ 35.825585] ? finish_fault+0x4e/0x70
[ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0
[ 35.826723] ? __sys_sendmsg+0x41/0x70
[ 35.827230] __sys_sendmsg+0x41/0x70
[ 35.827710] do_syscall_64+0x68/0x120
[ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 35.828859] RIP: 0033:0x7f3d0ca4da67
[ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
[ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
[ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
[ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
[ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
[ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48
8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
[ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
[ 35.838291] ---[ end trace a095c06ee4b97a26 ]---
v2:
Submit for net tree
Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/sched/act_tunnel_key.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 0e23aac09ad6..fea772e66e62 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -153,6 +153,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
break;
default:
+ ret = -EINVAL;
goto err_out;
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v2 iproute2-next 0/6] cm_id, cq, mr, and pd resource tracking
From: David Ahern @ 2018-03-13 20:45 UTC (permalink / raw)
To: Leon Romanovsky, stephen; +Cc: Steve Wise, netdev, linux-rdma
In-Reply-To: <20180313083211.GB1080@mtr-leonro.local>
On 3/13/18 1:32 AM, Leon Romanovsky wrote:
> On Mon, Mar 12, 2018 at 10:53:03AM -0700, David Ahern wrote:
>> On 3/12/18 8:16 AM, Steve Wise wrote:
>>> Hey all,
>>>
>>> The kernel side of this series has been merged for rdma-next [1]. Let me
>>> know if this iproute2 series can be merged, of if it needs more changes.
>>>
>>
>> The problem is that iproute2 headers are synced to kernel headers from
>> DaveM's tree (net-next mainly). I take it this series will not appear in
>> Dave's tree until after a merge through Linus' tree. Correct?
>
> David,
>
> Technically, you are right, and we would like to ask you for an extra tweak
> to the flow for the RDMAtool, because current scheme causes delays at least
> cycle.
>
> Every RDMAtool's patchset which requires changes to headers is always
> includes header patch, can you please accept those series and once you
> are bringing new net-next headers from Linus, simply overwrite all our
> headers?
I did not follow the discussion back when this decision was made, so how
did rdma tool end up in iproute2? I do not need the overhead of
sometimes I sync the rdma header file and sometimes I don't.
One option that comes to mind is to move the rdma header file under the
rdma directory. It breaks the uapi model, but it seems that iproute2 is
just a delivery vehicle for this command.
^ permalink raw reply
* Re: [PATCH 12/15] ice: Add stats and ethtool support
From: Venkataramanan, Anirudh @ 2018-03-13 20:42 UTC (permalink / raw)
To: stephen@networkplumber.org
Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20180310084243.4584ccd9@xeon-e3>
[-- Attachment #1: Type: text/plain, Size: 841 bytes --]
On Sat, 2018-03-10 at 08:42 -0800, Stephen Hemminger wrote:
> On Fri, 9 Mar 2018 09:21:33 -0800
> Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> wrote:
>
> > + /* VSI stats */
> > + struct rtnl_link_stats64 net_stats;
> > + struct rtnl_link_stats64 net_stats_prev;
> > + struct ice_eth_stats eth_stats;
> > + struct ice_eth_stats eth_stats_prev;
>
> You also don't need current and previous as separate copies since
> previous is only
> used while computing the current values.
Thanks for the feedback, Stephen.
eth_stats_prev is used in ice_update_eth_stats when we update
eth_stats.
While looking into this though, I found that net_stats_prev field in
struct ice_vsi (and consequently *prev_ns and *prev_es pointers in
ice_update_vsi_stats) may not be needed. Is this what you meant?
Thanks!
Ani
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3302 bytes --]
^ permalink raw reply
* [PATCH net-next 1/2] selftests/txtimestamp: Add more configurable parameters
From: Vinicius Costa Gomes @ 2018-03-13 20:35 UTC (permalink / raw)
To: netdev; +Cc: Vinicius Costa Gomes, randy.e.witt, davem, eric.dumazet
In-Reply-To: <20180313203519.8638-1-vinicius.gomes@intel.com>
Add a way to configure if poll() should wait forever for an event, the
number of packets that should be sent for each and if there should be
any delay between packets.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
.../selftests/networking/timestamping/txtimestamp.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/networking/timestamping/txtimestamp.c b/tools/testing/selftests/networking/timestamping/txtimestamp.c
index 5df07047ca86..5190b1dd78b1 100644
--- a/tools/testing/selftests/networking/timestamping/txtimestamp.c
+++ b/tools/testing/selftests/networking/timestamping/txtimestamp.c
@@ -68,9 +68,11 @@ static int cfg_num_pkts = 4;
static int do_ipv4 = 1;
static int do_ipv6 = 1;
static int cfg_payload_len = 10;
+static int cfg_poll_timeout = 100;
static bool cfg_show_payload;
static bool cfg_do_pktinfo;
static bool cfg_loop_nodata;
+static bool cfg_no_delay;
static uint16_t dest_port = 9000;
static struct sockaddr_in daddr;
@@ -171,7 +173,7 @@ static void __poll(int fd)
memset(&pollfd, 0, sizeof(pollfd));
pollfd.fd = fd;
- ret = poll(&pollfd, 1, 100);
+ ret = poll(&pollfd, 1, cfg_poll_timeout);
if (ret != 1)
error(1, errno, "poll");
}
@@ -371,7 +373,8 @@ static void do_test(int family, unsigned int opt)
error(1, errno, "send");
/* wait for all errors to be queued, else ACKs arrive OOO */
- usleep(50 * 1000);
+ if (!cfg_no_delay)
+ usleep(50 * 1000);
__poll(fd);
@@ -397,6 +400,9 @@ static void __attribute__((noreturn)) usage(const char *filepath)
" -n: set no-payload option\n"
" -r: use raw\n"
" -R: use raw (IP_HDRINCL)\n"
+ " -D: no delay between packets\n"
+ " -F: poll() waits forever for an event\n"
+ " -c N: number of packets for each test\n"
" -p N: connect to port N\n"
" -u: use udp\n"
" -x: show payload (up to 70 bytes)\n",
@@ -409,7 +415,7 @@ static void parse_opt(int argc, char **argv)
int proto_count = 0;
char c;
- while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
+ while ((c = getopt(argc, argv, "46hIl:np:rRuxc:DF")) != -1) {
switch (c) {
case '4':
do_ipv6 = 0;
@@ -447,6 +453,15 @@ static void parse_opt(int argc, char **argv)
case 'x':
cfg_show_payload = true;
break;
+ case 'c':
+ cfg_num_pkts = strtoul(optarg, NULL, 10);
+ break;
+ case 'D':
+ cfg_no_delay = true;
+ break;
+ case 'F':
+ cfg_poll_timeout = -1;
+ break;
case 'h':
default:
usage(argv[0]);
--
2.16.2
^ permalink raw reply related
* [PATCH net-next 0/2] skbuff: Fix applications not being woken for errors
From: Vinicius Costa Gomes @ 2018-03-13 20:35 UTC (permalink / raw)
To: netdev; +Cc: Vinicius Costa Gomes, randy.e.witt, davem, eric.dumazet
Hi,
Changes from the RFC:
- tweaked commit messages;
Original cover letter:
This is actually a "bug report"-RFC instead of the more usual "new
feature"-RFC.
We are developing an application that uses TX hardware timestamping to
make some measurements, and during development Randy Witt initially
reported that the application poll() never unblocked when TX hardware
timestamping was enabled.
After some investigation, it turned out the problem wasn't only
exclusive to hardware timestamping, and could be reproduced with
software timestamping.
Applying patch (1), and running txtimestamp like this, for example:
$ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F
('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
packets for each test, '-D' to remove the delay between packets, '-l
1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
wait forever)
will cause the application to become stuck in the poll() call in most
of the times. (Note: I couldn't reproduce the issue running against an
address that is routed through loopback.)
Another interesting fact is that if the POLLIN event is added to the
poll() .events, poll() no longer becomes stuck, and more interestingly
the returned event in .revents is only POLLERR.
After a few debugging sessions, we got to 'sock_queue_err_skb()' and
how it notifies applications of the error just enqueued. Changing it
to use 'sk->sk_error_report()', fixes the issue for hardware and
software timestamping. That is patch (2).
The "solution" proposed in patch (2) looks like too big a hammer, if
it's not, then it seems that this problem existed since a long time
ago (pre git) and was uncommon for folks to reach the necessary
conditions to trigger it (my hypothesis is that only triggers when the
error is reported from a different task context than the application).
Am I missing something here?
Cheers,
--
Vinicius Costa Gomes (2):
selftests/txtimestamp: Add more configurable parameters
skbuff: Fix not waking applications when errors are enqueued
net/core/skbuff.c | 2 +-
.../selftests/networking/timestamping/txtimestamp.c | 21 ++++++++++++++++++---
2 files changed, 19 insertions(+), 4 deletions(-)
--
2.16.2
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox