* Re: [bpf-next PATCH v2 10/18] bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192355.8039.74913.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:55 -0700
> Test read and writes for BPF_PROG_TYPE_SK_MSG.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 11/18] bpf: sockmap sample, add option to attach SK_MSG program
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192400.8039.31771.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:00 -0700
> Add sockmap option to use SK_MSG program types.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 12/18] bpf: sockmap sample, add sendfile test
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192405.8039.14734.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:05 -0700
> To exercise TX ULP sendpage implementation we need a test that does
> a sendfile. Add sendfile test option here.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck @ 2018-03-15 18:42 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180315183449.3102.64791.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Hardware-realized virtio_pci devices can implement SR-IOV, so this
patch enables its use. The device in question is an upcoming Intel
NIC that implements both a virtio_net PF and virtio_net VFs. These
are hardware realizations of what has been up to now been a software
interface.
The device in question has the following 4-part PCI IDs:
PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe
The patch currently needs no check for device ID, because the callback
will never be made for devices that do not assert the capability or
when run on a platform incapable of SR-IOV.
One reason for this patch is because the hardware requires the
vendor ID of a VF to be the same as the vendor ID of the PF that
created it. So it seemed logical to simply have a fully-functioning
virtio_net PF create the VFs. This patch makes that possible.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v4: Dropped call to pci_disable_sriov in virtio_pci_remove function
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No code change, added Reviewed-by
drivers/virtio/virtio_pci_common.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 48d4d1cf1cb6..67a227fd7aa0 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -596,6 +596,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
#ifdef CONFIG_PM_SLEEP
.driver.pm = &virtio_pci_pm_ops,
#endif
+ .sriov_configure = pci_sriov_configure_simple,
};
module_pci_driver(virtio_pci_driver);
^ permalink raw reply related
* Re: [bpf-next PATCH v2 13/18] bpf: sockmap sample, add data verification option
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192410.8039.85654.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:10 -0700
> To verify data is not being dropped or corrupted this adds an option
> to verify test-patterns on recv.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 14/18] bpf: sockmap, add sample option to test apply_bytes helper
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192415.8039.76958.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:16 -0700
> This adds an option to test the apply_bytes helper. This option lets
> the user specify an int on the command line specifying how much data
> each verdict should apply to.
>
> When this is set a map entry is set with the bytes input by the user
> and then the specified program --txmsg or --txmsg_redir will use the
> value and set the applied data. If no other option is set then a
> default --txmsg_apply program is run. This program will drop pkts
> if an error is detected on the bytes map lookup. Useful to verify
> the map lookup and apply helper are working and causing a hard
> error if it is not.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192421.8039.90630.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:21 -0700
> Add sample application support for the bpf_msg_cork_bytes helper. This
> lets the user specify how many bytes each verdict should apply to.
>
> Similar to apply_bytes() tests these can be run as a stand-alone test
> when used without other options or inline with other tests by using
> the txmsg_cork option along with any of the basic tests txmsg,
> txmsg_redir, txmsg_drop.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests
From: David Miller @ 2018-03-15 18:43 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192426.8039.59272.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:26 -0700
> Add tests for SK_DROP.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data
From: David Miller @ 2018-03-15 18:43 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192431.8039.36798.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:31 -0700
> This adds an option to test the msg_pull_data helper. This
> uses two options txmsg_start and txmsg_end to let the user
> specify start and end bytes to pull.
>
> The options can be used with txmsg_apply, txmsg_cork options
> as well as with any of the basic tests, txmsg, txmsg_redir and
> txmsg_drop (plus noisy variants) to run pull_data inline with
> those tests. By giving user direct control over the variables
> we can easily do negative testing as well as positive tests.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [bpf-next PATCH v2 18/18] bpf: sockmap test script
From: David Miller @ 2018-03-15 18:43 UTC (permalink / raw)
To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192436.8039.90188.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:24:36 -0700
> This adds the test script I am currently using to validate
> the latest sockmap changes. Shortly sockmap will be ported
> to selftests and these will be run from the infrastructure
> there. Until then add the script here so we have a coverage
> checklist when porting into selftests.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* [pci PATCH v7 3/5] ena: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-03-15 18:43 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180315183449.3102.64791.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Instead of implementing our own version of a SR-IOV configuration stub in
the ena driver we can just reuse the existing
pci_sriov_configure_simple function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No change
drivers/net/ethernet/amazon/ena/ena_netdev.c | 28 +-------------------------
1 file changed, 1 insertion(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150d144e..6054deb1e6aa 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3385,32 +3385,6 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
}
/*****************************************************************************/
-static int ena_sriov_configure(struct pci_dev *dev, int numvfs)
-{
- int rc;
-
- if (numvfs > 0) {
- rc = pci_enable_sriov(dev, numvfs);
- if (rc != 0) {
- dev_err(&dev->dev,
- "pci_enable_sriov failed to enable: %d vfs with the error: %d\n",
- numvfs, rc);
- return rc;
- }
-
- return numvfs;
- }
-
- if (numvfs == 0) {
- pci_disable_sriov(dev);
- return 0;
- }
-
- return -EINVAL;
-}
-
-/*****************************************************************************/
-/*****************************************************************************/
/* ena_remove - Device Removal Routine
* @pdev: PCI device information struct
@@ -3525,7 +3499,7 @@ static int ena_resume(struct pci_dev *pdev)
.suspend = ena_suspend,
.resume = ena_resume,
#endif
- .sriov_configure = ena_sriov_configure,
+ .sriov_configure = pci_sriov_configure_simple,
};
static int __init ena_init(void)
^ permalink raw reply related
* [pci PATCH v7 4/5] nvme: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-03-15 18:43 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180315183449.3102.64791.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Instead of implementing our own version of a SR-IOV configuration stub in
the nvme driver we can just reuse the existing
pci_sriov_configure_simple function.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No code change, added Reviewed-by
drivers/nvme/host/pci.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5933a5c732e8..5e963058882a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2580,24 +2580,6 @@ static void nvme_remove(struct pci_dev *pdev)
nvme_put_ctrl(&dev->ctrl);
}
-static int nvme_pci_sriov_configure(struct pci_dev *pdev, int numvfs)
-{
- int ret = 0;
-
- if (numvfs == 0) {
- if (pci_vfs_assigned(pdev)) {
- dev_warn(&pdev->dev,
- "Cannot disable SR-IOV VFs while assigned\n");
- return -EPERM;
- }
- pci_disable_sriov(pdev);
- return 0;
- }
-
- ret = pci_enable_sriov(pdev, numvfs);
- return ret ? ret : numvfs;
-}
-
#ifdef CONFIG_PM_SLEEP
static int nvme_suspend(struct device *dev)
{
@@ -2716,7 +2698,7 @@ static void nvme_error_resume(struct pci_dev *pdev)
.driver = {
.pm = &nvme_dev_pm_ops,
},
- .sriov_configure = nvme_pci_sriov_configure,
+ .sriov_configure = pci_sriov_configure_simple,
.err_handler = &nvme_err_handler,
};
^ permalink raw reply related
* [pci PATCH v7 5/5] pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
From: Alexander Duyck @ 2018-03-15 18:44 UTC (permalink / raw)
To: bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, ddutile, mheyne, liang-min.wang,
mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180315183449.3102.64791.stgit@localhost.localdomain>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Add a new driver called "pci-pf-stub" to act as a "white-list" for PF
devices that provide no other functionality other then acting as a means of
allocating a set of VFs. For now I only have one example ID provided by
Amazon in terms of devices that require this functionality. The general
idea is that in the future we will see other devices added as vendors come
up with devices where the PF is more or less just a lightweight shim used
to allocate VFs.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
v6: New driver to address concerns about Amazon devices left unsupported
v7: Dropped pci_id table explanation from pci-pf-stub driver
drivers/pci/Kconfig | 12 ++++++++++
drivers/pci/Makefile | 2 ++
drivers/pci/pci-pf-stub.c | 54 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/pci_ids.h | 2 ++
4 files changed, 70 insertions(+)
create mode 100644 drivers/pci/pci-pf-stub.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 34b56a8f8480..cdef2a2a9bc5 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -71,6 +71,18 @@ config PCI_STUB
When in doubt, say N.
+config PCI_PF_STUB
+ tristate "PCI PF Stub driver"
+ depends on PCI
+ depends on PCI_IOV
+ help
+ Say Y or M here if you want to enable support for devices that
+ require SR-IOV support, while at the same time the PF itself is
+ not providing any actual services on the host itself such as
+ storage or networking.
+
+ When in doubt, say N.
+
config XEN_PCIDEV_FRONTEND
tristate "Xen PCI Frontend"
depends on PCI && X86 && XEN
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 941970936840..4e133d3df403 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -43,6 +43,8 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
obj-$(CONFIG_PCI_STUB) += pci-stub.o
+obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o
+
obj-$(CONFIG_PCI_ECAM) += ecam.o
obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
new file mode 100644
index 000000000000..9d5fdf20d485
--- /dev/null
+++ b/drivers/pci/pci-pf-stub.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/* pci-pf-stub - simple stub driver for PCI SR-IOV PF device
+ *
+ * This driver is meant to act as a "white-list" for devices that provde
+ * SR-IOV functionality while at the same time not actually needing a
+ * driver of their own.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+
+/**
+ * pci_pf_stub_white_list - White list of devices to bind pci-pf-stub onto
+ *
+ * This table provides the list of IDs this driver is supposed to bind
+ * onto. You could think of this as a list of "quirked" devices where we
+ * are adding support for SR-IOV here since there are no other drivers
+ * that they would be running under.
+ */
+static const struct pci_device_id pci_pf_stub_white_list[] = {
+ { PCI_VDEVICE(AMAZON, 0x0053) },
+ /* required last entry */
+ { 0 }
+};
+MODULE_DEVICE_TABLE(pci, pci_pf_stub_white_list);
+
+static int pci_pf_stub_probe(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ pci_info(dev, "claimed by pci-pf-stub\n");
+ return 0;
+}
+
+static struct pci_driver pf_stub_driver = {
+ .name = "pci-pf-stub",
+ .id_table = pci_pf_stub_white_list,
+ .probe = pci_pf_stub_probe,
+ .sriov_configure = pci_sriov_configure_simple,
+};
+
+static int __init pci_pf_stub_init(void)
+{
+ return pci_register_driver(&pf_stub_driver);
+}
+
+static void __exit pci_pf_stub_exit(void)
+{
+ pci_unregister_driver(&pf_stub_driver);
+}
+
+module_init(pci_pf_stub_init);
+module_exit(pci_pf_stub_exit);
+
+MODULE_LICENSE("GPL");
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index a6b30667a331..b10621896017 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2548,6 +2548,8 @@
#define PCI_VENDOR_ID_CIRCUITCO 0x1cc8
#define PCI_SUBSYSTEM_ID_CIRCUITCO_MINNOWBOARD 0x0001
+#define PCI_VENDOR_ID_AMAZON 0x1d0f
+
#define PCI_VENDOR_ID_TEKRAM 0x1de1
#define PCI_DEVICE_ID_TEKRAM_DC290 0xdc29
^ permalink raw reply related
* Re: [PATCH net-next 1/1] net sched actions: return explicit error when tunnel_key mode is not specified
From: David Miller @ 2018-03-15 18:44 UTC (permalink / raw)
To: mrv; +Cc: netdev, jhs, amir, xiyou.wangcong, jiri
In-Reply-To: <1520886058-18337-1-git-send-email-mrv@mojatatu.com>
From: Roman Mashak <mrv@mojatatu.com>
Date: Mon, 12 Mar 2018 16:20:58 -0400
> If set/unset mode of the tunnel_key action is not provided, ->init() still
> returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
> this results in crash:
>
> % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1
...
> Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Applied to net and queued up for -stable.
^ permalink raw reply
* Re: [PATCH v2] net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
From: Grygorii Strashko @ 2018-03-15 19:19 UTC (permalink / raw)
To: Florian Fainelli, SZ Lin (林上智),
David S. Miller
Cc: Schuyler Patton, Ivan Khoronzhuk, Keerthy, Sekhar Nori,
linux-omap, netdev, linux-kernel
In-Reply-To: <f2a255a8-945a-6248-0c28-5ed96fbfde0b@gmail.com>
On 03/15/2018 01:29 PM, Florian Fainelli wrote:
> On 03/15/2018 11:18 AM, Grygorii Strashko wrote:
>>
>>
>> On 03/15/2018 12:39 PM, Grygorii Strashko wrote:
>>>
>>>
>>> On 03/15/2018 11:56 AM, SZ Lin (林上智) wrote:
>>>> According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
>>>> DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register
>>>> is only
>>>> available when PHY is configured in RGMII mode with 10Mbps speed. It
>>>> will
>>>> cause some networking issues without RGMII mode, such as carrier sense
>>>> errors and low throughput. TI also mentioned this issue in their
>>>> forum[4].
>>>>
>>>> This patch adds the check mechanism for PHY interface with RGMII
>>>> interface
>>>> type, the in-band mode can only be set in RGMII mode with 10Mbps speed.
>>>>
>>>> References:
>>>> [1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
>>>> [2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
>>>> [3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
>>>> [4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155
>>>>
>>>> Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
>>>> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
>>>> Signed-off-by: Schuyler Patton <spatton@ti.com>
>>>> ---
>>>> Changes from v1:
>>>> - Use phy_interface_is_rgmii helper function
>>>> - Remove blank line
>>>>
>>>
>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>
>>
>> Also could this be marked as stable material 4.9+?
>
> This is not how it works for networking changes, just make sure you
> provide a "Fixes:" tag, and David would usually take care of queueing
> the change to -stable accordingly:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/netdev-FAQ.txt#n148
>
Sry, I know that, but this patch fixes very old commit [1] and it can't be
applied to old Kernels without merge conflicts or build errors:(,
so I've manually checked if it can be applied to most recent stable kernels
and noted kernel versions here.
Also there is dependency from phy_interface_is_rgmii() which was merged in v4.2.
[1] commit a81d8762d713 ("drivers: net cpsw: Enable In Band mode in cpsw for 10 mbps")
^ went in v3.13
--
regards,
-grygorii
^ permalink raw reply
* [PATCH v4 0/2] Remove false-positive VLAs when using max()
From: Kees Cook @ 2018-03-15 19:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Kees Cook, Linus Torvalds, Josh Poimboeuf, Rasmus Villemoes,
Randy Dunlap, Miguel Ojeda, Ingo Molnar, David Laight, Ian Abbott,
linux-input, linux-btrfs, netdev, linux-kernel, kernel-hardening
I'm calling this "v4" since the last effort at this was v3, even
if it's a different approach. Patch 1 adds const_max(), patch 2
uses it in all the places max() was used for stack arrays. Commit
log from patch 1:
---snip---
kernel.h: Introduce const_max() for VLA removal
In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].
To work around this and still gain -Wvla coverage, this patch introduces
a new macro, const_max(), for use in these cases of stack array size
declaration, where the constant expressions are retained. Since this means
losing the double-evaluation protections of the max() macro, this macro is
designed to explicitly fail if used on non-constant arguments.
Older compilers will fail with the unhelpful message:
error: first argument to ‘__builtin_choose_expr’ not a constant
Newer compilers will fail with a hopefully more helpful message:
error: call to ‘__error_not_const_arg’ declared with attribute error: const_max() used with non-compile-time constant arg
To gain the ability to compare differing types, the arguments are
explicitly cast to size_t. Without this, some compiler versions will
fail when comparing different enum types or similar constant expression
cases. With the casting, it's possible to do things like:
int foo[const_max(6, sizeof(something))];
[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170
---eol---
Hopefully this reads well as a summary from all the things that got tried.
I've tested this on allmodconfig builds with gcc 4.4.4 and 6.3.0, with and
without -Wvla.
-Kees
^ permalink raw reply
* [PATCH v4 1/2] kernel.h: Introduce const_max() for VLA removal
From: Kees Cook @ 2018-03-15 19:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Kees Cook, Linus Torvalds, Josh Poimboeuf, Rasmus Villemoes,
Randy Dunlap, Miguel Ojeda, Ingo Molnar, David Laight, Ian Abbott,
linux-input, linux-btrfs, netdev, linux-kernel, kernel-hardening
In-Reply-To: <1521143266-31350-1-git-send-email-keescook@chromium.org>
In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the evaluation
of the max() macro which, due to its construction, ends up converting
constant expression arguments into a constant value result. Attempts
to adjust the behavior of max() ran afoul of version-dependent compiler
behavior[2].
To work around this and still gain -Wvla coverage, this patch introduces
a new macro, const_max(), for use in these cases of stack array size
declaration, where the constant expressions are retained. Since this means
losing the double-evaluation protections of the max() macro, this macro is
designed to explicitly fail if used on non-constant arguments.
Older compilers will fail with the unhelpful message:
error: first argument to ‘__builtin_choose_expr’ not a constant
Newer compilers will fail with a hopefully more helpful message:
error: call to ‘__error_not_const_arg’ declared with attribute error: const_max() used with non-compile-time constant arg
To gain the ability to compare differing types, the arguments are
explicitly cast to size_t. Without this, some compiler versions will
fail when comparing different enum types or similar constant expression
cases. With the casting, it's possible to do things like:
int foo[const_max(6, sizeof(something))];
[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170
Signed-off-by: Kees Cook <keescook@chromium.org>
---
include/linux/kernel.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..012f588b5a25 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -820,6 +820,25 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
x, y)
/**
+ * const_max - return maximum of two positive compile-time constant values
+ * @x: first compile-time constant value
+ * @y: second compile-time constant value
+ *
+ * This has no type checking nor multi-evaluation defenses, and must
+ * only ever be used with positive compile-time constant values (for
+ * example when calculating a stack array size).
+ */
+size_t __error_not_const_arg(void) \
+__compiletime_error("const_max() used with non-compile-time constant arg");
+#define const_max(x, y) \
+ __builtin_choose_expr(__builtin_constant_p(x) && \
+ __builtin_constant_p(y), \
+ (size_t)(x) > (size_t)(y) ? \
+ (size_t)(x) : \
+ (size_t)(y), \
+ __error_not_const_arg())
+
+/**
* min3 - return minimum of three values
* @x: first value
* @y: second value
--
2.7.4
^ permalink raw reply related
* [PATCH v4 2/2] Remove false-positive VLAs when using max()
From: Kees Cook @ 2018-03-15 19:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Kees Cook, Linus Torvalds, Josh Poimboeuf, Rasmus Villemoes,
Randy Dunlap, Miguel Ojeda, Ingo Molnar, David Laight, Ian Abbott,
linux-input, linux-btrfs, netdev, linux-kernel, kernel-hardening
In-Reply-To: <1521143266-31350-1-git-send-email-keescook@chromium.org>
As part of removing VLAs from the kernel[1], we want to build with -Wvla,
but it is overly pessimistic and only accepts constant expressions for
stack array sizes, instead of also constant values. The max() macro
triggers the warning, so this refactors these uses of max() to use the
new const_max() instead.
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Kees Cook <keescook@chromium.org>
---
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
fs/btrfs/tree-checker.c | 3 ++-
lib/vsprintf.c | 4 ++--
net/ipv4/proc.c | 8 ++++----
net/ipv6/proc.c | 10 ++++------
5 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/drivers/input/touchscreen/cyttsp4_core.c b/drivers/input/touchscreen/cyttsp4_core.c
index 727c3232517c..f89497940051 100644
--- a/drivers/input/touchscreen/cyttsp4_core.c
+++ b/drivers/input/touchscreen/cyttsp4_core.c
@@ -868,7 +868,7 @@ static void cyttsp4_get_mt_touches(struct cyttsp4_mt_data *md, int num_cur_tch)
struct cyttsp4_touch tch;
int sig;
int i, j, t = 0;
- int ids[max(CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
+ int ids[const_max(CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
memset(ids, 0, si->si_ofs.tch_abs[CY_TCH_T].max * sizeof(int));
for (i = 0; i < num_cur_tch; i++) {
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c3c8d48f6618..1ddd6cc3c4fc 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -341,7 +341,8 @@ static int check_dir_item(struct btrfs_root *root,
*/
if (key->type == BTRFS_DIR_ITEM_KEY ||
key->type == BTRFS_XATTR_ITEM_KEY) {
- char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
+ char namebuf[const_max(BTRFS_NAME_LEN,
+ XATTR_NAME_MAX)];
read_extent_buffer(leaf, namebuf,
(unsigned long)(di + 1), name_len);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d7a708f82559..9d5610b643ce 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -744,8 +744,8 @@ char *resource_string(char *buf, char *end, struct resource *res,
#define FLAG_BUF_SIZE (2 * sizeof(res->flags))
#define DECODED_BUF_SIZE sizeof("[mem - 64bit pref window disabled]")
#define RAW_BUF_SIZE sizeof("[mem - flags 0x]")
- char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
- 2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
+ char sym[const_max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
+ 2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
char *p = sym, *pend = sym + sizeof(sym);
int decode = (fmt[0] == 'R') ? 1 : 0;
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index dc5edc8f7564..fad6f989004e 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,7 +46,7 @@
#include <net/sock.h>
#include <net/raw.h>
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
+#define TCPUDP_MIB_MAX const_max(UDP_MIB_MAX, TCP_MIB_MAX)
/*
* Report socket allocation statistics [mea@utu.fi]
@@ -404,7 +404,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
struct net *net = seq->private;
int i;
- memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+ memset(buff, 0, sizeof(buff));
seq_puts(seq, "\nTcp:");
for (i = 0; snmp4_tcp_list[i].name; i++)
@@ -421,7 +421,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
seq_printf(seq, " %lu", buff[i]);
}
- memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+ memset(buff, 0, sizeof(buff));
snmp_get_cpu_field_batch(buff, snmp4_udp_list,
net->mib.udp_statistics);
@@ -432,7 +432,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
for (i = 0; snmp4_udp_list[i].name; i++)
seq_printf(seq, " %lu", buff[i]);
- memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
+ memset(buff, 0, sizeof(buff));
/* the UDP and UDP-Lite MIBs are the same */
seq_puts(seq, "\nUdpLite:");
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index b67814242f78..58bbfc4fa7fa 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -30,10 +30,8 @@
#include <net/transp_v6.h>
#include <net/ipv6.h>
-#define MAX4(a, b, c, d) \
- max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
-#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
- IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+#define SNMP_MIB_MAX const_max(const_max(UDP_MIB_MAX, TCP_MIB_MAX), \
+ const_max(IPSTATS_MIB_MAX, ICMP_MIB_MAX))
static int sockstat6_seq_show(struct seq_file *seq, void *v)
{
@@ -199,7 +197,7 @@ static void snmp6_seq_show_item(struct seq_file *seq, void __percpu *pcpumib,
int i;
if (pcpumib) {
- memset(buff, 0, sizeof(unsigned long) * SNMP_MIB_MAX);
+ memset(buff, 0, sizeof(buff));
snmp_get_cpu_field_batch(buff, itemlist, pcpumib);
for (i = 0; itemlist[i].name; i++)
@@ -218,7 +216,7 @@ static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu *mib,
u64 buff64[SNMP_MIB_MAX];
int i;
- memset(buff64, 0, sizeof(u64) * SNMP_MIB_MAX);
+ memset(buff64, 0, sizeof(buff64));
snmp_get_cpu_field64_batch(buff64, itemlist, mib, syncpoff);
for (i = 0; itemlist[i].name; i++)
--
2.7.4
^ permalink raw reply related
* [RFC 2/2] page_frag_cache: Store metadata in struct page
From: Matthew Wilcox @ 2018-03-15 19:53 UTC (permalink / raw)
To: Alexander Duyck; +Cc: linux-mm, netdev, Matthew Wilcox
In-Reply-To: <20180315195329.7787-1-willy@infradead.org>
From: Matthew Wilcox <mawilcox@microsoft.com>
Shrink page_frag_cache from 24 to 8 bytes (a single pointer to the
currently-in-use struct page) by using the page's refcount directly
(instead of maintaining a bias) and storing our current progress through
the page in the same bits currently used for page->index. We no longer
need to reflect the page pfmemalloc state if we're storing the page
directly.
On the downside, we now call page_address() on every allocation, and we
do an atomic_inc() rather than a non-atomic decrement, but we should
touch the same number of cachelines and there is far less code (and
the code is less complex).
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/mm_types.h | 17 +-----
mm/page_alloc.c | 135 ++++++++++++++++++++++++-----------------------
net/core/skbuff.c | 4 +-
3 files changed, 74 insertions(+), 82 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1c5dea402501..f922cb62bd91 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -90,6 +90,7 @@ struct page {
union {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* sl[aou]b first free object */
+ unsigned int offset; /* page_frag highwater mark */
/* page_deferred_list().prev -- second tail page */
};
@@ -219,22 +220,8 @@ struct page {
#endif
} _struct_page_alignment;
-#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK)
-#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE)
-
struct page_frag_cache {
- void * va;
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
- __u16 offset;
- __u16 size;
-#else
- __u32 offset;
-#endif
- /* we maintain a pagecount bias, so that we dont dirty cache line
- * containing page->_refcount every time we allocate a fragment.
- */
- unsigned int pagecnt_bias;
- bool pfmemalloc;
+ struct page *page;
};
typedef unsigned long vm_flags_t;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a9c14214ed2..f8a176aab287 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4319,34 +4319,72 @@ void free_pages(unsigned long addr, unsigned int order)
EXPORT_SYMBOL(free_pages);
/*
- * Page Fragment:
- * An arbitrary-length arbitrary-offset area of memory which resides
- * within a 0 or higher order page. Multiple fragments within that page
- * are individually refcounted, in the page's reference counter.
+ * The page fragment allocator is simple, yet effective. It allocates
+ * pages from the page allocator, then hands out fragments of those
+ * pages to its callers. It makes no effort to track which parts of
+ * the page remain in use, always allocating fresh memory. The page
+ * reference count is used to keep track of whether any fragment is
+ * still in use; when all fragments in a page have been freed, the
+ * entire page is returned to the page allocator.
*
- * The page_frag functions below provide a simple allocation framework for
- * page fragments. This is used by the network stack and network device
- * drivers to provide a backing region of memory for use as either an
- * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
+ * The page fragment allocator performs no locking. The caller is
+ * expected to ensure that two callers cannot simultaneously allocate
+ * from the same page_frag_cache. Freeing is atomic and is permitted
+ * to happen simultaneously with other frees or an allocation.
+ *
+ * The allocator uses the struct page to store its state. The 'offset'
+ * field in struct page is used to track how far through the page the
+ * allocation has proceeded. The 'refcount' field is used to track
+ * how many fragments have been allocated from this page. All other
+ * fields in struct page may be used by the owner of the page_frag_cache.
+ * The refcount is incremented by one while the page is still actively being
+ * allocated from; this prevents it from being freed prematurely.
*/
-static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
- gfp_t gfp_mask)
+
+#define PAGE_FRAG_ALLOC_SIZE (64 * 1024)
+#define PAGE_FRAG_ORDER get_order(PAGE_FRAG_ALLOC_SIZE)
+
+static noinline
+struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
+ unsigned int size, gfp_t gfp_mask)
{
+ struct page *old = nc->page;
struct page *page = NULL;
- gfp_t gfp = gfp_mask;
-
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
- gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
- __GFP_NOMEMALLOC;
- page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
- PAGE_FRAG_CACHE_MAX_ORDER);
- nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
-#endif
- if (unlikely(!page))
- page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
- nc->va = page ? page_address(page) : NULL;
+ if (size > PAGE_FRAG_ALLOC_SIZE)
+ return NULL;
+
+ /*
+ * If all the previous allocations from this page have already been
+ * freed, reuse the page if it can satisfy this allocation.
+ */
+ if (old && page_ref_count(old) == 1) {
+ unsigned int offset = PAGE_SIZE << compound_order(old);
+
+ if (offset > size) {
+ old->offset = offset;
+ return old;
+ }
+ }
+
+ if (PAGE_FRAG_ORDER > 0) {
+ gfp_t gfp = gfp_mask | __GFP_COMP | __GFP_NOWARN |
+ __GFP_NORETRY | __GFP_NOMEMALLOC;
+
+ page = alloc_pages_node(NUMA_NO_NODE, gfp, PAGE_FRAG_ORDER);
+ if (unlikely(!page) && size > PAGE_SIZE)
+ return NULL;
+ }
+ if (unlikely(!page))
+ page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
+ if (unlikely(!page))
+ return NULL;
+
+ if (old)
+ put_page(old);
+ nc->page = page;
+ page->offset = PAGE_SIZE << compound_order(page);
return page;
}
@@ -4366,56 +4404,23 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
EXPORT_SYMBOL(__page_frag_cache_drain);
void *page_frag_alloc(struct page_frag_cache *nc,
- unsigned int fragsz, gfp_t gfp_mask)
+ unsigned int size, gfp_t gfp_mask)
{
- unsigned int size = PAGE_SIZE;
- struct page *page;
- int offset;
+ struct page *page = nc->page;
+ unsigned int offset = page->offset;
- if (unlikely(!nc->va)) {
-refill:
- page = __page_frag_cache_refill(nc, gfp_mask);
+ if (unlikely(!page || offset < size)) {
+ page = __page_frag_cache_refill(nc, size, gfp_mask);
if (!page)
return NULL;
-
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
- /* if size can vary use size else just use PAGE_SIZE */
- size = nc->size;
-#endif
- /* Even if we own the page, we do not use atomic_set().
- * This would break get_page_unless_zero() users.
- */
- page_ref_add(page, size - 1);
-
- /* reset page count bias and offset to start of new frag */
- nc->pfmemalloc = page_is_pfmemalloc(page);
- nc->pagecnt_bias = size;
- nc->offset = size;
- }
-
- offset = nc->offset - fragsz;
- if (unlikely(offset < 0)) {
- page = virt_to_page(nc->va);
-
- if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
- goto refill;
-
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
- /* if size can vary use size else just use PAGE_SIZE */
- size = nc->size;
-#endif
- /* OK, page count is 0, we can safely set it */
- set_page_count(page, size);
-
- /* reset page count bias and offset to start of new frag */
- nc->pagecnt_bias = size;
- offset = size - fragsz;
+ offset = page->offset;
}
- nc->pagecnt_bias--;
- nc->offset = offset;
+ page_ref_inc(page);
+ offset -= size;
+ page->offset = offset;
- return nc->va + offset;
+ return page_address(page) + offset;
}
EXPORT_SYMBOL(page_frag_alloc);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 09bd89c90a71..59df4db31aed 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -412,7 +412,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
nc = this_cpu_ptr(&netdev_alloc_cache);
data = page_frag_alloc(nc, len, gfp_mask);
- pfmemalloc = nc->pfmemalloc;
+ pfmemalloc = page_is_pfmemalloc(nc->page);
local_irq_restore(flags);
@@ -486,7 +486,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
}
/* use OR instead of assignment to avoid clearing of bits in mask */
- if (nc->page.pfmemalloc)
+ if (page_is_pfmemalloc(nc->page.page))
skb->pfmemalloc = 1;
skb->head_frag = 1;
--
2.16.2
^ permalink raw reply related
* [RFC 1/2] mm: Use page->mapping to indicate pfmemalloc
From: Matthew Wilcox @ 2018-03-15 19:53 UTC (permalink / raw)
To: Alexander Duyck; +Cc: linux-mm, netdev, Matthew Wilcox
In-Reply-To: <20180315195329.7787-1-willy@infradead.org>
From: Matthew Wilcox <mawilcox@microsoft.com>
I want to use page->index for a different purpose, so move the pfmemalloc
indicator from page->index to page->mapping.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/mm.h | 16 +++++-----------
mm/page_alloc.c | 8 +++-----
2 files changed, 8 insertions(+), 16 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4ef7fb1726ab..06ea71358bda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1122,6 +1122,9 @@ static inline pgoff_t page_index(struct page *page)
bool page_mapped(struct page *page);
struct address_space *page_mapping(struct page *page);
+/* page->mapping cannot point into the zero page. */
+#define MAPPING_PFMEMALLOC ((struct address_space *)4)
+
/*
* Return true only if the page has been allocated with
* ALLOC_NO_WATERMARKS and the low watermark was not
@@ -1129,11 +1132,7 @@ struct address_space *page_mapping(struct page *page);
*/
static inline bool page_is_pfmemalloc(struct page *page)
{
- /*
- * Page index cannot be this large so this must be
- * a pfmemalloc page.
- */
- return page->index == -1UL;
+ return page->mapping == MAPPING_PFMEMALLOC;
}
/*
@@ -1142,12 +1141,7 @@ static inline bool page_is_pfmemalloc(struct page *page)
*/
static inline void set_page_pfmemalloc(struct page *page)
{
- page->index = -1UL;
-}
-
-static inline void clear_page_pfmemalloc(struct page *page)
-{
- page->index = 0;
+ page->mapping = MAPPING_PFMEMALLOC;
}
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 796ce1b3e0a1..7a9c14214ed2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1047,7 +1047,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
}
}
- if (PageMappingFlags(page))
+ if (PageMappingFlags(page) || page_is_pfmemalloc(page))
page->mapping = NULL;
if (memcg_kmem_enabled() && PageKmemcg(page))
memcg_kmem_uncharge(page, order);
@@ -1802,8 +1802,8 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
set_page_owner(page, order, gfp_flags);
}
-static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
- unsigned int alloc_flags)
+static void prep_new_page(struct page *page, unsigned int order,
+ gfp_t gfp_flags, unsigned int alloc_flags)
{
int i;
@@ -1824,8 +1824,6 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
*/
if (alloc_flags & ALLOC_NO_WATERMARKS)
set_page_pfmemalloc(page);
- else
- clear_page_pfmemalloc(page);
}
/*
--
2.16.2
^ permalink raw reply related
* [RFC 0/2] Shrink page_frag_cache
From: Matthew Wilcox @ 2018-03-15 19:53 UTC (permalink / raw)
To: Alexander Duyck; +Cc: linux-mm, netdev, Matthew Wilcox
From: Matthew Wilcox <mawilcox@microsoft.com>
I've just learned about the page_frag_cache allocator, and now I want
to use it everywhere ;-)
But before I start using it in other places, I want to see if it can
be improved at all. The pfmemalloc flag is pretty specific to how the
network stack uses it (with GFP_ATOMIC), and the pagecnt_bias is tricky
to understand. I think we can do better by just using the fields in
struct page directly. I don't have a suitable setup for performance
testing this code ... Alex, is there any chance you'd have time to give
this a spin?
Matthew Wilcox (2):
mm: Use page->mapping to indicate pfmemalloc
page_frag_cache: Store metadata in struct page
include/linux/mm.h | 16 ++----
include/linux/mm_types.h | 17 +-----
mm/page_alloc.c | 143 ++++++++++++++++++++++++-----------------------
net/core/skbuff.c | 4 +-
4 files changed, 82 insertions(+), 98 deletions(-)
--
2.16.2
^ permalink raw reply
* Re: [RFC 0/2] Shrink page_frag_cache
From: Alexander Duyck @ 2018-03-15 20:02 UTC (permalink / raw)
To: Matthew Wilcox, Jesper Dangaard Brouer
Cc: Alexander Duyck, linux-mm, Netdev, Matthew Wilcox
In-Reply-To: <20180315195329.7787-1-willy@infradead.org>
On Thu, Mar 15, 2018 at 12:53 PM, Matthew Wilcox <willy@infradead.org> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> I've just learned about the page_frag_cache allocator, and now I want
> to use it everywhere ;-)
>
> But before I start using it in other places, I want to see if it can
> be improved at all. The pfmemalloc flag is pretty specific to how the
> network stack uses it (with GFP_ATOMIC), and the pagecnt_bias is tricky
> to understand. I think we can do better by just using the fields in
> struct page directly. I don't have a suitable setup for performance
> testing this code ... Alex, is there any chance you'd have time to give
> this a spin?
>
> Matthew Wilcox (2):
> mm: Use page->mapping to indicate pfmemalloc
> page_frag_cache: Store metadata in struct page
I can try taking a look at it, but I am pretty swamped for the next
several days. I probably won't have any free time to really test
something like this until Monday or Tuesday of next week. I've added
Jesper to the Cc as this might be something he is interested in as
well.
Thanks.
- Alex
^ permalink raw reply
* [PATCH] json_print: fix print_uint with helper type extensions
From: Kevin Darbyshire-Bryant @ 2018-03-15 20:07 UTC (permalink / raw)
To: netdev; +Cc: Kevin Darbyshire-Bryant
Introduce print helper functions for int, uint, explicit int32, uint32,
int64 & uint64.
print_int used 'int' type internally, whereas print_uint used 'uint64_t'
These helper functions eventually call vfprintf(fp, fmt, args) which is
a variable argument list function and is dependent upon 'fmt' containing
correct information about the length of the passed arguments.
Unfortunately print_int v print_uint offered no clue to the programmer
that internally passed ints to print_uint were being promoted to 64bits,
thus the format passed in 'fmt' string vs the actual passed integer
could be different lengths. This is even more interesting on big endian
architectures where 'vfprintf' would be looking in the middle of an
int64 type and hence produced wildly incorrect values in tc qdisc
output.
print_u/int now stick with native int size. print_u/int32 & print
u/int64 functions offer explicit integer sizes.
To portably use these formats you should use the relevant PRIdN or PRIuN
formats as defined in inttypes.h
e.g.
print_uint64(PRINT_ANY, "refcnt", "refcnt %" PRIu64 " ", t->tcm_info)
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
include/json_print.h | 6 +++++-
lib/json_print.c | 6 +++++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/json_print.h b/include/json_print.h
index 2ca7830a..fb62b142 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -56,10 +56,14 @@ void close_json_array(enum output_type type, const char *delim);
print_color_##type_name(t, COLOR_NONE, key, fmt, value); \
}
_PRINT_FUNC(int, int);
+_PRINT_FUNC(uint, unsigned int);
_PRINT_FUNC(bool, bool);
_PRINT_FUNC(null, const char*);
_PRINT_FUNC(string, const char*);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(int32, int32_t);
+_PRINT_FUNC(uint32, uint32_t);
+_PRINT_FUNC(int64, int64_t);
+_PRINT_FUNC(uint64, uint64_t);
_PRINT_FUNC(hu, unsigned short);
_PRINT_FUNC(hex, unsigned int);
_PRINT_FUNC(0xhex, unsigned int);
diff --git a/lib/json_print.c b/lib/json_print.c
index 6518ba98..12ee26df 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -117,8 +117,12 @@ void close_json_array(enum output_type type, const char *str)
} \
}
_PRINT_FUNC(int, int);
+_PRINT_FUNC(uint, unsigned int);
_PRINT_FUNC(hu, unsigned short);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(int32, int32_t);
+_PRINT_FUNC(uint32, uint32_t);
+_PRINT_FUNC(int64, int64_t);
+_PRINT_FUNC(uint64, uint64_t);
_PRINT_FUNC(lluint, unsigned long long int);
_PRINT_FUNC(float, double);
#undef _PRINT_FUNC
--
2.14.3 (Apple Git-98)
^ permalink raw reply related
* [PATCH net-next] net: ethernet: ti: cpsw: enable vlan rx vlan offload
From: Grygorii Strashko @ 2018-03-15 20:15 UTC (permalink / raw)
To: David S. Miller, netdev, Sekhar Nori
Cc: linux-kernel, linux-omap, Grygorii Strashko
In VLAN_AWARE mode CPSW can insert VLAN header encapsulation word on Host
port 0 egress (RX) before the packet data if RX_VLAN_ENCAP bit is set in
CPSW_CONTROL register. VLAN header encapsulation word has following format:
HDR_PKT_Priority bits 29-31 - Header Packet VLAN prio (Highest prio: 7)
HDR_PKT_CFI bits 28 - Header Packet VLAN CFI bit.
HDR_PKT_Vid bits 27-16 - Header Packet VLAN ID
PKT_Type bits 8-9 - Packet Type. Indicates whether the packet is
VLAN-tagged, priority-tagged, or non-tagged.
00: VLAN-tagged packet
01: Reserved
10: Priority-tagged packet
11: Non-tagged packet
This feature can be used to implement TX VLAN offload in case of
VLAN-tagged packets and to insert VLAN tag in case Non-tagged packet was
received on port with PVID set. As per documentation, CPSW never modifies
packet data on Host egress (RX) and as result, without this feature
enabled, Host port will not be able to receive properly packets which
entered switch non-tagged through external Port with PVID set (when
non-tagged packet forwarded from external Port with PVID set to another
external Port - packet will be VLAN tagged properly).
Implementation details:
- on RX driver will check CPDMA status bit RX_VLAN_ENCAP BIT(19) in CPPI
descriptor to identify when VLAN header encapsulation word is present.
- PKT_Type = 0x01 or 0x02 then ignore VLAN header encapsulation word and
pass packet as is;
- if HDR_PKT_Vid = 0 then ignore VLAN header encapsulation word and pass
packet as is;
- In dual mac mode traffic is separated between ports using default port
vlans, which are not be visible to Host and so should not be reported.
Hence, check for default port vlans in dual mac mode and ignore VLAN header
encapsulation word;
- otherwise fill SKB with VLAN info using __vlan_hwaccel_put_tag();
- PKT_Type = 0x00 (VLAN-tagged) then strip out VLAN header from SKB.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
drivers/net/ethernet/ti/cpsw.c | 67 +++++++++++++++++++++++++++++++--
drivers/net/ethernet/ti/davinci_cpdma.c | 2 +-
drivers/net/ethernet/ti/davinci_cpdma.h | 2 +
3 files changed, 67 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1b1b78f..8af8891 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -120,14 +120,18 @@ do { \
#define CPDMA_RXCP 0x60
#define CPSW_POLL_WEIGHT 64
+#define CPSW_RX_VLAN_ENCAP_HDR_SIZE 4
#define CPSW_MIN_PACKET_SIZE (VLAN_ETH_ZLEN)
-#define CPSW_MAX_PACKET_SIZE (VLAN_ETH_FRAME_LEN + ETH_FCS_LEN)
+#define CPSW_MAX_PACKET_SIZE (VLAN_ETH_FRAME_LEN +\
+ ETH_FCS_LEN +\
+ CPSW_RX_VLAN_ENCAP_HDR_SIZE)
#define RX_PRIORITY_MAPPING 0x76543210
#define TX_PRIORITY_MAPPING 0x33221100
#define CPDMA_TX_PRIORITY_MAP 0x01234567
#define CPSW_VLAN_AWARE BIT(1)
+#define CPSW_RX_VLAN_ENCAP BIT(2)
#define CPSW_ALE_VLAN_AWARE 1
#define CPSW_FIFO_NORMAL_MODE (0 << 16)
@@ -148,6 +152,18 @@ do { \
#define CPSW_MAX_QUEUES 8
#define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
+#define CPSW_RX_VLAN_ENCAP_HDR_PRIO_SHIFT 29
+#define CPSW_RX_VLAN_ENCAP_HDR_PRIO_MSK GENMASK(2, 0)
+#define CPSW_RX_VLAN_ENCAP_HDR_VID_SHIFT 16
+#define CPSW_RX_VLAN_ENCAP_HDR_PKT_TYPE_SHIFT 8
+#define CPSW_RX_VLAN_ENCAP_HDR_PKT_TYPE_MSK GENMASK(1, 0)
+enum {
+ CPSW_RX_VLAN_ENCAP_HDR_PKT_VLAN_TAG = 0,
+ CPSW_RX_VLAN_ENCAP_HDR_PKT_RESERV,
+ CPSW_RX_VLAN_ENCAP_HDR_PKT_PRIO_TAG,
+ CPSW_RX_VLAN_ENCAP_HDR_PKT_UNTAG,
+};
+
static int debug_level;
module_param(debug_level, int, 0);
MODULE_PARM_DESC(debug_level, "cpsw debug level (NETIF_MSG bits)");
@@ -718,6 +734,49 @@ static void cpsw_tx_handler(void *token, int len, int status)
dev_kfree_skb_any(skb);
}
+static void cpsw_rx_vlan_encap(struct sk_buff *skb)
+{
+ struct cpsw_priv *priv = netdev_priv(skb->dev);
+ struct cpsw_common *cpsw = priv->cpsw;
+ u32 rx_vlan_encap_hdr = *((u32 *)skb->data);
+ u16 vtag, vid, prio, pkt_type;
+
+ /* Remove VLAN header encapsulation word */
+ skb_pull(skb, CPSW_RX_VLAN_ENCAP_HDR_SIZE);
+
+ pkt_type = (rx_vlan_encap_hdr >>
+ CPSW_RX_VLAN_ENCAP_HDR_PKT_TYPE_SHIFT) &
+ CPSW_RX_VLAN_ENCAP_HDR_PKT_TYPE_MSK;
+ /* Ignore unknown & Priority-tagged packets*/
+ if (pkt_type == CPSW_RX_VLAN_ENCAP_HDR_PKT_RESERV ||
+ pkt_type == CPSW_RX_VLAN_ENCAP_HDR_PKT_PRIO_TAG)
+ return;
+
+ vid = (rx_vlan_encap_hdr >>
+ CPSW_RX_VLAN_ENCAP_HDR_VID_SHIFT) &
+ VLAN_VID_MASK;
+ /* Ignore vid 0 and pass packet as is */
+ if (!vid)
+ return;
+ /* Ignore default vlans in dual mac mode */
+ if (cpsw->data.dual_emac &&
+ vid == cpsw->slaves[priv->emac_port].port_vlan)
+ return;
+
+ prio = (rx_vlan_encap_hdr >>
+ CPSW_RX_VLAN_ENCAP_HDR_PRIO_SHIFT) &
+ CPSW_RX_VLAN_ENCAP_HDR_PRIO_MSK;
+
+ vtag = (prio << VLAN_PRIO_SHIFT) | vid;
+ __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vtag);
+
+ /* strip vlan tag for VLAN-tagged packet */
+ if (pkt_type == CPSW_RX_VLAN_ENCAP_HDR_PKT_VLAN_TAG) {
+ memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
+ skb_pull(skb, VLAN_HLEN);
+ }
+}
+
static void cpsw_rx_handler(void *token, int len, int status)
{
struct cpdma_chan *ch;
@@ -752,6 +811,8 @@ static void cpsw_rx_handler(void *token, int len, int status)
if (new_skb) {
skb_copy_queue_mapping(new_skb, skb);
skb_put(skb, len);
+ if (status & CPDMA_RX_VLAN_ENCAP)
+ cpsw_rx_vlan_encap(skb);
cpts_rx_timestamp(cpsw->cpts, skb);
skb->protocol = eth_type_trans(skb, ndev);
netif_receive_skb(skb);
@@ -1406,7 +1467,7 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_VLAN_AWARE,
CPSW_ALE_VLAN_AWARE);
control_reg = readl(&cpsw->regs->control);
- control_reg |= CPSW_VLAN_AWARE;
+ control_reg |= CPSW_VLAN_AWARE | CPSW_RX_VLAN_ENCAP;
writel(control_reg, &cpsw->regs->control);
fifo_mode = (cpsw->data.dual_emac) ? CPSW_FIFO_DUAL_MAC_MODE :
CPSW_FIFO_NORMAL_MODE;
@@ -3122,7 +3183,7 @@ static int cpsw_probe(struct platform_device *pdev)
cpsw->quirk_irq = true;
}
- ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+ ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
ndev->netdev_ops = &cpsw_netdev_ops;
ndev->ethtool_ops = &cpsw_ethtool_ops;
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 6f9173f..31ae041 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -1164,7 +1164,7 @@ static int __cpdma_chan_process(struct cpdma_chan *chan)
outlen -= CPDMA_DESC_CRC_LEN;
status = status & (CPDMA_DESC_EOQ | CPDMA_DESC_TD_COMPLETE |
- CPDMA_DESC_PORT_MASK);
+ CPDMA_DESC_PORT_MASK | CPDMA_RX_VLAN_ENCAP);
chan->head = desc_from_phys(pool, desc_read(desc, hw_next));
chan_write(chan, cp, desc_dma);
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h b/drivers/net/ethernet/ti/davinci_cpdma.h
index fd65ce2..d399af5 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -19,6 +19,8 @@
#define CPDMA_RX_SOURCE_PORT(__status__) ((__status__ >> 16) & 0x7)
+#define CPDMA_RX_VLAN_ENCAP BIT(19)
+
#define CPDMA_EOI_RX_THRESH 0x0
#define CPDMA_EOI_RX 0x1
#define CPDMA_EOI_TX 0x2
--
2.10.5
^ permalink raw reply related
* Re: [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
From: Alexei Starovoitov @ 2018-03-15 20:15 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192421.8039.90630.stgit@john-Precision-Tower-5810>
On Mon, Mar 12, 2018 at 12:24:21PM -0700, John Fastabend wrote:
> Add sample application support for the bpf_msg_cork_bytes helper. This
> lets the user specify how many bytes each verdict should apply to.
>
> Similar to apply_bytes() tests these can be run as a stand-alone test
> when used without other options or inline with other tests by using
> the txmsg_cork option along with any of the basic tests txmsg,
> txmsg_redir, txmsg_drop.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
> include/uapi/linux/bpf_common.h | 7 ++--
> samples/sockmap/sockmap_kern.c | 53 +++++++++++++++++++++++++----
> samples/sockmap/sockmap_user.c | 19 ++++++++++
> tools/include/uapi/linux/bpf.h | 3 +-
> tools/testing/selftests/bpf/bpf_helpers.h | 2 +
> 5 files changed, 71 insertions(+), 13 deletions(-)
>
> diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
> index ee97668..18be907 100644
> --- a/include/uapi/linux/bpf_common.h
> +++ b/include/uapi/linux/bpf_common.h
> @@ -15,10 +15,9 @@
>
> /* ld/ldx fields */
> #define BPF_SIZE(code) ((code) & 0x18)
> -#define BPF_W 0x00 /* 32-bit */
> -#define BPF_H 0x08 /* 16-bit */
> -#define BPF_B 0x10 /* 8-bit */
> -/* eBPF BPF_DW 0x18 64-bit */
> +#define BPF_W 0x00
> +#define BPF_H 0x08
> +#define BPF_B 0x10
this hunk seems wrong here. Botched rebase?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox