* [PATCH net-next 1/2] fs/crashdd: add API to collect hardware dump in second kernel
From: Rahul Lakkireddy @ 2018-03-23 8:31 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
ganeshgr-ut6Up61K2wZBDgjK7y7TUQ, Rahul Lakkireddy,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
davem-fT/PcQaiUtIeIZ0/mPfg9Q,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1521793455.git.rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Add a new module crashdd that exports the /sys/kernel/crashdd/
directory in second kernel, containing collected hardware/firmware
dumps.
The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /sys/kernel/crashdd/ directory are
as follows:
1. During probe (before hardware is initialized), device drivers
register to the crashdd module (via crashdd_add_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.
2. Crashdd creates a driver's directory under
/sys/kernel/crashdd/<driver>. Then, it allocates the buffer with
requested size and invokes the device driver's registered callback
function.
3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to crashdd.
4. Crashdd exposes the buffer as a binary file via
/sys/kernel/crashdd/<driver>/<dump_file>.
Suggested-by: Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.
Suggested-by: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs. Suggested by
Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
rfc v2:
- Collecting logs in 2nd kernel instead of during kernel panic.
Suggested by Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.
- Patch added in this series.
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/crashdd/Kconfig | 10 ++
fs/crashdd/Makefile | 3 +
fs/crashdd/crashdd.c | 234 ++++++++++++++++++++++++++++++++++++++++++
fs/crashdd/crashdd_internal.h | 24 +++++
include/linux/crashdd.h | 24 +++++
7 files changed, 297 insertions(+)
create mode 100644 fs/crashdd/Kconfig
create mode 100644 fs/crashdd/Makefile
create mode 100644 fs/crashdd/crashdd.c
create mode 100644 fs/crashdd/crashdd_internal.h
create mode 100644 include/linux/crashdd.h
diff --git a/fs/Kconfig b/fs/Kconfig
index bc821a86d965..aae1c55a7dad 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -208,6 +208,7 @@ config ARCH_HAS_GIGANTIC_PAGE
source "fs/configfs/Kconfig"
source "fs/efivarfs/Kconfig"
+source "fs/crashdd/Kconfig"
endmenu
diff --git a/fs/Makefile b/fs/Makefile
index add789ea270a..ff398a44f611 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -128,3 +128,4 @@ obj-y += exofs/ # Multiple modules
obj-$(CONFIG_CEPH_FS) += ceph/
obj-$(CONFIG_PSTORE) += pstore/
obj-$(CONFIG_EFIVAR_FS) += efivarfs/
+obj-$(CONFIG_CRASH_DEVICE_DUMP) += crashdd/
diff --git a/fs/crashdd/Kconfig b/fs/crashdd/Kconfig
new file mode 100644
index 000000000000..5db9c7c98c17
--- /dev/null
+++ b/fs/crashdd/Kconfig
@@ -0,0 +1,10 @@
+config CRASH_DEVICE_DUMP
+ bool "Crash Kernel Device Hardware/Firmware Logs"
+ depends on SYSFS && CRASH_DUMP
+ default y
+ ---help---
+ Device drivers can collect the device specific snapshot of
+ their hardware or firmware before they are initialized in
+ crash recovery kernel. If you say Y here a tree of device
+ specific dumps will be made available under /sys/kernel/crashdd/
+ directory.
diff --git a/fs/crashdd/Makefile b/fs/crashdd/Makefile
new file mode 100644
index 000000000000..8dbf946c0ea4
--- /dev/null
+++ b/fs/crashdd/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y := crashdd.o
diff --git a/fs/crashdd/crashdd.c b/fs/crashdd/crashdd.c
new file mode 100644
index 000000000000..73882ff7722e
--- /dev/null
+++ b/fs/crashdd/crashdd.c
@@ -0,0 +1,234 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Chelsio Communications, Inc. All rights reserved. */
+
+#include <linux/vmalloc.h>
+#include <linux/crash_dump.h>
+#include <linux/crashdd.h>
+
+#include "crashdd_internal.h"
+
+static LIST_HEAD(crashdd_list);
+static DEFINE_MUTEX(crashdd_mutex);
+
+#define CRASHDD_SYSFS_MODE 444 /* S_IRUGO */
+static struct kobject *crashdd_kobj;
+
+static ssize_t crashdd_read(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *bin_attr,
+ char *buf, loff_t fpos, size_t count)
+{
+ struct crashdd_dump_node *dump = bin_attr->private;
+
+ memcpy(buf, dump->buf + fpos, count);
+ return count;
+}
+
+static struct kobject *crashdd_mkdir(const char *name)
+{
+ return kobject_create_and_add(name, crashdd_kobj);
+}
+
+static int crashdd_add_file(struct kobject *kobj, const char *name,
+ struct crashdd_dump_node *dump)
+{
+ dump->bin_attr.attr.name = name;
+ dump->bin_attr.attr.mode = CRASHDD_SYSFS_MODE;
+ dump->bin_attr.size = dump->size;
+ dump->bin_attr.read = crashdd_read;
+ dump->bin_attr.private = dump;
+
+ return sysfs_create_bin_file(kobj, &dump->bin_attr);
+}
+
+static void crashdd_rmdir(struct kobject *kobj)
+{
+ kobject_put(kobj);
+}
+
+/**
+ * crashdd_init_driver - create a sysfs driver entry.
+ * @name: Name of the directory.
+ *
+ * Creates a directory under /sys/kernel/crashdd/ with @name. Allocates
+ * and saves the sysfs entry. The sysfs entry is added to the global
+ * list and then returned to the caller. On failure, returns NULL.
+ */
+static struct crashdd_driver_node *crashdd_init_driver(const char *name)
+{
+ struct crashdd_driver_node *node;
+
+ node = vzalloc(sizeof(*node));
+ if (!node)
+ return NULL;
+
+ /* Create a driver's directory under /sys/kernel/crashdd/ */
+ node->kobj = crashdd_mkdir(name);
+ if (!node->kobj) {
+ vfree(node);
+ return NULL;
+ }
+
+ atomic_set(&node->refcnt, 1);
+
+ /* Initialize the list of dumps that go under this driver's
+ * directory.
+ */
+ INIT_LIST_HEAD(&node->dump_list);
+
+ /* Add the driver's entry to global list */
+ mutex_lock(&crashdd_mutex);
+ list_add_tail(&node->list, &crashdd_list);
+ mutex_unlock(&crashdd_mutex);
+
+ return node;
+}
+
+/**
+ * crashdd_get_driver - get an exisiting sysfs driver entry.
+ * @name: Name of the directory.
+ *
+ * Searches and fetches a sysfs entry having @name. If @name is
+ * found, then the reference count is incremented and the entry
+ * is returned. If @name is not found, NULL is returned.
+ */
+static struct crashdd_driver_node *crashdd_get_driver(const char *name)
+{
+ struct crashdd_driver_node *node;
+ int found = 0;
+
+ /* Search for an existing driver sysfs entry having @name */
+ mutex_lock(&crashdd_mutex);
+ list_for_each_entry(node, &crashdd_list, list) {
+ if (!strcmp(node->kobj->name, name)) {
+ atomic_inc(&node->refcnt);
+ found = 1;
+ break;
+ }
+ }
+ mutex_unlock(&crashdd_mutex);
+
+ if (found)
+ return node;
+
+ /* No driver with @name found */
+ return NULL;
+}
+
+/**
+ * crashdd_put_driver - put an exisiting sysfs driver entry.
+ * @node: driver sysfs entry.
+ *
+ * Decrement @node reference count. If there are no dumps left under it,
+ * delete the sysfs directory and remove it from the global list.
+ */
+static void crashdd_put_driver(struct crashdd_driver_node *node)
+{
+ mutex_lock(&crashdd_mutex);
+ if (atomic_dec_and_test(&node->refcnt)) {
+ /* Delete @node driver entry if it has no dumps under it */
+ crashdd_rmdir(node->kobj);
+ list_del(&node->list);
+ }
+ mutex_unlock(&crashdd_mutex);
+}
+
+/**
+ * crashdd_add_dump - Allocate a directory under /sys/kernel/crashdd/ and
+ * add the dump to it.
+ * @driver_name: directory name under which the dump should be added.
+ * @data: dump info.
+ *
+ * Search for /sys/kernel/crashdd/<@driver_name>/ directory. If not found,
+ * allocate a new directory under /sys/kernel/crashdd/ with @driver_name.
+ * Allocate the dump file's context and invoke the calling driver's dump
+ * collect routine. Once collection is done, add the dump under
+ * /sys/kernel/crashdd/<@driver_name>/ directory.
+ */
+int crashdd_add_dump(const char *driver_name, struct crashdd_data *data)
+{
+ struct crashdd_driver_node *node;
+ struct crashdd_dump_node *dump;
+ void *buf = NULL;
+ int ret;
+
+ if (!driver_name || !strlen(driver_name) ||
+ !data || !strlen(data->name) ||
+ !data->crashdd_callback || !data->size)
+ return -EINVAL;
+
+ /* Get a driver sysfs entry with specified name. */
+ node = crashdd_get_driver(driver_name);
+ if (!node) {
+ /* No driver sysfs entry found with specified name.
+ * So create a new one
+ */
+ node = crashdd_init_driver(driver_name);
+ if (!node)
+ return -ENOMEM;
+ }
+
+ dump = vzalloc(sizeof(*dump));
+ if (!dump) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ /* Allocate buffer for driver's to write their dumps */
+ buf = vzalloc(data->size);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ /* Invoke the driver's dump collection routing */
+ ret = data->crashdd_callback(data, buf);
+ if (ret)
+ goto out_err;
+
+ dump->buf = buf;
+ dump->size = data->size;
+
+ /* Add a binary file under /sys/kernel/crashdd/@driver_name/ */
+ ret = crashdd_add_file(node->kobj, data->name, dump);
+ if (ret)
+ goto out_err;
+
+ /* Add the dump to driver sysfs list */
+ mutex_lock(&crashdd_mutex);
+ list_add_tail(&dump->list, &node->dump_list);
+ atomic_inc(&node->refcnt);
+ mutex_unlock(&crashdd_mutex);
+
+ /* Return back the driver sysfs reference */
+ crashdd_put_driver(node);
+ return 0;
+
+out_err:
+ if (buf)
+ vfree(buf);
+
+ if (dump)
+ vfree(dump);
+
+ crashdd_put_driver(node);
+ return ret;
+}
+EXPORT_SYMBOL(crashdd_add_dump);
+
+/* Init function for crash device dump module. */
+static int __init crashdd_init(void)
+{
+ /*
+ * Only export this directory in 2nd kernel.
+ */
+ if (!is_kdump_kernel())
+ return 0;
+
+ /* Create /sys/kernel/crashdd/ directory */
+ crashdd_kobj = kobject_create_and_add("crashdd", kernel_kobj);
+ if (!crashdd_kobj)
+ return -ENOMEM;
+
+ return 0;
+}
+fs_initcall(crashdd_init);
diff --git a/fs/crashdd/crashdd_internal.h b/fs/crashdd/crashdd_internal.h
new file mode 100644
index 000000000000..9162d1a4264b
--- /dev/null
+++ b/fs/crashdd/crashdd_internal.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Chelsio Communications, Inc. All rights reserved. */
+
+#ifndef CRASH_DEVICE_DUMP_INTERNAL_H
+#define CRASH_DEVICE_DUMP_INTERNAL_H
+
+/* Binary dump file's context internal to crashdd */
+struct crashdd_dump_node {
+ /* Pointer to list of dumps under the driver sysfs entry */
+ struct list_head list;
+ void *buf; /* Buffer containing device's dump */
+ unsigned long size; /* Size of the buffer */
+ struct bin_attribute bin_attr; /* Binary dump file's attributes */
+};
+
+/* Driver sysfs entry internal to crashdd */
+struct crashdd_driver_node {
+ /* Pointer to global list of driver sysfs entries */
+ struct list_head list;
+ struct list_head dump_list; /* List of dumps under this driver */
+ atomic_t refcnt; /* Number of dumps under this directory */
+ struct kobject *kobj; /* Pointer to driver sysfs kobject */
+};
+#endif /* CRASH_DEVICE_DUMP_INTERNAL_H */
diff --git a/include/linux/crashdd.h b/include/linux/crashdd.h
new file mode 100644
index 000000000000..edaba8424019
--- /dev/null
+++ b/include/linux/crashdd.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Chelsio Communications, Inc. All rights reserved. */
+
+#ifndef CRASH_DEVICE_DUMP_H
+#define CRASH_DEVICE_DUMP_H
+
+/* Max dump name length */
+#define CRASHDD_NAME_LENGTH 32
+
+/* Device Dump information to be filled by drivers */
+struct crashdd_data {
+ char name[CRASHDD_NAME_LENGTH]; /* Unique name of the dump */
+ unsigned long size; /* Size of the dump */
+ /* Driver's registered callback to be invoked to collect dump */
+ int (*crashdd_callback)(struct crashdd_data *data, void *buf);
+};
+
+#ifdef CONFIG_CRASH_DEVICE_DUMP
+int crashdd_add_dump(const char *driver_name, struct crashdd_data *data);
+#else
+#define crashdd_add_dump(x, y) 0
+#endif /* CONFIG_CRASH_DEVICE_DUMP */
+
+#endif /* CRASH_DEVICE_DUMP_H */
--
2.14.1
^ permalink raw reply related
* [PATCH net-next 2/2] cxgb4: collect hardware dump in second kernel
From: Rahul Lakkireddy @ 2018-03-23 8:31 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
ganeshgr-ut6Up61K2wZBDgjK7y7TUQ, Rahul Lakkireddy,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
davem-fT/PcQaiUtIeIZ0/mPfg9Q,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1521793455.git.rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Register callback to collect hardware/firmware dumps in second kernel
before hardware/firmware is initialized. The dumps for each device
will be available under /sys/kernel/crashdd/cxgb4/ directory in second
kernel.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
Changes since rfc v2:
- Update comments and commit message for sysfs change.
rfc v2:
- Updated dump registration to the new API in patch 1.
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 4 ++++
drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 25 ++++++++++++++++++++++++
drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h | 3 +++
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 12 ++++++++++++
4 files changed, 44 insertions(+)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 688f95440af2..79a123bfb628 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -50,6 +50,7 @@
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
#include <linux/ptp_classify.h>
+#include <linux/crashdd.h>
#include <asm/io.h>
#include "t4_chip_type.h"
#include "cxgb4_uld.h"
@@ -964,6 +965,9 @@ struct adapter {
struct hma_data hma;
struct srq_data *srq;
+
+ /* Dump buffer for collecting logs in kdump kernel */
+ struct crashdd_data crashdd_buf;
};
/* Support for "sched-class" command to allow a TX Scheduling Class to be
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index 143686c60234..ce9f544781af 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -488,3 +488,28 @@ void cxgb4_init_ethtool_dump(struct adapter *adapter)
adapter->eth_dump.version = adapter->params.fw_vers;
adapter->eth_dump.len = 0;
}
+
+static int cxgb4_cudbg_crashdd_collect(struct crashdd_data *data, void *buf)
+{
+ struct adapter *adap = container_of(data, struct adapter, crashdd_buf);
+ u32 len = data->size;
+
+ return cxgb4_cudbg_collect(adap, buf, &len, CXGB4_ETH_DUMP_ALL);
+}
+
+int cxgb4_cudbg_crashdd_add_dump(struct adapter *adap)
+{
+ struct crashdd_data *data = &adap->crashdd_buf;
+ u32 len;
+
+ len = sizeof(struct cudbg_hdr) +
+ sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
+ len += CUDBG_DUMP_BUFF_SIZE;
+
+ data->size = len;
+ snprintf(data->name, sizeof(data->name), "%s_%s", cxgb4_driver_name,
+ adap->name);
+ data->crashdd_callback = cxgb4_cudbg_crashdd_collect;
+
+ return crashdd_add_dump(cxgb4_driver_name, data);
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
index ce1ac9a1c878..095c6f04357e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
@@ -41,8 +41,11 @@ enum CXGB4_ETHTOOL_DUMP_FLAGS {
CXGB4_ETH_DUMP_HW = (1 << 1), /* various FW and HW dumps */
};
+#define CXGB4_ETH_DUMP_ALL (CXGB4_ETH_DUMP_MEM | CXGB4_ETH_DUMP_HW)
+
u32 cxgb4_get_dump_length(struct adapter *adap, u32 flag);
int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
u32 flag);
void cxgb4_init_ethtool_dump(struct adapter *adapter);
+int cxgb4_cudbg_crashdd_add_dump(struct adapter *adap);
#endif /* __CXGB4_CUDBG_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 99c9b88d6d34..c4e0e4705cda 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5526,6 +5526,18 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
if (err)
goto out_free_adapter;
+ if (is_kdump_kernel()) {
+ /* Collect hardware state and append to
+ * /sys/kernel/crashdd/cxgb4/ directory
+ */
+ err = cxgb4_cudbg_crashdd_add_dump(adapter);
+ if (err) {
+ dev_warn(adapter->pdev_dev,
+ "Fail collecting crash device dump, err: %d. Continuing\n",
+ err);
+ err = 0;
+ }
+ }
if (!is_t4(adapter->params.chip)) {
s_qpp = (QUEUESPERPAGEPF0_S +
--
2.14.1
^ permalink raw reply related
* [PATCH iproute2] ss: Fix rendering of continuous output (-E, --events)
From: Stefano Brivio @ 2018-03-23 8:37 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Roman Mashak, netdev
Roman Mashak reported that ss currently shows no output when it
should continuously report information about terminated sockets
(-E, --events switch).
This happens because I missed this case in 691bd854bf4a ("ss:
Buffer raw fields first, then render them as a table") and the
rendering function is simply not called.
To fix this, we need to:
- call render() every time we need to display new socket events
from generic_show_sock(), which is only used to follow events.
Always call it even if specific socket display functions
return errors to ensure we clean up buffers
- get the screen width every time we have new events to display,
thus factor out getting the screen width from main() into a
function we'll call whenever we calculate columns width
- reset the current field pointer after rendering, more output
might come after render() is called
Reported-by: Roman Mashak <mrv@mojatatu.com>
Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
misc/ss.c | 61 ++++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 40 insertions(+), 21 deletions(-)
diff --git a/misc/ss.c b/misc/ss.c
index e087bef739b0..6338820bf4a0 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1106,15 +1106,33 @@ static void buf_free_all(void)
buffer.head = NULL;
}
+/* Get current screen width, default to 80 columns if TIOCGWINSZ fails */
+static int render_screen_width(void)
+{
+ int width = 80;
+
+ if (isatty(STDOUT_FILENO)) {
+ struct winsize w;
+
+ if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &w) != -1) {
+ if (w.ws_col > 0)
+ width = w.ws_col;
+ }
+ }
+
+ return width;
+}
+
/* Calculate column width from contents length. If columns don't fit on one
* line, break them into the least possible amount of lines and keep them
* aligned across lines. Available screen space is equally spread between fields
* as additional spacing.
*/
-static void render_calc_width(int screen_width)
+static void render_calc_width(void)
{
- int first, len = 0, linecols = 0;
+ int screen_width = render_screen_width();
struct column *c, *eol = columns - 1;
+ int first, len = 0, linecols = 0;
/* First pass: set width for each column to measured content length */
for (first = 1, c = columns; c - columns < COL_MAX; c++) {
@@ -1195,7 +1213,7 @@ newline:
}
/* Render buffered output with spacing and delimiters, then free up buffers */
-static void render(int screen_width)
+static void render(void)
{
struct buf_token *token;
int printed, line_started = 0;
@@ -1209,7 +1227,7 @@ static void render(int screen_width)
/* Ensure end alignment of last token, it wasn't necessarily flushed */
buffer.tail->end += buffer.cur->len % 2;
- render_calc_width(screen_width);
+ render_calc_width();
/* Rewind and replay */
buffer.tail = buffer.head;
@@ -1245,6 +1263,7 @@ static void render(int screen_width)
}
buf_free_all();
+ current_field = columns;
}
static void sock_state_print(struct sockstat *s)
@@ -4264,23 +4283,33 @@ static int generic_show_sock(const struct sockaddr_nl *addr,
{
struct sock_diag_msg *r = NLMSG_DATA(nlh);
struct inet_diag_arg inet_arg = { .f = arg, .protocol = IPPROTO_MAX };
+ int ret;
switch (r->sdiag_family) {
case AF_INET:
case AF_INET6:
inet_arg.rth = inet_arg.f->rth_for_killing;
- return show_one_inet_sock(addr, nlh, &inet_arg);
+ ret = show_one_inet_sock(addr, nlh, &inet_arg);
+ break;
case AF_UNIX:
- return unix_show_sock(addr, nlh, arg);
+ ret = unix_show_sock(addr, nlh, arg);
+ break;
case AF_PACKET:
- return packet_show_sock(addr, nlh, arg);
+ ret = packet_show_sock(addr, nlh, arg);
+ break;
case AF_NETLINK:
- return netlink_show_sock(addr, nlh, arg);
+ ret = netlink_show_sock(addr, nlh, arg);
+ break;
case AF_VSOCK:
- return vsock_show_sock(addr, nlh, arg);
+ ret = vsock_show_sock(addr, nlh, arg);
+ break;
default:
- return -1;
+ ret = -1;
}
+
+ render();
+
+ return ret;
}
static int handle_follow_request(struct filter *f)
@@ -4647,7 +4676,6 @@ int main(int argc, char *argv[])
FILE *filter_fp = NULL;
int ch;
int state_filter = 0;
- int screen_width = 80;
while ((ch = getopt_long(argc, argv,
"dhaletuwxnro460spbEf:miA:D:F:vVzZN:KHS",
@@ -4949,15 +4977,6 @@ int main(int argc, char *argv[])
if (!(current_filter.states & (current_filter.states - 1)))
columns[COL_STATE].disabled = 1;
- if (isatty(STDOUT_FILENO)) {
- struct winsize w;
-
- if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &w) != -1) {
- if (w.ws_col > 0)
- screen_width = w.ws_col;
- }
- }
-
if (show_header)
print_header();
@@ -4988,7 +5007,7 @@ int main(int argc, char *argv[])
if (show_users || show_proc_ctx || show_sock_ctx)
user_ent_destroy();
- render(screen_width);
+ render();
return 0;
}
--
2.15.1
^ permalink raw reply related
* Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
From: Thomas Gleixner @ 2018-03-23 8:49 UTC (permalink / raw)
To: Jesus Sanchez-Palencia
Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes, richardcochran,
anna-maria, henrik, john.stultz, levi.pearson, edumazet, willemb,
mlichvar
In-Reply-To: <4a61fd66-445d-1273-b63b-0b8989a217b8@intel.com>
On Thu, 22 Mar 2018, Jesus Sanchez-Palencia wrote:
> On 03/22/2018 03:11 PM, Thomas Gleixner wrote:
> So, are you just opposing to the case where sorting off + offload off is used?
> (i.e. the scheduled FIFO case)
FIFO does not make any sense if your packets have a fixed transmission
time. I yet have to see a reasonable explanation why FIFO in the context of
time ordered would be a good thing.
Thanks,
tglx
^ permalink raw reply
* Re: [RFC v3 net-next 14/18] net/sched: Add HW offloading capability to TBS
From: Thomas Gleixner @ 2018-03-23 8:51 UTC (permalink / raw)
To: Jesus Sanchez-Palencia
Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes, richardcochran,
intel-wired-lan, anna-maria, henrik, john.stultz, levi.pearson,
edumazet, willemb, mlichvar
In-Reply-To: <8b6e4036-2357-3f7c-f72d-b6594da32348@intel.com>
On Thu, 22 Mar 2018, Jesus Sanchez-Palencia wrote:
> On 03/21/2018 07:22 AM, Thomas Gleixner wrote:
> > Bah, and probably there you need CLOCK_TAI because that's what PTP is based
> > on, so clock_system needs to accomodate that as well. Dammit, there goes
> > the simple 2 bits implementation. CLOCK_TAI is 11, so we'd need 4 clock
> > bits plus the adapter bit.
> >
> > Though we could spare a bit. The fixed CLOCK_* space goes from 0 to 15. I
> > don't see us adding new fixed clocks, so we really can reserve #15 for
> > selecting the adapter clock if sparing that extra bit is truly required.
>
>
> So what about just using the previous single 'clockid' argument, but then just
> adding to uapi time.h something like:
>
> #define DYNAMIC_CLOCKID 15
>
> And using it for that, instead. This way applications that will use the raw hw
> offload mode must use this value for their per-socket clockid, and the qdisc's
> clockid would be implicitly initialized to the same value.
That's what I suggested above.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/2] bpf: Remove struct bpf_verifier_env argument from print_bpf_insn
From: Jiri Olsa @ 2018-03-23 9:09 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Quentin Monnet, Jiri Olsa, Alexei Starovoitov, lkml, netdev
In-Reply-To: <96093da6-9e0f-5d6b-59a4-91a744ad64cb@iogearbox.net>
On Thu, Mar 22, 2018 at 05:07:48PM +0100, Daniel Borkmann wrote:
SNIP
> >>> + va_end(args);
> >>> +}
> >>> EXPORT_SYMBOL_GPL(bpf_verifier_log_write);
> >>> +
> >>> +__printf(2, 3) static void print_ins(void *private_data,
> >>> + const char *fmt, ...)
> >>
> >> Unless I am mistaken, you could just call this one "verbose()" and
> >> simply remove the function alias?
> >
> > right you are ;-) I haven't realized that struct bpf_verifier_env *env
> > will cleanly cast to void *
> >
> > new version attached.. I'll make some tests and send full patch
>
> When you do so, please make sure to send a full fresh series with the two
> patches and also cover letter included, otherwise it's more fragile wrt
> potentially applying the wrong patch from one of the replies. :-)
will do, thanks
jirka
^ permalink raw reply
* [PATCH net-next RFC 0/5] ipv6: sr: introduce seg6local End.BPF action
From: Mathieu Xhonneux @ 2018-03-23 10:15 UTC (permalink / raw)
To: netdev; +Cc: dlebrun
As of Linux 4.14, it is possible to define advanced local processing for
IPv6 packets with a Segment Routing Header through the seg6local LWT
infrastructure. This LWT implements the network programming principles
defined in the IETF “SRv6 Network Programming” draft.
The implemented operations are generic, and it would be very interesting to
be able to implement user-specific seg6local actions, without having to
modify the kernel directly. To do so, this patchset adds an End.BPF action
to seg6local, powered by some specific Segment Routing-related helpers,
which provide SR functionalities that can be applied on the packet. This
BPF hook would then allow to implement specific actions at native kernel
speed such as OAM features, advanced SR SDN policies, SRv6 actions like
Segment Routing Header (SRH) encapsulation depending on the content of
the packet, etc ...
This patchset is divided in 5 patches, whose main features are :
- A new seg6local action End.BPF with the corresponding new BPF program
type BPF_PROG_TYPE_LWT_SEG6LOCAL. Such attached BPF program can be
passed to the LWT seg6local through netlink, the same way as the LWT
BPF hook operates.
- 3 new BPF helpers for the seg6local BPF hook, allowing to edit/grow/
shrink a SRH and apply on a packet some of the generic SRv6 actions.
- 1 new BPF helper for the LWT BPF IN hook, allowing to add a SRH through
encapsulation (via IPv6 encapsulation or inlining if the packet contains
already an IPv6 header).
As this patchset adds a new LWT BPF hook, I took into account the result of
the discussions when the LWT BPF infrastructure got merged. Hence, the
seg6local BPF hook doesn’t allow write access to skb->data directly, only
the SRH can be modified through specific helpers, which ensures that the
integrity of the packet is maintained.
More details are available in the related patches messages.
The performances of this BPF hook have been assessed with the BPF JIT
enabled on a Intel Xeon X3440 processors with 4 cores and 8 threads
clocked at 2.53 GHz. No throughput losses are noted with the seg6local
BPF hook when the BPF program does nothing (440kpps). Adding a 8-bytes
TLV (1 call each to bpf_lwt_seg6_adjust_srh and bpf_lwt_seg6_store_bytes)
drops the throughput to 410kpps, and inlining a SRH via
bpf_lwt_seg6_action drops the throughput to 420kpps.
All throughputs are stable.
Any comments on the patchset are welcome.
Thanks.
Mathieu Xhonneux (5):
ipv6: sr: export function lookup_nexthop
bpf: Add IPv6 Segment Routing helpers
bpf: Split lwt inout verifier structures
ipv6: sr: Add seg6local action End.BPF
selftests/bpf: test for seg6local End.BPF action
include/linux/bpf_types.h | 5 +-
include/net/seg6.h | 3 +-
include/net/seg6_local.h | 29 ++
include/uapi/linux/bpf.h | 57 ++-
include/uapi/linux/seg6_local.h | 3 +
kernel/bpf/verifier.c | 1 +
net/core/filter.c | 385 ++++++++++++++++---
net/ipv6/seg6_local.c | 170 ++++++++-
tools/include/uapi/linux/bpf.h | 58 ++-
tools/testing/selftests/bpf/Makefile | 5 +-
tools/testing/selftests/bpf/bpf_helpers.h | 12 +
tools/testing/selftests/bpf/test_lwt_seg6local.c | 438 ++++++++++++++++++++++
tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++++++
13 files changed, 1235 insertions(+), 71 deletions(-)
create mode 100644 include/net/seg6_local.h
create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh
--
2.16.1
^ permalink raw reply
* [PATCH net-next RFC 1/5] ipv6: sr: export function lookup_nexthop
From: Mathieu Xhonneux @ 2018-03-23 10:16 UTC (permalink / raw)
To: netdev; +Cc: dlebrun
In-Reply-To: <cover.1521799488.git.m.xhonneux@gmail.com>
The function lookup_nexthop is essential to implement most of the seg6local
actions. As we want to provide a BPF helper allowing to apply some of these
actions on the packet being processed, the helper should be able to call
this function, hence the need to make it public.
Moreover, if one argument is incorrect or if the next hop can not be found,
an error should be returned by the BPF helper so the BPF program can adapt
its processing of the packet (return an error, properly force the drop,
...). This patch hence makes this function return dst->error to indicate a
possible error.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
---
include/net/seg6.h | 3 ++-
include/net/seg6_local.h | 24 ++++++++++++++++++++++++
net/ipv6/seg6_local.c | 20 +++++++++++---------
3 files changed, 37 insertions(+), 10 deletions(-)
create mode 100644 include/net/seg6_local.h
diff --git a/include/net/seg6.h b/include/net/seg6.h
index 099bad59dc90..f450bc37d196 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -63,5 +63,6 @@ extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len);
extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
int proto);
extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
-
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+ u32 tbl_id);
#endif
diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
new file mode 100644
index 000000000000..57498b23085d
--- /dev/null
+++ b/include/net/seg6_local.h
@@ -0,0 +1,24 @@
+/*
+ * SR-IPv6 implementation
+ *
+ * Authors:
+ * David Lebrun <david.lebrun@uclouvain.be>
+ * eBPF support: Mathieu Xhonneux <m.xhonneux@gmail.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NET_SEG6_LOCAL_H
+#define _NET_SEG6_LOCAL_H
+
+#include <linux/net.h>
+#include <linux/ipv6.h>
+
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+ u32 tbl_id);
+
+#endif
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 45722327375a..e9b23fb924ad 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -30,6 +30,7 @@
#ifdef CONFIG_IPV6_SEG6_HMAC
#include <net/seg6_hmac.h>
#endif
+#include <net/seg6_local.h>
#include <linux/etherdevice.h>
struct seg6_local_lwt;
@@ -140,8 +141,8 @@ static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
*daddr = *addr;
}
-static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
- u32 tbl_id)
+int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+ u32 tbl_id)
{
struct net *net = dev_net(skb->dev);
struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -187,6 +188,7 @@ static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
skb_dst_drop(skb);
skb_dst_set(skb, dst);
+ return dst->error;
}
/* regular endpoint function */
@@ -200,7 +202,7 @@ static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
- lookup_nexthop(skb, NULL, 0);
+ seg6_lookup_nexthop(skb, NULL, 0);
return dst_input(skb);
@@ -220,7 +222,7 @@ static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
- lookup_nexthop(skb, &slwt->nh6, 0);
+ seg6_lookup_nexthop(skb, &slwt->nh6, 0);
return dst_input(skb);
@@ -239,7 +241,7 @@ static int input_action_end_t(struct sk_buff *skb, struct seg6_local_lwt *slwt)
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
- lookup_nexthop(skb, NULL, slwt->table);
+ seg6_lookup_nexthop(skb, NULL, slwt->table);
return dst_input(skb);
@@ -331,7 +333,7 @@ static int input_action_end_dx6(struct sk_buff *skb,
if (!ipv6_addr_any(&slwt->nh6))
nhaddr = &slwt->nh6;
- lookup_nexthop(skb, nhaddr, 0);
+ seg6_lookup_nexthop(skb, nhaddr, 0);
return dst_input(skb);
drop:
@@ -380,7 +382,7 @@ static int input_action_end_dt6(struct sk_buff *skb,
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
goto drop;
- lookup_nexthop(skb, NULL, slwt->table);
+ seg6_lookup_nexthop(skb, NULL, slwt->table);
return dst_input(skb);
@@ -406,7 +408,7 @@ static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
- lookup_nexthop(skb, NULL, 0);
+ seg6_lookup_nexthop(skb, NULL, 0);
return dst_input(skb);
@@ -438,7 +440,7 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
- lookup_nexthop(skb, NULL, 0);
+ seg6_lookup_nexthop(skb, NULL, 0);
return dst_input(skb);
--
2.16.1
^ permalink raw reply related
* [PATCH next-next RFC 2/5] bpf: Add IPv6 Segment Routing helpers
From: Mathieu Xhonneux @ 2018-03-23 10:16 UTC (permalink / raw)
To: netdev; +Cc: dlebrun
In-Reply-To: <cover.1521799488.git.m.xhonneux@gmail.com>
The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
- bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
- bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
(to add/delete TLVs)
- bpf_lwt_seg6_action: Apply some SRv6 network programming actions
(specifically End.X, End.T, End.B6 and
End.B6.Encap)
The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).
The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.
Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in the skb control block, along with the
current header length in bytes.
Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately to keep track of it. Again, a final SRH modified by a BPF
program which doesn’t respect the 8-bytes boundary will be discarded as
it will be considered as invalid.
Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
---
include/net/seg6_local.h | 5 +
include/uapi/linux/bpf.h | 56 +++++++++-
net/core/filter.c | 263 +++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 302 insertions(+), 22 deletions(-)
diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
index 57498b23085d..4eae7adfca34 100644
--- a/include/net/seg6_local.h
+++ b/include/net/seg6_local.h
@@ -21,4 +21,9 @@
extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
u32 tbl_id);
+struct seg6_bpf_srh_state {
+ bool valid;
+ u16 hdrlen;
+};
+
#endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 18b7c510c511..f2679c29e353 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -729,6 +729,50 @@ union bpf_attr {
* @flags: reserved for future use
* Return: SK_PASS
*
+ * int lwt_push_encap(skb, type, hdr, len)
+ * Push an encapsulation header on top of current packet.
+ * @type: type of header to push :
+ * - BPF_LWT_ENCAP_SEG6 (push an IPv6 Segment Routing Header, struct
+ * ipv6_sr_hdr, the helper will add the outer IPv6 header)
+ * - BPF_LWT_ENCAP_SEG6_INLINE (push an IPv6 Segment Routing Header,
+ * struct ipv6_sr_hdr, inside the existing IPv6 header)
+ * @hdr: pointer where to copy the header from
+ * @len: size of hdr in bytes
+ * Return: 0 on success or negative error
+ *
+ * int lwt_seg6_store_bytes(skb, offset, from, len)
+ * Store bytes into the outermost Segment Routing header of an IPv6 header.
+ * Only the flags, tag and TLVs can be modified.
+ * @skb: pointer to skb
+ * @offset: offset within packet from skb->data
+ * @from: pointer where to copy bytes from
+ * @len: number of bytes to store into packet
+ * Return: 0 on success or negative error
+ *
+ * int lwt_seg6_adjust_srh(skb, offset, delta)
+ * Adjust the size allocated to TLVs in the outermost IPv6 Segment Routing
+ * Header (grow if delta > 0, else shrink)
+ * @skb: pointer to skb
+ * @offset: offset within packet from skb->data where SRH will grow/shrink,
+ * only offsets after the segments are accepted
+ * @delta: a positive/negative integer
+ * Return: 0 on success or negative on error
+ *
+ * int lwt_seg6_action(skb, action, param, param_len)
+ * Apply a IPv6 Segment Routing action on a packet with an IPv6 Segment
+ * Routing Header.
+ * @action:
+ * - End.X: SEG6_LOCAL_ACTION_END_X
+ * (type of param: struct in6_addr)
+ * - End.T: SEG6_LOCAL_ACTION_END_T
+ * (type of param: int)
+ * - End.B6: SEG6_LOCAL_ACTION_END_B6
+ * (type of param: struct ipv6_sr_hdr)
+ * - End.B6.Encap: SEG6_LOCAL_ACTION_END_B6_ENCAP
+ * (type of param: struct ipv6_sr_hdr)
+ * @param: pointer to the parameter required by the action
+ * @param_len: length of param in bytes
+ * Return: 0 on success or negative error
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -794,7 +838,11 @@ union bpf_attr {
FN(msg_redirect_map), \
FN(msg_apply_bytes), \
FN(msg_cork_bytes), \
- FN(msg_pull_data),
+ FN(msg_pull_data), \
+ FN(lwt_push_encap), \
+ FN(lwt_seg6_store_bytes), \
+ FN(lwt_seg6_adjust_srh), \
+ FN(lwt_seg6_action),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -852,6 +900,12 @@ enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
};
+/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
+enum bpf_lwt_encap_mode {
+ BPF_LWT_ENCAP_SEG6,
+ BPF_LWT_ENCAP_SEG6_INLINE
+};
+
/* user accessible mirror of in-kernel sk_buff.
* new fields can only be added to the end of this structure
*/
diff --git a/net/core/filter.c b/net/core/filter.c
index c86f03fd9ea5..71019593a246 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -57,6 +57,10 @@
#include <net/busy_poll.h>
#include <net/tcp.h>
#include <linux/bpf_trace.h>
+#include <net/ipv6.h>
+#include <linux/seg6_local.h>
+#include <net/seg6.h>
+#include <net/seg6_local.h>
/**
* sk_filter_trim_cap - run a packet through a socket filter
@@ -3013,27 +3017,6 @@ static const struct bpf_func_proto bpf_xdp_redirect_map_proto = {
.arg3_type = ARG_ANYTHING,
};
-bool bpf_helper_changes_pkt_data(void *func)
-{
- if (func == bpf_skb_vlan_push ||
- func == bpf_skb_vlan_pop ||
- func == bpf_skb_store_bytes ||
- func == bpf_skb_change_proto ||
- func == bpf_skb_change_head ||
- func == bpf_skb_change_tail ||
- func == bpf_skb_adjust_room ||
- func == bpf_skb_pull_data ||
- func == bpf_clone_redirect ||
- func == bpf_l3_csum_replace ||
- func == bpf_l4_csum_replace ||
- func == bpf_xdp_adjust_head ||
- func == bpf_xdp_adjust_meta ||
- func == bpf_msg_pull_data)
- return true;
-
- return false;
-}
-
static unsigned long bpf_skb_copy(void *dst_buff, const void *skb,
unsigned long off, unsigned long len)
{
@@ -3597,6 +3580,244 @@ static const struct bpf_func_proto bpf_sock_ops_cb_flags_set_proto = {
.arg2_type = ARG_ANYTHING,
};
+int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+{
+ int err;
+ struct ipv6_sr_hdr *srh = (struct ipv6_sr_hdr *)hdr;
+
+ if (!seg6_validate_srh(srh, len))
+ return -EINVAL;
+
+ switch (type) {
+ case BPF_LWT_ENCAP_SEG6_INLINE:
+ if (skb->protocol != htons(ETH_P_IPV6))
+ return -EBADMSG;
+
+ err = seg6_do_srh_inline(skb, srh);
+ break;
+ case BPF_LWT_ENCAP_SEG6:
+ skb_reset_inner_headers(skb);
+ skb->encapsulation = 1;
+ err = seg6_do_srh_encap(skb, srh, IPPROTO_IPV6);
+ break;
+ default:
+ return -EINVAL;
+ }
+ if (err)
+ return err;
+
+ ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+ skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+
+ return seg6_lookup_nexthop(skb, NULL, 0);
+}
+
+BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
+ u32, len)
+{
+ switch (type) {
+ case BPF_LWT_ENCAP_SEG6:
+ case BPF_LWT_ENCAP_SEG6_INLINE:
+ return bpf_push_seg6_encap(skb, type, hdr, len);
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct bpf_func_proto bpf_lwt_push_encap_proto = {
+ .func = bpf_lwt_push_encap,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
+ const void *, from, u32, len)
+{
+ struct seg6_bpf_srh_state *srh_state = (struct seg6_bpf_srh_state *)
+ &skb->cb;
+ void *srh_tlvs, *srh_end, *ptr;
+ struct ipv6_sr_hdr *srh;
+ int srhoff = 0;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return -EINVAL;
+
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+ srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
+ srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
+
+ ptr = skb->data + offset;
+ if (ptr >= srh_tlvs && ptr + len <= srh_end)
+ srh_state->valid = 0;
+ else if (ptr < (void *)&srh->flags ||
+ ptr + len > (void *)&srh->segments)
+ return -EFAULT;
+
+ if (unlikely(bpf_try_make_writable(skb, offset + len)))
+ return -EFAULT;
+
+ memcpy(ptr, from, len);
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_lwt_seg6_store_bytes_proto = {
+ .func = bpf_lwt_seg6_store_bytes,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
+ u32, action, void *, param, u32, param_len)
+{
+ struct seg6_bpf_srh_state *srh_state = (struct seg6_bpf_srh_state *)
+ &skb->cb;
+ struct ipv6_sr_hdr *srh;
+ int srhoff = 0;
+ int err;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return -EINVAL;
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+
+ if (!srh_state->valid) {
+ if (unlikely((srh_state->hdrlen & 7) != 0))
+ return -EBADMSG;
+
+ srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
+ if (unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
+ return -EBADMSG;
+
+ srh_state->valid = 1;
+ }
+
+ switch (action) {
+ case SEG6_LOCAL_ACTION_END_X:
+ if (param_len != sizeof(struct in6_addr))
+ return -EINVAL;
+ return seg6_lookup_nexthop(skb, (struct in6_addr *)param, 0);
+ case SEG6_LOCAL_ACTION_END_T:
+ if (param_len != sizeof(int))
+ return -EINVAL;
+ return seg6_lookup_nexthop(skb, NULL, *(int *)param);
+ case SEG6_LOCAL_ACTION_END_B6:
+ err = bpf_push_seg6_encap(skb, BPF_LWT_ENCAP_SEG6_INLINE,
+ param, param_len);
+ if (!err)
+ srh_state->hdrlen =
+ ((struct ipv6_sr_hdr *)param)->hdrlen << 3;
+ return err;
+ case SEG6_LOCAL_ACTION_END_B6_ENCAP:
+ err = bpf_push_seg6_encap(skb, BPF_LWT_ENCAP_SEG6,
+ param, param_len);
+ if (!err)
+ srh_state->hdrlen =
+ ((struct ipv6_sr_hdr *)param)->hdrlen << 3;
+ return err;
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct bpf_func_proto bpf_lwt_seg6_action_proto = {
+ .func = bpf_lwt_seg6_action,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset,
+ s32, len)
+{
+ struct seg6_bpf_srh_state *srh_state = (struct seg6_bpf_srh_state *)
+ &skb->cb;
+
+ void *srh_end, *srh_tlvs, *ptr;
+ struct ipv6_sr_hdr *srh;
+ struct ipv6hdr *hdr;
+ int srhoff = 0;
+ int ret;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return -EINVAL;
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+
+ srh_tlvs = (void *)((unsigned char *)srh + sizeof(*srh) +
+ ((srh->first_segment + 1) << 4));
+ srh_end = (void *)((unsigned char *)srh + sizeof(*srh) +
+ srh_state->hdrlen);
+ ptr = skb->data + offset;
+
+ if (unlikely(ptr < srh_tlvs || ptr > srh_end))
+ return -EFAULT;
+ if (unlikely(len < 0 && (void *)((char *)ptr - len) > srh_end))
+ return -EFAULT;
+
+ if (len > 0) {
+ ret = skb_cow_head(skb, len);
+ if (unlikely(ret < 0))
+ return ret;
+
+ ret = bpf_skb_net_hdr_push(skb, offset, len);
+ } else {
+ ret = bpf_skb_net_hdr_pop(skb, offset, -1 * len);
+ }
+ if (unlikely(ret < 0))
+ return ret;
+
+ hdr = (struct ipv6hdr *)skb->data;
+ hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+
+ bpf_compute_data_pointers(skb);
+ srh_state->hdrlen += len;
+ srh_state->valid = 0;
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
+ .func = bpf_lwt_seg6_adjust_srh,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_ANYTHING,
+};
+
+bool bpf_helper_changes_pkt_data(void *func)
+{
+ if (func == bpf_skb_vlan_push ||
+ func == bpf_skb_vlan_pop ||
+ func == bpf_skb_store_bytes ||
+ func == bpf_skb_change_proto ||
+ func == bpf_skb_change_head ||
+ func == bpf_skb_change_tail ||
+ func == bpf_skb_adjust_room ||
+ func == bpf_skb_pull_data ||
+ func == bpf_clone_redirect ||
+ func == bpf_l3_csum_replace ||
+ func == bpf_l4_csum_replace ||
+ func == bpf_xdp_adjust_head ||
+ func == bpf_xdp_adjust_meta ||
+ func == bpf_msg_pull_data ||
+ func == bpf_lwt_push_encap ||
+ func == bpf_lwt_seg6_store_bytes ||
+ func == bpf_lwt_seg6_adjust_srh
+ )
+ return true;
+
+ return false;
+}
+
static const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
{
--
2.16.1
^ permalink raw reply related
* [PATCH net-next RFC 3/5] bpf: Split lwt inout verifier structures
From: Mathieu Xhonneux @ 2018-03-23 10:16 UTC (permalink / raw)
To: netdev; +Cc: dlebrun
In-Reply-To: <cover.1521799488.git.m.xhonneux@gmail.com>
The new bpf_lwt_push_encap helper should only be accessible within the
LWT BPF IN hook, and not the OUT one, as this may lead to a skb under
panic.
At the moment, both LWT BPF IN and OUT share the same list of helpers,
whose calls are authorized by the verifier. This patch separates the
verifier ops for the IN and OUT hooks, and allows the IN hook to call the
bpf_lwt_push_encap helper.
This patch is also the occasion to put all lwt_*_func_proto functions
together for clarity. At the moment, socks_op_func_proto is in the middle
of lwt_inout_func_proto and lwt_xmit_func_proto.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
---
include/linux/bpf_types.h | 4 +--
net/core/filter.c | 85 ++++++++++++++++++++++++++++++-----------------
2 files changed, 56 insertions(+), 33 deletions(-)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 5e2e8a49fb21..0f8f2525da9b 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -8,8 +8,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED_ACT, tc_cls_act)
BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SKB, cg_skb)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_inout)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_inout)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
diff --git a/net/core/filter.c b/net/core/filter.c
index 71019593a246..741dd352937d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3965,33 +3965,6 @@ xdp_func_proto(enum bpf_func_id func_id)
}
}
-static const struct bpf_func_proto *
-lwt_inout_func_proto(enum bpf_func_id func_id)
-{
- switch (func_id) {
- case BPF_FUNC_skb_load_bytes:
- return &bpf_skb_load_bytes_proto;
- case BPF_FUNC_skb_pull_data:
- return &bpf_skb_pull_data_proto;
- case BPF_FUNC_csum_diff:
- return &bpf_csum_diff_proto;
- case BPF_FUNC_get_cgroup_classid:
- return &bpf_get_cgroup_classid_proto;
- case BPF_FUNC_get_route_realm:
- return &bpf_get_route_realm_proto;
- case BPF_FUNC_get_hash_recalc:
- return &bpf_get_hash_recalc_proto;
- case BPF_FUNC_perf_event_output:
- return &bpf_skb_event_output_proto;
- case BPF_FUNC_get_smp_processor_id:
- return &bpf_get_smp_processor_id_proto;
- case BPF_FUNC_skb_under_cgroup:
- return &bpf_skb_under_cgroup_proto;
- default:
- return bpf_base_func_proto(func_id);
- }
-}
-
static const struct bpf_func_proto *
sock_ops_func_proto(enum bpf_func_id func_id)
{
@@ -4049,6 +4022,44 @@ static const struct bpf_func_proto *sk_skb_func_proto(enum bpf_func_id func_id)
}
}
+static const struct bpf_func_proto *
+lwt_out_func_proto(enum bpf_func_id func_id)
+{
+ switch (func_id) {
+ case BPF_FUNC_skb_load_bytes:
+ return &bpf_skb_load_bytes_proto;
+ case BPF_FUNC_skb_pull_data:
+ return &bpf_skb_pull_data_proto;
+ case BPF_FUNC_csum_diff:
+ return &bpf_csum_diff_proto;
+ case BPF_FUNC_get_cgroup_classid:
+ return &bpf_get_cgroup_classid_proto;
+ case BPF_FUNC_get_route_realm:
+ return &bpf_get_route_realm_proto;
+ case BPF_FUNC_get_hash_recalc:
+ return &bpf_get_hash_recalc_proto;
+ case BPF_FUNC_perf_event_output:
+ return &bpf_skb_event_output_proto;
+ case BPF_FUNC_get_smp_processor_id:
+ return &bpf_get_smp_processor_id_proto;
+ case BPF_FUNC_skb_under_cgroup:
+ return &bpf_skb_under_cgroup_proto;
+ default:
+ return bpf_base_func_proto(func_id);
+ }
+}
+
+static const struct bpf_func_proto *
+lwt_in_func_proto(enum bpf_func_id func_id)
+{
+ switch (func_id) {
+ case BPF_FUNC_lwt_push_encap:
+ return &bpf_lwt_push_encap_proto;
+ default:
+ return lwt_out_func_proto(func_id);
+ }
+}
+
static const struct bpf_func_proto *
lwt_xmit_func_proto(enum bpf_func_id func_id)
{
@@ -4080,7 +4091,9 @@ lwt_xmit_func_proto(enum bpf_func_id func_id)
case BPF_FUNC_set_hash_invalid:
return &bpf_set_hash_invalid_proto;
default:
- return lwt_inout_func_proto(func_id);
+ return lwt_out_func_proto(func_id);
+ }
+}
}
}
@@ -5302,13 +5315,23 @@ const struct bpf_prog_ops cg_skb_prog_ops = {
.test_run = bpf_prog_test_run_skb,
};
-const struct bpf_verifier_ops lwt_inout_verifier_ops = {
- .get_func_proto = lwt_inout_func_proto,
+const struct bpf_verifier_ops lwt_in_verifier_ops = {
+ .get_func_proto = lwt_in_func_proto,
+ .is_valid_access = lwt_is_valid_access,
+ .convert_ctx_access = bpf_convert_ctx_access,
+};
+
+const struct bpf_prog_ops lwt_in_prog_ops = {
+ .test_run = bpf_prog_test_run_skb,
+};
+
+const struct bpf_verifier_ops lwt_out_verifier_ops = {
+ .get_func_proto = lwt_out_func_proto,
.is_valid_access = lwt_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
};
-const struct bpf_prog_ops lwt_inout_prog_ops = {
+const struct bpf_prog_ops lwt_out_prog_ops = {
.test_run = bpf_prog_test_run_skb,
};
--
2.16.1
^ permalink raw reply related
* [PATCH net-next RFC 4/5] ipv6: sr: Add seg6local action End.BPF
From: Mathieu Xhonneux @ 2018-03-23 10:16 UTC (permalink / raw)
To: netdev; +Cc: dlebrun
In-Reply-To: <cover.1521799488.git.m.xhonneux@gmail.com>
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
doesn’t have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in skb->cb, read and write access to it from the BPF program is
prohibited. The BPF program is not allowed to directly write into the
packet, and only some fields of the SRH can be altered through the helper
bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation doesn’t induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and won’t be performed again if the SRH isn’t modified
after calling the action.
The BPF program may return 3 types of return codes :
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb destination is already set and the default
lookup shouldn’t be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
---
include/linux/bpf_types.h | 1 +
include/uapi/linux/bpf.h | 1 +
include/uapi/linux/seg6_local.h | 3 +
kernel/bpf/verifier.c | 1 +
net/core/filter.c | 37 ++++++++++
net/ipv6/seg6_local.c | 150 +++++++++++++++++++++++++++++++++++++++-
6 files changed, 190 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 0f8f2525da9b..92bc613a3b59 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -11,6 +11,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_SEG6LOCAL, lwt_seg6local)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f2679c29e353..2eeea4c4cc7f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -134,6 +134,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_CGROUP_DEVICE,
BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_LWT_SEG6LOCAL,
};
enum bpf_attach_type {
diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
index ef2d8c3e76c1..aadcc11fb918 100644
--- a/include/uapi/linux/seg6_local.h
+++ b/include/uapi/linux/seg6_local.h
@@ -25,6 +25,7 @@ enum {
SEG6_LOCAL_NH6,
SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF,
+ SEG6_LOCAL_BPF,
__SEG6_LOCAL_MAX,
};
#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
@@ -59,6 +60,8 @@ enum {
SEG6_LOCAL_ACTION_END_AS = 13,
/* forward to SR-unaware VNF with masquerading */
SEG6_LOCAL_ACTION_END_AM = 14,
+ /* custom BPF action */
+ SEG6_LOCAL_ACTION_END_BPF = 15,
__SEG6_LOCAL_ACTION_MAX,
};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e9f7c20691c1..3c645ad33ac7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1239,6 +1239,7 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
switch (env->prog->type) {
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_OUT:
+ case BPF_PROG_TYPE_LWT_SEG6LOCAL:
/* dst_input() and dst_output() can't write for now */
if (t == BPF_WRITE)
return false;
diff --git a/net/core/filter.c b/net/core/filter.c
index 741dd352937d..4b46d09dcddc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4094,6 +4094,19 @@ lwt_xmit_func_proto(enum bpf_func_id func_id)
return lwt_out_func_proto(func_id);
}
}
+
+static const struct bpf_func_proto *
+lwt_seg6local_func_proto(enum bpf_func_id func_id)
+{
+ switch (func_id) {
+ case BPF_FUNC_lwt_seg6_store_bytes:
+ return &bpf_lwt_seg6_store_bytes_proto;
+ case BPF_FUNC_lwt_seg6_action:
+ return &bpf_lwt_seg6_action_proto;
+ case BPF_FUNC_lwt_seg6_adjust_srh:
+ return &bpf_lwt_seg6_adjust_srh_proto;
+ default:
+ return lwt_out_func_proto(func_id);
}
}
@@ -4198,6 +4211,20 @@ static bool lwt_is_valid_access(int off, int size,
return bpf_skb_is_valid_access(off, size, type, info);
}
+static bool lwt_seg6local_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
+{
+ switch (off) {
+ case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+ return false;
+ default:
+ break;
+ }
+
+ return lwt_is_valid_access(off, size, type, info);
+}
+
static bool sock_filter_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
@@ -5346,6 +5373,16 @@ const struct bpf_prog_ops lwt_xmit_prog_ops = {
.test_run = bpf_prog_test_run_skb,
};
+const struct bpf_verifier_ops lwt_seg6local_verifier_ops = {
+ .get_func_proto = lwt_seg6local_func_proto,
+ .is_valid_access = lwt_seg6local_is_valid_access,
+ .convert_ctx_access = bpf_convert_ctx_access,
+};
+
+const struct bpf_prog_ops lwt_seg6local_prog_ops = {
+ .test_run = bpf_prog_test_run_skb,
+};
+
const struct bpf_verifier_ops cg_sock_verifier_ops = {
.get_func_proto = sock_filter_func_proto,
.is_valid_access = sock_filter_is_valid_access,
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index e9b23fb924ad..4805b7d404ed 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -1,8 +1,9 @@
/*
* SR-IPv6 implementation
*
- * Author:
+ * Authors:
* David Lebrun <david.lebrun@uclouvain.be>
+ * eBPF support: Mathieu Xhonneux <m.xhonneux@gmail.com>
*
*
* This program is free software; you can redistribute it and/or
@@ -32,6 +33,7 @@
#endif
#include <net/seg6_local.h>
#include <linux/etherdevice.h>
+#include <linux/bpf.h>
struct seg6_local_lwt;
@@ -42,6 +44,11 @@ struct seg6_action_desc {
int static_headroom;
};
+struct bpf_lwt_prog {
+ struct bpf_prog *prog;
+ char *name;
+};
+
struct seg6_local_lwt {
int action;
struct ipv6_sr_hdr *srh;
@@ -50,6 +57,7 @@ struct seg6_local_lwt {
struct in6_addr nh6;
int iif;
int oif;
+ struct bpf_lwt_prog bpf;
int headroom;
struct seg6_action_desc *desc;
@@ -449,6 +457,61 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
return err;
}
+static int input_action_end_bpf(struct sk_buff *skb,
+ struct seg6_local_lwt *slwt)
+{
+ struct seg6_bpf_srh_state *srh_state = (struct seg6_bpf_srh_state *)
+ &skb->cb;
+ struct ipv6_sr_hdr *srh;
+ int srhoff = 0;
+ int ret;
+
+ srh = get_and_validate_srh(skb);
+ if (!srh)
+ goto drop;
+
+ advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+ srh_state->hdrlen = srh->hdrlen << 3;
+ srh_state->valid = 1;
+
+ rcu_read_lock();
+ bpf_compute_data_pointers(skb);
+ ret = bpf_prog_run_save_cb(slwt->bpf.prog, skb);
+ rcu_read_unlock();
+
+ switch (ret) {
+ case BPF_OK:
+ case BPF_REDIRECT:
+ break;
+ case BPF_DROP:
+ goto drop;
+ default:
+ pr_warn_once("bpf-seg6local: Illegal return value %u\n", ret);
+ goto drop;
+ }
+
+ if (unlikely((srh_state->hdrlen & 7) != 0))
+ goto drop;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ goto drop;
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+ srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
+
+ if (!srh_state->valid &&
+ unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
+ goto drop;
+
+ if (ret != BPF_REDIRECT)
+ seg6_lookup_nexthop(skb, NULL, 0);
+
+ return dst_input(skb);
+
+drop:
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
static struct seg6_action_desc seg6_action_table[] = {
{
.action = SEG6_LOCAL_ACTION_END,
@@ -495,7 +558,13 @@ static struct seg6_action_desc seg6_action_table[] = {
.attrs = (1 << SEG6_LOCAL_SRH),
.input = input_action_end_b6_encap,
.static_headroom = sizeof(struct ipv6hdr),
- }
+ },
+ {
+ .action = SEG6_LOCAL_ACTION_END_BPF,
+ .attrs = (1 << SEG6_LOCAL_BPF),
+ .input = input_action_end_bpf,
+ },
+
};
static struct seg6_action_desc *__get_action_desc(int action)
@@ -540,6 +609,7 @@ static const struct nla_policy seg6_local_policy[SEG6_LOCAL_MAX + 1] = {
.len = sizeof(struct in6_addr) },
[SEG6_LOCAL_IIF] = { .type = NLA_U32 },
[SEG6_LOCAL_OIF] = { .type = NLA_U32 },
+ [SEG6_LOCAL_BPF] = { .type = NLA_NESTED },
};
static int parse_nla_srh(struct nlattr **attrs, struct seg6_local_lwt *slwt)
@@ -717,6 +787,71 @@ static int cmp_nla_oif(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
return 0;
}
+#define MAX_PROG_NAME 256
+static const struct nla_policy bpf_prog_policy[LWT_BPF_PROG_MAX + 1] = {
+ [LWT_BPF_PROG_FD] = { .type = NLA_U32, },
+ [LWT_BPF_PROG_NAME] = { .type = NLA_NUL_STRING,
+ .len = MAX_PROG_NAME },
+};
+
+static int parse_nla_bpf(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+ struct nlattr *tb[LWT_BPF_PROG_MAX + 1];
+ struct bpf_prog *p;
+ int ret;
+ u32 fd;
+
+ ret = nla_parse_nested(tb, LWT_BPF_PROG_MAX, attrs[SEG6_LOCAL_BPF],
+ bpf_prog_policy, NULL);
+ if (ret < 0)
+ return ret;
+
+ if (!tb[LWT_BPF_PROG_FD] || !tb[LWT_BPF_PROG_NAME])
+ return -EINVAL;
+
+ slwt->bpf.name = nla_memdup(tb[LWT_BPF_PROG_NAME], GFP_KERNEL);
+ if (!slwt->bpf.name)
+ return -ENOMEM;
+
+ fd = nla_get_u32(tb[LWT_BPF_PROG_FD]);
+ p = bpf_prog_get_type(fd, BPF_PROG_TYPE_LWT_SEG6LOCAL);
+ if (IS_ERR(p))
+ return PTR_ERR(p);
+
+ slwt->bpf.prog = p;
+
+ return 0;
+}
+
+static int put_nla_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+ struct nlattr *nest;
+
+ if (!slwt->bpf.prog)
+ return 0;
+
+ nest = nla_nest_start(skb, SEG6_LOCAL_BPF);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (slwt->bpf.name &&
+ nla_put_string(skb, LWT_BPF_PROG_NAME, slwt->bpf.name))
+ return -EMSGSIZE;
+
+ return nla_nest_end(skb, nest);
+}
+
+static int cmp_nla_bpf(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+ if (!a->bpf.name && !b->bpf.name)
+ return 0;
+
+ if (!a->bpf.name || !b->bpf.name)
+ return 1;
+
+ return strcmp(a->bpf.name, b->bpf.name);
+}
+
struct seg6_action_param {
int (*parse)(struct nlattr **attrs, struct seg6_local_lwt *slwt);
int (*put)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
@@ -747,6 +882,11 @@ static struct seg6_action_param seg6_action_params[SEG6_LOCAL_MAX + 1] = {
[SEG6_LOCAL_OIF] = { .parse = parse_nla_oif,
.put = put_nla_oif,
.cmp = cmp_nla_oif },
+
+ [SEG6_LOCAL_BPF] = { .parse = parse_nla_bpf,
+ .put = put_nla_bpf,
+ .cmp = cmp_nla_bpf },
+
};
static int parse_nla_action(struct nlattr **attrs, struct seg6_local_lwt *slwt)
@@ -795,7 +935,6 @@ static int seg6_local_build_state(struct nlattr *nla, unsigned int family,
err = nla_parse_nested(tb, SEG6_LOCAL_MAX, nla, seg6_local_policy,
extack);
-
if (err < 0)
return err;
@@ -884,6 +1023,11 @@ static int seg6_local_get_encap_size(struct lwtunnel_state *lwt)
if (attrs & (1 << SEG6_LOCAL_OIF))
nlsize += nla_total_size(4);
+ if (attrs & (1 << SEG6_LOCAL_BPF))
+ nlsize += nla_total_size(sizeof(struct nlattr)) +
+ nla_total_size(MAX_PROG_NAME) +
+ nla_total_size(4);
+
return nlsize;
}
--
2.16.1
^ permalink raw reply related
* [PATCH net-next RFC 5/5] selftests/bpf: test for seg6local End.BPF action
From: Mathieu Xhonneux @ 2018-03-23 10:16 UTC (permalink / raw)
To: netdev; +Cc: dlebrun
In-Reply-To: <cover.1521799488.git.m.xhonneux@gmail.com>
Add a new test for the seg6local End.BPF action. The following helpers
are also tested :
- bpf_lwt_push_encap within the LWT BPF IN hook
- bpf_lwt_seg6_action
- bpf_lwt_seg6_adjust_srh
- bpf_lwt_seg6_store_bytes
A chain of End.BPF actions is built. The SRH is injected through a LWT
BPF IN hook before the chain. Each End.BPF action validates the previous
one, otherwise the packet is dropped.
The test succeeds if the last node in the chain receives the packet and
the UDP datagram contained can be retrieved from userspace.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
---
tools/include/uapi/linux/bpf.h | 58 ++-
tools/testing/selftests/bpf/Makefile | 5 +-
tools/testing/selftests/bpf/bpf_helpers.h | 12 +
tools/testing/selftests/bpf/test_lwt_seg6local.c | 438 ++++++++++++++++++++++
tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++++++
5 files changed, 650 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d245c41213ac..2eeea4c4cc7f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -134,6 +134,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_CGROUP_DEVICE,
BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_LWT_SEG6LOCAL,
};
enum bpf_attach_type {
@@ -729,6 +730,50 @@ union bpf_attr {
* @flags: reserved for future use
* Return: SK_PASS
*
+ * int lwt_push_encap(skb, type, hdr, len)
+ * Push an encapsulation header on top of current packet.
+ * @type: type of header to push :
+ * - BPF_LWT_ENCAP_SEG6 (push an IPv6 Segment Routing Header, struct
+ * ipv6_sr_hdr, the helper will add the outer IPv6 header)
+ * - BPF_LWT_ENCAP_SEG6_INLINE (push an IPv6 Segment Routing Header,
+ * struct ipv6_sr_hdr, inside the existing IPv6 header)
+ * @hdr: pointer where to copy the header from
+ * @len: size of hdr in bytes
+ * Return: 0 on success or negative error
+ *
+ * int lwt_seg6_store_bytes(skb, offset, from, len)
+ * Store bytes into the outermost Segment Routing header of an IPv6 header.
+ * Only the flags, tag and TLVs can be modified.
+ * @skb: pointer to skb
+ * @offset: offset within packet from skb->data
+ * @from: pointer where to copy bytes from
+ * @len: number of bytes to store into packet
+ * Return: 0 on success or negative error
+ *
+ * int lwt_seg6_adjust_srh(skb, offset, delta)
+ * Adjust the size allocated to TLVs in the outermost IPv6 Segment Routing
+ * Header (grow if delta > 0, else shrink)
+ * @skb: pointer to skb
+ * @offset: offset within packet from skb->data where SRH will grow/shrink,
+ * only offsets after the segments are accepted
+ * @delta: a positive/negative integer
+ * Return: 0 on success or negative on error
+ *
+ * int lwt_seg6_action(skb, action, param, param_len)
+ * Apply a IPv6 Segment Routing action on a packet with an IPv6 Segment
+ * Routing Header.
+ * @action:
+ * - End.X: SEG6_LOCAL_ACTION_END_X
+ * (type of param: struct in6_addr)
+ * - End.T: SEG6_LOCAL_ACTION_END_T
+ * (type of param: int)
+ * - End.B6: SEG6_LOCAL_ACTION_END_B6
+ * (type of param: struct ipv6_sr_hdr)
+ * - End.B6.Encap: SEG6_LOCAL_ACTION_END_B6_ENCAP
+ * (type of param: struct ipv6_sr_hdr)
+ * @param: pointer to the parameter required by the action
+ * @param_len: length of param in bytes
+ * Return: 0 on success or negative error
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -794,7 +839,11 @@ union bpf_attr {
FN(msg_redirect_map), \
FN(msg_apply_bytes), \
FN(msg_cork_bytes), \
- FN(msg_pull_data),
+ FN(msg_pull_data), \
+ FN(lwt_push_encap), \
+ FN(lwt_seg6_store_bytes), \
+ FN(lwt_seg6_adjust_srh), \
+ FN(lwt_seg6_action),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -837,6 +886,7 @@ enum bpf_func_id {
/* BPF_FUNC_skb_set_tunnel_key flags. */
#define BPF_F_ZERO_CSUM_TX (1ULL << 1)
#define BPF_F_DONT_FRAGMENT (1ULL << 2)
+#define BPF_F_SEQ_NUMBER (1ULL << 3)
/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
* BPF_FUNC_perf_event_read_value flags.
@@ -851,6 +901,12 @@ enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
};
+/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
+enum bpf_lwt_encap_mode {
+ BPF_LWT_ENCAP_SEG6,
+ BPF_LWT_ENCAP_SEG6_INLINE
+};
+
/* user accessible mirror of in-kernel sk_buff.
* new fields can only be added to the end of this structure
*/
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f35fb02bdf56..95c6bb8548a3 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -30,14 +30,15 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
- sockmap_tcp_msg_prog.o
+ sockmap_tcp_msg_prog.o test_lwt_seg6local.o
# Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \
test_libbpf.sh \
test_xdp_redirect.sh \
test_xdp_meta.sh \
- test_offload.py
+ test_offload.py \
+ test_lwt_seg6local.sh
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_libbpf_open
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 7cae376d8d0c..9a89a2a4cea8 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -94,6 +94,18 @@ static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
(void *) BPF_FUNC_msg_cork_bytes;
static int (*bpf_msg_pull_data)(void *ctx, int start, int end, int flags) =
(void *) BPF_FUNC_msg_pull_data;
+static int (*bpf_lwt_push_encap)(void *ctx, unsigned int type, void *hdr,
+ unsigned int len) =
+ (void *) BPF_FUNC_lwt_push_encap;
+static int (*bpf_lwt_seg6_store_bytes)(void *ctx, unsigned int offset,
+ void *from, unsigned int len) =
+ (void *) BPF_FUNC_lwt_seg6_store_bytes;
+static int (*bpf_lwt_seg6_action)(void *ctx, unsigned int action, void *param,
+ unsigned int param_len) =
+ (void *) BPF_FUNC_lwt_seg6_action;
+static int (*bpf_lwt_seg6_adjust_srh)(void *ctx, unsigned int offset,
+ unsigned int len) =
+ (void *) BPF_FUNC_lwt_seg6_adjust_srh;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.c b/tools/testing/selftests/bpf/test_lwt_seg6local.c
new file mode 100644
index 000000000000..d752bc1fe81c
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.c
@@ -0,0 +1,438 @@
+#include <stddef.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <linux/seg6_local.h>
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define bpf_printk(fmt, ...) \
+({ \
+ char ____fmt[] = fmt; \
+ bpf_trace_printk(____fmt, sizeof(____fmt), \
+ ##__VA_ARGS__); \
+})
+
+/* Packet parsing state machine helpers. */
+#define cursor_advance(_cursor, _len) \
+ ({ void *_tmp = _cursor; _cursor += _len; _tmp; })
+
+#define SR6_FLAG_ALERT (1 << 4)
+
+#define htonll(x) ((bpf_htonl(1)) == 1 ? (x) : ((uint64_t)bpf_htonl((x) & \
+ 0xFFFFFFFF) << 32) | bpf_htonl((x) >> 32))
+#define ntohll(x) ((bpf_ntohl(1)) == 1 ? (x) : ((uint64_t)bpf_ntohl((x) & \
+ 0xFFFFFFFF) << 32) | bpf_ntohl((x) >> 32))
+#define BPF_PACKET_HEADER __attribute__((packed))
+
+struct ip6_t {
+ unsigned int ver:4;
+ unsigned int priority:8;
+ unsigned int flow_label:20;
+ unsigned short payload_len;
+ unsigned char next_header;
+ unsigned char hop_limit;
+ unsigned long long src_hi;
+ unsigned long long src_lo;
+ unsigned long long dst_hi;
+ unsigned long long dst_lo;
+} BPF_PACKET_HEADER;
+
+struct ip6_addr_t {
+ unsigned long long hi;
+ unsigned long long lo;
+} BPF_PACKET_HEADER;
+
+struct ip6_srh_t {
+ unsigned char nexthdr;
+ unsigned char hdrlen;
+ unsigned char type;
+ unsigned char segments_left;
+ unsigned char first_segment;
+ unsigned char flags;
+ unsigned short tag;
+
+ struct ip6_addr_t segments[0];
+} BPF_PACKET_HEADER;
+
+struct sr6_tlv_t {
+ unsigned char type;
+ unsigned char len;
+ unsigned char value[0];
+} BPF_PACKET_HEADER;
+
+__attribute__((always_inline)) struct ip6_srh_t *get_srh(struct __sk_buff *skb)
+{
+ void *cursor, *data_end;
+ struct ip6_srh_t *srh;
+ struct ip6_t *ip;
+ uint8_t *ipver;
+
+ data_end = (void *)(long)skb->data_end;
+ cursor = (void *)(long)skb->data;
+ ipver = (uint8_t *)cursor;
+
+ if ((void *)ipver + sizeof(*ipver) > data_end)
+ return NULL;
+
+ if ((*ipver >> 4) != 6)
+ return NULL;
+
+ ip = cursor_advance(cursor, sizeof(*ip));
+ if ((void *)ip + sizeof(*ip) > data_end)
+ return NULL;
+
+ if (ip->next_header != 43)
+ return NULL;
+
+ srh = cursor_advance(cursor, sizeof(*srh));
+ if ((void *)srh + sizeof(*srh) > data_end)
+ return NULL;
+
+ if (srh->type != 4)
+ return NULL;
+
+ return srh;
+}
+
+__attribute__((always_inline))
+int update_tlv_pad(struct __sk_buff *skb, uint32_t new_pad,
+ uint32_t old_pad, uint32_t pad_off)
+{
+ int err;
+
+ if (new_pad != old_pad) {
+ err = bpf_lwt_seg6_adjust_srh(skb, pad_off,
+ (int) new_pad - (int) old_pad);
+ if (err)
+ return err;
+ }
+
+ if (new_pad > 0) {
+ char pad_tlv_buf[16] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0};
+ struct sr6_tlv_t *pad_tlv = (struct sr6_tlv_t *) pad_tlv_buf;
+
+ pad_tlv->type = SR6_TLV_PADDING;
+ pad_tlv->len = new_pad - 2;
+
+ err = bpf_lwt_seg6_store_bytes(skb, pad_off,
+ (void *)pad_tlv_buf, new_pad);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+__attribute__((always_inline))
+int is_valid_tlv_boundary(struct __sk_buff *skb, struct ip6_srh_t *srh,
+ uint32_t *tlv_off, uint32_t *pad_size,
+ uint32_t *pad_off)
+{
+ uint32_t srh_off, cur_off;
+ int offset_valid = 0;
+ int err;
+
+ srh_off = (char *)srh - (char *)(long)skb->data;
+ // cur_off = end of segments, start of possible TLVs
+ cur_off = srh_off + sizeof(*srh) +
+ sizeof(struct ip6_addr_t) * (srh->first_segment + 1);
+
+ *pad_off = 0;
+
+ // we can only go as far as ~10 TLVs due to the BPF max stack size
+ #pragma clang loop unroll(full)
+ for (int i = 0; i < 10; i++) {
+ struct sr6_tlv_t tlv;
+
+ if (cur_off == *tlv_off)
+ offset_valid = 1;
+
+ if (cur_off >= srh_off + ((srh->hdrlen + 1) << 3))
+ break;
+
+ err = bpf_skb_load_bytes(skb, cur_off, &tlv, sizeof(tlv));
+ if (err)
+ return err;
+
+ if (tlv.type == SR6_TLV_PADDING) {
+ *pad_size = tlv.len + sizeof(tlv);
+ *pad_off = cur_off;
+
+ if (*tlv_off == srh_off) {
+ *tlv_off = cur_off;
+ offset_valid = 1;
+ }
+ break;
+
+ } else if (tlv.type == SR6_TLV_HMAC) {
+ break;
+ }
+
+ cur_off += sizeof(tlv) + tlv.len;
+ } // we reached the padding or HMAC TLVs, or the end of the SRH
+
+ if (*pad_off == 0)
+ *pad_off = cur_off;
+
+ if (*tlv_off == -1)
+ *tlv_off = cur_off;
+ else if (!offset_valid)
+ return -EINVAL;
+
+ return 0;
+}
+
+__attribute__((always_inline))
+int add_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh, uint32_t tlv_off,
+ struct sr6_tlv_t *itlv, uint8_t tlv_size)
+{
+ uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
+ uint8_t len_remaining, new_pad;
+ uint32_t pad_off = 0;
+ uint32_t pad_size = 0;
+ uint32_t partial_srh_len;
+ int err;
+
+ if (tlv_off != -1)
+ tlv_off += srh_off;
+
+ if (itlv->type == SR6_TLV_PADDING || itlv->type == SR6_TLV_HMAC)
+ return -EINVAL;
+
+ err = is_valid_tlv_boundary(skb, srh, &tlv_off, &pad_size, &pad_off);
+ if (err)
+ return err;
+
+ err = bpf_lwt_seg6_adjust_srh(skb, tlv_off, sizeof(*itlv) + itlv->len);
+ if (err)
+ return err;
+
+ err = bpf_lwt_seg6_store_bytes(skb, tlv_off, (void *)itlv, tlv_size);
+ if (err)
+ return err;
+
+ // the following can't be moved inside update_tlv_pad because the
+ // bpf verifier has some issues with it
+ pad_off += sizeof(*itlv) + itlv->len;
+ partial_srh_len = pad_off - srh_off;
+ len_remaining = partial_srh_len % 8;
+ new_pad = 8 - len_remaining;
+
+ if (new_pad == 1) // cannot pad for 1 byte only
+ new_pad = 9;
+ else if (new_pad == 8)
+ new_pad = 0;
+
+ return update_tlv_pad(skb, new_pad, pad_size, pad_off);
+}
+
+__attribute__((always_inline))
+int delete_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh,
+ uint32_t tlv_off)
+{
+ uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
+ uint8_t len_remaining, new_pad;
+ uint32_t partial_srh_len;
+ uint32_t pad_off = 0;
+ uint32_t pad_size = 0;
+ struct sr6_tlv_t tlv;
+ int err;
+
+ tlv_off += srh_off;
+
+ err = is_valid_tlv_boundary(skb, srh, &tlv_off, &pad_size, &pad_off);
+ if (err)
+ return err;
+
+ err = bpf_skb_load_bytes(skb, tlv_off, &tlv, sizeof(tlv));
+ if (err)
+ return err;
+
+ err = bpf_lwt_seg6_adjust_srh(skb, tlv_off, -(sizeof(tlv) + tlv.len));
+ if (err)
+ return err;
+
+ pad_off -= sizeof(tlv) + tlv.len;
+ partial_srh_len = pad_off - srh_off;
+ len_remaining = partial_srh_len % 8;
+ new_pad = 8 - len_remaining;
+ if (new_pad == 1) // cannot pad for 1 byte only
+ new_pad = 9;
+ else if (new_pad == 8)
+ new_pad = 0;
+
+ return update_tlv_pad(skb, new_pad, pad_size, pad_off);
+}
+
+__attribute__((always_inline))
+int has_egr_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh)
+{
+ int tlv_offset = sizeof(struct ip6_t) + sizeof(struct ip6_srh_t) +
+ ((srh->first_segment + 1) << 4);
+ struct sr6_tlv_t tlv;
+
+ if (bpf_skb_load_bytes(skb, tlv_offset, &tlv, sizeof(struct sr6_tlv_t)))
+ return 0;
+
+ if (tlv.type == SR6_TLV_EGRESS && tlv.len == 18) {
+ struct ip6_addr_t egr_addr;
+
+ if (bpf_skb_load_bytes(skb, tlv_offset + 4, &egr_addr, 16))
+ return 0;
+
+ // check if egress TLV value is correct
+ if (ntohll(egr_addr.hi) == 0xfd00000000000000 &&
+ ntohll(egr_addr.lo) == 0x4)
+ return 1;
+ }
+
+ return 0;
+}
+
+// This function will push a SRH with segments fd00::1, fd00::2, fd00::3,
+// fd00::4
+SEC("encap_srh")
+int __encap_srh(struct __sk_buff *skb)
+{
+ bpf_printk("got pkt\n");
+ unsigned long long hi = 0xfd00000000000000;
+ struct ip6_addr_t *seg;
+ struct ip6_srh_t *srh;
+ char srh_buf[72]; // room for 4 segments
+ int err;
+
+ srh = (struct ip6_srh_t *)srh_buf;
+ srh->nexthdr = 0;
+ srh->hdrlen = 8;
+ srh->type = 4;
+ srh->segments_left = 3;
+ srh->first_segment = 3;
+ srh->flags = 0;
+ srh->tag = 0;
+
+ seg = (struct ip6_addr_t *)((char *)srh + sizeof(*srh));
+
+ #pragma clang loop unroll(full)
+ for (unsigned long long lo = 0; lo < 4; lo++) {
+ seg->lo = htonll(4 - lo);
+ seg->hi = htonll(hi);
+ seg = (struct ip6_addr_t *)((char *)seg + sizeof(*seg));
+ }
+
+ err = bpf_lwt_push_encap(skb, 0, (void *)srh, sizeof(srh_buf));
+ if (err)
+ return BPF_DROP;
+
+ return BPF_REDIRECT;
+}
+
+// Add an Egress TLV fc00::4, add the flag A,
+// and apply End.X action to fc42::1
+SEC("add_egr_x")
+int __add_egr_x(struct __sk_buff *skb)
+{
+ unsigned long long hi = 0xfc42000000000000;
+ unsigned long long lo = 0x1;
+ struct ip6_srh_t *srh = get_srh(skb);
+ uint8_t new_flags = SR6_FLAG_ALERT;
+ struct ip6_addr_t addr;
+ int err, offset;
+
+ if (srh == NULL)
+ return BPF_DROP;
+
+ uint8_t tlv[20] = {2, 18, 0, 0, 0xfd, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4};
+
+ err = add_tlv(skb, srh, (srh->hdrlen+1) << 3,
+ (struct sr6_tlv_t *)&tlv, 20);
+ if (err)
+ return BPF_DROP;
+
+ offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, flags);
+ err = bpf_lwt_seg6_store_bytes(skb, offset,
+ (void *)&new_flags, sizeof(new_flags));
+ if (err)
+ return BPF_DROP;
+
+ addr.lo = htonll(lo);
+ addr.hi = htonll(hi);
+ err = bpf_lwt_seg6_action(skb, SEG6_LOCAL_ACTION_END_X,
+ (void *)&addr, sizeof(addr));
+ if (err)
+ return BPF_DROP;
+ return BPF_REDIRECT;
+}
+
+// Pop the Egress TLV, reset the flags, change the tag 2442 and finally do a
+// simple End action
+SEC("pop_egr")
+int __pop_egr(struct __sk_buff *skb)
+{
+ struct ip6_srh_t *srh = get_srh(skb);
+ uint16_t new_tag = bpf_htons(2442);
+ uint8_t new_flags = 0;
+ int err, offset;
+
+ if (srh == NULL)
+ return BPF_DROP;
+
+ if (srh->flags != SR6_FLAG_ALERT)
+ return BPF_DROP;
+
+ if (srh->hdrlen != 11) // 4 segments + Egress TLV + Padding TLV
+ return BPF_DROP;
+
+ if (!has_egr_tlv(skb, srh))
+ return BPF_DROP;
+
+ err = delete_tlv(skb, srh, 8 + (srh->first_segment + 1) * 16);
+ if (err)
+ return BPF_DROP;
+
+ offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, flags);
+ if (bpf_lwt_seg6_store_bytes(skb, offset, (void *)&new_flags,
+ sizeof(new_flags)))
+ return BPF_DROP;
+
+ offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, tag);
+ if (bpf_lwt_seg6_store_bytes(skb, offset, (void *)&new_tag,
+ sizeof(new_tag)))
+ return BPF_DROP;
+
+ return BPF_OK;
+}
+
+// Inspect if the Egress TLV and flag have been removed, if the tag is correct,
+// then apply a End.T action to reach the last segment
+SEC("inspect_t")
+int __inspect_t(struct __sk_buff *skb)
+{
+ struct ip6_srh_t *srh = get_srh(skb);
+ int table = 117;
+ int err;
+
+ if (srh == NULL)
+ return BPF_DROP;
+
+ if (srh->flags != 0)
+ return BPF_DROP;
+
+ if (srh->tag != bpf_htons(2442))
+ return BPF_DROP;
+
+ if (srh->hdrlen != 8) // 4 segments
+ return BPF_DROP;
+
+ err = bpf_lwt_seg6_action(skb, SEG6_LOCAL_ACTION_END_T,
+ (void *)&table, sizeof(table));
+
+ if (err)
+ return BPF_DROP;
+
+ return BPF_REDIRECT;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
new file mode 100755
index 000000000000..1c77994b5e71
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
@@ -0,0 +1,140 @@
+#!/bin/bash
+# Connects 6 network namespaces through veths.
+# Each NS may have different IPv6 global scope addresses :
+# NS1 ---- NS2 ---- NS3 ---- NS4 ---- NS5 ---- NS6
+# fb00::1 fd00::1 fd00::2 fd00::3 fb00::6
+# fc42::1 fd00::4
+#
+# All IPv6 packets going to fb00::/16 through NS2 will be encapsulated in a
+# IPv6 header with a Segment Routing Header, with segments :
+# fd00::1 -> fd00::2 -> fd00::3 -> fd00::4
+#
+# 3 fd00::/16 IPv6 addresses are binded to seg6local End.BPF actions :
+# - fd00::1 : add a TLV, change the flags and apply a End.X action to fc42::1
+# - fd00::2 : remove the TLV, change the flags, add a tag
+# - fd00::3 : apply an End.T action to fd00::4, through routing table 117
+#
+# fd00::4 is a simple Segment Routing node decapsulating the inner IPv6 packet.
+# Each End.BPF action will validate the operations applied on the SRH by the
+# previous BPF program in the chain, otherwise the packet is dropped.
+#
+# An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
+# datagram can be read on NS6 when binding to fb00::6.
+
+TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
+
+cleanup()
+{
+ if [ "$?" = "0" ]; then
+ echo "selftests: test_lwt_seg6local [PASS]";
+ else
+ echo "selftests: test_lwt_seg6local [FAILED]";
+ fi
+
+ set +e
+ ip netns del ns1 2> /dev/null
+ ip netns del ns2 2> /dev/null
+ ip netns del ns3 2> /dev/null
+ ip netns del ns4 2> /dev/null
+ ip netns del ns5 2> /dev/null
+ ip netns del ns6 2> /dev/null
+ rm -f $TMP_FILE
+}
+
+set -e
+
+ip netns add ns1
+ip netns add ns2
+ip netns add ns3
+ip netns add ns4
+ip netns add ns5
+ip netns add ns6
+
+trap cleanup 0 2 3 6 9
+
+ip link add veth1 type veth peer name veth2
+ip link add veth3 type veth peer name veth4
+ip link add veth5 type veth peer name veth6
+ip link add veth7 type veth peer name veth8
+ip link add veth9 type veth peer name veth10
+
+ip link set veth1 netns ns1
+ip link set veth2 netns ns2
+ip link set veth3 netns ns2
+ip link set veth4 netns ns3
+ip link set veth5 netns ns3
+ip link set veth6 netns ns4
+ip link set veth7 netns ns4
+ip link set veth8 netns ns5
+ip link set veth9 netns ns5
+ip link set veth10 netns ns6
+
+ip netns exec ns1 ip link set dev veth1 up
+ip netns exec ns2 ip link set dev veth2 up
+ip netns exec ns2 ip link set dev veth3 up
+ip netns exec ns3 ip link set dev veth4 up
+ip netns exec ns3 ip link set dev veth5 up
+ip netns exec ns4 ip link set dev veth6 up
+ip netns exec ns4 ip link set dev veth7 up
+ip netns exec ns5 ip link set dev veth8 up
+ip netns exec ns5 ip link set dev veth9 up
+ip netns exec ns6 ip link set dev veth10 up
+ip netns exec ns6 ip link set dev lo up
+
+# All link scope addresses and routes required between veths
+ip netns exec ns1 ip -6 addr add fb00::12/16 dev veth1 scope link
+ip netns exec ns1 ip -6 route add fb00::21 dev veth1 scope link
+ip netns exec ns2 ip -6 addr add fb00::21/16 dev veth2 scope link
+ip netns exec ns2 ip -6 addr add fb00::34/16 dev veth3 scope link
+ip netns exec ns2 ip -6 route add fb00::43 dev veth3 scope link
+ip netns exec ns3 ip -6 route add fb00::65 dev veth5 scope link
+ip netns exec ns3 ip -6 addr add fb00::43/16 dev veth4 scope link
+ip netns exec ns3 ip -6 addr add fb00::56/16 dev veth5 scope link
+ip netns exec ns4 ip -6 addr add fb00::65/16 dev veth6 scope link
+ip netns exec ns4 ip -6 addr add fb00::78/16 dev veth7 scope link
+ip netns exec ns4 ip -6 route add fb00::87 dev veth7 scope link
+ip netns exec ns5 ip -6 addr add fb00::87/16 dev veth8 scope link
+ip netns exec ns5 ip -6 addr add fb00::910/16 dev veth9 scope link
+ip netns exec ns5 ip -6 route add fb00::109 dev veth9 scope link
+ip netns exec ns5 ip -6 route add fb00::109 table 117 dev veth9 scope link
+ip netns exec ns6 ip -6 addr add fb00::109/16 dev veth10 scope link
+
+ip netns exec ns1 ip -6 addr add fb00::1/16 dev lo
+ip netns exec ns1 ip -6 route add fb00::6 dev veth1 via fb00::21
+
+ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2
+ip netns exec ns2 ip -6 route add fd00::1 dev veth3 via fb00::43 scope link
+
+ip netns exec ns3 ip -6 route add fc42::1 dev veth5 via fb00::65
+ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF obj test_lwt_seg6local.o sec add_egr_x dev veth4
+
+ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF obj test_lwt_seg6local.o sec pop_egr dev veth6
+ip netns exec ns4 ip -6 addr add fc42::1 dev lo
+ip netns exec ns4 ip -6 route add fd00::3 dev veth7 via fb00::87
+
+ip netns exec ns5 ip -6 route add fd00::4 table 117 dev veth9 via fb00::109
+ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF obj test_lwt_seg6local.o sec inspect_t dev veth8
+
+ip netns exec ns6 ip -6 addr add fb00::6/16 dev lo
+ip netns exec ns6 ip -6 addr add fd00::4/16 dev lo
+
+ip netns exec ns1 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns2 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns3 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns4 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns5 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+
+ip netns exec ns6 sysctl net.ipv6.conf.all.seg6_enabled=1 > /dev/null
+ip netns exec ns6 sysctl net.ipv6.conf.lo.seg6_enabled=1 > /dev/null
+ip netns exec ns6 sysctl net.ipv6.conf.veth10.seg6_enabled=1 > /dev/null
+
+ip netns exec ns6 nc -l -6 -u -d 7330 > $TMP_FILE &
+ip netns exec ns1 bash -c "echo 'foobar' | nc -w0 -6 -u -p 2121 -s fb00::1 fb00::6 7330"
+sleep 5 # wait enough time to ensure the UDP datagram arrived to the last segment
+kill -INT $!
+
+if [[ $(< $TMP_FILE) != "foobar" ]]; then
+ exit 1
+fi
+
+exit 0
--
2.16.1
^ permalink raw reply related
* Re: [bpf-next V4 PATCH 11/15] page_pool: refurbish version of page_pool code
From: Jesper Dangaard Brouer @ 2018-03-23 9:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: netdev, BjörnTöpel, magnus.karlsson, eugenia,
Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed, galp,
Daniel Borkmann, Tariq Toukan, brouer
In-Reply-To: <20180322172156.zmd33sbzggqggkkj@ast-mbp.dhcp.thefacebook.com>
On Thu, 22 Mar 2018 10:21:58 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > new file mode 100644
> > index 000000000000..04112feb2df6
> > --- /dev/null
> > +++ b/net/core/page_pool.c
> > @@ -0,0 +1,329 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>
> These are kernel internal files. 'WITH ...' doesn't apply.
> See LICENSES/exceptions/Linux-syscall-note
Okay, that makes sense. I'll change it to plain GPL-2.0 then.
And send out a V5... adding Tariq's reviewed-by.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* [PATCH net-next 2/2] net: Drop NETDEV_UNREGISTER_FINAL
From: Kirill Tkhai @ 2018-03-23 9:39 UTC (permalink / raw)
To: davem, Michal.Kalderon, Ariel.Elior, dledford, jgg, benve,
1dgoodell, daniel, jakub.kicinski, ast, edumazet, linux,
john.fastabend, brouer, dsahern, netdev, ktkhai
In-Reply-To: <152179793764.13076.13815355125332565598.stgit@localhost.localdomain>
Last user is gone after bdf5bd7f2132 "rds: tcp: remove
register_netdevice_notifier infrastructure.", so we can
remove this netdevice command. This allows to delete
rtnl_lock() in netdev_run_todo(), which is hot path for
net namespace unregistration.
dev_change_net_namespace() and netdev_wait_allrefs()
have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
and the source commits say they were introduced to
delemit the call with NETDEV_UNREGISTER, but this patch
leaves them on the places, since they require additional
analysis, whether we need in them for something else.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
drivers/infiniband/hw/qedr/main.c | 4 ++--
include/linux/netdevice.h | 1 -
include/rdma/ib_verbs.h | 4 ++--
net/core/dev.c | 6 ------
4 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index db4bf97c0e15..eb32abb0099a 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -90,8 +90,8 @@ static struct net_device *qedr_get_netdev(struct ib_device *dev, u8 port_num)
dev_hold(qdev->ndev);
/* The HW vendor's device driver must guarantee
- * that this function returns NULL before the net device reaches
- * NETDEV_UNREGISTER_FINAL state.
+ * that this function returns NULL before the net device has finished
+ * NETDEV_UNREGISTER state.
*/
return qdev->ndev;
}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dd5a04c971d5..2a2d9cf50aa2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2336,7 +2336,6 @@ enum netdev_cmd {
NETDEV_PRE_TYPE_CHANGE,
NETDEV_POST_TYPE_CHANGE,
NETDEV_POST_INIT,
- NETDEV_UNREGISTER_FINAL,
NETDEV_RELEASE,
NETDEV_NOTIFY_PEERS,
NETDEV_JOIN,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 73b2387e3f74..a1c5e8320f86 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2126,8 +2126,8 @@ struct ib_device {
* net device of device @device at port @port_num or NULL if such
* a net device doesn't exist. The vendor driver should call dev_hold
* on this net device. The HW vendor's device driver must guarantee
- * that this function returns NULL before the net device reaches
- * NETDEV_UNREGISTER_FINAL state.
+ * that this function returns NULL before the net device has finished
+ * NETDEV_UNREGISTER state.
*/
struct net_device *(*get_netdev)(struct ib_device *device,
u8 port_num);
diff --git a/net/core/dev.c b/net/core/dev.c
index bcc9fe68240b..c444da3e4b82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8086,7 +8086,6 @@ static void netdev_wait_allrefs(struct net_device *dev)
rcu_barrier();
rtnl_lock();
- call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
&dev->state)) {
/* We must not have linkwatch events
@@ -8158,10 +8157,6 @@ void netdev_run_todo(void)
= list_first_entry(&list, struct net_device, todo_list);
list_del(&dev->todo_list);
- rtnl_lock();
- call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
- __rtnl_unlock();
-
if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) {
pr_err("network todo '%s' but state %d\n",
dev->name, dev->reg_state);
@@ -8603,7 +8598,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
rcu_barrier();
- call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
new_nsid = peernet2id_alloc(dev_net(dev), net);
/* If there is an ifindex conflict assign a new one */
^ permalink raw reply related
* [QUESTION] Mainline support for B43_PHY_AC wifi cards
From: Juri Lelli @ 2018-03-23 9:47 UTC (permalink / raw)
To: b43-dev; +Cc: linux-wireless, netdev, linux-kernel
Hi,
I've got a Dell XPS 13 9343/0TM99H (BIOS A15 01/23/2018) mounting a
BCM4352 802.11ac (rev 03) wireless card and so far I've been using it on
Fedora with broadcom-wl package (which I believe installs Broadcom's STA
driver?). It works good apart from occasional hiccups after suspend.
I'd like to get rid of that dependency (you can understand that it's
particularly annoying when testing mainline kernels), but I found out
that support for my card is BROKEN in mainline [1]. Just to see what
happens, I forcibly enabled it witnessing that it indeed crashes like
below as Kconfig warns. :)
bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00
bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, class 0x0)
bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, class 0x0)
bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, class 0x0)
bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, class 0x0)
bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev 0x11, class 0x0)
bcma: Unsupported SPROM revision: 11
bcma: bus0: Invalid SPROM read from the PCIe card, trying to use fallback SPROM
bcma: bus0: Using fallback SPROM failed (err -2)
bcma: bus0: No SPROM available
bcma: bus0: Bus registered
b43-phy0: Broadcom 4352 WLAN found (core revision 42)
b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1
b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
So, question: is replacing my card the only way I can get rid of this
downstream dependency? :(
Thanks a lot.
Best,
- Juri
[1] https://elixir.bootlin.com/linux/v4.16-rc6/source/drivers/net/wireless/broadcom/b43/Kconfig#L151
^ permalink raw reply
* RE: [PATCH] fsl/fman: remove unnecessary set_dma_ops() call and HAS_DMA dependency
From: Madalin-cristian Bucur @ 2018-03-23 9:51 UTC (permalink / raw)
To: David Miller
Cc: geert.uytterhoeven@gmail.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20180322.143159.2085752623277697593.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, March 22, 2018 8:32 PM
> To: Madalin-cristian Bucur <madalin.bucur@nxp.com>
> Cc: geert.uytterhoeven@gmail.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] fsl/fman: remove unnecessary set_dma_ops() call and
> HAS_DMA dependency
>
> From: Madalin Bucur <madalin.bucur@nxp.com>
> Date: Wed, 21 Mar 2018 03:58:19 -0500
>
> > The platform device is no longer used for DMA mapping so the
> > (questionable) setting of the DMA ops done here is no longer
> > needed. Removing it together with the HAS_DMA dependency that
> > it required.
> >
> > Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
>
> This doesn't apply to any of my trees.
Sorry, it's caused by a patch in my tree that adds ARM 32b support, resulting in differences
in the context:
- depends on FSL_SOC || ARCH_LAYERSCAPE || COMPILE_TEST
+ depends on ARM || ARCH_LAYERSCAPE || FSL_SOC || COMPILE_TEST
I'll send a v2 based on a clean tree.
Madalin
^ permalink raw reply
* [PATCH 4.9 015/177] time: Change posix clocks ops interfaces to use timespec64
From: Greg Kroah-Hartman @ 2018-03-23 9:52 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Deepa Dinamani, arnd, y2038, netdev,
Richard Cochran, john.stultz, Thomas Gleixner, Sasha Levin
In-Reply-To: <20180323094205.090519271@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Deepa Dinamani <deepa.kernel@gmail.com>
[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ]
struct timespec is not y2038 safe on 32 bit machines.
The posix clocks apis use struct timespec directly and through struct
itimerspec.
Replace the posix clock interfaces to use struct timespec64 and struct
itimerspec64 instead. Also fix up their implementations accordingly.
Note that the clock_getres() interface has also been changed to use
timespec64 even though this particular interface is not affected by the
y2038 problem. This helps verification for internal kernel code for y2038
readiness by getting rid of time_t/ timeval/ timespec.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: arnd@arndb.de
Cc: y2038@lists.linaro.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/ptp/ptp_clock.c | 18 +++++++-----------
include/linux/posix-clock.h | 10 +++++-----
kernel/time/posix-clock.c | 34 ++++++++++++++++++++++++----------
3 files changed, 36 insertions(+), 26 deletions(-)
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,30 +97,26 @@ static s32 scaled_ppm_to_ppb(long ppm)
/* posix clock implementation */
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
{
tp->tv_sec = 0;
tp->tv_nsec = 1;
return 0;
}
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 *tp)
{
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
- struct timespec64 ts = timespec_to_timespec64(*tp);
- return ptp->info->settime64(ptp->info, &ts);
+ return ptp->info->settime64(ptp->info, tp);
}
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
{
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
- struct timespec64 ts;
int err;
- err = ptp->info->gettime64(ptp->info, &ts);
- if (!err)
- *tp = timespec64_to_timespec(ts);
+ err = ptp->info->gettime64(ptp->info, tp);
return err;
}
@@ -133,7 +129,7 @@ static int ptp_clock_adjtime(struct posi
ops = ptp->info;
if (tx->modes & ADJ_SETOFFSET) {
- struct timespec ts;
+ struct timespec64 ts;
ktime_t kt;
s64 delta;
@@ -146,7 +142,7 @@ static int ptp_clock_adjtime(struct posi
if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC)
return -EINVAL;
- kt = timespec_to_ktime(ts);
+ kt = timespec64_to_ktime(ts);
delta = ktime_to_ns(kt);
err = ops->adjtime(ops, delta);
} else if (tx->modes & ADJ_FREQUENCY) {
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
- int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+ int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
- int (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+ int (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts);
int (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
int (*timer_settime)(struct posix_clock *pc,
struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 *old);
/*
* Optional character device methods:
*/
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -300,14 +300,17 @@ out:
static int pc_clock_gettime(clockid_t id, struct timespec *ts)
{
struct posix_clock_desc cd;
+ struct timespec64 ts64;
int err;
err = get_clock_desc(id, &cd);
if (err)
return err;
- if (cd.clk->ops.clock_gettime)
- err = cd.clk->ops.clock_gettime(cd.clk, ts);
+ if (cd.clk->ops.clock_gettime) {
+ err = cd.clk->ops.clock_gettime(cd.clk, &ts64);
+ *ts = timespec64_to_timespec(ts64);
+ }
else
err = -EOPNOTSUPP;
@@ -319,14 +322,17 @@ static int pc_clock_gettime(clockid_t id
static int pc_clock_getres(clockid_t id, struct timespec *ts)
{
struct posix_clock_desc cd;
+ struct timespec64 ts64;
int err;
err = get_clock_desc(id, &cd);
if (err)
return err;
- if (cd.clk->ops.clock_getres)
- err = cd.clk->ops.clock_getres(cd.clk, ts);
+ if (cd.clk->ops.clock_getres) {
+ err = cd.clk->ops.clock_getres(cd.clk, &ts64);
+ *ts = timespec64_to_timespec(ts64);
+ }
else
err = -EOPNOTSUPP;
@@ -337,6 +343,7 @@ static int pc_clock_getres(clockid_t id,
static int pc_clock_settime(clockid_t id, const struct timespec *ts)
{
+ struct timespec64 ts64 = timespec_to_timespec64(*ts);
struct posix_clock_desc cd;
int err;
@@ -350,7 +357,7 @@ static int pc_clock_settime(clockid_t id
}
if (cd.clk->ops.clock_settime)
- err = cd.clk->ops.clock_settime(cd.clk, ts);
+ err = cd.clk->ops.clock_settime(cd.clk, &ts64);
else
err = -EOPNOTSUPP;
out:
@@ -403,29 +410,36 @@ static void pc_timer_gettime(struct k_it
{
clockid_t id = kit->it_clock;
struct posix_clock_desc cd;
+ struct itimerspec64 ts64;
if (get_clock_desc(id, &cd))
return;
- if (cd.clk->ops.timer_gettime)
- cd.clk->ops.timer_gettime(cd.clk, kit, ts);
-
+ if (cd.clk->ops.timer_gettime) {
+ cd.clk->ops.timer_gettime(cd.clk, kit, &ts64);
+ *ts = itimerspec64_to_itimerspec(&ts64);
+ }
put_clock_desc(&cd);
}
static int pc_timer_settime(struct k_itimer *kit, int flags,
struct itimerspec *ts, struct itimerspec *old)
{
+ struct itimerspec64 ts64 = itimerspec_to_itimerspec64(ts);
clockid_t id = kit->it_clock;
struct posix_clock_desc cd;
+ struct itimerspec64 old64;
int err;
err = get_clock_desc(id, &cd);
if (err)
return err;
- if (cd.clk->ops.timer_settime)
- err = cd.clk->ops.timer_settime(cd.clk, kit, flags, ts, old);
+ if (cd.clk->ops.timer_settime) {
+ err = cd.clk->ops.timer_settime(cd.clk, kit, flags, &ts64, &old64);
+ if (old)
+ *old = itimerspec64_to_itimerspec(&old64);
+ }
else
err = -EOPNOTSUPP;
^ permalink raw reply
* [PATCH 4.4 11/97] time: Change posix clocks ops interfaces to use timespec64
From: Greg Kroah-Hartman @ 2018-03-23 9:53 UTC (permalink / raw)
To: linux-kernel
Cc: arnd, y2038, Greg Kroah-Hartman, Richard Cochran, stable,
Sasha Levin, john.stultz, Deepa Dinamani, netdev, Thomas Gleixner
In-Reply-To: <20180323094157.535925724@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Deepa Dinamani <deepa.kernel@gmail.com>
[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ]
struct timespec is not y2038 safe on 32 bit machines.
The posix clocks apis use struct timespec directly and through struct
itimerspec.
Replace the posix clock interfaces to use struct timespec64 and struct
itimerspec64 instead. Also fix up their implementations accordingly.
Note that the clock_getres() interface has also been changed to use
timespec64 even though this particular interface is not affected by the
y2038 problem. This helps verification for internal kernel code for y2038
readiness by getting rid of time_t/ timeval/ timespec.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: arnd@arndb.de
Cc: y2038@lists.linaro.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/ptp/ptp_clock.c | 18 +++++++-----------
include/linux/posix-clock.h | 10 +++++-----
kernel/time/posix-clock.c | 34 ++++++++++++++++++++++++----------
3 files changed, 36 insertions(+), 26 deletions(-)
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,30 +97,26 @@ static s32 scaled_ppm_to_ppb(long ppm)
/* posix clock implementation */
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
{
tp->tv_sec = 0;
tp->tv_nsec = 1;
return 0;
}
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 *tp)
{
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
- struct timespec64 ts = timespec_to_timespec64(*tp);
- return ptp->info->settime64(ptp->info, &ts);
+ return ptp->info->settime64(ptp->info, tp);
}
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
{
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
- struct timespec64 ts;
int err;
- err = ptp->info->gettime64(ptp->info, &ts);
- if (!err)
- *tp = timespec64_to_timespec(ts);
+ err = ptp->info->gettime64(ptp->info, tp);
return err;
}
@@ -133,7 +129,7 @@ static int ptp_clock_adjtime(struct posi
ops = ptp->info;
if (tx->modes & ADJ_SETOFFSET) {
- struct timespec ts;
+ struct timespec64 ts;
ktime_t kt;
s64 delta;
@@ -146,7 +142,7 @@ static int ptp_clock_adjtime(struct posi
if ((unsigned long) ts.tv_nsec >= NSEC_PER_SEC)
return -EINVAL;
- kt = timespec_to_ktime(ts);
+ kt = timespec64_to_ktime(ts);
delta = ktime_to_ns(kt);
err = ops->adjtime(ops, delta);
} else if (tx->modes & ADJ_FREQUENCY) {
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
- int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+ int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
- int (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+ int (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts);
int (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
int (*timer_settime)(struct posix_clock *pc,
struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 *old);
/*
* Optional character device methods:
*/
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -300,14 +300,17 @@ out:
static int pc_clock_gettime(clockid_t id, struct timespec *ts)
{
struct posix_clock_desc cd;
+ struct timespec64 ts64;
int err;
err = get_clock_desc(id, &cd);
if (err)
return err;
- if (cd.clk->ops.clock_gettime)
- err = cd.clk->ops.clock_gettime(cd.clk, ts);
+ if (cd.clk->ops.clock_gettime) {
+ err = cd.clk->ops.clock_gettime(cd.clk, &ts64);
+ *ts = timespec64_to_timespec(ts64);
+ }
else
err = -EOPNOTSUPP;
@@ -319,14 +322,17 @@ static int pc_clock_gettime(clockid_t id
static int pc_clock_getres(clockid_t id, struct timespec *ts)
{
struct posix_clock_desc cd;
+ struct timespec64 ts64;
int err;
err = get_clock_desc(id, &cd);
if (err)
return err;
- if (cd.clk->ops.clock_getres)
- err = cd.clk->ops.clock_getres(cd.clk, ts);
+ if (cd.clk->ops.clock_getres) {
+ err = cd.clk->ops.clock_getres(cd.clk, &ts64);
+ *ts = timespec64_to_timespec(ts64);
+ }
else
err = -EOPNOTSUPP;
@@ -337,6 +343,7 @@ static int pc_clock_getres(clockid_t id,
static int pc_clock_settime(clockid_t id, const struct timespec *ts)
{
+ struct timespec64 ts64 = timespec_to_timespec64(*ts);
struct posix_clock_desc cd;
int err;
@@ -350,7 +357,7 @@ static int pc_clock_settime(clockid_t id
}
if (cd.clk->ops.clock_settime)
- err = cd.clk->ops.clock_settime(cd.clk, ts);
+ err = cd.clk->ops.clock_settime(cd.clk, &ts64);
else
err = -EOPNOTSUPP;
out:
@@ -403,29 +410,36 @@ static void pc_timer_gettime(struct k_it
{
clockid_t id = kit->it_clock;
struct posix_clock_desc cd;
+ struct itimerspec64 ts64;
if (get_clock_desc(id, &cd))
return;
- if (cd.clk->ops.timer_gettime)
- cd.clk->ops.timer_gettime(cd.clk, kit, ts);
-
+ if (cd.clk->ops.timer_gettime) {
+ cd.clk->ops.timer_gettime(cd.clk, kit, &ts64);
+ *ts = itimerspec64_to_itimerspec(&ts64);
+ }
put_clock_desc(&cd);
}
static int pc_timer_settime(struct k_itimer *kit, int flags,
struct itimerspec *ts, struct itimerspec *old)
{
+ struct itimerspec64 ts64 = itimerspec_to_itimerspec64(ts);
clockid_t id = kit->it_clock;
struct posix_clock_desc cd;
+ struct itimerspec64 old64;
int err;
err = get_clock_desc(id, &cd);
if (err)
return err;
- if (cd.clk->ops.timer_settime)
- err = cd.clk->ops.timer_settime(cd.clk, kit, flags, ts, old);
+ if (cd.clk->ops.timer_settime) {
+ err = cd.clk->ops.timer_settime(cd.clk, kit, flags, &ts64, &old64);
+ if (old)
+ *old = itimerspec64_to_itimerspec(&old64);
+ }
else
err = -EOPNOTSUPP;
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038
^ permalink raw reply
* [PATCH][next] net: hns3: make function hclge_inform_reset_assert_to_vf static
From: Colin King @ 2018-03-23 9:54 UTC (permalink / raw)
To: Yisen Zhuang, Salil Mehta, David S . Miller, Peng Li, netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
The function hclge_inform_reset_assert_to_vf is local to the source and
does not need to be in global scope, so make it static.
Cleans up sparse warning:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c:82:5: warning:
symbol 'hclge_inform_reset_assert_to_vf' was not declared. Should it
be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index 39013334a613..a6f7ffa9c259 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -79,7 +79,7 @@ static int hclge_send_mbx_msg(struct hclge_vport *vport, u8 *msg, u16 msg_len,
return status;
}
-int hclge_inform_reset_assert_to_vf(struct hclge_vport *vport)
+static int hclge_inform_reset_assert_to_vf(struct hclge_vport *vport)
{
u8 msg_data[2];
u8 dest_vfid;
--
2.15.1
^ permalink raw reply related
* [PATCH net-next] cxgb4: Setup FW queues before registering netdev
From: Ganesh Goudar @ 2018-03-23 9:55 UTC (permalink / raw)
To: netdev, davem
Cc: nirranjan, indranil, venkatesh, Arjun Vynipadath, Casey Leedom,
Ganesh Goudar
From: Arjun Vynipadath <arjun@chelsio.com>
When NetworkManager is enabled, there are chances that interface up
is called even before probe completes. This means we have not yet
allocated the FW sge queues, hence rest of ingress queue allocation
wont be proper. Fix this by calling setup_fw_sge_queues() before
register_netdev().
Fixes: 0fbc81b3ad51 ('chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's')
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index c4e0e47..8f733ee 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -841,8 +841,6 @@ static int setup_fw_sge_queues(struct adapter *adap)
err = t4_sge_alloc_rxq(adap, &s->fw_evtq, true, adap->port[0],
adap->msi_idx, NULL, fwevtq_handler, NULL, -1);
- if (err)
- t4_free_sge_resources(adap);
return err;
}
@@ -5750,6 +5748,13 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
if (err)
goto out_free_dev;
+ err = setup_fw_sge_queues(adapter);
+ if (err) {
+ dev_err(adapter->pdev_dev,
+ "FW sge queue allocation failed, err %d", err);
+ goto out_free_dev;
+ }
+
/*
* The card is now ready to go. If any errors occur during device
* registration we do not fail the whole card but rather proceed only
@@ -5798,10 +5803,10 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
cxgb4_ptp_init(adapter);
print_adapter_info(adapter);
- setup_fw_sge_queues(adapter);
return 0;
out_free_dev:
+ t4_free_sge_resources(adapter);
free_some_resources(adapter);
if (adapter->flags & USING_MSIX)
free_msix_info(adapter);
--
2.1.0
^ permalink raw reply related
* [PATCH V2 net 1/1] net/ipv4: disable SMC TCP option with SYN Cookies
From: Ursula Braun @ 2018-03-23 10:05 UTC (permalink / raw)
To: davem
Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun,
hwippel, eric.dumazet
In-Reply-To: <20180323100545.78607-1-ubraun@linux.vnet.ibm.com>
From: Hans Wippel <hwippel@linux.vnet.ibm.com>
Currently, the SMC experimental TCP option in a SYN packet is lost on
the server side when SYN Cookies are active. However, the corresponding
SYNACK sent back to the client contains the SMC option. This causes an
inconsistent view of the SMC capabilities on the client and server.
This patch disables the SMC option in the SYNACK when SYN Cookies are
active to avoid this issue.
Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC")
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
net/ipv4/syncookies.c | 2 ++
net/ipv4/tcp_input.c | 3 +++
net/ipv6/syncookies.c | 2 ++
3 files changed, 7 insertions(+)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index fda37f2862c9..c3387dfd725b 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -349,6 +349,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
treq->snt_synack = 0;
treq->tfo_listener = false;
+ if (IS_ENABLED(CONFIG_SMC))
+ ireq->smc_ok = 0;
ireq->ir_iif = inet_request_bound_dev_if(sk, skb);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9a1b3c1c1c14..ff6cd98ce8d5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6256,6 +6256,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
if (want_cookie && !tmp_opt.saw_tstamp)
tcp_clear_options(&tmp_opt);
+ if (IS_ENABLED(CONFIG_SMC) && want_cookie)
+ tmp_opt.smc_ok = 0;
+
tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
tcp_openreq_init(req, &tmp_opt, skb, sk);
inet_rsk(req)->no_srccheck = inet_sk(sk)->transparent;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index e7a3a6b6cf56..e997141aed8c 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -217,6 +217,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
treq->snt_isn = cookie;
treq->ts_off = 0;
treq->txhash = net_tx_rndhash();
+ if (IS_ENABLED(CONFIG_SMC))
+ ireq->smc_ok = 0;
/*
* We need to lookup the dst_entry to get the correct window size.
--
2.13.5
^ permalink raw reply related
* [PATCH V2 net 0/1] net/ipv4: SMC and SYN Cookies
From: Ursula Braun @ 2018-03-23 10:05 UTC (permalink / raw)
To: davem
Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun,
hwippel, eric.dumazet
Dave,
net/smc code is based on the SMC TCP experimental option.
Please apply Hans' patch to fix an inconsistency when SYN cookies
are active.
V2 includes the changes proposed by Eric Dumazet.
Thanks, Ursula
Hans Wippel (1):
net/ipv4: disable SMC TCP option with SYN Cookies
net/ipv4/syncookies.c | 2 ++
net/ipv4/tcp_input.c | 3 +++
net/ipv6/syncookies.c | 2 ++
3 files changed, 7 insertions(+)
--
2.13.5
^ permalink raw reply
* [PATCH v1 net] lan78xx: Set ASD in MAC_CR when EEE is enabled.
From: Raghuram Chary J @ 2018-03-23 10:18 UTC (permalink / raw)
To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli
Description:
EEE does not work with lan7800 when AutoSpeed is not set.
(This can happen when EEPROM is not populated or configured incorrectly)
Root-Cause:
When EEE is enabled, the mac config register ASD is not set
i.e. in default state, causing EEE fail.
Fix:
Set the register when eeprom is not present.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
v0->v1:
* Resolved the style related problems.
* Removed 1/3 patch count from Subject line.
---
drivers/net/usb/lan78xx.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 60a604cc7647..90d176279152 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2351,6 +2351,7 @@ static int lan78xx_reset(struct lan78xx_net *dev)
u32 buf;
int ret = 0;
unsigned long timeout;
+ u8 sig;
ret = lan78xx_read_reg(dev, HW_CFG, &buf);
buf |= HW_CFG_LRST_;
@@ -2450,6 +2451,15 @@ static int lan78xx_reset(struct lan78xx_net *dev)
/* LAN7801 only has RGMII mode */
if (dev->chipid == ID_REV_CHIP_ID_7801_)
buf &= ~MAC_CR_GMII_EN_;
+
+ if (dev->chipid == ID_REV_CHIP_ID_7800_) {
+ ret = lan78xx_read_raw_eeprom(dev, 0, 1, &sig);
+ if (!ret && sig != EEPROM_INDICATOR) {
+ /* Implies there is no external eeprom. Set mac speed */
+ netdev_info(dev->net, "No External EEPROM. Setting MAC Speed\n");
+ buf |= MAC_CR_AUTO_DUPLEX_ | MAC_CR_AUTO_SPEED_;
+ }
+ }
ret = lan78xx_write_reg(dev, MAC_CR, buf);
ret = lan78xx_read_reg(dev, MAC_TX, &buf);
--
2.16.2
^ permalink raw reply related
* [PATCH net-next] cxgb4: copy vlan_id in ndo_get_vf_config
From: Ganesh Goudar @ 2018-03-23 10:18 UTC (permalink / raw)
To: netdev, davem
Cc: nirranjan, indranil, venkatesh, Arjun Vynipadath, Casey Leedom,
Ganesh Goudhar
From: Arjun Vynipadath <arjun@chelsio.com>
Copy vlan_id to get it displayed in vf info.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudhar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 8f733ee..55f158f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2695,13 +2695,17 @@ static int cxgb4_mgmt_get_vf_config(struct net_device *dev,
{
struct port_info *pi = netdev_priv(dev);
struct adapter *adap = pi->adapter;
+ struct vf_info *vfinfo;
if (vf >= adap->num_vfs)
return -EINVAL;
+ vfinfo = &adap->vfinfo[vf];
+
ivi->vf = vf;
- ivi->max_tx_rate = adap->vfinfo[vf].tx_rate;
+ ivi->max_tx_rate = vfinfo->tx_rate;
ivi->min_tx_rate = 0;
- ether_addr_copy(ivi->mac, adap->vfinfo[vf].vf_mac_addr);
+ ether_addr_copy(ivi->mac, vfinfo->vf_mac_addr);
+ ivi->vlan = vfinfo->vlan;
return 0;
}
--
2.1.0
^ permalink raw reply related
* [PATCH 0/2] bpf: Change print_bpf_insn interface
From: Jiri Olsa @ 2018-03-23 10:41 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann; +Cc: lkml, netdev, Quentin Monnet
hi,
this patchset removes struct bpf_verifier_env argument
from print_bpf_insn function (patch 1) and changes user
space bpftool user to use it that way (patch 2).
thanks,
jirka
---
Jiri Olsa (2):
bpf: Remove struct bpf_verifier_env argument from print_bpf_insn
bpftool: Adjust to new print_bpf_insn interface
kernel/bpf/disasm.c | 52 ++++++++++++++++++++++++++--------------------------
kernel/bpf/disasm.h | 5 +----
kernel/bpf/verifier.c | 44 +++++++++++++++++++++++++++-----------------
tools/bpf/bpftool/xlated_dumper.c | 12 ++++++------
4 files changed, 60 insertions(+), 53 deletions(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox