Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Felipe Balbi @ 2019-07-16  7:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall,
	Felipe Balbi

TGPIO is a new IP which allows for time synchronization between systems
without any other means of synchronization such as PTP or NTP. The
driver is implemented as part of the PTP framework since its features
covered most of what this controller can do.

There are a few things that made me send this as a RFC, however:

(1) This version of the controller lacks an interrupt line. Currently I
	put a kthread that starts polling the controller whenever its
	pin is configured as input. Any better ideas for allowing
	userspace control the polling rate? Perhaps tap into ptp_poll()?

(2) ACPI IDs can't be shared at this moment, unfortunately.

(3) The change in arch/x86/kernel/tsc.c needs to be reviewed at length
	before going in.

Let me know what you guys think,
Cheers

Felipe Balbi (5):
  x86: tsc: add tsc to art helpers
  PTP: add a callback for counting timestamp events
  PTP: implement PTP_EVENT_COUNT_TSTAMP ioctl
  PTP: Add flag for non-periodic output
  PTP: Add support for Intel PMC Timed GPIO Controller

 arch/x86/include/asm/tsc.h        |   2 +
 arch/x86/kernel/tsc.c             |  32 +++
 drivers/ptp/Kconfig               |   8 +
 drivers/ptp/Makefile              |   1 +
 drivers/ptp/ptp-intel-pmc-tgpio.c | 378 ++++++++++++++++++++++++++++++
 drivers/ptp/ptp_chardev.c         |  15 ++
 include/linux/ptp_clock_kernel.h  |  12 +
 include/uapi/linux/ptp_clock.h    |   6 +-
 8 files changed, 453 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ptp/ptp-intel-pmc-tgpio.c

-- 
2.22.0


^ permalink raw reply

* [RFC PATCH 2/5] PTP: add a callback for counting timestamp events
From: Felipe Balbi @ 2019-07-16  7:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall,
	Felipe Balbi
In-Reply-To: <20190716072038.8408-1-felipe.balbi@linux.intel.com>

This will be used for frequency discipline adjustments.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
 include/linux/ptp_clock_kernel.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index 28eb9c792522..1a4e3f916128 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -35,6 +35,16 @@ struct ptp_system_timestamp {
 	struct timespec64 post_ts;
 };
 
+/**
+ * struct ptp_event_count_tstamp - device time vs event count for frequency discipline
+ */
+struct ptp_event_count_tstamp {
+	unsigned int index;
+
+	struct ptp_clock_time device_time;
+	u64 event_count;
+};
+
 /**
  * struct ptp_clock_info - decribes a PTP hardware clock
  *
@@ -134,6 +144,8 @@ struct ptp_clock_info {
 			  struct ptp_system_timestamp *sts);
 	int (*getcrosststamp)(struct ptp_clock_info *ptp,
 			      struct system_device_crosststamp *cts);
+	int (*counttstamp)(struct ptp_clock_info *ptp,
+			   struct ptp_event_count_tstamp *count);
 	int (*settime64)(struct ptp_clock_info *p, const struct timespec64 *ts);
 	int (*enable)(struct ptp_clock_info *ptp,
 		      struct ptp_clock_request *request, int on);
-- 
2.22.0


^ permalink raw reply related

* [RFC PATCH 1/5] x86: tsc: add tsc to art helpers
From: Felipe Balbi @ 2019-07-16  7:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall,
	Felipe Balbi
In-Reply-To: <20190716072038.8408-1-felipe.balbi@linux.intel.com>

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
 arch/x86/include/asm/tsc.h |  2 ++
 arch/x86/kernel/tsc.c      | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 8a0c25c6bf09..b7a9f4385a82 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -32,6 +32,8 @@ static inline cycles_t get_cycles(void)
 
 extern struct system_counterval_t convert_art_to_tsc(u64 art);
 extern struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns);
+extern void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns);
+extern u64 get_art_ns_now(void);
 
 extern void tsc_early_init(void);
 extern void tsc_init(void);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 0b29e58f288e..333fffc1db7c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1215,6 +1215,38 @@ struct system_counterval_t convert_art_to_tsc(u64 art)
 }
 EXPORT_SYMBOL(convert_art_to_tsc);
 
+void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns)
+{
+	u64 tmp, res, rem;
+	u64 cycles;
+
+	tsc_counterval->cycles = clocksource_tsc.read(NULL);
+	cycles = tsc_counterval->cycles;
+	tsc_counterval->cs = art_related_clocksource;
+
+	rem = do_div(cycles, tsc_khz);
+
+	res = cycles * USEC_PER_SEC;
+	tmp = rem * USEC_PER_SEC;
+
+	do_div(tmp, tsc_khz);
+	res += tmp;
+
+	*tsc_ns = res;
+}
+EXPORT_SYMBOL(get_tsc_ns);
+
+u64 get_art_ns_now(void)
+{
+	struct system_counterval_t tsc_cycles;
+	u64 tsc_ns;
+
+	get_tsc_ns(&tsc_cycles, &tsc_ns);
+
+	return tsc_ns;
+}
+EXPORT_SYMBOL(get_art_ns_now);
+
 /**
  * convert_art_ns_to_tsc() - Convert ART in nanoseconds to TSC.
  * @art_ns: ART (Always Running Timer) in unit of nanoseconds
-- 
2.22.0


^ permalink raw reply related

* [RFC PATCH 5/5] PTP: Add support for Intel PMC Timed GPIO Controller
From: Felipe Balbi @ 2019-07-16  7:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall,
	Felipe Balbi
In-Reply-To: <20190716072038.8408-1-felipe.balbi@linux.intel.com>

Add a driver supporting Intel Timed GPIO controller available as part
of some Intel PMCs.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
 drivers/ptp/Kconfig               |   8 +
 drivers/ptp/Makefile              |   1 +
 drivers/ptp/ptp-intel-pmc-tgpio.c | 378 ++++++++++++++++++++++++++++++
 3 files changed, 387 insertions(+)
 create mode 100644 drivers/ptp/ptp-intel-pmc-tgpio.c

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 9b8fee5178e8..bb0fce70a783 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -107,6 +107,14 @@ config PTP_1588_CLOCK_PCH
 	  To compile this driver as a module, choose M here: the module
 	  will be called ptp_pch.
 
+config PTP_INTEL_PMC_TGPIO
+	tristate "Intel PMC Timed GPIO"
+	depends on X86
+	depends on ACPI
+	imply PTP_1588_CLOCK
+	help
+	  This driver adds support for Intel PMC Timed GPIO Controller
+
 config PTP_1588_CLOCK_KVM
 	tristate "KVM virtual PTP clock"
 	depends on PTP_1588_CLOCK
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 677d1d178a3e..ff89c90ace82 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -7,6 +7,7 @@ ptp-y					:= ptp_clock.o ptp_chardev.o ptp_sysfs.o
 obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp.o
 obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
 obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)	+= ptp_ixp46x.o
+obj-$(CONFIG_PTP_INTEL_PMC_TGPIO)	+= ptp-intel-pmc-tgpio.o
 obj-$(CONFIG_PTP_1588_CLOCK_PCH)	+= ptp_pch.o
 obj-$(CONFIG_PTP_1588_CLOCK_KVM)	+= ptp_kvm.o
 obj-$(CONFIG_PTP_1588_CLOCK_QORIQ)	+= ptp-qoriq.o
diff --git a/drivers/ptp/ptp-intel-pmc-tgpio.c b/drivers/ptp/ptp-intel-pmc-tgpio.c
new file mode 100644
index 000000000000..880ece34868a
--- /dev/null
+++ b/drivers/ptp/ptp-intel-pmc-tgpio.c
@@ -0,0 +1,378 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel Timed GPIO Controller Driver
+ *
+ * Copyright (C) 2018 Intel Corporation
+ * Author: Felipe Balbi <felipe.balbi@linux.intel.com>
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/gpio.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/ptp_clock_kernel.h>
+#include <asm/tsc.h>
+
+#define TGPIOCTL		0x00
+#define TGPIOCOMPV31_0		0x10
+#define TGPIOCOMPV63_32		0x14
+#define TGPIOPIV31_0		0x18
+#define TGPIOPIV63_32		0x1c
+#define TGPIOTCV31_0		0x20
+#define TGPIOTCV63_32		0x24
+#define TGPIOECCV31_0		0x28
+#define TGPIOECCV63_32		0x2c
+#define TGPIOEC31_0		0x30
+#define TGPIOEC63_32		0x34
+
+/* Control Register */
+#define TGPIOCTL_EN		BIT(0)
+#define TGPIOCTL_DIR		BIT(1)
+#define TGPIOCTL_EP		GENMASK(3, 2)
+#define TGPIOCTL_EP_RISING_EDGE	(0 << 2)
+#define TGPIOCTL_EP_FALLING_EDGE (1 << 2)
+#define TGPIOCTL_EP_TOGGLE_EDGE	(2 << 2)
+#define TGPIOCTL_PM		BIT(4)
+
+#define NSECS_PER_SEC		1000000000
+#define TGPIO_MAX_ADJ_TIME	999999900
+
+struct intel_pmc_tgpio {
+	struct ptp_clock_info	info;
+	struct ptp_clock	*clock;
+
+	struct mutex		lock;
+	struct device		*dev;
+	void __iomem		*base;
+
+	struct task_struct	*event_thread;
+	bool			input;
+};
+#define to_intel_pmc_tgpio(i)	(container_of((i), struct intel_pmc_tgpio, info))
+
+static inline u64 to_intel_pmc_tgpio_time(struct ptp_clock_time *t)
+{
+	return t->sec * NSECS_PER_SEC + t->nsec;
+}
+
+static inline u64 intel_pmc_tgpio_readq(void __iomem *base, u32 offset)
+{
+	return lo_hi_readq(base + offset);
+}
+
+static inline void intel_pmc_tgpio_writeq(void __iomem *base, u32 offset, u64 v)
+{
+	return lo_hi_writeq(v, base + offset);
+}
+
+static inline u32 intel_pmc_tgpio_readl(void __iomem *base, u32 offset)
+{
+	return readl(base + offset);
+}
+
+static inline void intel_pmc_tgpio_writel(void __iomem *base, u32 offset, u32 value)
+{
+	writel(value, base + offset);
+}
+
+static struct ptp_pin_desc intel_pmc_tgpio_pin_config[] = {
+	{					\
+		.name	= "pin0",		\
+		.index	= 0,			\
+		.func	= PTP_PF_NONE,		\
+		.chan	= 0,			\
+	}
+};
+
+static int intel_pmc_tgpio_gettime64(struct ptp_clock_info *info,
+		struct timespec64 *ts)
+{
+	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
+	u64 now;
+
+	mutex_lock(&tgpio->lock);
+	now = get_art_ns_now();
+	*ts = ns_to_timespec64(now);
+	mutex_unlock(&tgpio->lock);
+
+	return 0;
+}
+
+static int intel_pmc_tgpio_settime64(struct ptp_clock_info *info,
+		const struct timespec64 *ts)
+{
+	return -EOPNOTSUPP;
+}
+
+static int intel_pmc_tgpio_event_thread(void *_tgpio)
+{
+	struct intel_pmc_tgpio	*tgpio = _tgpio;
+	u64 reg;
+
+	while (!kthread_should_stop()) {
+		bool input;
+		int i;
+
+		mutex_lock(&tgpio->lock);
+		input = tgpio->input;
+		mutex_unlock(&tgpio->lock);
+
+		if (!input)
+			schedule();
+
+		reg = intel_pmc_tgpio_readq(tgpio->base, TGPIOEC31_0);
+
+		for (i = 0; i < reg; i++) {
+			struct ptp_clock_event event;
+
+			event.type = PTP_CLOCK_EXTTS;
+			event.index = 0;
+			event.timestamp = intel_pmc_tgpio_readq(tgpio->base,
+					TGPIOTCV31_0);
+
+			ptp_clock_event(tgpio->clock, &event);
+		}
+		schedule_timeout_interruptible(10);
+	}
+
+	return 0;
+}
+
+static int intel_pmc_tgpio_config_input(struct intel_pmc_tgpio *tgpio,
+		struct ptp_extts_request *extts, int on)
+{
+	u32			ctrl;
+	bool			input;
+
+	ctrl = intel_pmc_tgpio_readl(tgpio->base, TGPIOCTL);
+	ctrl &= ~TGPIOCTL_EN;
+	intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
+
+	if (on) {
+		ctrl |= TGPIOCTL_DIR;
+
+		if (extts->flags & PTP_RISING_EDGE &&
+				extts->flags & PTP_FALLING_EDGE)
+			ctrl |= TGPIOCTL_EP_TOGGLE_EDGE;
+		else if (extts->flags & PTP_RISING_EDGE)
+			ctrl |= TGPIOCTL_EP_RISING_EDGE;
+		else if (extts->flags & PTP_FALLING_EDGE)
+			ctrl |= TGPIOCTL_EP_FALLING_EDGE;
+
+		/* gotta program all other bits before EN bit is set */
+		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
+		ctrl |= TGPIOCTL_EN;
+		input = true;
+	} else {
+		ctrl &= ~(TGPIOCTL_DIR | TGPIOCTL_EN);
+		input = false;
+	}
+
+	intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
+	tgpio->input = input;
+
+	if (input)
+		wake_up_process(tgpio->event_thread);
+
+	return 0;
+}
+
+static int intel_pmc_tgpio_config_output(struct intel_pmc_tgpio *tgpio,
+		struct ptp_perout_request *perout, int on)
+{
+	u32			ctrl;
+
+	ctrl = intel_pmc_tgpio_readl(tgpio->base, TGPIOCTL);
+	if (on) {
+		struct ptp_clock_time *period = &perout->period;
+		struct ptp_clock_time *start = &perout->start;
+
+		if (ctrl & TGPIOCTL_EN)
+			return 0;
+
+		intel_pmc_tgpio_writeq(tgpio->base, TGPIOCOMPV31_0,
+				to_intel_pmc_tgpio_time(start));
+
+		intel_pmc_tgpio_writeq(tgpio->base, TGPIOPIV31_0,
+				to_intel_pmc_tgpio_time(period));
+
+		ctrl &= ~TGPIOCTL_DIR;
+		if (perout->flags & PTP_PEROUT_ONE_SHOT)
+			ctrl &= ~TGPIOCTL_PM;
+		else
+			ctrl |= TGPIOCTL_PM;
+
+		/* gotta program all other bits before EN bit is set */
+		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
+
+		ctrl |= TGPIOCTL_EN;
+		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
+	} else {
+		if (!(ctrl & ~TGPIOCTL_EN))
+			return 0;
+
+		ctrl &= ~(TGPIOCTL_EN | TGPIOCTL_PM);
+		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
+	}
+
+	return 0;
+}
+
+static int intel_pmc_tgpio_enable(struct ptp_clock_info *info,
+		struct ptp_clock_request *req, int on)
+{
+	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
+	int			ret = -EOPNOTSUPP;
+
+	mutex_lock(&tgpio->lock);
+	switch (req->type) {
+	case PTP_CLK_REQ_EXTTS:
+		ret = intel_pmc_tgpio_config_input(tgpio, &req->extts, on);
+		break;
+	case PTP_CLK_REQ_PEROUT:
+		ret = intel_pmc_tgpio_config_output(tgpio, &req->perout, on);
+		break;
+	default:
+		break;
+	}
+	mutex_unlock(&tgpio->lock);
+
+	return ret;
+}
+
+static int intel_pmc_tgpio_get_time_fn(ktime_t *device_time,
+		struct system_counterval_t *system_counter, void *_tgpio)
+{
+	get_tsc_ns(system_counter, device_time);
+	return 0;
+}
+
+static int intel_pmc_tgpio_getcrosststamp(struct ptp_clock_info *info,
+		struct system_device_crosststamp *cts)
+{
+	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
+
+	return get_device_system_crosststamp(intel_pmc_tgpio_get_time_fn, tgpio,
+			NULL, cts);
+}
+
+static int intel_pmc_tgpio_counttstamp(struct ptp_clock_info *info,
+		struct ptp_event_count_tstamp *count)
+{
+	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
+	u32 dt_hi_tmp;
+	u32 dt_hi;
+	u32 dt_lo;
+
+	dt_hi_tmp = intel_pmc_tgpio_readl(tgpio->base, TGPIOTCV63_32);
+	dt_lo = intel_pmc_tgpio_readl(tgpio->base, TGPIOTCV31_0);
+
+	count->event_count = intel_pmc_tgpio_readl(tgpio->base, TGPIOECCV63_32);
+	count->event_count <<= 32;
+	count->event_count |= intel_pmc_tgpio_readl(tgpio->base, TGPIOECCV31_0);
+
+	dt_hi = intel_pmc_tgpio_readl(tgpio->base, TGPIOTCV63_32);
+
+	if (dt_hi_tmp != dt_hi && dt_lo & 0x80000000)
+		count->device_time.sec = dt_hi_tmp;
+	else
+		count->device_time.sec = dt_hi;
+
+	count->device_time.nsec = dt_lo;
+
+	return 0;
+}
+
+static int intel_pmc_tgpio_verify(struct ptp_clock_info *ptp, unsigned int pin,
+		enum ptp_pin_function func, unsigned int chan)
+{
+	return 0;
+}
+
+static const struct ptp_clock_info intel_pmc_tgpio_info = {
+	.owner		= THIS_MODULE,
+	.name		= "Intel PMC TGPIO",
+	.max_adj	= 50000000,
+	.n_pins		= 1,
+	.n_ext_ts	= 1,
+	.n_per_out	= 1,
+	.pin_config	= intel_pmc_tgpio_pin_config,
+	.gettime64	= intel_pmc_tgpio_gettime64,
+	.settime64	= intel_pmc_tgpio_settime64,
+	.enable		= intel_pmc_tgpio_enable,
+	.getcrosststamp	= intel_pmc_tgpio_getcrosststamp,
+	.counttstamp	= intel_pmc_tgpio_counttstamp,
+	.verify		= intel_pmc_tgpio_verify,
+};
+
+static int intel_pmc_tgpio_probe(struct platform_device *pdev)
+{
+	struct intel_pmc_tgpio	*tgpio;
+	struct device		*dev;
+	struct resource		*res;
+
+	dev = &pdev->dev;
+	tgpio = devm_kzalloc(dev, sizeof(*tgpio), GFP_KERNEL);
+	if (!tgpio)
+		return -ENOMEM;
+
+	tgpio->dev = dev;
+	tgpio->info = intel_pmc_tgpio_info;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	tgpio->base = devm_ioremap_resource(dev, res);
+	if (!tgpio->base)
+		return -ENOMEM;
+
+	mutex_init(&tgpio->lock);
+	platform_set_drvdata(pdev, tgpio);
+
+	tgpio->event_thread = kthread_create(intel_pmc_tgpio_event_thread,
+			tgpio, dev_name(tgpio->dev));
+	if (IS_ERR(tgpio->event_thread))
+		return PTR_ERR(tgpio->event_thread);
+
+	tgpio->clock = ptp_clock_register(&tgpio->info, &pdev->dev);
+	if (IS_ERR(tgpio->clock))
+		return PTR_ERR(tgpio->clock);
+
+	wake_up_process(tgpio->event_thread);
+
+	return 0;
+}
+
+static int intel_pmc_tgpio_remove(struct platform_device *pdev)
+{
+	struct intel_pmc_tgpio	*tgpio = platform_get_drvdata(pdev);
+
+	ptp_clock_unregister(tgpio->clock);
+
+	return 0;
+}
+
+static const struct acpi_device_id intel_pmc_acpi_match[] = {
+	/* TODO */
+
+	{  },
+};
+
+/* MODULE_ALIAS("acpi*:TODO:*"); */
+
+static struct platform_driver intel_pmc_tgpio_driver = {
+	.probe		= intel_pmc_tgpio_probe,
+	.remove		= intel_pmc_tgpio_remove,
+	.driver		= {
+		.name	= "intel-pmc-tgpio",
+		.acpi_match_table = ACPI_PTR(intel_pmc_acpi_match),
+	},
+};
+
+module_platform_driver(intel_pmc_tgpio_driver);
+
+MODULE_AUTHOR("Felipe Balbi <felipe.balbi@linux.intel.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Intel PMC Timed GPIO Controller Driver");
-- 
2.22.0


^ permalink raw reply related

* [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Felipe Balbi @ 2019-07-16  7:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall,
	Felipe Balbi
In-Reply-To: <20190716072038.8408-1-felipe.balbi@linux.intel.com>

When this new flag is set, we can use single-shot output.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
 include/uapi/linux/ptp_clock.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index 674db7de64f3..439cbdfc3d9b 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -67,7 +67,9 @@ struct ptp_perout_request {
 	struct ptp_clock_time start;  /* Absolute start time. */
 	struct ptp_clock_time period; /* Desired period, zero means disable. */
 	unsigned int index;           /* Which channel to configure. */
-	unsigned int flags;           /* Reserved for future use. */
+
+#define PTP_PEROUT_ONE_SHOT BIT(0)
+	unsigned int flags;           /* Bit 0 -> oneshot output. */
 	unsigned int rsv[4];          /* Reserved for future use. */
 };
 
-- 
2.22.0


^ permalink raw reply related

* [RFC PATCH 3/5] PTP: implement PTP_EVENT_COUNT_TSTAMP ioctl
From: Felipe Balbi @ 2019-07-16  7:20 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall,
	Felipe Balbi
In-Reply-To: <20190716072038.8408-1-felipe.balbi@linux.intel.com>

With this, we can request the underlying driver to count the number of
events that have been captured.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
 drivers/ptp/ptp_chardev.c      | 15 +++++++++++++++
 include/uapi/linux/ptp_clock.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 18ffe449efdf..a3e163a6acdc 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -114,6 +114,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 	struct system_device_crosststamp xtstamp;
 	struct ptp_clock_info *ops = ptp->info;
 	struct ptp_sys_offset *sysoff = NULL;
+	struct ptp_event_count_tstamp counttstamp;
 	struct ptp_system_timestamp sts;
 	struct ptp_clock_request req;
 	struct ptp_clock_caps caps;
@@ -301,6 +302,20 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 		mutex_unlock(&ptp->pincfg_mux);
 		break;
 
+	case PTP_EVENT_COUNT_TSTAMP:
+		if (!ops->counttstamp)
+			return -ENOTSUPP;
+		if (copy_from_user(&req.perout, (void __user *)arg,
+				   sizeof(counttstamp))) {
+			err = -EFAULT;
+			break;
+		}
+		err = ops->counttstamp(ops, &counttstamp);
+		if (!err && copy_to_user((void __user *)arg, &counttstamp,
+						sizeof(counttstamp)))
+			err = -EFAULT;
+		break;
+
 	default:
 		err = -ENOTTY;
 		break;
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index 1bc794ad957a..674db7de64f3 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -148,6 +148,8 @@ struct ptp_pin_desc {
 	_IOWR(PTP_CLK_MAGIC, 8, struct ptp_sys_offset_precise)
 #define PTP_SYS_OFFSET_EXTENDED \
 	_IOWR(PTP_CLK_MAGIC, 9, struct ptp_sys_offset_extended)
+#define PTP_EVENT_COUNT_TSTAMP \
+	_IOWR(PTP_CLK_MAGIC, 6, struct ptp_event_count_tstamp)
 
 struct ptp_extts_event {
 	struct ptp_clock_time t; /* Time event occured. */
-- 
2.22.0


^ permalink raw reply related

* memory leak in new_inode_pseudo (2)
From: syzbot @ 2019-07-16  7:38 UTC (permalink / raw)
  To: davem, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    fec88ab0 Merge tag 'for-linus-hmm' of git://git.kernel.org..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15a3da1fa00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8422fa55ce69212c
dashboard link: https://syzkaller.appspot.com/bug?extid=e682cca30bc101a4d9d9
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16ca5aa4600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e682cca30bc101a4d9d9@syzkaller.appspotmail.com

BUG: memory leak
unreferenced object 0xffff8881223e5980 (size 768):
   comm "syz-executor.0", pid 7093, jiffies 4294950175 (age 8.140s)
   hex dump (first 32 bytes):
     01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<0000000030f6ab07>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<0000000030f6ab07>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<0000000030f6ab07>] slab_alloc mm/slab.c:3319 [inline]
     [<0000000030f6ab07>] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
     [<0000000005b17a67>] sock_alloc_inode+0x1c/0xa0 net/socket.c:238
     [<00000000cae2a9b4>] alloc_inode+0x2c/0xe0 fs/inode.c:227
     [<000000004d22e56a>] new_inode_pseudo+0x18/0x70 fs/inode.c:916
     [<000000007bb4d82d>] sock_alloc+0x1c/0x90 net/socket.c:554
     [<00000000884dfd41>] __sock_create+0x8f/0x250 net/socket.c:1378
     [<000000009dc85063>] sock_create_kern+0x3b/0x50 net/socket.c:1483
     [<00000000ca0afb1d>] smc_create+0xae/0x160 net/smc/af_smc.c:1975
     [<00000000ff903d89>] __sock_create+0x164/0x250 net/socket.c:1414
     [<00000000c0787cdf>] sock_create net/socket.c:1465 [inline]
     [<00000000c0787cdf>] __sys_socket+0x69/0x110 net/socket.c:1507
     [<0000000067a4ade6>] __do_sys_socket net/socket.c:1516 [inline]
     [<0000000067a4ade6>] __se_sys_socket net/socket.c:1514 [inline]
     [<0000000067a4ade6>] __x64_sys_socket+0x1e/0x30 net/socket.c:1514
     [<000000001e7b04ac>] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:296
     [<000000003fe40e36>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0xffff88811f269f50 (size 56):
   comm "syz-executor.0", pid 7093, jiffies 4294950175 (age 8.140s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 5a 3e 22 81 88 ff ff 68 9f 26 1f 81 88 ff ff  .Z>"....h.&.....
   backtrace:
     [<0000000030f6ab07>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<0000000030f6ab07>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<0000000030f6ab07>] slab_alloc mm/slab.c:3319 [inline]
     [<0000000030f6ab07>] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
     [<000000005d4d6be7>] kmem_cache_zalloc include/linux/slab.h:738 [inline]
     [<000000005d4d6be7>] lsm_inode_alloc security/security.c:522 [inline]
     [<000000005d4d6be7>] security_inode_alloc+0x33/0xb0  
security/security.c:875
     [<00000000ef89212c>] inode_init_always+0x108/0x200 fs/inode.c:169
     [<00000000647feaf5>] alloc_inode+0x49/0xe0 fs/inode.c:234
     [<000000004d22e56a>] new_inode_pseudo+0x18/0x70 fs/inode.c:916
     [<000000007bb4d82d>] sock_alloc+0x1c/0x90 net/socket.c:554
     [<00000000884dfd41>] __sock_create+0x8f/0x250 net/socket.c:1378
     [<000000009dc85063>] sock_create_kern+0x3b/0x50 net/socket.c:1483
     [<00000000ca0afb1d>] smc_create+0xae/0x160 net/smc/af_smc.c:1975
     [<00000000ff903d89>] __sock_create+0x164/0x250 net/socket.c:1414
     [<00000000c0787cdf>] sock_create net/socket.c:1465 [inline]
     [<00000000c0787cdf>] __sys_socket+0x69/0x110 net/socket.c:1507
     [<0000000067a4ade6>] __do_sys_socket net/socket.c:1516 [inline]
     [<0000000067a4ade6>] __se_sys_socket net/socket.c:1514 [inline]
     [<0000000067a4ade6>] __x64_sys_socket+0x1e/0x30 net/socket.c:1514
     [<000000001e7b04ac>] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:296
     [<000000003fe40e36>] entry_SYSCALL_64_after_hwframe+0x44/0xa9



---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [RFC PATCH 1/5] x86: tsc: add tsc to art helpers
From: Thomas Gleixner @ 2019-07-16  7:57 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Richard Cochran, netdev, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190716072038.8408-2-felipe.balbi@linux.intel.com>

Felipe,

On Tue, 16 Jul 2019, Felipe Balbi wrote:

-ENOCHANGELOG

As you said in the cover letter:

>  (3) The change in arch/x86/kernel/tsc.c needs to be reviewed at length
>      before going in.

So some information what those interfaces are used for and why they are
needed would be really helpful.

> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns)
> +{
> +	u64 tmp, res, rem;
> +	u64 cycles;
> +
> +	tsc_counterval->cycles = clocksource_tsc.read(NULL);
> +	cycles = tsc_counterval->cycles;
> +	tsc_counterval->cs = art_related_clocksource;
> +
> +	rem = do_div(cycles, tsc_khz);
> +
> +	res = cycles * USEC_PER_SEC;
> +	tmp = rem * USEC_PER_SEC;
> +
> +	do_div(tmp, tsc_khz);
> +	res += tmp;
> +
> +	*tsc_ns = res;
> +}
> +EXPORT_SYMBOL(get_tsc_ns);
> +
> +u64 get_art_ns_now(void)
> +{
> +	struct system_counterval_t tsc_cycles;
> +	u64 tsc_ns;
> +
> +	get_tsc_ns(&tsc_cycles, &tsc_ns);
> +
> +	return tsc_ns;
> +}
> +EXPORT_SYMBOL(get_art_ns_now);

While the changes look innocuous I'm missing the big picture why this needs
to emulate ART instead of simply using TSC directly.

Thanks,

	tglx

^ permalink raw reply

* Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Eric Dumazet @ 2019-07-16  7:59 UTC (permalink / raw)
  To: Petar Penkov, netdev, bpf
  Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-4-ppenkov.kernel@gmail.com>



On 7/16/19 2:26 AM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> This helper function allows BPF programs to try to generate SYN
> cookies, given a reference to a listener socket. The function works
> from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> socket in both cases.
> 
...
>  
> +BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
> +	   struct tcphdr *, th, u32, th_len)
> +{
> +#ifdef CONFIG_SYN_COOKIES
> +	u32 cookie;
> +	u16 mss;
> +
> +	if (unlikely(th_len < sizeof(*th)))


You probably need to check that th_len == th->doff * 4

> +		return -EINVAL;
> +
> +	if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
> +		return -EINVAL;
> +
> +	if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
> +		return -EINVAL;
> +
> +	if (!th->syn || th->ack || th->fin || th->rst)
> +		return -EINVAL;
> +
> +	switch (sk->sk_family) {

This is strange, because a dual stack listener will have sk->sk_family set to AF_INET6.

What really matters here is if the packet is IPv4 or IPv6.

So you need to look at iph->version instead.

Then look if the socket family allows this packet to be processed
(For example AF_INET6 sockets might prevent IPv4 packets, see sk->sk_ipv6only )

> +	case AF_INET:
> +		if (unlikely(iph_len < sizeof(struct iphdr)))
> +			return -EINVAL;
> +		mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
> +		break;
> +
> +#if IS_BUILTIN(CONFIG_IPV6)
> +	case AF_INET6:
> +		if (unlikely(iph_len < sizeof(struct ipv6hdr)))
> +			return -EINVAL;
> +		mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
> +		break;
> +#endif /* CONFIG_IPV6 */
> +
> +	default:
> +		return -EPROTONOSUPPORT;
> +	}
> +	if (mss <= 0)
> +		return -ENOENT;
> +



^ permalink raw reply

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
From: Jiong Wang @ 2019-07-16  8:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Edward Cree, Naveen N. Rao, Jakub Kicinski, bpf, Networking,
	oss-drivers
In-Reply-To: <CAEf4BzYzSuwVL9W+LRbGJXcv8AszxLJ0EBTH-FXxTzcW6CCU7Q@mail.gmail.com>


Andrii Nakryiko writes:

> On Mon, Jul 15, 2019 at 3:02 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 11, 2019 at 5:20 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >>
>> >> Jiong Wang writes:
>> >>
>> >> > Andrii Nakryiko writes:
>> >> >
>> >> >> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >> >>>
>> >> >>> Verification layer also needs to handle auxiliar info as well as adjusting
>> >> >>> subprog start.
>> >> >>>
>> >> >>> At this layer, insns inside patch buffer could be jump, but they should
>> >> >>> have been resolved, meaning they shouldn't jump to insn outside of the
>> >> >>> patch buffer. Lineration function for this layer won't touch insns inside
>> >> >>> patch buffer.
>> >> >>>
>> >> >>> Adjusting subprog is finished along with adjusting jump target when the
>> >> >>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
>> >> >>> But adjustment when there is insn deleteion is not considered yet.
>> >> >>>
>> >> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> >> >>> ---
>> >> >>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>>  1 file changed, 150 insertions(+)
>> >> >>>
>> >> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> >> >>> index a2e7637..2026d64 100644
>> >> >>> --- a/kernel/bpf/verifier.c
>> >> >>> +++ b/kernel/bpf/verifier.c
>> >> >>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
>> >> >>>         }
>> >> >>>  }
>> >> >>>
>> >> >>> +/* Linearize bpf list insn to array (verifier layer). */
>> >> >>> +static struct bpf_verifier_env *
>> >> >>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
>> >> >>> +                            struct bpf_list_insn *list)
>> >> >>
>> >> >> It's unclear why this returns env back? It's not allocating a new env,
>> >> >> so it's weird and unnecessary. Just return error code.
>> >> >
>> >> > The reason is I was thinking we have two layers in BPF, the core and the
>> >> > verifier.
>> >> >
>> >> > For core layer (the relevant file is core.c), when doing patching, the
>> >> > input is insn list and bpf_prog, the linearization should linearize the
>> >> > insn list into insn array, and also whatever others affect inside bpf_prog
>> >> > due to changing on insns, for example line info inside prog->aux. So the
>> >> > return value is bpf_prog for core layer linearization hook.
>> >> >
>> >> > For verifier layer, it is similar, but the context if bpf_verifier_env, the
>> >> > linearization hook should linearize the insn list, and also those affected
>> >> > inside env, for example bpf_insn_aux_data, so the return value is
>> >> > bpf_verifier_env, meaning returning an updated verifier context
>> >> > (bpf_verifier_env) after insn list linearization.
>> >>
>> >> Realized your point is no new env is allocated, so just return error
>> >> code. Yes, the env pointer is not changed, just internal data is
>> >> updated. Return bpf_verifier_env mostly is trying to make the hook more
>> >> clear that it returns an updated "context" where the linearization happens,
>> >> for verifier layer, it is bpf_verifier_env, and for core layer, it is
>> >> bpf_prog, so return value was designed to return these two types.
>> >
>> > Oh, I missed that core layer returns bpf_prog*. I think this is
>> > confusing as hell and is very contrary to what one would expect. If
>> > the function doesn't allocate those objects, it shouldn't return them,
>> > except for rare cases of some accessor functions. Me reading this,
>> > I'll always be suprised and will have to go skim code just to check
>> > whether those functions really return new bpf_prog or
>> > bpf_verifier_env, respectively.
>>
>> bpf_prog_realloc do return new bpf_prog, so we will need to return bpf_prog
>> * for core layer.
>
> Ah, I see, then it would make sense for core layer, but still is very
> confusing for verifier_linearize_list_insn.
> I still hope for unified solution, so it shouldn't matter. But it
> pointed me to a bug in your code, see below.

Yeah, thanks!

>
>>
>> >
>> > Please change them both to just return error code.
>> >
>> >>
>> >> >
>> >> > Make sense?
>> >> >
>> >> > Regards,
>> >> > Jiong
>> >> >
>> >> >>
>> >> >>> +{
>> >> >>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
>> >> >>> +       struct bpf_subprog_info *new_subinfo;
>> >> >>> +       struct bpf_insn_aux_data *new_data;
>> >> >>> +       struct bpf_prog *prog = env->prog;
>> >> >>> +       struct bpf_verifier_env *ret_env;
>> >> >>> +       struct bpf_insn *insns, *insn;
>> >> >>> +       struct bpf_list_insn *elem;
>> >> >>> +       int ret;
>> >> >>> +
>> >> >>> +       /* Calculate final size. */
>> >> >>> +       for (elem = list; elem; elem = elem->next)
>> >> >>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
>> >> >>> +                       fini_cnt++;
>> >> >>> +
>> >> >>> +       orig_cnt = prog->len;
>> >> >>> +       insns = prog->insnsi;
>> >> >>> +       /* If prog length remains same, nothing else to do. */
>> >> >>> +       if (fini_cnt == orig_cnt) {
>> >> >>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
>> >> >>> +                       *insn = elem->insn;
>> >> >>> +               return env;
>> >> >>> +       }
>> >> >>> +       /* Realloc insn buffer when necessary. */
>> >> >>> +       if (fini_cnt > orig_cnt)
>> >> >>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
>> >> >>> +                                       GFP_USER);
>> >> >>> +       if (!prog)
>> >> >>> +               return ERR_PTR(-ENOMEM);
>
> On realloc failure, prog will be non-NULL, so you need to handle error
> properly (and propagate it, instead of returning -ENOMEM):
>
> if (IS_ERR(prog))
>     return ERR_PTR(prog);
>
>
>> >> >>> +       insns = prog->insnsi;
>> >> >>> +       prog->len = fini_cnt;
>> >> >>> +       ret_env = env;
>> >> >>> +
>> >> >>> +       /* idx_map[OLD_IDX] = NEW_IDX */
>> >> >>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
>> >> >>> +       if (!idx_map)
>> >> >>> +               return ERR_PTR(-ENOMEM);
>> >> >>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
>> >> >>> +
>> >> >>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
>> >> >>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
>> >> >>> +       if (!new_data) {
>> >> >>> +               kvfree(idx_map);
>> >> >>> +               return ERR_PTR(-ENOMEM);
>> >> >>> +       }
>> >> >>> +
>> >> >>> +       /* Copy over insn + calculate idx_map. */
>> >> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> >> >>> +               int orig_idx = elem->orig_idx - 1;
>> >> >>> +
>> >> >>> +               if (orig_idx >= 0) {
>> >> >>> +                       idx_map[orig_idx] = idx;
>> >> >>> +
>> >> >>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> >> >>> +                               continue;
>> >> >>> +
>> >> >>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
>> >> >>> +
>> >> >>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
>> >> >>> +                               new_data[idx].zext_dst =
>> >> >>> +                                       insn_has_def32(env, &elem->insn);
>> >> >>> +               } else {
>> >> >>> +                       new_data[idx].seen = true;
>> >> >>> +                       new_data[idx].zext_dst = insn_has_def32(env,
>> >> >>> +                                                               &elem->insn);
>> >> >>> +               }
>> >> >>> +               insns[idx++] = elem->insn;
>> >> >>> +       }
>> >> >>> +
>> >> >>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
>> >> >>> +       if (!new_subinfo) {
>> >> >>> +               kvfree(idx_map);
>> >> >>> +               vfree(new_data);
>> >> >>> +               return ERR_PTR(-ENOMEM);
>> >> >>> +       }
>> >> >>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
>> >> >>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
>> >> >>> +       env->subprog_cnt = 0;
>> >> >>> +       env->prog = prog;
>> >> >>> +       ret = add_subprog(env, 0);
>> >> >>> +       if (ret < 0) {
>> >> >>> +               ret_env = ERR_PTR(ret);
>> >> >>> +               goto free_all_ret;
>> >> >>> +       }
>> >> >>> +       /* Relocate jumps using idx_map.
>> >> >>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
>> >> >>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
>> >> >>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
>> >> >>> +        */
>> >> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
>> >> >>> +               int orig_idx = elem->orig_idx;
>> >> >>> +
>> >> >>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
>> >> >>> +                       continue;
>> >> >>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
>> >> >>> +                       idx++;
>> >> >>> +                       continue;
>> >> >>> +               }
>> >> >>> +
>> >> >>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
>> >> >>> +                                         idx_map);
>> >> >>> +               if (ret < 0) {
>> >> >>> +                       ret_env = ERR_PTR(ret);
>> >> >>> +                       goto free_all_ret;
>> >> >>> +               }
>> >> >>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
>> >> >>> +               if (ret > 0) {
>> >> >>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
>> >> >>> +                       if (ret < 0) {
>> >> >>> +                               ret_env = ERR_PTR(ret);
>> >> >>> +                               goto free_all_ret;
>> >> >>> +                       }
>> >> >>> +               }
>> >> >>> +               idx++;
>> >> >>> +       }
>> >> >>> +       if (ret < 0) {
>> >> >>> +               ret_env = ERR_PTR(ret);
>> >> >>> +               goto free_all_ret;
>> >> >>> +       }
>> >> >>> +
>> >> >>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
>> >> >>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
>> >> >>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
>> >> >>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
>> >> >>> +
>> >> >>> +       /* Adjust linfo.
>> >> >>> +        * FIXME: no support for insn removal at the moment.
>> >> >>> +        */
>> >> >>> +       if (prog->aux->nr_linfo) {
>> >> >>> +               struct bpf_line_info *linfo = prog->aux->linfo;
>> >> >>> +               u32 nr_linfo = prog->aux->nr_linfo;
>> >> >>> +
>> >> >>> +               for (idx = 0; idx < nr_linfo; idx++)
>> >> >>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
>> >> >>> +       }
>> >> >>> +       vfree(env->insn_aux_data);
>> >> >>> +       env->insn_aux_data = new_data;
>> >> >>> +       goto free_mem_list_ret;
>> >> >>> +free_all_ret:
>> >> >>> +       vfree(new_data);
>> >> >>> +free_mem_list_ret:
>> >> >>> +       kvfree(new_subinfo);
>> >> >>> +       kvfree(idx_map);
>> >> >>> +       return ret_env;
>> >> >>> +}
>> >> >>> +
>> >> >>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
>> >> >>>  {
>> >> >>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
>> >> >>> --
>> >> >>> 2.7.4
>> >> >>>
>> >>
>>


^ permalink raw reply

* [PATCH net] be2net: Signal that the device cannot transmit during reconfiguration
From: Benjamin Poirier @ 2019-07-16  8:16 UTC (permalink / raw)
  To: David Miller
  Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
	Firo Yang, Saeed Mahameed, netdev

While changing the number of interrupt channels, be2net stops adapter
operation (including netif_tx_disable()) but it doesn't signal that it
cannot transmit. This may lead dev_watchdog() to falsely trigger during
that time.

Add the missing call to netif_carrier_off(), following the pattern used in
many other drivers. netif_carrier_on() is already taken care of in
be_open().

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 82015c8a5ed7..b7a246b33599 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4697,8 +4697,12 @@ int be_update_queues(struct be_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int status;
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		/* device cannot transmit now, avoid dev_watchdog timeouts */
+		netif_carrier_off(netdev);
+
 		be_close(netdev);
+	}
 
 	be_cancel_worker(adapter);
 
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH iproute2-rc 8/8] rdma: Document counter statistic
From: Gal Pressman @ 2019-07-16  8:19 UTC (permalink / raw)
  To: Leon Romanovsky, Stephen Hemminger
  Cc: Leon Romanovsky, netdev, David Ahern, Mark Zhang,
	RDMA mailing list
In-Reply-To: <20190710072455.9125-9-leon@kernel.org>

On 10/07/2019 10:24, Leon Romanovsky wrote:
> +.SH "EXAMPLES"
> +.PP
> +rdma statistic show
> +.RS 4
> +Shows the state of the default counter of all RDMA devices on the system.
> +.RE
> +.PP
> +rdma statistic show link mlx5_2/1
> +.RS 4
> +Shows the state of the default counter of specified RDMA port
> +.RE
> +.PP
> +rdma statistic qp show
> +.RS 4
> +Shows the state of all qp counters of all RDMA devices on the system.
> +.RE
> +.PP
> +rdma statistic qp show link mlx5_2/1
> +.RS 4
> +Shows the state of all qp counters of specified RDMA port.
> +.RE
> +.PP
> +rdma statistic qp show link mlx5_2 pid 30489
> +.RS 4
> +Shows the state of all qp counters of specified RDMA port and belonging to pid 30489
> +.RE
> +.PP
> +rdma statistic qp mode
> +.RS 4
> +List current counter mode on all deivces

"deivces" -> "devices".

^ permalink raw reply

* Re: memory leak in new_inode_pseudo (2)
From: syzbot @ 2019-07-16  8:28 UTC (permalink / raw)
  To: davem, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <000000000000111cbe058dc7754d@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    be8454af Merge tag 'drm-next-2019-07-16' of git://anongit...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13d5f750600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d23a1a7bf85c5250
dashboard link: https://syzkaller.appspot.com/bug?extid=e682cca30bc101a4d9d9
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=155c5800600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1738f800600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e682cca30bc101a4d9d9@syzkaller.appspotmail.com

executing program
executing program
executing program
executing program
BUG: memory leak
unreferenced object 0xffff888128ea0980 (size 768):
   comm "syz-executor303", pid 7044, jiffies 4294943526 (age 13.490s)
   hex dump (first 32 bytes):
     01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<000000005ba542b8>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<000000005ba542b8>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<000000005ba542b8>] slab_alloc mm/slab.c:3319 [inline]
     [<000000005ba542b8>] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
     [<000000006532a1e9>] sock_alloc_inode+0x1c/0xa0 net/socket.c:238
     [<0000000014ddc967>] alloc_inode+0x2c/0xe0 fs/inode.c:227
     [<0000000056541455>] new_inode_pseudo+0x18/0x70 fs/inode.c:916
     [<000000003b5b5444>] sock_alloc+0x1c/0x90 net/socket.c:554
     [<00000000e623b353>] __sock_create+0x8f/0x250 net/socket.c:1378
     [<000000000e094708>] sock_create_kern+0x3b/0x50 net/socket.c:1483
     [<000000009fe4f64f>] smc_create+0xae/0x160 net/smc/af_smc.c:1975
     [<0000000056be84a7>] __sock_create+0x164/0x250 net/socket.c:1414
     [<000000005915e5fe>] sock_create net/socket.c:1465 [inline]
     [<000000005915e5fe>] __sys_socket+0x69/0x110 net/socket.c:1507
     [<00000000afa837b2>] __do_sys_socket net/socket.c:1516 [inline]
     [<00000000afa837b2>] __se_sys_socket net/socket.c:1514 [inline]
     [<00000000afa837b2>] __x64_sys_socket+0x1e/0x30 net/socket.c:1514
     [<00000000d0addad1>] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:296
     [<000000004e8e7c22>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0xffff88811faeeab8 (size 56):
   comm "syz-executor303", pid 7044, jiffies 4294943526 (age 13.490s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 0a ea 28 81 88 ff ff d0 ea ae 1f 81 88 ff ff  ...(............
   backtrace:
     [<000000005ba542b8>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<000000005ba542b8>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<000000005ba542b8>] slab_alloc mm/slab.c:3319 [inline]
     [<000000005ba542b8>] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483
     [<000000008ca63096>] kmem_cache_zalloc include/linux/slab.h:738 [inline]
     [<000000008ca63096>] lsm_inode_alloc security/security.c:522 [inline]
     [<000000008ca63096>] security_inode_alloc+0x33/0xb0  
security/security.c:875
     [<00000000b335d930>] inode_init_always+0x108/0x200 fs/inode.c:169
     [<0000000015dcffb3>] alloc_inode+0x49/0xe0 fs/inode.c:234
     [<0000000056541455>] new_inode_pseudo+0x18/0x70 fs/inode.c:916
     [<000000003b5b5444>] sock_alloc+0x1c/0x90 net/socket.c:554
     [<00000000e623b353>] __sock_create+0x8f/0x250 net/socket.c:1378
     [<000000000e094708>] sock_create_kern+0x3b/0x50 net/socket.c:1483
     [<000000009fe4f64f>] smc_create+0xae/0x160 net/smc/af_smc.c:1975
     [<0000000056be84a7>] __sock_create+0x164/0x250 net/socket.c:1414
     [<000000005915e5fe>] sock_create net/socket.c:1465 [inline]
     [<000000005915e5fe>] __sys_socket+0x69/0x110 net/socket.c:1507
     [<00000000afa837b2>] __do_sys_socket net/socket.c:1516 [inline]
     [<00000000afa837b2>] __se_sys_socket net/socket.c:1514 [inline]
     [<00000000afa837b2>] __x64_sys_socket+0x1e/0x30 net/socket.c:1514
     [<00000000d0addad1>] do_syscall_64+0x76/0x1a0  
arch/x86/entry/common.c:296
     [<000000004e8e7c22>] entry_SYSCALL_64_after_hwframe+0x44/0xa9



^ permalink raw reply

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Jiong Wang @ 2019-07-16  8:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers, Yonghong Song
In-Reply-To: <CAEf4BzYDAVUgajz4=dRTu5xQDddp5pi2s=T1BdFmRLZjOwGypQ@mail.gmail.com>


Andrii Nakryiko writes:

> On Mon, Jul 15, 2019 at 2:21 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >>
>> >> Andrii Nakryiko writes:
>> >>
>> >> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >> >>
>> >> >> This is an RFC based on latest bpf-next about acclerating insn patching
>> >> >> speed, it is now near the shape of final PATCH set, and we could see the
>> >> >> changes migrating to list patching would brings, so send out for
>> >> >> comments. Most of the info are in cover letter. I splitted the code in a
>> >> >> way to show API migration more easily.
>> >> >
>> >> >
>> >> > Hey Jiong,
>> >> >
>> >> >
>> >> > Sorry, took me a while to get to this and learn more about instruction
>> >> > patching. Overall this looks good and I think is a good direction.
>> >> > I'll post high-level feedback here, and some more
>> >> > implementation-specific ones in corresponding patches.
>> >>
>> >> Great, thanks very much for the feedbacks. Most of your feedbacks are
>> >> hitting those pain points I exactly had ran into. For some of them, I
>> >> thought similar solutions like yours, but failed due to various
>> >> reasons. Let's go through them again, I could have missed some important
>> >> things.
>> >>
>> >> Please see my replies below.
>> >
>> > Thanks for thoughtful reply :)
>> >
>> >>
>> >> >>
>> >> >> Test Results
>> >> >> ===
>> >> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
>> >> >>     modes (interpreter, JIT, JIT with blinding).
>> >> >>
>> >> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
>> >> >>     patching time from 5100s (nearly one and a half hour) to less than
>> >> >>     0.5s for 1M insn patching.
>> >> >>
>> >> >> Known Issues
>> >> >> ===
>> >> >>   - The following warning is triggered when running scale test which
>> >> >>     contains 1M insns and patching:
>> >> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
>> >> >>
>> >> >>     This is caused by existing code, it can be reproduced on bpf-next
>> >> >>     master with jit blinding enabled, then run scale unit test, it will
>> >> >>     shown up after half an hour. After this set, patching is very fast, so
>> >> >>     it shows up quickly.
>> >> >>
>> >> >>   - No line info adjustment support when doing insn delete, subprog adj
>> >> >>     is with bug when doing insn delete as well. Generally, removal of insns
>> >> >>     could possibly cause remove of entire line or subprog, therefore
>> >> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
>> >> >>     don't have good idea and clean code for integrating this into the
>> >> >>     linearization code at the moment, will do more experimenting,
>> >> >>     appreciate ideas and suggestions on this.
>> >> >
>> >> > Is there any specific problem to detect which line info to delete? Or
>> >> > what am I missing besides careful implementation?
>> >>
>> >> Mostly line info and subprog info are range info which covers a range of
>> >> insns. Deleting insns could causing you adjusting the range or removing one
>> >> range entirely. subprog info could be fully recalcuated during
>> >> linearization while line info I need some careful implementation and I
>> >> failed to have clean code for this during linearization also as said no
>> >> unit tests to help me understand whether the code is correct or not.
>> >>
>> >
>> > Ok, that's good that it's just about clean implementation. Try to
>> > implement it as clearly as possible. Then post it here, and if it can
>> > be improved someone (me?) will try to help to clean it up further.
>> >
>> > Not a big expert on line info, so can't comment on that,
>> > unfortunately. Maybe Yonghong can chime in (cc'ed)
>> >
>> >
>> >> I will described this latter, spent too much time writing the following
>> >> reply. Might worth an separate discussion thread.
>> >>
>> >> >>
>> >> >>     Insn delete doesn't happen on normal programs, for example Cilium
>> >> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
>> >> >>     not good. That's also why this RFC have a full pass on selftest with
>> >> >>     this known issue.
>> >> >
>> >> > I hope you'll add test for deletion (and w/ corresponding line info)
>> >> > in final patch set :)
>> >>
>> >> Will try. Need to spend some time on BTF format.
>> >> >
>> >> >>
>> >> >>   - Could further use mem pool to accelerate the speed, changes are trivial
>> >> >>     on top of this RFC, and could be 2x extra faster. Not included in this
>> >> >>     RFC as reducing the algo complexity from quadratic to linear of insn
>> >> >>     number is the first step.
>> >> >
>> >> > Honestly, I think that would add more complexity than necessary, and I
>> >> > think we can further speed up performance without that, see below.
>> >> >
>> >> >>
>> >> >> Background
>> >> >> ===
>> >> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
>> >> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
>> >> >> remove insns.
>> >> >>
>> >> >> At the moment, insn patching is quadratic of insn number, this is due to
>> >> >> branch targets of jump insns needs to be adjusted, and the algo used is:
>> >> >>
>> >> >>   for insn inside prog
>> >> >>     patch insn + regeneate bpf prog
>> >> >>     for insn inside new prog
>> >> >>       adjust jump target
>> >> >>
>> >> >> This is causing significant time spending when a bpf prog requires large
>> >> >> amount of patching on different insns. Benchmarking shows it could take
>> >> >> more than half minutes to finish patching when patching number is more
>> >> >> than 50K, and the time spent could be more than one hour when patching
>> >> >> number is around 1M.
>> >> >>
>> >> >>   15000   :    3s
>> >> >>   45000   :   29s
>> >> >>   95000   :  125s
>> >> >>   195000  :  712s
>> >> >>   1000000 : 5100s
>> >> >>
>> >> >> This RFC introduces new patching infrastructure. Before doing insn
>> >> >> patching, insns in bpf prog are turned into a singly linked list, insert
>> >> >> new insns just insert new list node, delete insns just set delete flag.
>> >> >> And finally, the list is linearized back into array, and branch target
>> >> >> adjustment is done for all jump insns during linearization. This algo
>> >> >> brings the time complexity from quadratic to linear of insn number.
>> >> >>
>> >> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
>> >> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
>> >> >> to less than 0.5s.
>> >> >>
>> >> >> Patching API
>> >> >> ===
>> >> >> Insn patching could happen on two layers inside BPF. One is "core layer"
>> >> >> where only BPF insns are patched. The other is "verification layer" where
>> >> >> insns have corresponding aux info as well high level subprog info, so
>> >> >> insn patching means aux info needs to be patched as well, and subprog info
>> >> >> needs to be adjusted. BPF prog also has debug info associated, so line info
>> >> >> should always be updated after insn patching.
>> >> >>
>> >> >> So, list creation, destroy, insert, delete is the same for both layer,
>> >> >> but lineration is different. "verification layer" patching require extra
>> >> >> work. Therefore the patch APIs are:
>> >> >>
>> >> >>    list creation:                bpf_create_list_insn
>> >> >>    list patch:                   bpf_patch_list_insn
>> >> >>    list pre-patch:               bpf_prepatch_list_insn
>> >> >
>> >> > I think pre-patch name is very confusing, until I read full
>> >> > description I couldn't understand what it's supposed to be used for.
>> >> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
>> >> > me wondering whether instruction buffer is inserted after instruction,
>> >> > or instruction is replaced with a bunch of instructions.
>> >> >
>> >> > So how about two more specific names:
>> >> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
>> >> > instruction with a list of patch instructions)
>> >> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
>> >> > one is pretty clear).
>> >>
>> >> My sense on English word is not great, will switch to above which indeed
>> >> reads more clear.
>> >>
>> >> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
>> >> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
>> >> >
>> >> > These two functions are both quite involved, as well as share a lot of
>> >> > common code. I'd rather have one linearize instruction, that takes env
>> >> > as an optional parameter. If env is specified (which is the case for
>> >> > all cases except for constant blinding pass), then adjust aux_data and
>> >> > subprogs along the way.
>> >>
>> >> Two version of lineration and how to unify them was a painpoint to me. I
>> >> thought to factor out some of the common code out, but it actually doesn't
>> >> count much, the final size counting + insnsi resize parts are the same,
>> >> then things start to diverge since the "Copy over insn" loop.
>> >>
>> >> verifier layer needs to copy and initialize aux data etc. And jump
>> >> relocation is different. At core layer, the use case is JIT blinding which
>> >> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
>> >
>> > Sorry, I didn't get what "could expand an jump_imm insn into a
>> > and/or/jump_reg sequence", maybe you can clarify if I'm missing
>> > something.
>> >
>> > But from your cover letter description, core layer has no jumps at
>> > all, while verifier has jumps inside patch buffer. So, if you support
>> > jumps inside of patch buffer, it will automatically work for core
>> > layer. Or what am I missing?
>>
>> I meant in core layer (JIT blinding), there is the following patching:
>>
>> input:
>>   insn 0             insn 0
>>   insn 1             insn 1
>>   jmp_imm   >>       mov_imm  \
>>   insn 2             xor_imm    insn seq expanded from jmp_imm
>>   insn 3             jmp_reg  /
>>                      insn 2
>>                      insn 3
>>
>>
>> jmp_imm is the insn that will be patched, and the actually transformation
>> is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
>> at the end of the patch buffer, must jump to the same destination as the
>> original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
>> be relocated, and the jump destination is outside of patch buffer.
>
>
> Ok, great, thanks for explaining, yeah it's definitely something that
> we should be able to support. BUT. It got me thinking a bit more and I
> think I have simpler and more elegant solution now, again, supporting
> both core-layer and verifier-layer operations.
>
> struct bpf_patchable_insn {
>    struct bpf_patchable_insn *next;
>    struct bpf_insn insn;
>    int orig_idx; /* original non-patched index */
>    int new_idx;  /* new index, will be filled only during linearization */
> };
>
> struct bpf_patcher {
>     /* dummy head node of a chain of patchable instructions */
>     struct bpf_patchable_insn insn_head;
>     /* dynamic array of size(original instruction count)
>      * this is a map from original instruction index to a first
>      * patchable instruction that replaced that instruction (or
>      * just original instruction as bpf_patchable_insn).
>      */
>     int *orig_idx_to_patchable_insn;
>     int cnt;
> };
>
> Few points, but it should be pretty clear just from comments and definitions:
> 1. When you created bpf_patcher, you create patchabe_insn list, fill
> orig_idx_to_patchable_insn map to store proper pointers. This array is
> NEVER changed after that.
> 2. When replacing instruction, you re-use struct bpf_patchable_insn
> for first patched instruction, then append after that (not prepend to
> next instruction to not disrupt orig_idx -> patchable_insn mapping).
> 3. During linearizations, you first traverse the chain of instructions
> and trivially assing new_idxs.
> 4. No need for patchabe_insn->target anymore. All jumps use relative
> instruction offsets, right?

Yes, all jumps are pc-relative.

> So when you need to determine new
> instruction index during linearization, you just do (after you
> calculated new instruction indicies):
>
> func adjust_jmp(struct bpf_patcher* patcher, struct bpf_patchable_insn *insn) {
>    int old_jmp_idx = insn->orig_idx + jmp_offset_of(insn->insn);
>    int new_jmp_idx = patcher->orig_idx_to_patchable_insn[old_jmp_idx]->new_idx;
>    adjust_jmp_offset(insn->insn, new_jmp_idx) - insn->orig_idx;
> }

Hmm, this algo is kinds of the same this RFC, just we have organized "new_index"
as "idx_map". And in this RFC, only new_idx of one original insn matters,
no space is allocated for patched insns. (As mentioned, JIT blinding
requires the last insn inside patch buffer relocated to original jump
offset, so there was a little special handling in the relocation loop in
core layer linearization code)

> The idea is that we want to support quick look-up by original
> instruction index. That's what orig_idx_to_patchable_insn provides. On
> the other hand, no existing instruction is ever referencing newly
> patched instruction by its new offset, so with careful implementation,
> you can transparently support all the cases, regardless if it's in
> core layer or verifier layer (so, e.g., verifier layer patched
> instructions now will be able to jump out of patched buffer, if
> necessary, neat, right?).
>
> It is cleaner than everything we've discussed so far. Unless I missed
> something critical (it's all quite convoluted, so I might have
> forgotten some parts already). Let me know what you think.

Let me digest a little bit and do some coding, then I will come back. Some
issues can only shown up during in-depth coding. I kind of feel handling
aux reference in verifier layer is the part that will still introduce some
un-clean code.

<snip>
>> If there is no dead insn elimination opt, then we could just adjust
>> offsets. When there is insn deleting, I feel the logic becomes more
>> complex. One subprog could be completely deleted or partially deleted, so
>> I feel just recalculate the whole subprog info as a side-product is
>> much simpler.
>
> What's the situation where entirety of subprog can be deleted?

Suppose you have conditional jmp_imm, true path calls one subprog, false
path calls the other. If insn walker later found it is also true, then the
subprog at false path won't be marked as "seen", so it is entirely deleted.

I actually thought it is in theory one subprog could be deleted entirely,
so if we support insn deletion inside verifier, then range info like
line_info/subprog_info needs to consider one range is deleted.

Thanks.
Regards,
Jiong

^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Stefano Garzarella @ 2019-07-16  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190715134115-mutt-send-email-mst@kernel.org>

On Mon, Jul 15, 2019 at 01:50:28PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote:
> > On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:

[...]

> > > 
> > > 
> > > I think it's just a branch, for ethernet, go for networking stack. otherwise
> > > go for vsock core?
> > > 
> > 
> > Yes, that should work.
> > 
> > So, I should refactor the functions that can be called also from the vsock
> > core, in order to remove "struct net_device *dev" parameter.
> > Maybe creating some wrappers for the network stack.
> > 
> > Otherwise I should create a fake net_device for vsock_core.
> > 
> > What do you suggest?
> 
> Neither.
> 
> I think what Jason was saying all along is this:
> 
> virtio net doesn't actually lose packets, at least most
> of the time. And it actually most of the time
> passes all packets to host. So it's possible to use a virtio net
> device (possibly with a feature flag that says "does not lose packets,
> all packets go to host") and build vsock on top.

Yes, I got it after the latest Jason's reply.

> 
> and all of this is nice, but don't expect anything easy,
> or any quick results.

I expected this... :-(

> 
> Also, in a sense it's a missed opportunity: we could cut out a lot
> of fat and see just how fast can a protocol that is completely
> new and separate from networking stack go.

In this case, if we will try to do a PoC, what do you think is better?
    1. new AF_VSOCK + network-stack + virtio-net modified
        Maybe it is allow us to reuse a lot of stuff already written,
        but we will go through the network stack

    2. new AF_VSOCK + glue + virtio-net modified
        Intermediate approach, similar to Jason's proposal

    3, new AF_VSOCK + new virtio-vsock
        Can be the thinnest, but we have to rewrite many things, with the risk
        of making the same mistakes as the current implementation.


> Instead vsock implementation carries so much baggage from both
> networking stack - such as softirq processing - and itself such as
> workqueues, global state and crude locking - to the point where
> it's actually slower than TCP.

I agree, and I'm finding new issues while I'm trying to support nested
VMs, allowing multiple vsock transports (virtio-vsock and vhost-vsock in
the KVM case) at runtime.

> 

[...]

> > > 
> > > I suggest to do this step by step:
> > > 
> > > 1) use virtio-net but keep some protocol logic
> > > 
> > > 2) separate protocol logic and merge it to exist Linux networking stack
> > 
> > Make sense, thanks for the suggestions, I'll try to do these steps!
> > 
> > Thanks,
> > Stefano
> 
> 
> An alternative is look at sources of overhead in vsock and get rid of
> them, or rewrite it from scratch focusing on performance.

I started looking at virtio-vsock and vhost-vsock trying to do very
simple changes [1] to increase the performance. I should send a v4 of that
series as a very short term, then I'd like to have a deeper look to understand
if it is better to try to optimize or rewrite it from scratch.


Thanks,
Stefano

[1] https://patchwork.kernel.org/cover/10970145/


^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Michael S. Tsirkin @ 2019-07-16 10:01 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Jason Wang, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190716094024.ob43g5lxga5uwb7z@steredhat>

On Tue, Jul 16, 2019 at 11:40:24AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 15, 2019 at 01:50:28PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote:
> > > On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> 
> [...]
> 
> > > > 
> > > > 
> > > > I think it's just a branch, for ethernet, go for networking stack. otherwise
> > > > go for vsock core?
> > > > 
> > > 
> > > Yes, that should work.
> > > 
> > > So, I should refactor the functions that can be called also from the vsock
> > > core, in order to remove "struct net_device *dev" parameter.
> > > Maybe creating some wrappers for the network stack.
> > > 
> > > Otherwise I should create a fake net_device for vsock_core.
> > > 
> > > What do you suggest?
> > 
> > Neither.
> > 
> > I think what Jason was saying all along is this:
> > 
> > virtio net doesn't actually lose packets, at least most
> > of the time. And it actually most of the time
> > passes all packets to host. So it's possible to use a virtio net
> > device (possibly with a feature flag that says "does not lose packets,
> > all packets go to host") and build vsock on top.
> 
> Yes, I got it after the latest Jason's reply.
> 
> > 
> > and all of this is nice, but don't expect anything easy,
> > or any quick results.
> 
> I expected this... :-(
> 
> > 
> > Also, in a sense it's a missed opportunity: we could cut out a lot
> > of fat and see just how fast can a protocol that is completely
> > new and separate from networking stack go.
> 
> In this case, if we will try to do a PoC, what do you think is better?
>     1. new AF_VSOCK + network-stack + virtio-net modified
>         Maybe it is allow us to reuse a lot of stuff already written,
>         but we will go through the network stack
> 
>     2. new AF_VSOCK + glue + virtio-net modified
>         Intermediate approach, similar to Jason's proposal
> 
>     3, new AF_VSOCK + new virtio-vsock
>         Can be the thinnest, but we have to rewrite many things, with the risk
>         of making the same mistakes as the current implementation.
> 

1 or 3 imho. I wouldn't expect a lot from 2.  I slightly favor 3 and
Jason 1. So take your pick :)

> > Instead vsock implementation carries so much baggage from both
> > networking stack - such as softirq processing - and itself such as
> > workqueues, global state and crude locking - to the point where
> > it's actually slower than TCP.
> 
> I agree, and I'm finding new issues while I'm trying to support nested
> VMs, allowing multiple vsock transports (virtio-vsock and vhost-vsock in
> the KVM case) at runtime.
> 
> > 
> 
> [...]
> 
> > > > 
> > > > I suggest to do this step by step:
> > > > 
> > > > 1) use virtio-net but keep some protocol logic
> > > > 
> > > > 2) separate protocol logic and merge it to exist Linux networking stack
> > > 
> > > Make sense, thanks for the suggestions, I'll try to do these steps!
> > > 
> > > Thanks,
> > > Stefano
> > 
> > 
> > An alternative is look at sources of overhead in vsock and get rid of
> > them, or rewrite it from scratch focusing on performance.
> 
> I started looking at virtio-vsock and vhost-vsock trying to do very
> simple changes [1] to increase the performance. I should send a v4 of that
> series as a very short term, then I'd like to have a deeper look to understand
> if it is better to try to optimize or rewrite it from scratch.
> 
> 
> Thanks,
> Stefano
> 
> [1] https://patchwork.kernel.org/cover/10970145/

^ permalink raw reply

* Re: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Vasily Averin @ 2019-07-16 10:20 UTC (permalink / raw)
  To: Xiaoming Ni, gregkh@linuxfoundation.org
  Cc: adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, Huangjianhui (Alex), Dailei,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <3bbc16ba-953c-a6b6-c5f3-4deaeaa25d10@huawei.com>

On 7/16/19 5:00 AM, Xiaoming Ni wrote:
> On 2019/7/15 13:38, Vasily Averin wrote:
>> On 7/14/19 5:45 AM, Xiaoming Ni wrote:
>>> On 2019/7/12 22:07, gregkh@linuxfoundation.org wrote:
>>>> On Fri, Jul 12, 2019 at 09:11:57PM +0800, Xiaoming Ni wrote:
>>>>> On 2019/7/11 21:57, Vasily Averin wrote:
>>>>>> On 7/11/19 4:55 AM, Nixiaoming wrote:
>>>>>>> On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>>>>>>>> On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>>>>>>>>> Registering the same notifier to a hook repeatedly can cause the hook
>>>>>>>>> list to form a ring or lose other members of the list.
>>>>>>>>
>>>>>>>> I think is not enough to _prevent_ 2nd register attempt,
>>>>>>>> it's enough to detect just attempt and generate warning to mark host in bad state.
>>>>>>>>
>>>>>>>
>>>>>>> Duplicate registration is prevented in my patch, not just "mark host in bad state"
>>>>>>>
>>>>>>> Duplicate registration is checked and exited in notifier_chain_cond_register()
>>>>>>>
>>>>>>> Duplicate registration was checked in notifier_chain_register() but only 
>>>>>>> the alarm was triggered without exiting. added by commit 831246570d34692e 
>>>>>>> ("kernel/notifier.c: double register detection")
>>>>>>>
>>>>>>> My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
>>>>>>>  which triggers an alarm and exits when a duplicate registration is detected.
>>>>>>>
>>>>>>>> Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>>>>>>>> and it can lead to host crash in any time: 
>>>>>>>> you can unregister notifier on first attempt it can be too early, it can be still in use.
>>>>>>>> on the other hand you can never call 2nd unregister at all.
>>>>>>>
>>>>>>> Since the member was not added to the linked list at the time of the second registration, 
>>>>>>> no linked list ring was formed. 
>>>>>>> The member is released on the first unregistration and -ENOENT on the second unregistration.
>>>>>>> After patching, the fault has been alleviated
>>>>>>
>>>>>> You are wrong here.
>>>>>> 2nd notifier's registration is a pure bug, this should never happen.
>>>>>> If you know the way to reproduce this situation -- you need to fix it. 
>>>>>>
>>>>>> 2nd registration can happen in 2 cases:
>>>>>> 1) missed rollback, when someone forget to call unregister after successfull registration, 
>>>>>> and then tried to call register again. It can lead to crash for example when according module will be unloaded.
>>>>>> 2) some subsystem is registered twice, for example from  different namespaces.
>>>>>> in this case unregister called during sybsystem cleanup in first namespace will incorrectly remove notifier used 
>>>>>> in second namespace, it also can lead to unexpacted behaviour.
>>>>>>
>>>>> So in these two cases, is it more reasonable to trigger BUG() directly when checking for duplicate registration ?
>>>>> But why does current notifier_chain_register() just trigger WARN() without exiting ?
>>>>> notifier_chain_cond_register() direct exit without triggering WARN() ?
>>>>
>>>> It should recover from this, if it can be detected.  The main point is
>>>> that not all apis have to be this "robust" when used within the kernel
>>>> as we do allow for the callers to know what they are doing :)
>>>>
>>> In the notifier_chain_register(), the condition ( (*nl) == n) is the same registration of the same hook.
>>>  We can intercept this situation and avoid forming a linked list ring to make the API more rob
>>
>> Once again -- yes, you CAN prevent list corruption, but you CANNOT recover the host and return it back to safe state.
>> If double register event was detected -- it means you have bug in kernel.
>>
>> Yes, you can add BUG here and crash the host immediately, but I prefer to use warning in such situations.
>>
>>>> If this does not cause any additional problems or slow downs, it's
>>>> probably fine to add.
>>>>
>>> Notifier_chain_register() is not a system hotspot function.
>>> At the same time, there is already a WARN_ONCE judgment. There is no new judgment in the new patch.
>>> It only changes the processing under the condition of (*nl) == n, which will not cause performance problems.
>>> At the same time, avoiding the formation of a link ring can make the system more robust.
>>
>> I disagree, 
>> yes, node will have correct list, but anyway node will work wrong and can crash the host in any time.
> 
> Sorry, my description is not accurate.
> 
> My patch feature does not prevent users from repeatedly registering hooks.
> But avoiding the chain ring caused by the user repeatedly registering the hook
> 
> There are no modules for duplicate registration hooks in the current system.
> But considering that not all modules are in the kernel source tree,
> In order to improve the robustness of the kernel API, we should avoid the linked list ring caused by repeated registration.
> Or in order to improve the efficiency of problem location, when the duplicate registration is checked, the system crashes directly.

Detect of duplicate registration means an unrecoverable error,
from this point of view it makes sense to replace WARN_ONCE by BUG_ON.
 
> On the other hand, the difference between notifier_chain_register() and notifier_chain_cond_register() for duplicate registrations is confusing:
> Blocking the formation of the linked list ring in notifier_chain_cond_register()
> There is no interception of the linked list ring in notifier_chain_register(), just an alarm.
> Give me the illusion: Isn't notifier_chain_register() allowed to create a linked list ring?

I'm not sure that I understood your question correctly but will try to answer.
As far as I see all callers of notifier_chain_cond_register checks return value, expect possible failure and handle it somehow.
On the other hand callers of notifier_chain_register() in many cases do not check return value and always expect success.
The goal of original WARN_ONCE -- to detect possible misuse of notifiers and it seems for me it correctly handles this task.


^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Stefano Garzarella @ 2019-07-16 10:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190716055837-mutt-send-email-mst@kernel.org>

On Tue, Jul 16, 2019 at 06:01:33AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 16, 2019 at 11:40:24AM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 15, 2019 at 01:50:28PM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote:
> > > > On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> > 
> > [...]
> > 
> > > > > 
> > > > > 
> > > > > I think it's just a branch, for ethernet, go for networking stack. otherwise
> > > > > go for vsock core?
> > > > > 
> > > > 
> > > > Yes, that should work.
> > > > 
> > > > So, I should refactor the functions that can be called also from the vsock
> > > > core, in order to remove "struct net_device *dev" parameter.
> > > > Maybe creating some wrappers for the network stack.
> > > > 
> > > > Otherwise I should create a fake net_device for vsock_core.
> > > > 
> > > > What do you suggest?
> > > 
> > > Neither.
> > > 
> > > I think what Jason was saying all along is this:
> > > 
> > > virtio net doesn't actually lose packets, at least most
> > > of the time. And it actually most of the time
> > > passes all packets to host. So it's possible to use a virtio net
> > > device (possibly with a feature flag that says "does not lose packets,
> > > all packets go to host") and build vsock on top.
> > 
> > Yes, I got it after the latest Jason's reply.
> > 
> > > 
> > > and all of this is nice, but don't expect anything easy,
> > > or any quick results.
> > 
> > I expected this... :-(
> > 
> > > 
> > > Also, in a sense it's a missed opportunity: we could cut out a lot
> > > of fat and see just how fast can a protocol that is completely
> > > new and separate from networking stack go.
> > 
> > In this case, if we will try to do a PoC, what do you think is better?
> >     1. new AF_VSOCK + network-stack + virtio-net modified
> >         Maybe it is allow us to reuse a lot of stuff already written,
> >         but we will go through the network stack
> > 
> >     2. new AF_VSOCK + glue + virtio-net modified
> >         Intermediate approach, similar to Jason's proposal
> > 
> >     3, new AF_VSOCK + new virtio-vsock
> >         Can be the thinnest, but we have to rewrite many things, with the risk
> >         of making the same mistakes as the current implementation.
> > 
> 
> 1 or 3 imho. I wouldn't expect a lot from 2.  I slightly favor 3 and
> Jason 1. So take your pick :)
> 

Yes, I agree :)

Maybe "Jason 1" could be the short term (and an opportunity to study better the
code and sources of overhead) and "new AF_VSOCK + new virtio-vsock" the long
term goal with the multi-transport support in mind.

Thank you so much for your guidance and useful advice,
Stefano

^ permalink raw reply

* [PATCH bpf v3] selftests/bpf: fix "alu with different scalars 1" on s390
From: Ilya Leoshkevich @ 2019-07-16 10:53 UTC (permalink / raw)
  To: bpf, netdev
  Cc: gor, heiko.carstens, alexei.starovoitov, daniel, ys114321,
	Ilya Leoshkevich

BPF_LDX_MEM is used to load the least significant byte of the retrieved
test_val.index, however, on big-endian machines it ends up retrieving
the most significant byte.

Change the test to load the whole int in order to make it
endianness-independent.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---

v1->v2:
- use __BYTE_ORDER instead of __BYTE_ORDER__.
v2->v3:
- load the whole int instead of byte.

 tools/testing/selftests/bpf/verifier/value_ptr_arith.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/verifier/value_ptr_arith.c b/tools/testing/selftests/bpf/verifier/value_ptr_arith.c
index c3de1a2c9dc5..a53d99cebd9f 100644
--- a/tools/testing/selftests/bpf/verifier/value_ptr_arith.c
+++ b/tools/testing/selftests/bpf/verifier/value_ptr_arith.c
@@ -183,7 +183,7 @@
 	BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
 	BPF_EXIT_INSN(),
-	BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
 	BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 3),
 	BPF_MOV64_IMM(BPF_REG_2, 0),
 	BPF_MOV64_IMM(BPF_REG_3, 0x100000),
-- 
2.21.0


^ permalink raw reply related

* [PATCH bpf v2] selftests/bpf: skip nmi test when perf hw events are disabled
From: Ilya Leoshkevich @ 2019-07-16 10:56 UTC (permalink / raw)
  To: bpf, netdev
  Cc: gor, heiko.carstens, andrii.nakryiko, ys114321, Ilya Leoshkevich

Some setups (e.g. virtual machines) might run with hardware perf events
disabled. If this is the case, skip the test_send_signal_nmi test.

Add a separate test involving a software perf event. This allows testing
the perf event path regardless of hardware perf event support.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---

v1->v2: Skip the test instead of using a software event.
Add a separate test with a software event.

 .../selftests/bpf/prog_tests/send_signal.c    | 33 ++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index 67cea1686305..54218ee3c004 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -173,6 +173,18 @@ static int test_send_signal_tracepoint(void)
 	return test_send_signal_common(&attr, BPF_PROG_TYPE_TRACEPOINT, "tracepoint");
 }
 
+static int test_send_signal_perf(void)
+{
+	struct perf_event_attr attr = {
+		.sample_period = 1,
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_CPU_CLOCK,
+	};
+
+	return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT,
+				       "perf_sw_event");
+}
+
 static int test_send_signal_nmi(void)
 {
 	struct perf_event_attr attr = {
@@ -181,8 +193,26 @@ static int test_send_signal_nmi(void)
 		.type = PERF_TYPE_HARDWARE,
 		.config = PERF_COUNT_HW_CPU_CYCLES,
 	};
+	int pmu_fd;
+
+	/* Some setups (e.g. virtual machines) might run with hardware
+	 * perf events disabled. If this is the case, skip this test.
+	 */
+	pmu_fd = syscall(__NR_perf_event_open, &attr, 0 /* pid */,
+			 -1 /* cpu */, -1 /* group_fd */, 0 /* flags */);
+	if (pmu_fd == -1) {
+		if (errno == ENOENT) {
+			printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
+				__func__);
+			return 0;
+		}
+		/* Let the test fail with a more informative message */
+	} else {
+		close(pmu_fd);
+	}
 
-	return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
+	return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT,
+				       "perf_hw_event");
 }
 
 void test_send_signal(void)
@@ -190,6 +220,7 @@ void test_send_signal(void)
 	int ret = 0;
 
 	ret |= test_send_signal_tracepoint();
+	ret |= test_send_signal_perf();
 	ret |= test_send_signal_nmi();
 	if (!ret)
 		printf("test_send_signal:OK\n");
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"
From: Neil Horman @ 2019-07-16 11:09 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <20190716022002.GA19592@hari-Inspiron-1545>

On Tue, Jul 16, 2019 at 07:50:02AM +0530, Hariprasad Kelam wrote:
> This patch removes NULL checks before calling kfree.
> 
> fixes below issues reported by coccicheck
> net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
> ---
>  net/sctp/sm_make_chunk.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index ed39396..36bd8a6e 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2582,8 +2582,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  	case SCTP_PARAM_STATE_COOKIE:
>  		asoc->peer.cookie_len =
>  			ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
> -		if (asoc->peer.cookie)
> -			kfree(asoc->peer.cookie);
> +		kfree(asoc->peer.cookie);
>  		asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp);
>  		if (!asoc->peer.cookie)
>  			retval = 0;
> @@ -2648,8 +2647,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  			goto fall_through;
>  
>  		/* Save peer's random parameter */
> -		if (asoc->peer.peer_random)
> -			kfree(asoc->peer.peer_random);
> +		kfree(asoc->peer.peer_random);
>  		asoc->peer.peer_random = kmemdup(param.p,
>  					    ntohs(param.p->length), gfp);
>  		if (!asoc->peer.peer_random) {
> @@ -2663,8 +2661,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  			goto fall_through;
>  
>  		/* Save peer's HMAC list */
> -		if (asoc->peer.peer_hmacs)
> -			kfree(asoc->peer.peer_hmacs);
> +		kfree(asoc->peer.peer_hmacs);
>  		asoc->peer.peer_hmacs = kmemdup(param.p,
>  					    ntohs(param.p->length), gfp);
>  		if (!asoc->peer.peer_hmacs) {
> @@ -2680,8 +2677,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  		if (!ep->auth_enable)
>  			goto fall_through;
>  
> -		if (asoc->peer.peer_chunks)
> -			kfree(asoc->peer.peer_chunks);
> +		kfree(asoc->peer.peer_chunks);
>  		asoc->peer.peer_chunks = kmemdup(param.p,
>  					    ntohs(param.p->length), gfp);
>  		if (!asoc->peer.peer_chunks)
> -- 
> 2.7.4
> 
> 

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [PATCH 0/2] arm/arm64: Add support for function error injection
From: Leo Yan @ 2019-07-16 11:12 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	netdev, bpf, Masami Hiramatsu, Justin He
  Cc: Leo Yan

This small patch set is to add support for function error injection;
this can be used to eanble more advanced debugging feature, e.g.
CONFIG_BPF_KPROBE_OVERRIDE.

I only tested the first patch on arm64 platform Juno-r2 with below
steps; the second patch is for arm arch, but I absent the platform
for the testing so only pass compilation.

- Enable kernel configuration:
  CONFIG_BPF_KPROBE_OVERRIDE
  CONFIG_BTRFS_FS
  CONFIG_BPF_EVENTS=y
  CONFIG_KPROBES=y
  CONFIG_KPROBE_EVENTS=y
  CONFIG_BPF_KPROBE_OVERRIDE=y
- Build samples/bpf on Juno-r2 board with Debian rootFS:
  # cd $kernel
  # make headers_install
  # make samples/bpf/ LLC=llc-7 CLANG=clang-7
- Run the sample tracex7:
  # ./tracex7 /dev/sdb1
  [ 1975.211781] BTRFS error (device (efault)): open_ctree failed
  mount: /mnt/linux-kernel/linux-cs-dev/samples/bpf/tmpmnt: mount(2) system call failed: Cannot allocate memory.


Leo Yan (2):
  arm64: Add support for function error injection
  arm: Add support for function error injection

 arch/arm/Kconfig                         |  1 +
 arch/arm/include/asm/error-injection.h   | 13 +++++++++++++
 arch/arm/include/asm/ptrace.h            |  5 +++++
 arch/arm/lib/Makefile                    |  2 ++
 arch/arm/lib/error-inject.c              | 19 +++++++++++++++++++
 arch/arm64/Kconfig                       |  1 +
 arch/arm64/include/asm/error-injection.h | 13 +++++++++++++
 arch/arm64/include/asm/ptrace.h          |  5 +++++
 arch/arm64/lib/Makefile                  |  2 ++
 arch/arm64/lib/error-inject.c            | 19 +++++++++++++++++++
 10 files changed, 80 insertions(+)
 create mode 100644 arch/arm/include/asm/error-injection.h
 create mode 100644 arch/arm/lib/error-inject.c
 create mode 100644 arch/arm64/include/asm/error-injection.h
 create mode 100644 arch/arm64/lib/error-inject.c

-- 
2.17.1


^ permalink raw reply

* [PATCH 1/2] arm64: Add support for function error injection
From: Leo Yan @ 2019-07-16 11:13 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	netdev, bpf, Masami Hiramatsu, Justin He
  Cc: Leo Yan
In-Reply-To: <20190716111301.1855-1-leo.yan@linaro.org>

This patch implement regs_set_return_value() and
override_function_with_return() to support function error injection
for arm64.

In the exception flow, arm64's general register x30 contains the value
for the link register; so we can just update pt_regs::pc with it rather
than redirecting execution to a dummy function that returns.

This patch is heavily inspired by the commit 7cd01b08d35f ("powerpc:
Add support for function error injection").

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/arm64/Kconfig                       |  1 +
 arch/arm64/include/asm/error-injection.h | 13 +++++++++++++
 arch/arm64/include/asm/ptrace.h          |  5 +++++
 arch/arm64/lib/Makefile                  |  2 ++
 arch/arm64/lib/error-inject.c            | 19 +++++++++++++++++++
 5 files changed, 40 insertions(+)
 create mode 100644 arch/arm64/include/asm/error-injection.h
 create mode 100644 arch/arm64/lib/error-inject.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 697ea0510729..a6d9e622977d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARM64
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_FTRACE_MCOUNT_RECORD
 	select HAVE_FUNCTION_TRACER
+	select HAVE_FUNCTION_ERROR_INJECTION
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_GCC_PLUGINS
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
diff --git a/arch/arm64/include/asm/error-injection.h b/arch/arm64/include/asm/error-injection.h
new file mode 100644
index 000000000000..da057e8ed224
--- /dev/null
+++ b/arch/arm64/include/asm/error-injection.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef __ASM_ERROR_INJECTION_H_
+#define __ASM_ERROR_INJECTION_H_
+
+#include <linux/compiler.h>
+#include <linux/linkage.h>
+#include <asm/ptrace.h>
+#include <asm-generic/error-injection.h>
+
+void override_function_with_return(struct pt_regs *regs);
+
+#endif /* __ASM_ERROR_INJECTION_H_ */
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index dad858b6adc6..3aafbbe218a2 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -294,6 +294,11 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
 	return regs->regs[0];
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
+{
+	regs->regs[0] = rc;
+}
+
 /**
  * regs_get_kernel_argument() - get Nth function argument in kernel
  * @regs:	pt_regs of that context
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 33c2a4abda04..f182ccb0438e 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -33,3 +33,5 @@ UBSAN_SANITIZE_atomic_ll_sc.o	:= n
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 
 obj-$(CONFIG_CRC32) += crc32.o
+
+obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/arm64/lib/error-inject.c b/arch/arm64/lib/error-inject.c
new file mode 100644
index 000000000000..35661c2de4b0
--- /dev/null
+++ b/arch/arm64/lib/error-inject.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/error-injection.h>
+#include <linux/kprobes.h>
+
+void override_function_with_return(struct pt_regs *regs)
+{
+	/*
+	 * 'regs' represents the state on entry of a predefined function in
+	 * the kernel/module and which is captured on a kprobe.
+	 *
+	 * 'regs->regs[30]' contains the the link register for the probed
+	 * function and assign it to 'regs->pc', so when kprobe returns
+	 * back from exception it will override the end of probed function
+	 * and drirectly return to the predefined function's caller.
+	 */
+	regs->pc = regs->regs[30];
+}
+NOKPROBE_SYMBOL(override_function_with_return);
-- 
2.17.1


^ permalink raw reply related

* [PATCH 2/2] arm: Add support for function error injection
From: Leo Yan @ 2019-07-16 11:13 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	netdev, bpf, Masami Hiramatsu, Justin He
  Cc: Leo Yan
In-Reply-To: <20190716111301.1855-1-leo.yan@linaro.org>

This patch implement regs_set_return_value() and
override_function_with_return() to support function error injection
for arm.

In the exception flow, we can update pt_regs::ARM_pc with
pt_regs::ARM_lr so that can override the probed function return.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/arm/Kconfig                       |  1 +
 arch/arm/include/asm/error-injection.h | 13 +++++++++++++
 arch/arm/include/asm/ptrace.h          |  5 +++++
 arch/arm/lib/Makefile                  |  2 ++
 arch/arm/lib/error-inject.c            | 19 +++++++++++++++++++
 5 files changed, 40 insertions(+)
 create mode 100644 arch/arm/include/asm/error-injection.h
 create mode 100644 arch/arm/lib/error-inject.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8869742a85df..f7932a5e29ea 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -74,6 +74,7 @@ config ARM
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
 	select HAVE_EXIT_THREAD
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
+	select HAVE_FUNCTION_ERROR_INJECTION if !THUMB2_KERNEL
 	select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
 	select HAVE_FUNCTION_TRACER if !XIP_KERNEL
 	select HAVE_GCC_PLUGINS
diff --git a/arch/arm/include/asm/error-injection.h b/arch/arm/include/asm/error-injection.h
new file mode 100644
index 000000000000..da057e8ed224
--- /dev/null
+++ b/arch/arm/include/asm/error-injection.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef __ASM_ERROR_INJECTION_H_
+#define __ASM_ERROR_INJECTION_H_
+
+#include <linux/compiler.h>
+#include <linux/linkage.h>
+#include <asm/ptrace.h>
+#include <asm-generic/error-injection.h>
+
+void override_function_with_return(struct pt_regs *regs);
+
+#endif /* __ASM_ERROR_INJECTION_H_ */
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 91d6b7856be4..3b41f37b361a 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -89,6 +89,11 @@ static inline long regs_return_value(struct pt_regs *regs)
 	return regs->ARM_r0;
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
+{
+	regs->ARM_r0 = rc;
+}
+
 #define instruction_pointer(regs)	(regs)->ARM_pc
 
 #ifdef CONFIG_THUMB2_KERNEL
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 0bff0176db2c..d3d7430ecd76 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -43,3 +43,5 @@ ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
   CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
   obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
 endif
+
+obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/arm/lib/error-inject.c b/arch/arm/lib/error-inject.c
new file mode 100644
index 000000000000..96319d017114
--- /dev/null
+++ b/arch/arm/lib/error-inject.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/error-injection.h>
+#include <linux/kprobes.h>
+
+void override_function_with_return(struct pt_regs *regs)
+{
+	/*
+	 * 'regs' represents the state on entry of a predefined function in
+	 * the kernel/module and which is captured on a kprobe.
+	 *
+	 * 'regs->ARM_lr' contains the the link register for the probed
+	 * function and assign it to 'regs->ARM_pc', so when kprobe returns
+	 * back from exception it will override the end of probed function
+	 * and drirectly return to the predefined function's caller.
+	 */
+	regs->ARM_pc = regs->ARM_lr;
+}
+NOKPROBE_SYMBOL(override_function_with_return);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v5] net: netfilter: Fix rpfilter dropping vrf packets by mistake
From: Pablo Neira Ayuso @ 2019-07-16 11:16 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: kadlec, fw, davem, kuznet, yoshfuji, netfilter-devel, coreteam,
	netdev, linux-kernel, mingfangsen
In-Reply-To: <1562039976-203880-1-git-send-email-linmiaohe@huawei.com>

On Tue, Jul 02, 2019 at 03:59:36AM +0000, Miaohe Lin wrote:
> When firewalld is enabled with ipv4/ipv6 rpfilter, vrf
> ipv4/ipv6 packets will be dropped. Vrf device will pass
> through netfilter hook twice. One with enslaved device
> and another one with l3 master device. So in device may
> dismatch witch out device because out device is always
> enslaved device.So failed with the check of the rpfilter
> and drop the packets by mistake.

Applied to nf.git, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox