Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2-rc 2/8] rdma: Add "stat qp show" support
From: Stephen Hemminger @ 2019-07-16 19:01 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Leon Romanovsky, netdev, David Ahern, Mark Zhang,
	RDMA mailing list
In-Reply-To: <20190710072455.9125-3-leon@kernel.org>

On Wed, 10 Jul 2019 10:24:49 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> From: Mark Zhang <markz@mellanox.com>
> 
> This patch presents link, id, task name, lqpn, as well as all sub
> counters of a QP counter.
> A QP counter is a dynamically allocated statistic counter that is
> bound with one or more QPs. It has several sub-counters, each is
> used for a different purpose.
> 
> Examples:
> $ rdma stat qp show
> link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
> rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
> duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
> implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
> resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
> req_remote_access_errors 0 resp_remote_access_errors 0
> resp_cqe_flush_error 0 req_cqe_flush_error 0
>     LQPN: <178>
> $ rdma stat show link rocep1s0f5/1
> link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 duplicate_request 0
> rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
> req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0 resp_cqe_flush_error 0
> req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
> $ rdma stat show link rocep1s0f5/1 -p
> link rocep1s0f5/1
>     rx_write_requests 0
>     rx_read_requests 0
>     rx_atomic_requests 0
>     out_of_buffer 0
>     duplicate_request 0
>     rnr_nak_retry_err 0
>     packet_seq_err 0
>     implied_nak_seq_err 0
>     local_ack_timeout_err 0
>     resp_local_length_error 0
>     resp_cqe_error 0
>     req_cqe_error 0
>     req_remote_invalid_request 0
>     req_remote_access_errors 0
>     resp_remote_access_errors 0
>     resp_cqe_flush_error 0
>     req_cqe_flush_error 0
>     rp_cnp_ignored 0
>     rp_cnp_handled 0
>     np_ecn_marked_roce_packets 0
>     np_cnp_sent 0
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  rdma/Makefile |   2 +-
>  rdma/rdma.c   |   3 +-
>  rdma/rdma.h   |   1 +
>  rdma/stat.c   | 268 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  rdma/utils.c  |   7 ++
>  5 files changed, 279 insertions(+), 2 deletions(-)
>  create mode 100644 rdma/stat.c
> 

Headers have been merged, but this patch does not apply cleanly to current iproute2


^ permalink raw reply

* Re: [PATCH net v2] skbuff: fix compilation warnings in skb_dump()
From: Willem de Bruijn @ 2019-07-16 19:01 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Qian Cai, David Miller, Willem de Bruijn, joe, clang-built-linux,
	Network Development, LKML
In-Reply-To: <20190716165136.GC37903@archlinux-threadripper>

On Tue, Jul 16, 2019 at 6:53 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Tue, Jul 16, 2019 at 11:43:05AM -0400, Qian Cai wrote:
> > The commit 6413139dfc64 ("skbuff: increase verbosity when dumping skb
> > data") introduced a few compilation warnings.
> >
> > net/core/skbuff.c:766:32: warning: format specifies type 'unsigned
> > short' but the argument has type 'unsigned int' [-Wformat]
> >                        level, sk->sk_family, sk->sk_type,
> > sk->sk_protocol);
> >                                              ^~~~~~~~~~~
> > net/core/skbuff.c:766:45: warning: format specifies type 'unsigned
> > short' but the argument has type 'unsigned int' [-Wformat]
> >                        level, sk->sk_family, sk->sk_type,
> > sk->sk_protocol);
> > ^~~~~~~~~~~~~~~
> >
> > Fix them by using the proper types.
> >
> > Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data")
> > Signed-off-by: Qian Cai <cai@lca.pw>
>
> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks Qian.

^ permalink raw reply

* Re: [RFC PATCH 5/5] PTP: Add support for Intel PMC Timed GPIO Controller
From: Shannon Nelson @ 2019-07-16 19:14 UTC (permalink / raw)
  To: Felipe Balbi, Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190716072038.8408-6-felipe.balbi@linux.intel.com>

On 7/16/19 12:20 AM, Felipe Balbi wrote:
> Add a driver supporting Intel Timed GPIO controller available as part
> of some Intel PMCs.
>
> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

Hi Felipe, just a couple of quick comments:

There are several places where a line is continued on the next line, but 
should be indented to match the opening parenthesis on a function call 
or 'if' expression.

Shouldn't there be a kthread_stop() in intel_pmc_tgpio_remove(), or did 
I miss that somewhere?

Cheers,
sln


> ---
>   drivers/ptp/Kconfig               |   8 +
>   drivers/ptp/Makefile              |   1 +
>   drivers/ptp/ptp-intel-pmc-tgpio.c | 378 ++++++++++++++++++++++++++++++
>   3 files changed, 387 insertions(+)
>   create mode 100644 drivers/ptp/ptp-intel-pmc-tgpio.c
>
> diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
> index 9b8fee5178e8..bb0fce70a783 100644
> --- a/drivers/ptp/Kconfig
> +++ b/drivers/ptp/Kconfig
> @@ -107,6 +107,14 @@ config PTP_1588_CLOCK_PCH
>   	  To compile this driver as a module, choose M here: the module
>   	  will be called ptp_pch.
>   
> +config PTP_INTEL_PMC_TGPIO
> +	tristate "Intel PMC Timed GPIO"
> +	depends on X86
> +	depends on ACPI
> +	imply PTP_1588_CLOCK
> +	help
> +	  This driver adds support for Intel PMC Timed GPIO Controller
> +
>   config PTP_1588_CLOCK_KVM
>   	tristate "KVM virtual PTP clock"
>   	depends on PTP_1588_CLOCK
> diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
> index 677d1d178a3e..ff89c90ace82 100644
> --- a/drivers/ptp/Makefile
> +++ b/drivers/ptp/Makefile
> @@ -7,6 +7,7 @@ ptp-y					:= ptp_clock.o ptp_chardev.o ptp_sysfs.o
>   obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp.o
>   obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
>   obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)	+= ptp_ixp46x.o
> +obj-$(CONFIG_PTP_INTEL_PMC_TGPIO)	+= ptp-intel-pmc-tgpio.o
>   obj-$(CONFIG_PTP_1588_CLOCK_PCH)	+= ptp_pch.o
>   obj-$(CONFIG_PTP_1588_CLOCK_KVM)	+= ptp_kvm.o
>   obj-$(CONFIG_PTP_1588_CLOCK_QORIQ)	+= ptp-qoriq.o
> diff --git a/drivers/ptp/ptp-intel-pmc-tgpio.c b/drivers/ptp/ptp-intel-pmc-tgpio.c
> new file mode 100644
> index 000000000000..880ece34868a
> --- /dev/null
> +++ b/drivers/ptp/ptp-intel-pmc-tgpio.c
> @@ -0,0 +1,378 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Intel Timed GPIO Controller Driver
> + *
> + * Copyright (C) 2018 Intel Corporation
> + * Author: Felipe Balbi <felipe.balbi@linux.intel.com>
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/gpio.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/kthread.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/ptp_clock_kernel.h>
> +#include <asm/tsc.h>
> +
> +#define TGPIOCTL		0x00
> +#define TGPIOCOMPV31_0		0x10
> +#define TGPIOCOMPV63_32		0x14
> +#define TGPIOPIV31_0		0x18
> +#define TGPIOPIV63_32		0x1c
> +#define TGPIOTCV31_0		0x20
> +#define TGPIOTCV63_32		0x24
> +#define TGPIOECCV31_0		0x28
> +#define TGPIOECCV63_32		0x2c
> +#define TGPIOEC31_0		0x30
> +#define TGPIOEC63_32		0x34
> +
> +/* Control Register */
> +#define TGPIOCTL_EN		BIT(0)
> +#define TGPIOCTL_DIR		BIT(1)
> +#define TGPIOCTL_EP		GENMASK(3, 2)
> +#define TGPIOCTL_EP_RISING_EDGE	(0 << 2)
> +#define TGPIOCTL_EP_FALLING_EDGE (1 << 2)
> +#define TGPIOCTL_EP_TOGGLE_EDGE	(2 << 2)
> +#define TGPIOCTL_PM		BIT(4)
> +
> +#define NSECS_PER_SEC		1000000000
> +#define TGPIO_MAX_ADJ_TIME	999999900
> +
> +struct intel_pmc_tgpio {
> +	struct ptp_clock_info	info;
> +	struct ptp_clock	*clock;
> +
> +	struct mutex		lock;
> +	struct device		*dev;
> +	void __iomem		*base;
> +
> +	struct task_struct	*event_thread;
> +	bool			input;
> +};
> +#define to_intel_pmc_tgpio(i)	(container_of((i), struct intel_pmc_tgpio, info))
> +
> +static inline u64 to_intel_pmc_tgpio_time(struct ptp_clock_time *t)
> +{
> +	return t->sec * NSECS_PER_SEC + t->nsec;
> +}
> +
> +static inline u64 intel_pmc_tgpio_readq(void __iomem *base, u32 offset)
> +{
> +	return lo_hi_readq(base + offset);
> +}
> +
> +static inline void intel_pmc_tgpio_writeq(void __iomem *base, u32 offset, u64 v)
> +{
> +	return lo_hi_writeq(v, base + offset);
> +}
> +
> +static inline u32 intel_pmc_tgpio_readl(void __iomem *base, u32 offset)
> +{
> +	return readl(base + offset);
> +}
> +
> +static inline void intel_pmc_tgpio_writel(void __iomem *base, u32 offset, u32 value)
> +{
> +	writel(value, base + offset);
> +}
> +
> +static struct ptp_pin_desc intel_pmc_tgpio_pin_config[] = {
> +	{					\
> +		.name	= "pin0",		\
> +		.index	= 0,			\
> +		.func	= PTP_PF_NONE,		\
> +		.chan	= 0,			\
> +	}
> +};
> +
> +static int intel_pmc_tgpio_gettime64(struct ptp_clock_info *info,
> +		struct timespec64 *ts)
> +{
> +	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
> +	u64 now;
> +
> +	mutex_lock(&tgpio->lock);
> +	now = get_art_ns_now();
> +	*ts = ns_to_timespec64(now);
> +	mutex_unlock(&tgpio->lock);
> +
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_settime64(struct ptp_clock_info *info,
> +		const struct timespec64 *ts)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static int intel_pmc_tgpio_event_thread(void *_tgpio)
> +{
> +	struct intel_pmc_tgpio	*tgpio = _tgpio;
> +	u64 reg;
> +
> +	while (!kthread_should_stop()) {
> +		bool input;
> +		int i;
> +
> +		mutex_lock(&tgpio->lock);
> +		input = tgpio->input;
> +		mutex_unlock(&tgpio->lock);
> +
> +		if (!input)
> +			schedule();
> +
> +		reg = intel_pmc_tgpio_readq(tgpio->base, TGPIOEC31_0);
> +
> +		for (i = 0; i < reg; i++) {
> +			struct ptp_clock_event event;
> +
> +			event.type = PTP_CLOCK_EXTTS;
> +			event.index = 0;
> +			event.timestamp = intel_pmc_tgpio_readq(tgpio->base,
> +					TGPIOTCV31_0);
> +
> +			ptp_clock_event(tgpio->clock, &event);
> +		}
> +		schedule_timeout_interruptible(10);
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_config_input(struct intel_pmc_tgpio *tgpio,
> +		struct ptp_extts_request *extts, int on)
> +{
> +	u32			ctrl;
> +	bool			input;
> +
> +	ctrl = intel_pmc_tgpio_readl(tgpio->base, TGPIOCTL);
> +	ctrl &= ~TGPIOCTL_EN;
> +	intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
> +
> +	if (on) {
> +		ctrl |= TGPIOCTL_DIR;
> +
> +		if (extts->flags & PTP_RISING_EDGE &&
> +				extts->flags & PTP_FALLING_EDGE)
> +			ctrl |= TGPIOCTL_EP_TOGGLE_EDGE;
> +		else if (extts->flags & PTP_RISING_EDGE)
> +			ctrl |= TGPIOCTL_EP_RISING_EDGE;
> +		else if (extts->flags & PTP_FALLING_EDGE)
> +			ctrl |= TGPIOCTL_EP_FALLING_EDGE;
> +
> +		/* gotta program all other bits before EN bit is set */
> +		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
> +		ctrl |= TGPIOCTL_EN;
> +		input = true;
> +	} else {
> +		ctrl &= ~(TGPIOCTL_DIR | TGPIOCTL_EN);
> +		input = false;
> +	}
> +
> +	intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
> +	tgpio->input = input;
> +
> +	if (input)
> +		wake_up_process(tgpio->event_thread);
> +
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_config_output(struct intel_pmc_tgpio *tgpio,
> +		struct ptp_perout_request *perout, int on)
> +{
> +	u32			ctrl;
> +
> +	ctrl = intel_pmc_tgpio_readl(tgpio->base, TGPIOCTL);
> +	if (on) {
> +		struct ptp_clock_time *period = &perout->period;
> +		struct ptp_clock_time *start = &perout->start;
> +
> +		if (ctrl & TGPIOCTL_EN)
> +			return 0;
> +
> +		intel_pmc_tgpio_writeq(tgpio->base, TGPIOCOMPV31_0,
> +				to_intel_pmc_tgpio_time(start));
> +
> +		intel_pmc_tgpio_writeq(tgpio->base, TGPIOPIV31_0,
> +				to_intel_pmc_tgpio_time(period));
> +
> +		ctrl &= ~TGPIOCTL_DIR;
> +		if (perout->flags & PTP_PEROUT_ONE_SHOT)
> +			ctrl &= ~TGPIOCTL_PM;
> +		else
> +			ctrl |= TGPIOCTL_PM;
> +
> +		/* gotta program all other bits before EN bit is set */
> +		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
> +
> +		ctrl |= TGPIOCTL_EN;
> +		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
> +	} else {
> +		if (!(ctrl & ~TGPIOCTL_EN))
> +			return 0;
> +
> +		ctrl &= ~(TGPIOCTL_EN | TGPIOCTL_PM);
> +		intel_pmc_tgpio_writel(tgpio->base, TGPIOCTL, ctrl);
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_enable(struct ptp_clock_info *info,
> +		struct ptp_clock_request *req, int on)
> +{
> +	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
> +	int			ret = -EOPNOTSUPP;
> +
> +	mutex_lock(&tgpio->lock);
> +	switch (req->type) {
> +	case PTP_CLK_REQ_EXTTS:
> +		ret = intel_pmc_tgpio_config_input(tgpio, &req->extts, on);
> +		break;
> +	case PTP_CLK_REQ_PEROUT:
> +		ret = intel_pmc_tgpio_config_output(tgpio, &req->perout, on);
> +		break;
> +	default:
> +		break;
> +	}
> +	mutex_unlock(&tgpio->lock);
> +
> +	return ret;
> +}
> +
> +static int intel_pmc_tgpio_get_time_fn(ktime_t *device_time,
> +		struct system_counterval_t *system_counter, void *_tgpio)
> +{
> +	get_tsc_ns(system_counter, device_time);
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_getcrosststamp(struct ptp_clock_info *info,
> +		struct system_device_crosststamp *cts)
> +{
> +	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
> +
> +	return get_device_system_crosststamp(intel_pmc_tgpio_get_time_fn, tgpio,
> +			NULL, cts);
> +}
> +
> +static int intel_pmc_tgpio_counttstamp(struct ptp_clock_info *info,
> +		struct ptp_event_count_tstamp *count)
> +{
> +	struct intel_pmc_tgpio	*tgpio = to_intel_pmc_tgpio(info);
> +	u32 dt_hi_tmp;
> +	u32 dt_hi;
> +	u32 dt_lo;
> +
> +	dt_hi_tmp = intel_pmc_tgpio_readl(tgpio->base, TGPIOTCV63_32);
> +	dt_lo = intel_pmc_tgpio_readl(tgpio->base, TGPIOTCV31_0);
> +
> +	count->event_count = intel_pmc_tgpio_readl(tgpio->base, TGPIOECCV63_32);
> +	count->event_count <<= 32;
> +	count->event_count |= intel_pmc_tgpio_readl(tgpio->base, TGPIOECCV31_0);
> +
> +	dt_hi = intel_pmc_tgpio_readl(tgpio->base, TGPIOTCV63_32);
> +
> +	if (dt_hi_tmp != dt_hi && dt_lo & 0x80000000)
> +		count->device_time.sec = dt_hi_tmp;
> +	else
> +		count->device_time.sec = dt_hi;
> +
> +	count->device_time.nsec = dt_lo;
> +
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_verify(struct ptp_clock_info *ptp, unsigned int pin,
> +		enum ptp_pin_function func, unsigned int chan)
> +{
> +	return 0;
> +}
> +
> +static const struct ptp_clock_info intel_pmc_tgpio_info = {
> +	.owner		= THIS_MODULE,
> +	.name		= "Intel PMC TGPIO",
> +	.max_adj	= 50000000,
> +	.n_pins		= 1,
> +	.n_ext_ts	= 1,
> +	.n_per_out	= 1,
> +	.pin_config	= intel_pmc_tgpio_pin_config,
> +	.gettime64	= intel_pmc_tgpio_gettime64,
> +	.settime64	= intel_pmc_tgpio_settime64,
> +	.enable		= intel_pmc_tgpio_enable,
> +	.getcrosststamp	= intel_pmc_tgpio_getcrosststamp,
> +	.counttstamp	= intel_pmc_tgpio_counttstamp,
> +	.verify		= intel_pmc_tgpio_verify,
> +};
> +
> +static int intel_pmc_tgpio_probe(struct platform_device *pdev)
> +{
> +	struct intel_pmc_tgpio	*tgpio;
> +	struct device		*dev;
> +	struct resource		*res;
> +
> +	dev = &pdev->dev;
> +	tgpio = devm_kzalloc(dev, sizeof(*tgpio), GFP_KERNEL);
> +	if (!tgpio)
> +		return -ENOMEM;
> +
> +	tgpio->dev = dev;
> +	tgpio->info = intel_pmc_tgpio_info;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	tgpio->base = devm_ioremap_resource(dev, res);
> +	if (!tgpio->base)
> +		return -ENOMEM;
> +
> +	mutex_init(&tgpio->lock);
> +	platform_set_drvdata(pdev, tgpio);
> +
> +	tgpio->event_thread = kthread_create(intel_pmc_tgpio_event_thread,
> +			tgpio, dev_name(tgpio->dev));
> +	if (IS_ERR(tgpio->event_thread))
> +		return PTR_ERR(tgpio->event_thread);
> +
> +	tgpio->clock = ptp_clock_register(&tgpio->info, &pdev->dev);
> +	if (IS_ERR(tgpio->clock))
> +		return PTR_ERR(tgpio->clock);
> +
> +	wake_up_process(tgpio->event_thread);
> +
> +	return 0;
> +}
> +
> +static int intel_pmc_tgpio_remove(struct platform_device *pdev)
> +{
> +	struct intel_pmc_tgpio	*tgpio = platform_get_drvdata(pdev);
> +
> +	ptp_clock_unregister(tgpio->clock);
> +
> +	return 0;
> +}
> +
> +static const struct acpi_device_id intel_pmc_acpi_match[] = {
> +	/* TODO */
> +
> +	{  },
> +};
> +
> +/* MODULE_ALIAS("acpi*:TODO:*"); */
> +
> +static struct platform_driver intel_pmc_tgpio_driver = {
> +	.probe		= intel_pmc_tgpio_probe,
> +	.remove		= intel_pmc_tgpio_remove,
> +	.driver		= {
> +		.name	= "intel-pmc-tgpio",
> +		.acpi_match_table = ACPI_PTR(intel_pmc_acpi_match),
> +	},
> +};
> +
> +module_platform_driver(intel_pmc_tgpio_driver);
> +
> +MODULE_AUTHOR("Felipe Balbi <felipe.balbi@linux.intel.com>");
> +MODULE_LICENSE("GPL v2");
> +MODULE_DESCRIPTION("Intel PMC Timed GPIO Controller Driver");


^ permalink raw reply

* Re: [PATCH iproute2 0/2] Fix IPv6 tunnel add when dev param is used
From: Stephen Hemminger @ 2019-07-16 19:28 UTC (permalink / raw)
  To: Andrea Claudi; +Cc: netdev, dsahern
In-Reply-To: <cover.1562667648.git.aclaudi@redhat.com>

On Tue,  9 Jul 2019 15:16:49 +0200
Andrea Claudi <aclaudi@redhat.com> wrote:

> Commit ba126dcad20e6 ("ip6tunnel: fix 'ip -6 {show|change} dev
> <name>' cmds") breaks IPv6 tunnel creation when dev parameter
> is used.
> 
> This series revert the original commit, which mistakenly use
> dev for tunnel name, while addressing a issue on tunnel change
> when no interface name is specified.
> 
> Andrea Claudi (2):
>   Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
>   ip tunnel: warn when changing IPv6 tunnel without tunnel name
> 
>  ip/ip6tunnel.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 

Both applied, thanks

^ permalink raw reply

* Re: [PATCH] net: ethernet: ti: cpsw: Add of_node_put() before return and break
From: David Miller @ 2019-07-16 19:37 UTC (permalink / raw)
  To: nishkadg.linux; +Cc: grygorii.strashko, ivan.khoronzhuk, linux-omap, netdev
In-Reply-To: <20190716054843.2957-1-nishkadg.linux@gmail.com>

From: Nishka Dasgupta <nishkadg.linux@gmail.com>
Date: Tue, 16 Jul 2019 11:18:43 +0530

> Each iteration of for_each_available_child_of_node puts the previous
> node, but in the case of a return or break from the middle of the loop,
> there is no put, thus causing a memory leak.

What an incredible terribly designed loop macro, this
for_each_available_child_of_node () thing is.

A macro with non-trivial, invisible, side effects.  It requires
special handling of reference counting of objects if the loop is
terminated early.

This is so error prone.  Is it any wonder we have to go through the
entire tree fixing up nearly every use of this thing?

Instead of looking at the automated analysis of this and saying "great
here are all of these places where I can fix bugs", I would instead
appreicate it if the reaction was more like "this interface is
obviously impossible to use in a non-error-prone fashion, we should
fix it."

I guess I have no choice but to apply your fixes, but the larger issue
must be addressed instead.

^ permalink raw reply

* Re: [PATCH ghak90 V6 02/10] audit: add container id
From: Richard Guy Briggs @ 2019-07-16 19:38 UTC (permalink / raw)
  To: Paul Moore
  Cc: Tycho Andersen, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, netfilter-devel, sgrubb, omosnace,
	dhowells, simo, Eric Paris, Serge Hallyn, ebiederm, nhorman
In-Reply-To: <CAHC9VhRFeCFSCn=m6wgDK2tXBN1euc2+bw8o=CfNwptk8t=j7A@mail.gmail.com>

On 2019-07-15 16:38, Paul Moore wrote:
> On Mon, Jul 8, 2019 at 1:51 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2019-05-29 11:29, Paul Moore wrote:
> 
> ...
> 
> > > The idea is that only container orchestrators should be able to
> > > set/modify the audit container ID, and since setting the audit
> > > container ID can have a significant effect on the records captured
> > > (and their routing to multiple daemons when we get there) modifying
> > > the audit container ID is akin to modifying the audit configuration
> > > which is why it is gated by CAP_AUDIT_CONTROL.  The current thinking
> > > is that you would only change the audit container ID from one
> > > set/inherited value to another if you were nesting containers, in
> > > which case the nested container orchestrator would need to be granted
> > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > compromise).  We did consider allowing for a chain of nested audit
> > > container IDs, but the implications of doing so are significant
> > > (implementation mess, runtime cost, etc.) so we are leaving that out
> > > of this effort.
> >
> > We had previously discussed the idea of restricting
> > orchestrators/engines from only being able to set the audit container
> > identifier on their own descendants, but it was discarded.  I've added a
> > check to ensure this is now enforced.
> 
> When we weren't allowing nested orchestrators it wasn't necessary, but
> with the move to support nesting I believe this will be a requirement.
> We might also need/want to restrict audit container ID changes if a
> descendant is acting as a container orchestrator and managing one or
> more audit container IDs; although I'm less certain of the need for
> this.

I was of the opinion it was necessary before with single-layer parallel
orchestrators/engines.

> > I've also added a check to ensure that a process can't set its own audit
> > container identifier ...
> 
> What does this protect against, or what problem does this solve?
> Considering how easy it is to fork/exec, it seems like this could be
> trivially bypassed.

Well, for starters, it would remove one layer of nesting.  It would
separate the functional layers of processes.  Other than that, it seems
like a gut feeling that it is just wrong to allow it.  It seems like a
layer violation that one container orchestrator/engine could set its own
audit container identifier and then set its children as well.  It would
be its own parent.  It would make it harder to verify adherance to
descendancy and inheritance rules.

> > ... and that if the identifier is already set, then the
> > orchestrator/engine must be in a descendant user namespace from the
> > orchestrator that set the previously inherited audit container
> > identifier.
> 
> You lost me here ... although I don't like the idea of relying on X
> namespace inheritance for a hard coded policy on setting the audit
> container ID; we've worked hard to keep this independent of any
> definition of a "container" and it would sadden me greatly if we had
> to go back on that.

This would seem to be the one concession I'm reluctantly making to try
to solve this nested container orchestrator/engine challenge.

Would backing off on that descendant user namespace requirement and only
require that a nested audit container identifier only be permitted on a
descendant task be sufficient?  It may for this use case, but I suspect
not for additional audit daemons (we're not there yet) and message
routing to those daemons.

The one difference here is that it does not depend on this if the audit
container identifier has not already been set.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* [PATCH bpf 1/2] selftests/bpf: fix test_verifier/test_maps make dependencies
From: Andrii Nakryiko @ 2019-07-16 19:38 UTC (permalink / raw)
  To: bpf, netdev, daniel, ast
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Ilya Leoshkevich,
	Stanislav Fomichev, Martin KaFai Lau

e46fc22e60a4 ("selftests/bpf: make directory prerequisites order-only")
exposed existing problem in Makefile for test_verifier and test_maps tests:
their dependency on auto-generated header file with a list of all tests wasn't
recorded explicitly. This patch fixes these issues.

Fixes: 51a0e301a563 ("bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps")
Fixes: 6b7b6995c43e ("selftests: bpf: tests.h should depend on .c files, not the output")
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1296253b3422..9bc68d8abc5f 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -86,8 +86,6 @@ $(OUTPUT)/urandom_read: $(OUTPUT)/%: %.c
 $(OUTPUT)/test_stub.o: test_stub.c
 	$(CC) $(TEST_PROGS_CFLAGS) $(CFLAGS) -c -o $@ $<
 
-$(OUTPUT)/test_maps: map_tests/*.c
-
 BPFOBJ := $(OUTPUT)/libbpf.a
 
 $(TEST_GEN_PROGS): $(OUTPUT)/test_stub.o $(BPFOBJ)
@@ -257,9 +255,10 @@ MAP_TESTS_DIR = $(OUTPUT)/map_tests
 $(MAP_TESTS_DIR):
 	mkdir -p $@
 MAP_TESTS_H := $(MAP_TESTS_DIR)/tests.h
+MAP_TESTS_FILES := $(wildcard map_tests/*.c)
 test_maps.c: $(MAP_TESTS_H)
 $(OUTPUT)/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
-MAP_TESTS_FILES := $(wildcard map_tests/*.c)
+$(OUTPUT)/test_maps: test_maps.c $(MAP_TESTS_H) $(MAP_TESTS_FILES)
 $(MAP_TESTS_H): $(MAP_TESTS_FILES) | $(MAP_TESTS_DIR)
 	$(shell ( cd map_tests/; \
 		  echo '/* Generated header, do not edit */'; \
@@ -276,6 +275,7 @@ $(MAP_TESTS_H): $(MAP_TESTS_FILES) | $(MAP_TESTS_DIR)
 VERIFIER_TESTS_H := $(OUTPUT)/verifier/tests.h
 test_verifier.c: $(VERIFIER_TESTS_H)
 $(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
+$(OUTPUT)/test_verifier: test_verifier.c $(VERIFIER_TESTS_H)
 
 VERIFIER_TESTS_DIR = $(OUTPUT)/verifier
 $(VERIFIER_TESTS_DIR):
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 2/2] selftests/bpf: structure test_{progs,maps,verifier} test runners uniformly
From: Andrii Nakryiko @ 2019-07-16 19:38 UTC (permalink / raw)
  To: bpf, netdev, daniel, ast; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190716193837.2808971-1-andriin@fb.com>

It's easier to follow the logic if it's structured the same.
There is just slight difference between test_progs/test_maps and
test_verifier. test_verifier's verifier/*.c files are not really compilable
C files (they are more of include headers), so they can't be specified as
explicit dependencies of test_verifier.

Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/Makefile | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 9bc68d8abc5f..11c9c62c3362 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -176,6 +176,7 @@ endif
 endif
 
 TEST_PROGS_CFLAGS := -I. -I$(OUTPUT)
+TEST_MAPS_CFLAGS := -I. -I$(OUTPUT)
 TEST_VERIFIER_CFLAGS := -I. -I$(OUTPUT) -Iverifier
 
 ifneq ($(SUBREG_CODEGEN),)
@@ -227,16 +228,14 @@ ifeq ($(DWARF2BTF),y)
 	$(BTF_PAHOLE) -J $@
 endif
 
-PROG_TESTS_H := $(OUTPUT)/prog_tests/tests.h
-test_progs.c: $(PROG_TESTS_H)
-$(OUTPUT)/test_progs: CFLAGS += $(TEST_PROGS_CFLAGS)
-$(OUTPUT)/test_progs: prog_tests/*.c
-
 PROG_TESTS_DIR = $(OUTPUT)/prog_tests
 $(PROG_TESTS_DIR):
 	mkdir -p $@
-
+PROG_TESTS_H := $(PROG_TESTS_DIR)/tests.h
 PROG_TESTS_FILES := $(wildcard prog_tests/*.c)
+test_progs.c: $(PROG_TESTS_H)
+$(OUTPUT)/test_progs: CFLAGS += $(TEST_PROGS_CFLAGS)
+$(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_H) $(PROG_TESTS_FILES)
 $(PROG_TESTS_H): $(PROG_TESTS_FILES) | $(PROG_TESTS_DIR)
 	$(shell ( cd prog_tests/; \
 		  echo '/* Generated header, do not edit */'; \
@@ -250,7 +249,6 @@ $(PROG_TESTS_H): $(PROG_TESTS_FILES) | $(PROG_TESTS_DIR)
 		  echo '#endif' \
 		 ) > $(PROG_TESTS_H))
 
-TEST_MAPS_CFLAGS := -I. -I$(OUTPUT)
 MAP_TESTS_DIR = $(OUTPUT)/map_tests
 $(MAP_TESTS_DIR):
 	mkdir -p $@
@@ -272,17 +270,15 @@ $(MAP_TESTS_H): $(MAP_TESTS_FILES) | $(MAP_TESTS_DIR)
 		  echo '#endif' \
 		 ) > $(MAP_TESTS_H))
 
-VERIFIER_TESTS_H := $(OUTPUT)/verifier/tests.h
-test_verifier.c: $(VERIFIER_TESTS_H)
-$(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
-$(OUTPUT)/test_verifier: test_verifier.c $(VERIFIER_TESTS_H)
-
 VERIFIER_TESTS_DIR = $(OUTPUT)/verifier
 $(VERIFIER_TESTS_DIR):
 	mkdir -p $@
-
+VERIFIER_TESTS_H := $(VERIFIER_TESTS_DIR)/tests.h
 VERIFIER_TEST_FILES := $(wildcard verifier/*.c)
-$(OUTPUT)/verifier/tests.h: $(VERIFIER_TEST_FILES) | $(VERIFIER_TESTS_DIR)
+test_verifier.c: $(VERIFIER_TESTS_H)
+$(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
+$(OUTPUT)/test_verifier: test_verifier.c $(VERIFIER_TESTS_H)
+$(VERIFIER_TESTS_H): $(VERIFIER_TEST_FILES) | $(VERIFIER_TESTS_DIR)
 	$(shell ( cd verifier/; \
 		  echo '/* Generated header, do not edit */'; \
 		  echo '#ifdef FILL_ARRAY'; \
-- 
2.17.1


^ permalink raw reply related

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Jiong Wang @ 2019-07-16 19:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiong Wang, Andrii Nakryiko, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers, Yonghong Song
In-Reply-To: <20190716161701.mk5ye47aj2slkdjp@ast-mbp.dhcp.thefacebook.com>


Alexei Starovoitov writes:

> On Tue, Jul 16, 2019 at 09:50:25AM +0100, Jiong Wang wrote:
>> 
>> Let me digest a little bit and do some coding, then I will come back. Some
>> issues can only shown up during in-depth coding. I kind of feel handling
>> aux reference in verifier layer is the part that will still introduce some
>> un-clean code.
>
> I'm still internalizing this discussion. Only want to point out
> that I think it's better to have simpler algorithm that consumes more
> memory and slower than more complex algorithm that is more cpu/memory efficient.
> Here we're aiming at 10x improvement anyway, so extra cpu and memory
> here and there are good trade-off to make.
>
>> >> If there is no dead insn elimination opt, then we could just adjust
>> >> offsets. When there is insn deleting, I feel the logic becomes more
>> >> complex. One subprog could be completely deleted or partially deleted, so
>> >> I feel just recalculate the whole subprog info as a side-product is
>> >> much simpler.
>> >
>> > What's the situation where entirety of subprog can be deleted?
>> 
>> Suppose you have conditional jmp_imm, true path calls one subprog, false
>> path calls the other. If insn walker later found it is also true, then the
>> subprog at false path won't be marked as "seen", so it is entirely deleted.
>> 
>> I actually thought it is in theory one subprog could be deleted entirely,
>> so if we support insn deletion inside verifier, then range info like
>> line_info/subprog_info needs to consider one range is deleted.
>
> I don't think dead code elim can remove subprogs.
> cfg check rejects code with dead progs.

cfg check rejects unreachable code based on static analysis while one
subprog passed cfg check could be identified as dead later after runtime
value tracking, after check_cond_jmp_op pruning subprog call in false
path and making the subprog dead?

For example:

  static subprog1()
  static subprog2()
  
  foo(int mask)
  {
    if (mask & 0x1)
      subprog1();
    else
      subprog2();
    ...
  }

foo's incoming arg is a mask, and depending on whether the LSB is set, it
calls different init functions, subprog1 or subprog2.

foo might be called with a constant as mask, for example 0x8000. Then if
foo is not called by someone else, subprog1 is dead if there is no other
caller of it.

LLVM is smart enough to optimize out such dead functions if they are only
visible in the same compilation unit, and people might only write code in
such shape when they are encapsulated in a lib. but if case like above is
true, I think it is possible one subprog could be deleted by verifier
entirely.

> I don't think we have a test for such 'dead prog only due to verifier walk'
> situation. I wonder what happens :)


^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: make directory prerequisites order-only
From: Andrii Nakryiko @ 2019-07-16 19:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Ilya Leoshkevich, bpf, Network Development, gor,
	Heiko Carstens
In-Reply-To: <CAADnVQKzZQ_mbaMHEU6HA-JEy=1jXvBWULg8yKQY_2zwSmU86g@mail.gmail.com>

On Tue, Jul 16, 2019 at 10:49 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 15, 2019 at 3:22 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 7/12/19 3:56 PM, Ilya Leoshkevich wrote:
> > > When directories are used as prerequisites in Makefiles, they can cause
> > > a lot of unnecessary rebuilds, because a directory is considered changed
> > > whenever a file in this directory is added, removed or modified.
> > >
> > > If the only thing a target is interested in is the existence of the
> > > directory it depends on, which is the case for selftests/bpf, this
> > > directory should be specified as an order-only prerequisite: it would
> > > still be created in case it does not exist, but it would not trigger a
> > > rebuild of a target in case it's considered changed.
> > >
> > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >
> > Applied, thanks!
>
> Hi Ilya,
>
> this commit breaks map_tests.

This change just exposed existing problem with Makefile. Sent out fix.

> To reproduce:
> rm map_tests/tests.h
> make
> tests.h will not be regenerated.
> Please provide a fix asap.
> We cannot ship bpf tree with such failure.

^ permalink raw reply

* Re: [PATCH net] be2net: Signal that the device cannot transmit during reconfiguration
From: David Miller @ 2019-07-16 19:41 UTC (permalink / raw)
  To: bpoirier
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	fyang, saeedm, netdev
In-Reply-To: <20190716081655.7676-1-bpoirier@suse.com>

From: Benjamin Poirier <bpoirier@suse.com>
Date: Tue, 16 Jul 2019 17:16:55 +0900

> While changing the number of interrupt channels, be2net stops adapter
> operation (including netif_tx_disable()) but it doesn't signal that it
> cannot transmit. This may lead dev_watchdog() to falsely trigger during
> that time.
> 
> Add the missing call to netif_carrier_off(), following the pattern used in
> many other drivers. netif_carrier_on() is already taken care of in
> be_open().
> 
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Applied.

^ permalink raw reply

* [PATCH net-next v1] fix: taprio: Change type of txtime-delay parameter to u32
From: Vedang Patel @ 2019-07-16 19:52 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, stephen,
	Vedang Patel

During the review of the iproute2 patches for txtime-assist mode, it was
pointed out that it does not make sense for the txtime-delay parameter to
be negative. So, change the type of the parameter from s32 to u32.

Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 include/uapi/linux/pkt_sched.h | 2 +-
 net/sched/sch_taprio.c         | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 1f623252abe8..18f185299f47 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1174,7 +1174,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
-	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* s32 */
+	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 388750ddc57a..c39db507ba3f 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -75,7 +75,7 @@ struct taprio_sched {
 	struct sched_gate_list __rcu *admin_sched;
 	struct hrtimer advance_timer;
 	struct list_head taprio_list;
-	int txtime_delay;
+	u32 txtime_delay;
 };
 
 static ktime_t sched_base_time(const struct sched_gate_list *sched)
@@ -1113,7 +1113,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 			goto unlock;
 		}
 
-		q->txtime_delay = nla_get_s32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
+		q->txtime_delay = nla_get_u32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
 	}
 
 	if (!TXTIME_ASSIST_IS_ENABLED(taprio_flags) &&
@@ -1430,7 +1430,7 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		goto options_error;
 
 	if (q->txtime_delay &&
-	    nla_put_s32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
+	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
 		goto options_error;
 
 	if (oper && dump_schedule(skb, oper))
-- 
2.7.3


^ permalink raw reply related

* [PATCH iproute2 net-next v4 1/6] Update kernel headers
From: Vedang Patel @ 2019-07-16 19:53 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, stephen, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern,
	Vedang Patel

The type for txtime-delay parameter will change from s32 to u32. So,
make the corresponding change in the ABI file as well.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 include/uapi/linux/pkt_sched.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 1f623252abe8..18f185299f47 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1174,7 +1174,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
-	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* s32 */
+	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
-- 
2.7.3


^ permalink raw reply related

* [PATCH iproute2 net-next v4 2/6] etf: Add skip_sock_check
From: Vedang Patel @ 2019-07-16 19:53 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, stephen, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern,
	Vedang Patel
In-Reply-To: <1563306789-2908-1-git-send-email-vedang.patel@intel.com>

ETF Qdisc currently checks for a socket with SO_TXTIME socket option. If
either is not present, the packet is dropped. In the future commits, we
want other Qdiscs to add packet with launchtime to the ETF Qdisc. Also,
there are some packets (e.g. ICMP packets) which may not have a socket
associated with them.  So, add an option to skip this check.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 tc/q_etf.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tc/q_etf.c b/tc/q_etf.c
index 76aca476c61d..c2090589bc64 100644
--- a/tc/q_etf.c
+++ b/tc/q_etf.c
@@ -130,6 +130,13 @@ static int etf_parse_opt(struct qdisc_util *qu, int argc,
 				explain_clockid(*argv);
 				return -1;
 			}
+		} else if (strcmp(*argv, "skip_sock_check") == 0) {
+			if (opt.flags & TC_ETF_SKIP_SOCK_CHECK) {
+				fprintf(stderr, "etf: duplicate \"skip_sock_check\" specification\n");
+				return -1;
+			}
+
+			opt.flags |= TC_ETF_SKIP_SOCK_CHECK;
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -171,8 +178,10 @@ static int etf_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	print_uint(PRINT_ANY, "delta", "delta %d ", qopt->delta);
 	print_string(PRINT_ANY, "offload", "offload %s ",
 				(qopt->flags & TC_ETF_OFFLOAD_ON) ? "on" : "off");
-	print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s",
+	print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s ",
 				(qopt->flags & TC_ETF_DEADLINE_MODE_ON) ? "on" : "off");
+	print_string(PRINT_ANY, "skip_sock_check", "skip_sock_check %s",
+				(qopt->flags & TC_ETF_SKIP_SOCK_CHECK) ? "on" : "off");
 
 	return 0;
 }
-- 
2.7.3


^ permalink raw reply related

* [PATCH iproute2 net-next v4 5/6] tc: etf: Add documentation for skip-skb-check.
From: Vedang Patel @ 2019-07-16 19:53 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, stephen, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern,
	Vedang Patel
In-Reply-To: <1563306789-2908-1-git-send-email-vedang.patel@intel.com>

Document the newly added option (skip-skb-check) on the etf man-page.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 man/man8/tc-etf.8 | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/man/man8/tc-etf.8 b/man/man8/tc-etf.8
index 30a12de7d2c7..2e01a591dbaa 100644
--- a/man/man8/tc-etf.8
+++ b/man/man8/tc-etf.8
@@ -106,6 +106,16 @@ referred to as "Launch Time" or "Time-Based Scheduling" by the
 documentation of network interface controllers.
 The default is for this option to be disabled.
 
+.TP
+skip_skb_check
+.br
+.BR etf(8)
+currently drops any packet which does not have a socket associated with it or
+if the socket does not have SO_TXTIME socket option set. But, this will not
+work if the launchtime is set by another entity inside the kernel (e.g. some
+other Qdisc). Setting the skip_skb_check will skip checking for a socket
+associated with the packet.
+
 .SH EXAMPLES
 
 ETF is used to enforce a Quality of Service. It controls when each
-- 
2.7.3


^ permalink raw reply related

* [PATCH iproute2 net-next v4 4/6] taprio: add support for setting txtime_delay.
From: Vedang Patel @ 2019-07-16 19:53 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, stephen, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern,
	Vedang Patel
In-Reply-To: <1563306789-2908-1-git-send-email-vedang.patel@intel.com>

This adds support for setting the txtime_delay parameter which is useful
for the txtime offload mode of taprio.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 tc/q_taprio.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/tc/q_taprio.c b/tc/q_taprio.c
index 1db2aba6efe7..b9954436b0f9 100644
--- a/tc/q_taprio.c
+++ b/tc/q_taprio.c
@@ -52,7 +52,7 @@ static void explain(void)
 		"		[num_tc NUMBER] [map P0 P1 ...] "
 		"		[queues COUNT@OFFSET COUNT@OFFSET COUNT@OFFSET ...] "
 		"		[ [sched-entry index cmd gate-mask interval] ... ] "
-		"		[base-time time] "
+		"		[base-time time] [txtime-delay delay]"
 		"\n"
 		"CLOCKID must be a valid SYS-V id (i.e. CLOCK_TAI)\n");
 }
@@ -160,6 +160,7 @@ static int taprio_parse_opt(struct qdisc_util *qu, int argc,
 	struct list_head sched_entries;
 	struct rtattr *tail, *l;
 	__u32 taprio_flags = 0;
+	__u32 txtime_delay = 0;
 	__s64 cycle_time = 0;
 	__s64 base_time = 0;
 	int err, idx;
@@ -293,6 +294,17 @@ static int taprio_parse_opt(struct qdisc_util *qu, int argc,
 				return -1;
 			}
 
+		} else if (strcmp(*argv, "txtime-delay") == 0) {
+			NEXT_ARG();
+			if (txtime_delay != 0) {
+				fprintf(stderr, "taprio: duplicate \"txtime-delay\" specification\n");
+				return -1;
+			}
+			if (get_u32(&txtime_delay, *argv, 0)) {
+				PREV_ARG();
+				return -1;
+			}
+
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -315,6 +327,9 @@ static int taprio_parse_opt(struct qdisc_util *qu, int argc,
 	if (opt.num_tc > 0)
 		addattr_l(n, 1024, TCA_TAPRIO_ATTR_PRIOMAP, &opt, sizeof(opt));
 
+	if (txtime_delay)
+		addattr_l(n, 1024, TCA_TAPRIO_ATTR_TXTIME_DELAY, &txtime_delay, sizeof(txtime_delay));
+
 	if (base_time)
 		addattr_l(n, 1024, TCA_TAPRIO_ATTR_SCHED_BASE_TIME, &base_time, sizeof(base_time));
 
@@ -464,6 +479,13 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 		print_0xhex(PRINT_ANY, "flags", " flags %#x", flags);
 	}
 
+	if (tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]) {
+		__u32 txtime_delay;
+
+		txtime_delay = rta_getattr_s32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
+		print_uint(PRINT_ANY, "txtime_delay", " txtime delay %d", txtime_delay);
+	}
+
 	print_schedule(f, tb);
 
 	if (tb[TCA_TAPRIO_ATTR_ADMIN_SCHED]) {
-- 
2.7.3


^ permalink raw reply related

* [PATCH iproute2 net-next v4 6/6] tc: taprio: Update documentation
From: Vedang Patel @ 2019-07-16 19:53 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, stephen, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern,
	Vedang Patel
In-Reply-To: <1563306789-2908-1-git-send-email-vedang.patel@intel.com>

Add documentation for the latest options, flags and txtime-delay, to the
taprio manpage.

This also adds an example to run tc in txtime offload mode.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 man/man8/tc-taprio.8 | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/man/man8/tc-taprio.8 b/man/man8/tc-taprio.8
index 850be9b03649..e1d19ba19089 100644
--- a/man/man8/tc-taprio.8
+++ b/man/man8/tc-taprio.8
@@ -112,6 +112,26 @@ means that traffic class 0 is "active" for that schedule entry.
 long that state defined by <command> and <gate mask> should be held
 before moving to the next entry.
 
+.TP
+flags
+.br
+Specifies different modes for taprio. Currently, only txtime-assist is
+supported which can be enabled by setting it to 0x1. In this mode, taprio will
+set the transmit timestamp depending on the interval in which the packet needs
+to be transmitted. It will then utililize the
+.BR etf(8)
+qdisc to sort and transmit the packets at the right time. The second example
+can be used as a reference to configure this mode.
+
+.TP
+txtime-delay
+.br
+This parameter is specific to the txtime offload mode. It specifies the maximum
+time a packet might take to reach the network card from the taprio qdisc. The
+value should always be greater than the delta specified in the
+.BR etf(8)
+qdisc.
+
 .SH EXAMPLES
 
 The following example shows how an traffic schedule with three traffic
@@ -137,6 +157,26 @@ reference CLOCK_TAI. The schedule is composed of three entries each of
               clockid CLOCK_TAI
 .EE
 
+Following is an example to enable the txtime offload mode in taprio. See
+.BR etf(8)
+for more information about configuring the ETF qdisc.
+
+.EX
+# tc qdisc replace dev eth0 parent root handle 100 taprio \\
+              num_tc 3 \\
+              map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
+              queues 1@0 1@0 1@0 \\
+              base-time 1528743495910289987 \\
+              sched-entry S 01 300000 \\
+              sched-entry S 02 300000 \\
+              sched-entry S 04 400000 \\
+              flags 0x1 \\
+              txtime-delay 200000 \\
+              clockid CLOCK_TAI
+
+# tc qdisc replace dev $IFACE parent 100:1 etf skip_skb_check \\
+              offload delta 200000 clockid CLOCK_TAI
+.EE
 
 .SH AUTHORS
 Vinicius Costa Gomes <vinicius.gomes@intel.com>
-- 
2.7.3


^ permalink raw reply related

* [PATCH iproute2 net-next v4 3/6] taprio: Add support for setting flags
From: Vedang Patel @ 2019-07-16 19:53 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, stephen, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern,
	Vedang Patel
In-Reply-To: <1563306789-2908-1-git-send-email-vedang.patel@intel.com>

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

This allows a new parameter, flags, to be passed to taprio. Currently, it
only supports enabling the txtime-assist mode. But, we plan to add
different modes for taprio (e.g. hardware offloading) and this parameter
will be useful in enabling those modes.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 tc/q_taprio.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tc/q_taprio.c b/tc/q_taprio.c
index 62c8c591da99..1db2aba6efe7 100644
--- a/tc/q_taprio.c
+++ b/tc/q_taprio.c
@@ -159,6 +159,7 @@ static int taprio_parse_opt(struct qdisc_util *qu, int argc,
 	__s64 cycle_time_extension = 0;
 	struct list_head sched_entries;
 	struct rtattr *tail, *l;
+	__u32 taprio_flags = 0;
 	__s64 cycle_time = 0;
 	__s64 base_time = 0;
 	int err, idx;
@@ -281,6 +282,17 @@ static int taprio_parse_opt(struct qdisc_util *qu, int argc,
 				explain_clockid(*argv);
 				return -1;
 			}
+		} else if (strcmp(*argv, "flags") == 0) {
+			NEXT_ARG();
+			if (taprio_flags) {
+				fprintf(stderr, "taprio: duplicate \"flags\" specification\n");
+				return -1;
+			}
+			if (get_u32(&taprio_flags, *argv, 0)) {
+				PREV_ARG();
+				return -1;
+			}
+
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -297,6 +309,9 @@ static int taprio_parse_opt(struct qdisc_util *qu, int argc,
 	if (clockid != CLOCKID_INVALID)
 		addattr_l(n, 1024, TCA_TAPRIO_ATTR_SCHED_CLOCKID, &clockid, sizeof(clockid));
 
+	if (taprio_flags)
+		addattr_l(n, 1024, TCA_TAPRIO_ATTR_FLAGS, &taprio_flags, sizeof(taprio_flags));
+
 	if (opt.num_tc > 0)
 		addattr_l(n, 1024, TCA_TAPRIO_ATTR_PRIOMAP, &opt, sizeof(opt));
 
@@ -442,6 +457,13 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 
 	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
 
+	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
+		__u32 flags;
+
+		flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
+		print_0xhex(PRINT_ANY, "flags", " flags %#x", flags);
+	}
+
 	print_schedule(f, tb);
 
 	if (tb[TCA_TAPRIO_ATTR_ADMIN_SCHED]) {
-- 
2.7.3


^ permalink raw reply related

* Re: [PATCH bpf 1/2] selftests/bpf: fix test_verifier/test_maps make dependencies
From: Stanislav Fomichev @ 2019-07-16 19:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, daniel, ast, andrii.nakryiko, kernel-team,
	Ilya Leoshkevich, Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <20190716193837.2808971-1-andriin@fb.com>

On 07/16, Andrii Nakryiko wrote:
> e46fc22e60a4 ("selftests/bpf: make directory prerequisites order-only")
> exposed existing problem in Makefile for test_verifier and test_maps tests:
> their dependency on auto-generated header file with a list of all tests wasn't
> recorded explicitly. This patch fixes these issues.
Why adding it explicitly fixes it? At least for test_verifier, we have
the following rule:

	test_verifier.c: $(VERIFIER_TESTS_H)

And there should be implicit/builtin test_verifier -> test_verifier.c
dependency rule.

Same for maps, I guess:

	$(OUTPUT)/test_maps: map_tests/*.c
	test_maps.c: $(MAP_TESTS_H)

So why is it not working as is? What I'm I missing?

^ permalink raw reply

* Re: [PATCH net-next v1] fix: taprio: Change type of txtime-delay parameter to u32
From: Patel, Vedang @ 2019-07-16 20:15 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Kirsher, Jeffrey T, David Miller, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, intel-wired-lan@lists.osuosl.org, Gomes, Vinicius,
	l@dorileo.org, Jakub Kicinski, Murali Karicheri, Sergei Shtylyov,
	Eric Dumazet, Brown, Aaron F, Stephen Hemminger
In-Reply-To: <1563306738-2779-1-git-send-email-vedang.patel@intel.com>

I request that this patch should also be considered for the net tree since it fixes the data type of of the txtime_delay parameter and should go in with the iproute2 patches which implement support for txtime-assist mode. 

Thanks,
Vedang Patel

> On Jul 16, 2019, at 12:52 PM, Patel, Vedang <vedang.patel@intel.com> wrote:
> 
> During the review of the iproute2 patches for txtime-assist mode, it was
> pointed out that it does not make sense for the txtime-delay parameter to
> be negative. So, change the type of the parameter from s32 to u32.
> 
> Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
> Reported-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Vedang Patel <vedang.patel@intel.com>
> ---
> include/uapi/linux/pkt_sched.h | 2 +-
> net/sched/sch_taprio.c         | 6 +++---
> 2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 1f623252abe8..18f185299f47 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -1174,7 +1174,7 @@ enum {
> 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
> 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
> 	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
> -	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* s32 */
> +	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
> 	__TCA_TAPRIO_ATTR_MAX,
> };
> 
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 388750ddc57a..c39db507ba3f 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -75,7 +75,7 @@ struct taprio_sched {
> 	struct sched_gate_list __rcu *admin_sched;
> 	struct hrtimer advance_timer;
> 	struct list_head taprio_list;
> -	int txtime_delay;
> +	u32 txtime_delay;
> };
> 
> static ktime_t sched_base_time(const struct sched_gate_list *sched)
> @@ -1113,7 +1113,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
> 			goto unlock;
> 		}
> 
> -		q->txtime_delay = nla_get_s32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
> +		q->txtime_delay = nla_get_u32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
> 	}
> 
> 	if (!TXTIME_ASSIST_IS_ENABLED(taprio_flags) &&
> @@ -1430,7 +1430,7 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
> 		goto options_error;
> 
> 	if (q->txtime_delay &&
> -	    nla_put_s32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
> +	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
> 		goto options_error;
> 
> 	if (oper && dump_schedule(skb, oper))
> -- 
> 2.7.3
> 


^ permalink raw reply

* Re: [PATCH V35 23/29] bpf: Restrict bpf when kernel lockdown is in confidentiality mode
From: Matthew Garrett @ 2019-07-16 20:32 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: James Morris, LSM List, Linux Kernel Mailing List, Linux API,
	David Howells, Alexei Starovoitov, Network Development,
	Chun-Yi Lee
In-Reply-To: <5d363f09-d649-5693-45c0-bb99d69f0f38@iogearbox.net>

On Mon, Jul 15, 2019 at 3:54 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> Hmm, does security_locked_down() ever return a code > 0 or why do you
> have the double check on return code? If not, then for clarity the
> ret code from security_locked_down() should be checked as 'ret < 0'
> as well and out label should be at the memset directly instead.

It doesn't, so I'll update. Thanks!

^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Alexei Starovoitov @ 2019-07-16 20:54 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Adrian Ratiu, Alexei Starovoitov, bpf,
	Brendan Gregg, connoro, Daniel Borkmann, duyuchao, Ingo Molnar,
	jeffv, Karim Yaghmour, kernel-team, linux-kselftest,
	Manali Shukla, Manjo Raja Rao, Martin KaFai Lau, Masami Hiramatsu,
	Matt Mullins, Michal Gregorczyk, Michal Gregorczyk,
	Mohammad Husain, namhyung, namhyung, netdev, paul.chaignon,
	primiano, Qais Yousef, Shuah Khan, Song Liu, Srinivas Ramana,
	Steven Rostedt, Tamir Carmeli, Yonghong Song
In-Reply-To: <20190710141548.132193-1-joel@joelfernandes.org>

On Wed, Jul 10, 2019 at 10:15:44AM -0400, Joel Fernandes (Google) wrote:
> Hi,

why are you cc-ing the whole world for this patch set?
I'll reply to all as well, but I suspect a bunch of folks consider it spam.
Please read Documentation/bpf/bpf_devel_QA.rst

Also, I think, netdev@vger rejects emails with 80+ characters in cc as spam,
so I'm not sure this set reached public mailing lists.

> These patches make it possible to attach BPF programs directly to tracepoints
> using ftrace (/sys/kernel/debug/tracing) without needing the process doing the
> attach to be alive. This has the following benefits:
> 
> 1. Simplified Security: In Android, we have finer-grained security controls to
> specific ftrace trace events using SELinux labels. We control precisely who is
> allowed to enable an ftrace event already. By adding a node to ftrace for
> attaching BPF programs, we can use the same mechanism to further control who is
> allowed to attach to a trace event.
> 
> 2. Process lifetime: In Android we are adding usecases where a tracing program
> needs to be attached all the time to a tracepoint, for the full life time of
> the system. Such as to gather statistics where there no need for a detach for
> the full system lifetime. With perf or bpf(2)'s BPF_RAW_TRACEPOINT_OPEN, this
> means keeping a process alive all the time.  However, in Android our BPF loader
> currently (for hardeneded security) involves just starting a process at boot
> time, doing the BPF program loading, and then pinning them to /sys/fs/bpf.  We
> don't keep this process alive all the time. It is more suitable to do a
> one-shot attach of the program using ftrace and not need to have a process
> alive all the time anymore for this. Such process also needs elevated
> privileges since tracepoint program loading currently requires CAP_SYS_ADMIN
> anyway so by design Android's bpfloader runs once at init and exits.
> 
> This series add a new bpf file to /sys/kernel/debug/tracing/events/X/Y/bpf
> The following commands can be written into it:
> attach:<fd>     Attaches BPF prog fd to tracepoint
> detach:<fd>     Detaches BPF prog fd to tracepoint

Looks like, to detach a program the user needs to read a text file,
parse bpf prog id from text into binary. Then call fd_from_id bpf syscall,
get a binary FD, convert it back to text and write as a text back into this file.
I think this is just a single example why text based apis are not accepted
in bpf anymore.

Through the patch set you call it ftrace. As far as I can see, this set
has zero overlap with ftrace. There is no ftrace-bpf connection here at all
that we discussed in the past Steven. It's all quite confusing.

I suggest android to solve sticky raw_tracepoint problem with user space deamon.
The reasons, you point out why user daemon cannot be used, sound weak to me.
Another acceptable solution would be to introduce pinning of raw_tp objects.
bpf progs and maps can be pinned in bpffs already. Pinning raw_tp would
be natural extension.

^ permalink raw reply

* [Patch net v2] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: Cong Wang @ 2019-07-16 20:57 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Eric Dumazet

For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
notably fq_codel, it makes no sense to let packets bypass the TC
filters we setup in any scenario, otherwise our packets steering
policy could not be enforced.

This can be reproduced easily with the following script:

 ip li add dev dummy0 type dummy
 ifconfig dummy0 up
 tc qd add dev dummy0 root fq_codel
 tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
 tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
 ping -I dummy0 192.168.112.1

Without this patch, packets are sent directly to dummy0 without
hitting any of the filters. With this patch, packets are redirected
to loopback as expected.

This fix is not perfect, it only unsets the flag but does not set it back
because we have to save the information somewhere in the qdisc if we
really want that. Note, both fq_codel and sfq clear this flag in their
->bind_tcf() but this is clearly not sufficient when we don't use any
class ID.

Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_api.c      | 1 +
 net/sched/sch_fq_codel.c | 2 --
 net/sched/sch_sfq.c      | 2 --
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 278014e26aec..d144233423c5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2152,6 +2152,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false, rtnl_held);
 		tfilter_put(tp, fh);
+		q->flags &= ~TCQ_F_CAN_BYPASS;
 	}
 
 errout:
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index e2faf33d282b..d59fbcc745d1 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -596,8 +596,6 @@ static unsigned long fq_codel_find(struct Qdisc *sch, u32 classid)
 static unsigned long fq_codel_bind(struct Qdisc *sch, unsigned long parent,
 			      u32 classid)
 {
-	/* we cannot bypass queue discipline anymore */
-	sch->flags &= ~TCQ_F_CAN_BYPASS;
 	return 0;
 }
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 420bd8411677..68404a9d2ce4 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -824,8 +824,6 @@ static unsigned long sfq_find(struct Qdisc *sch, u32 classid)
 static unsigned long sfq_bind(struct Qdisc *sch, unsigned long parent,
 			      u32 classid)
 {
-	/* we cannot bypass queue discipline anymore */
-	sch->flags &= ~TCQ_F_CAN_BYPASS;
 	return 0;
 }
 
-- 
2.21.0


^ permalink raw reply related

* [Patch net v2 1/2] fib: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-16 20:58 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Julian Anastasov, David Ahern

In a rare case where we redirect local packets from veth to lo,
these packets fail to pass the source validation when rp_filter
is turned on, as the tracing shows:

  <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
  <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130

So, the fib table lookup returns eth0 as the nexthop even though
the packets are local and should be routed to loopback nonetheless,
but they can't pass the dev match check in fib_info_nh_uses_dev()
without this patch.

It should be safe to relax this check for this special case, as
normally packets coming out of loopback device still have skb_dst
so they won't even hit this slow path.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/ipv4/fib_frontend.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 317339cd7f03..e8bc939b56dd 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -388,6 +388,11 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	fib_combine_itag(itag, &res);
 
 	dev_match = fib_info_nh_uses_dev(res.fi, dev);
+	/* This is not common, loopback packets retain skb_dst so normally they
+	 * would not even hit this slow path.
+	 */
+	dev_match = dev_match || (res.type == RTN_LOCAL &&
+				  dev == net->loopback_dev);
 	if (dev_match) {
 		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
 		return ret;
-- 
2.21.0


^ permalink raw reply related

* [Patch net v2 2/2] selftests: add a test case for rp filter
From: Cong Wang @ 2019-07-16 20:58 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, David Ahern

Add a test case to simulate the loopback packet case fixed
in the previous patch.

This test gets passed after the fix:

IPv4 rp_filter tests
    TEST: rp_filter passes local packets                                [ OK ]
    TEST: rp_filter passes loopback packets                             [ OK ]

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 tools/testing/selftests/net/fib_tests.sh | 30 +++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 9457aaeae092..a9e45471edfe 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -9,12 +9,13 @@ ret=0
 ksft_skip=4
 
 # all tests in this script. Can be overridden with -t option
-TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw"
+TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter"
 
 VERBOSE=0
 PAUSE_ON_FAIL=no
 PAUSE=no
 IP="ip -netns ns1"
+NETNS="ip netns exec ns1"
 
 log_test()
 {
@@ -433,6 +434,32 @@ fib_carrier_test()
 	fib_carrier_unicast_test
 }
 
+fib_rp_filter_test()
+{
+	echo
+	echo "IPv4 rp_filter tests"
+
+	setup
+
+	$IP link set dev lo address 52:54:00:6a:c7:5e
+	$IP link set dev dummy0 address 52:54:00:6a:c7:5e
+	echo 1 | $NETNS tee /proc/sys/net/ipv4/conf/all/rp_filter > /dev/null
+	echo 1 | $NETNS tee /proc/sys/net/ipv4/conf/all/accept_local > /dev/null
+	echo 1 | $NETNS tee /proc/sys/net/ipv4/conf/all/route_localnet > /dev/null
+
+	$NETNS tc qd add dev dummy0 parent root handle 1: fq_codel
+	$NETNS tc filter add dev dummy0 parent 1: protocol arp basic action mirred egress redirect dev lo
+	$NETNS tc filter add dev dummy0 parent 1: protocol ip basic action mirred egress redirect dev lo
+
+	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 198.51.100.1"
+	log_test $? 0 "rp_filter passes local packets"
+
+	run_cmd "ip netns exec ns1 ping -I dummy0 -w1 -c1 127.0.0.1"
+	log_test $? 0 "rp_filter passes loopback packets"
+
+	cleanup
+}
+
 ################################################################################
 # Tests on nexthop spec
 
@@ -1557,6 +1584,7 @@ do
 	fib_unreg_test|unregister)	fib_unreg_test;;
 	fib_down_test|down)		fib_down_test;;
 	fib_carrier_test|carrier)	fib_carrier_test;;
+	fib_rp_filter_test|rp_filter)	fib_rp_filter_test;;
 	fib_nexthop_test|nexthop)	fib_nexthop_test;;
 	ipv6_route_test|ipv6_rt)	ipv6_route_test;;
 	ipv4_route_test|ipv4_rt)	ipv4_route_test;;
-- 
2.21.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox