Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] i40e: mark expected switch fall-through
From: Gustavo A. R. Silva @ 2018-08-31 22:57 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, David S. Miller
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <61CC2BC414934749BD9F5BF3D5D94044950056B7@ORSMSX112.amr.corp.intel.com>



On 8/30/18 4:09 PM, Kirsher, Jeffrey T wrote:
>> -----Original Message-----
>> From: Gustavo A. R. Silva [mailto:gustavo@embeddedor.com]
>> Sent: Thursday, August 30, 2018 11:50
>> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; David S. Miller
>> <davem@davemloft.net>
>> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Gustavo A. R. Silva <gustavo@embeddedor.com>
>> Subject: [PATCH] i40e: mark expected switch fall-through
>>
>> In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
>> we are expecting to fall through.
>>
>> Addresses-Coverity-ID: 1473099 ("Missing break in switch")
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> I have picked this up Dave.
> 

Thanks, Jeffrey.
--
Gustavo

^ permalink raw reply

* [PATCH v6 4/4] gpiolib: Implement fast processing path in get/set array
From: Janusz Krzysztofik @ 2018-08-31 22:56 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Jonathan Corbet, Miguel Ojeda Sandonis, Peter Korsgaard,
	Peter Rosin, Ulf Hansson, Andrew Lunn, Florian Fainelli,
	David S. Miller, Dominik Brodowski, Greg Kroah-Hartman,
	Kishon Vijay Abraham I, Lars-Peter Clausen, Michael Hennerich,
	Jonathan Cameron, Hartmut Knaack, Peter Meerwald-Stadler,
	Jiri Slaby, Willy Tarreau, Geert Uytterhoeven
In-Reply-To: <20180831225616.29221-1-jmkrzyszt@gmail.com>

Certain GPIO descriptor arrays returned by gpio_get_array() may contain
information on direct mapping of array members to pins of a single GPIO
chip in hardware order.  In such cases, bitmaps of values can be passed
directly from/to the chip's .get/set_multiple() callbacks without
wasting time on iterations.

Add respective code to gpiod_get/set_array_bitmap_complex() functions.
Pins not applicable for fast path are processed as before, skipping
over the 'fast' ones.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
---
 Documentation/driver-api/gpio/board.rst    | 15 ++++++
 Documentation/driver-api/gpio/consumer.rst |  8 +++
 drivers/gpio/gpiolib.c                     | 87 ++++++++++++++++++++++++++++--
 3 files changed, 105 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst
index 2c112553df84..c66821e033c2 100644
--- a/Documentation/driver-api/gpio/board.rst
+++ b/Documentation/driver-api/gpio/board.rst
@@ -193,3 +193,18 @@ And the table can be added to the board code as follows::
 
 The line will be hogged as soon as the gpiochip is created or - in case the
 chip was created earlier - when the hog table is registered.
+
+Arrays of pins
+--------------
+In addition to requesting pins belonging to a function one by one, a device may
+also request an array of pins assigned to the function.  The way those pins are
+mapped to the device determines if the array qualifies for fast bitmap
+processing.  If yes, a bitmap is passed over get/set array functions directly
+between a caller and a respective .get/set_multiple() callback of a GPIO chip.
+
+In order to qualify for fast bitmap processing, the pin mapping must meet the
+following requirements:
+- it must belong to the same chip as other 'fast' pins of the function,
+- its index within the function must match its hardware number within the chip.
+
+Open drain and open source pins are excluded from fast bitmap output processing.
diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst
index 0afd95a12b10..cf992e5ab976 100644
--- a/Documentation/driver-api/gpio/consumer.rst
+++ b/Documentation/driver-api/gpio/consumer.rst
@@ -388,6 +388,14 @@ array_info should be set to NULL.
 Note that for optimal performance GPIOs belonging to the same chip should be
 contiguous within the array of descriptors.
 
+Still better performance may be achieved if array indexes of the descriptors
+match hardware pin numbers of a single chip.  If an array passed to a get/set
+array function matches the one obtained from gpiod_get_array() and array_info
+associated with the array is also passed, the function may take a fast bitmap
+processing path, passing the value_bitmap argument directly to the respective
+.get/set_multiple() callback of the chip.  That allows for utilization of GPIO
+banks as data I/O ports without much loss of performance.
+
 The return value of gpiod_get_array_value() and its variants is 0 on success
 or negative on error. Note the difference to gpiod_get_value(), which returns
 0 or 1 on success to convey the GPIO value. With the array functions, the GPIO
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 4d26cdbdb7cf..b799a89c4c17 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2787,7 +2787,36 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep,
 				  struct gpio_array *array_info,
 				  unsigned long *value_bitmap)
 {
-	int i = 0;
+	int err, i = 0;
+
+	/*
+	 * Validate array_info against desc_array and its size.
+	 * It should immediately follow desc_array if both
+	 * have been obtained from the same gpiod_get_array() call.
+	 */
+	if (array_info && array_info->desc == desc_array &&
+	    array_size <= array_info->size &&
+	    (void *)array_info == desc_array + array_info->size) {
+		if (!can_sleep)
+			WARN_ON(array_info->chip->can_sleep);
+
+		err = gpio_chip_get_multiple(array_info->chip,
+					     array_info->get_mask,
+					     value_bitmap);
+		if (err)
+			return err;
+
+		if (!raw && !bitmap_empty(array_info->invert_mask, array_size))
+			bitmap_xor(value_bitmap, value_bitmap,
+				   array_info->invert_mask, array_size);
+
+		if (bitmap_full(array_info->get_mask, array_size))
+			return 0;
+
+		i = find_first_zero_bit(array_info->get_mask, array_size);
+	} else {
+		array_info = NULL;
+	}
 
 	while (i < array_size) {
 		struct gpio_chip *chip = desc_array[i]->gdev->chip;
@@ -2818,7 +2847,12 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep,
 			int hwgpio = gpio_chip_hwgpio(desc);
 
 			__set_bit(hwgpio, mask);
-			i++;
+
+			if (array_info)
+				find_next_zero_bit(array_info->get_mask,
+						   array_size, i);
+			else
+				i++;
 		} while ((i < array_size) &&
 			 (desc_array[i]->gdev->chip == chip));
 
@@ -2829,7 +2863,7 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep,
 			return ret;
 		}
 
-		for (j = first; j < i; j++) {
+		for (j = first; j < i; ) {
 			const struct gpio_desc *desc = desc_array[j];
 			int hwgpio = gpio_chip_hwgpio(desc);
 			int value = test_bit(hwgpio, bits);
@@ -2838,6 +2872,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep,
 				value = !value;
 			__assign_bit(j, value_bitmap, value);
 			trace_gpio_value(desc_to_gpio(desc), 1, value);
+
+			if (array_info)
+				find_next_zero_bit(array_info->get_mask, i, j);
+			else
+				j++;
 		}
 
 		if (mask != fastpath)
@@ -3039,6 +3078,32 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep,
 {
 	int i = 0;
 
+	/*
+	 * Validate array_info against desc_array and its size.
+	 * It should immediately follow desc_array if both
+	 * have been obtained from the same gpiod_get_array() call.
+	 */
+	if (array_info && array_info->desc == desc_array &&
+	    array_size <= array_info->size &&
+	    (void *)array_info == desc_array + array_info->size) {
+		if (!can_sleep)
+			WARN_ON(array_info->chip->can_sleep);
+
+		if (!raw && !bitmap_empty(array_info->invert_mask, array_size))
+			bitmap_xor(value_bitmap, value_bitmap,
+				   array_info->invert_mask, array_size);
+
+		gpio_chip_set_multiple(array_info->chip, array_info->set_mask,
+				       value_bitmap);
+
+		if (bitmap_full(array_info->set_mask, array_size))
+			return 0;
+
+		i = find_first_zero_bit(array_info->set_mask, array_size);
+	} else {
+		array_info = NULL;
+	}
+
 	while (i < array_size) {
 		struct gpio_chip *chip = desc_array[i]->gdev->chip;
 		unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
@@ -3066,7 +3131,14 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep,
 			int hwgpio = gpio_chip_hwgpio(desc);
 			int value = test_bit(i, value_bitmap);
 
-			if (!raw && test_bit(FLAG_ACTIVE_LOW, &desc->flags))
+			/*
+			 * Pins applicable for fast input but not for
+			 * fast output processing may have been already
+			 * inverted inside the fast path, skip them.
+			 */
+			if (!raw && !(array_info &&
+			    test_bit(i, array_info->invert_mask)) &&
+			    test_bit(FLAG_ACTIVE_LOW, &desc->flags))
 				value = !value;
 			trace_gpio_value(desc_to_gpio(desc), 0, value);
 			/*
@@ -3085,7 +3157,12 @@ int gpiod_set_array_value_complex(bool raw, bool can_sleep,
 					__clear_bit(hwgpio, bits);
 				count++;
 			}
-			i++;
+
+			if (array_info)
+				find_next_zero_bit(array_info->set_mask,
+						   array_size, i);
+			else
+				i++;
 		} while ((i < array_size) &&
 			 (desc_array[i]->gdev->chip == chip));
 		/* push collected bits to outputs */
-- 
2.16.4

^ permalink raw reply related

* [PATCH v6 2/4] gpiolib: Identify arrays matching GPIO hardware
From: Janusz Krzysztofik @ 2018-08-31 22:56 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Andrew Lunn, Ulf Hansson, linux-doc, linux-iio, Dominik Brodowski,
	Peter Rosin, netdev, linux-i2c, Peter Meerwald-Stadler, devel,
	Florian Fainelli, Jonathan Corbet, Janusz Krzysztofik,
	Kishon Vijay Abraham I, Geert Uytterhoeven, linux-serial,
	Jiri Slaby, Michael Hennerich, linux-gpio, Lars-Peter Clausen,
	Greg Kroah-Hartman, linux-mmc, linux-kernel, Willy Tarreau,
	Miguel Ojeda Sandonis
In-Reply-To: <20180831225616.29221-1-jmkrzyszt@gmail.com>

Certain GPIO array lookup results may map directly to GPIO pins of a
single GPIO chip in hardware order.  If that condition is recognized
and handled efficiently, significant performance gain of get/set array
functions may be possible.

While processing a request for an array of GPIO descriptors, identify
those which represent corresponding pins of a single GPIO chip.  Skip
over pins which require open source or open drain special processing.
Moreover, identify pins which require inversion.  Pass a pointer to
that information with the array to the caller so it can benefit from
enhanced performance as soon as get/set array functions can accept and
make efficient use of it.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
---
 Documentation/driver-api/gpio/consumer.rst |  4 +-
 drivers/gpio/gpiolib.c                     | 72 +++++++++++++++++++++++++++++-
 drivers/gpio/gpiolib.h                     |  9 ++++
 include/linux/gpio/consumer.h              |  9 ++++
 4 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst
index ed68042ddccf..7e0298b9a7b9 100644
--- a/Documentation/driver-api/gpio/consumer.rst
+++ b/Documentation/driver-api/gpio/consumer.rst
@@ -109,9 +109,11 @@ For a function using multiple GPIOs all of those can be obtained with one call::
 					   enum gpiod_flags flags)
 
 This function returns a struct gpio_descs which contains an array of
-descriptors::
+descriptors.  It also contains a pointer to a gpiolib private structure which,
+if passed back to get/set array functions, may speed up I/O proocessing::
 
 	struct gpio_descs {
+		struct gpio_array *info;
 		unsigned int ndescs;
 		struct gpio_desc *desc[];
 	}
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index f0e9ffa8cab6..c1ed1c759345 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -4174,7 +4174,9 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev,
 {
 	struct gpio_desc *desc;
 	struct gpio_descs *descs;
-	int count;
+	struct gpio_array *array_info = NULL;
+	struct gpio_chip *chip;
+	int count, bitmap_size;
 
 	count = gpiod_count(dev, con_id);
 	if (count < 0)
@@ -4190,9 +4192,77 @@ struct gpio_descs *__must_check gpiod_get_array(struct device *dev,
 			gpiod_put_array(descs);
 			return ERR_CAST(desc);
 		}
+
 		descs->desc[descs->ndescs] = desc;
+
+		chip = gpiod_to_chip(desc);
+		/*
+		 * Select a chip of first array member
+		 * whose index matches its pin hardware number
+		 * as a candidate for fast bitmap processing.
+		 */
+		if (!array_info && gpio_chip_hwgpio(desc) == descs->ndescs) {
+			struct gpio_descs *array;
+
+			bitmap_size = BITS_TO_LONGS(chip->ngpio > count ?
+						    chip->ngpio : count);
+
+			array = kzalloc(struct_size(descs, desc, count) +
+					struct_size(array_info, invert_mask,
+					3 * bitmap_size), GFP_KERNEL);
+			if (!array) {
+				gpiod_put_array(descs);
+				return ERR_PTR(-ENOMEM);
+			}
+
+			memcpy(array, descs,
+			       struct_size(descs, desc, descs->ndescs + 1));
+			kfree(descs);
+
+			descs = array;
+			array_info = (void *)(descs->desc + count);
+			array_info->get_mask = array_info->invert_mask +
+						  bitmap_size;
+			array_info->set_mask = array_info->get_mask +
+						  bitmap_size;
+
+			array_info->desc = descs->desc;
+			array_info->size = count;
+			array_info->chip = chip;
+			bitmap_set(array_info->get_mask, descs->ndescs,
+				   count - descs->ndescs);
+			bitmap_set(array_info->set_mask, descs->ndescs,
+				   count - descs->ndescs);
+			descs->info = array_info;
+		}
+		/*
+		 * Unmark members which don't qualify for fast bitmap
+		 * processing (different chip, not in hardware order)
+		 */
+		if (array_info && (chip != array_info->chip ||
+		    gpio_chip_hwgpio(desc) != descs->ndescs)) {
+			__clear_bit(descs->ndescs, array_info->get_mask);
+			__clear_bit(descs->ndescs, array_info->set_mask);
+		} else if (array_info) {
+			/* Exclude open drain or open source from fast output */
+			if (gpiochip_line_is_open_drain(chip, descs->ndescs) ||
+			    gpiochip_line_is_open_source(chip, descs->ndescs))
+				__clear_bit(descs->ndescs,
+					    array_info->set_mask);
+			/* Identify 'fast' pins which require invertion */
+			if (gpiod_is_active_low(desc))
+				__set_bit(descs->ndescs,
+					  array_info->invert_mask);
+		}
+
 		descs->ndescs++;
 	}
+	if (array_info)
+		dev_dbg(dev,
+			"GPIO array info: chip=%s, size=%d, get_mask=%lx, set_mask=%lx, invert_mask=%lx\n",
+			array_info->chip->label, array_info->size,
+			*array_info->get_mask, *array_info->set_mask,
+			*array_info->invert_mask);
 	return descs;
 }
 EXPORT_SYMBOL_GPL(gpiod_get_array);
diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h
index 11e83d2eef89..b60905d558b1 100644
--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -183,6 +183,15 @@ static inline bool acpi_can_fallback_to_crs(struct acpi_device *adev,
 }
 #endif
 
+struct gpio_array {
+	struct gpio_desc	**desc;
+	unsigned int		size;
+	struct gpio_chip	*chip;
+	unsigned long		*get_mask;
+	unsigned long		*set_mask;
+	unsigned long		invert_mask[];
+};
+
 struct gpio_desc *gpiochip_get_desc(struct gpio_chip *chip, u16 hwnum);
 int gpiod_get_array_value_complex(bool raw, bool can_sleep,
 				  unsigned int array_size,
diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h
index 1b21dc7b0fad..8dede3e886af 100644
--- a/include/linux/gpio/consumer.h
+++ b/include/linux/gpio/consumer.h
@@ -17,11 +17,20 @@ struct device;
  */
 struct gpio_desc;
 
+/**
+ * Opaque descriptor for a structure of GPIO array attributes.  This structure
+ * is attached to struct gpiod_descs obtained from gpiod_get_array() and can be
+ * passed back to get/set array functions in order to activate fast processing
+ * path if applicable.
+ */
+struct gpio_array;
+
 /**
  * Struct containing an array of descriptors that can be obtained using
  * gpiod_get_array().
  */
 struct gpio_descs {
+	struct gpio_array *info;
 	unsigned int ndescs;
 	struct gpio_desc *desc[];
 };
-- 
2.16.4

^ permalink raw reply related

* [PATH v6 0/4] gpiolib: speed up GPIO array processing
From: Janusz Krzysztofik @ 2018-08-31 22:56 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Jonathan Corbet, Miguel Ojeda Sandonis, Peter Korsgaard,
	Peter Rosin, Ulf Hansson, Andrew Lunn, Florian Fainelli,
	David S. Miller, Dominik Brodowski, Greg Kroah-Hartman,
	Kishon Vijay Abraham I, Lars-Peter Clausen, Michael Hennerich,
	Jonathan Cameron, Hartmut Knaack, Peter Meerwald-Stadler,
	Jiri Slaby, Willy Tarreau, Geert Uytterhoeven
In-Reply-To: <20180829204900.19390-1-jmkrzyszt@gmail.com>


The goal is to boost performance of get/set array functions while
processing GPIO arrays which represent pins of a signle chip in
hardware order.  If resulting performance is close to PIO, GPIO API
can be used for data I/O without much loss of speed.

Created and tested on a low end Amstrad Delta board with NAND driver
updated to use GPIO API for data I/O.  Performance degrade compared to
PIO is much better than before the optimization but still not quite
satisfactory.

Janusz Krzysztofik (4):
      gpiolib: Pass bitmaps, not integer arrays, to get/set array
      gpiolib: Identify arrays matching GPIO hardware
      gpiolib: Pass array info to get/set array functions
      gpiolib: Implement fast processing path in get/set array

Changelog:
v6:
[PATCH v6 1/4] gpiolib: Pass bitmaps, not integer arrays, to get/set
- use DECLARE_BITMAP() macro for declaring value_bitmap - great idea by
  David Laight, thanks!
drivers/auxdisplay/hd44780.c:
- simplify the code and adjust comments as recommended by Geert
  Uytterhoeven - thanks!,
drivers/i2c/muxes/i2c-mux-gpio.c:
- drop .values member of struct gpiomux - details prvided by Peter
  Rosin, thanks!, 
drivers/mux/gpio.c:
- drop .val member of struct mux_gpio - details prvided by Peter
  Rosin, thanks!,
drivers/net/phy/mdio-mux-gpio.c:
- drop .values member of struct mdio_mux_gpio_state and is processsing.

v5:
[PATCH v5 1/4] gpiolib: Pass bitmaps, not integer arrays, to get/set
- drivers/i2c/muxes/i2c-mux-gpio.c:
  - drop assigment of values to struct gpiomux.values, as recommended
    by Peter Rosin - thanks!,
  - mark the .values member of the structure as obsolete,
- drivers/mux/gpio.c:
  - drop assigment of values to struct mux_gpio.val, also recommended
    by Peter Rosin - thanks!,
  - merk the .val member of the structure as obsolete,
- drivers/auxdisplay/hd44780.c:
  - fix incorrect bitmap size,
  - use >>= operator to simplify notation,
  both catched by Miguel Ojeda - thanks!,
- add Cc: clauses as well as Acked-by: collected so far.
[PATCH v5 2/4] gpiolib: Identify arrays matching GPIO hardware
- add Cc: clause.
[PATCH v5 3/4] gpiolib: Pass array info to get/set array functions
- add Cc: clauses as well as Acked-by: collected so far.
[PATCH v5 4/4] gpiolib: Implement fast processing path in get/set
- add Cc: clause.

v4:
That series was a follow up of the former "mtd: rawnand: ams-delta: Use
gpio-omap accessors for data I/O" which already contained some changes
to gpiolib.  Those previous attempts were commented by Borris Brezillon
who suggested using GPIO API modified to accept bitmaps, and by Linus
Walleij who suggested still more great ideas for further immprovement
of the proposed API changes - thanks!

diffstat:
 Documentation/driver-api/gpio/board.rst     |   15 +
 Documentation/driver-api/gpio/consumer.rst  |   48 +++-
 drivers/auxdisplay/hd44780.c                |   74 +++----
 drivers/bus/ts-nbus.c                       |   27 +-
 drivers/gpio/gpio-max3191x.c                |   23 +-
 drivers/gpio/gpiolib.c                      |  279 ++++++++++++++++++++++------
 drivers/gpio/gpiolib.h                      |   15 +
 drivers/i2c/muxes/i2c-mux-gpio.c            |   16 -
 drivers/mmc/core/pwrseq_simple.c            |   15 -
 drivers/mux/gpio.c                          |   18 -
 drivers/net/phy/mdio-mux-gpio.c             |   13 -
 drivers/pcmcia/soc_common.c                 |   14 -
 drivers/phy/motorola/phy-mapphone-mdm6600.c |   21 +-
 drivers/staging/iio/adc/ad7606.c            |   12 -
 drivers/tty/serial/serial_mctrl_gpio.c      |    9 
 include/linux/gpio/consumer.h               |   35 ++-
 16 files changed, 417 insertions(+), 217 deletions(-)

^ permalink raw reply

* Re: [bpf-next PATCH 3/3] xdp: split code for map vs non-map redirect
From: Daniel Borkmann @ 2018-08-31 18:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev; +Cc: Alexei Starovoitov
In-Reply-To: <153572917749.27338.853717648840898807.stgit@firesoul>

On 08/31/2018 05:26 PM, Jesper Dangaard Brouer wrote:
> The compiler does an efficient job of inlining static C functions.
> Perf top clearly shows that almost everything gets inlined into the
> function call xdp_do_redirect.
> 
> The function xdp_do_redirect end-up containing and interleaving the
> map and non-map redirect code.  This is sub-optimal, as it would be
> strange for an XDP program to use both types of redirect in the same
> program. The two use-cases are separate, and interleaving the code
> just cause more instruction-cache pressure.
> 
> I would like to stress (again) that the non-map variant bpf_redirect
> is very slow compared to the bpf_redirect_map variant, approx half the
> speed.  Measured with driver i40e the difference is:
> 
> - map     redirect: 13,250,350 pps
> - non-map redirect:  7,491,425 pps
> 
> For this reason, the function name of the non-map variant of redirect
> have been called xdp_do_redirect_slow.  This hopefully gives a hint
> when using perf, that this is not the optimal XDP redirect operating mode.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  net/core/filter.c |   52 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index ec1b4eb0d3d4..c4ad1b93167f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3170,6 +3170,32 @@ static int __bpf_tx_xdp(struct net_device *dev,
>  	return 0;
>  }
>  
> +/* non-static to avoid inline by compiler */
> +int xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,

Nit: should be 'static noinline' in that case then.

> +			struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
> +{
> +	struct net_device *fwd;
> +	u32 index = ri->ifindex;
> +	int err;
> +
> +	fwd = dev_get_by_index_rcu(dev_net(dev), index);
> +	ri->ifindex = 0;
> +	if (unlikely(!fwd)) {
> +		err = -EINVAL;
> +		goto err;
> +	}
> +
> +	err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
> +	if (unlikely(err))
> +		goto err;
> +
> +	_trace_xdp_redirect(dev, xdp_prog, index);
> +	return 0;
> +err:
> +	_trace_xdp_redirect_err(dev, xdp_prog, index, err);
> +	return err;
> +}
> +
>  static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
>  			    struct bpf_map *map,
>  			    struct xdp_buff *xdp,
> @@ -3264,9 +3290,9 @@ void bpf_clear_redirect_map(struct bpf_map *map)
>  }
>  
>  static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
> -			       struct bpf_prog *xdp_prog, struct bpf_map *map)
> +			       struct bpf_prog *xdp_prog, struct bpf_map *map,
> +			       struct bpf_redirect_info *ri)
>  {
> -	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>  	u32 index = ri->ifindex;
>  	void *fwd = NULL;
>  	int err;
> @@ -3299,29 +3325,11 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
>  {
>  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>  	struct bpf_map *map = READ_ONCE(ri->map);
> -	struct net_device *fwd;
> -	u32 index = ri->ifindex;
> -	int err;
>  
>  	if (likely(map))
> -		return xdp_do_redirect_map(dev, xdp, xdp_prog, map);
> +		return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri);
>  
> -	fwd = dev_get_by_index_rcu(dev_net(dev), index);
> -	ri->ifindex = 0;
> -	if (unlikely(!fwd)) {
> -		err = -EINVAL;
> -		goto err;
> -	}
> -
> -	err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
> -	if (unlikely(err))
> -		goto err;
> -
> -	_trace_xdp_redirect(dev, xdp_prog, index);
> -	return 0;
> -err:
> -	_trace_xdp_redirect_err(dev, xdp_prog, index, err);
> -	return err;
> +	return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri);
>  }
>  EXPORT_SYMBOL_GPL(xdp_do_redirect);
>  
> 

^ permalink raw reply

* Cutting out
From: Jimmy Wilson @ 2018-08-31 13:07 UTC (permalink / raw)
  To: netdev

Hi,

Have you received my email from last week?

I would like to speak with the person who manage your photos for your
company?

We are here to provide you all kinds of imaging editing.

What we can provide you:
Cutting out for photos
Clipping path for photos
Masking for photos
Retouching for your photos
Retouching for the Beauty Model and Portraits.

We have 20 staffs in house and daily basis 1000 images can be processed.

We give testing for your photos.

Thanks,
Jimmy Wilson

^ permalink raw reply

* Re: [RFC PATCH v1 0/3] device property: Support MAC address in VPD
From: Brian Norris @ 2018-08-31 21:28 UTC (permalink / raw)
  To: Srinivas Kandagatla
  Cc: Rob Herring, Brian Norris, Florian Fainelli, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Lunn, Dmitry Torokhov, Guenter Roeck,
	netdev, devicetree, linux-kernel, Julius Werner, Stephen Boyd
In-Reply-To: <44c7fc7a-9c1c-e894-f27f-5320c061aafc@linaro.org>

Hi Srinivas,

On Fri, Aug 31, 2018 at 09:43:30AM +0100, Srinivas Kandagatla wrote:

> On 31/08/18 02:26, Brian Norris wrote:
> > > Seems to me that nvmem needs to be extended to allow providers to
> > > retrieve and interpret data. Not everything is at some fixed offset and
> > > size. Something like this is valid dts:
> > > 
> > > nvmem = <&phandle> "a-string";
> > > 
> 
> There has been some discussion on extending nvmem support to MTD and non-DT
> users in https://patchwork.kernel.org/cover/10562365/
> 
> One of the thing which we discussed in this thread is adding compatible
> strings to cells mainly to
> 1> Differentiate between actual cells and partitions for MTD case.

I'm not particularly worried about the MTD case. As I mentioned earlier,
while VPD is stored on flash (and could be exposed as an MTD), coreboot
places these tables in memory, and we currently just read them from
there instead of from flash.

> 2> Allow provider specific bindings for each cell, in VPD instance key
> string & value length could be one them.

I'm not sure we'd need to have a binding for value length -- VPD encodes
the length itself, and for many properties, the length is understood by
both sides anyway (2x6 bytes for a MAC address).

> This means that we would endup adding xlate callback support to the
> nvmem_config.

OK, but that's not in the current series, correct?

> AFAIU, From consumer side old bindings should still work!

I'm still trying to wrap my head around all the existing and proposed
behaviors of nvmem, but I see a few things lacking (IIUC):

(1) for the new "lookup" method, you would only support a single MAC
    address, identified by looking up for "mac-address" -- this means
    you can't support two devices (e.g., we have systems with VPD
    entries for "ethernet_mac0" and "etherent_mac1")
(2) the consumer API isn't very flexible -- it assumes that the data you
    read out of an NVMEM cell is directly usable as a MAC address; this
    fails for VPD, because VPD uses ASCII encoding. To resolve this,
    you'd need the consumer/provider relationship to know something
    about the data type -- to know that we should decode the ASCII
    values

> From non-dt or ACPI side these cells can be parsed by the provider driver
> and add it to the nvmem_config.

I think that might work, except for the above problems. But perhaps I'm
misreading.

> Does this make sense? Or Did I miss anything obvious ?
> 
> 
> > > But that's pretty uncommon (I can't think of a binding that actually
> > > uses that). Perhaps the provider has an array of keys defined and the
> > > consumer just provides the index.
> > In the case of VPD, all keys are 0-terminated strings (there's also a
> > length field, but the key is still 0-terminated), so that scheme could
> > work. (I'm not sure an indexed provider is extremely relevant right now,
> > although it probably could be supported if I expand the of_nvmem
> > retrieval to support a generic of_xlate() override anyway.) The
> > information represented is almost the same as in my proposal, except that:

@Rob:
I forgot about problem (2) above -- NVMEM is not very expressive about
the *type* of information. My proposal makes it explicit that the
provider is presenting MAC addresses. To make a generic VPD NVMEM
provider, I'd need to do ASCII decoding on some fields but not on
others.

Brian

> > (a) now I have to give the VPD a phandle -- so far, I've avoided that,
> >      as it's just an auto-enumerated device underneath the
> >      /firmware/coreboot device (see drivers/firmware/google/vpd.c)
> > (b) this is no longer directly useful to ACPI systems -- I'm not
> >      actually sure how (if at all) nvmem provider/consumer is supposed to
> >      work there
> > 
> > But maybe this isn't really that useful to ACPI, and it's sufficient to
> > just have fwnode_get_mac_address() call of_get_nvmem_mac_address() when
> > we're using DT.
> > 
> > > Or we could do '<key>-nvmem = <&phandle>', but parsing that is a bit
> > > more complicated.
> > That doesn't seem to have much advantage to me.

^ permalink raw reply

* Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace
From: Lorenzo Bianconi @ 2018-08-31 16:54 UTC (permalink / raw)
  To: David Ahern; +Cc: davem, netdev
In-Reply-To: <b3d8a5f8-4824-b450-2236-ea32aa52da83@gmail.com>

> On 8/31/18 10:19 AM, Lorenzo Bianconi wrote:
> >> On 8/31/18 5:43 AM, Lorenzo Bianconi wrote:
> >>> When moving a veth device to another namespace, userspace receives a
> >>> RTM_DELLINK message indicating the device has been removed from current
> >>> netns. However, the other peer does not receive a netlink event
> >>> containing new values for IFLA_LINK_NETNSID and IFLA_LINK veth
> >>> attributes.
> >>> Fix that behaviour sending to userspace a RTM_NEWLINK message in the peer
> >>> namespace to report new IFLA_LINK_NETNSID/IFLA_LINK values
> >>>
> >>
> >> A newlink message is generated in the new namespace. What information is
> >> missing from that message?
> >>
> > 
> > Hi David,
> > 
> > let's assume we have two veth paired devices (veth0 and veth1) on inet
> > namespace. When moving a veth1 to another namespace, userspace is notified
> > with RTM_DELLINK event on inet namespace to indicate that veth1 has been
> > moved to another namespace. However some userspace applications
> > (e.g. NetworkManager), listening for events on inet namespace, are interested
> > in veth1 ifindex in the new namespace. This patch sends a new RTM_NEWLINK event
> > in inet namespace to provide new values for IFLA_LINK_NETNSID/IFLA_LINK 
> 
> This is in init namespace
> $ ip li set veth2 netns foo
> 
> $ ip monitor
> Deleted 20: veth2@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc
> noop state DOWN group default
>     link/ether c6:d0:d6:c5:23:7d brd ff:ff:ff:ff:ff:ff new-netns foo
> new-ifindex 20
> 
> It shows the new namespace in the delete message.

Ops, I have not noticed this info has been already introduced in
the commit 38e01b30563a ("dev: advertise the new ifindex when the netns
iface changes"). Thanks for the hint.

DaveM please drop this patch.

Regards,
Lorenzo

^ permalink raw reply

* [PATCH bpf-next] adding selftest for bpf's (set|get)_sockopt for SAVE_SYN
From: Nikita V. Shirokov @ 2018-08-31 16:43 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: netdev, Nikita V. Shirokov

adding selftest for feature, introduced in
commit 9452048c79404 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for
bpf_(set|get)sockopt")

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
---
 .../testing/selftests/bpf/test_tcpbpf_kern.c  | 38 +++++++++++++++++--
 .../testing/selftests/bpf/test_tcpbpf_user.c  | 31 ++++++++++++++-
 2 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
index 4b7fd540cea9..74f73b33a7b0 100644
--- a/tools/testing/selftests/bpf/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
@@ -5,6 +5,7 @@
 #include <linux/if_ether.h>
 #include <linux/if_packet.h>
 #include <linux/ip.h>
+#include <linux/ipv6.h>
 #include <linux/types.h>
 #include <linux/socket.h>
 #include <linux/tcp.h>
@@ -17,6 +18,13 @@ struct bpf_map_def SEC("maps") global_map = {
 	.type = BPF_MAP_TYPE_ARRAY,
 	.key_size = sizeof(__u32),
 	.value_size = sizeof(struct tcpbpf_globals),
+	.max_entries = 4,
+};
+
+struct bpf_map_def SEC("maps") sockopt_results = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(int),
 	.max_entries = 2,
 };
 
@@ -45,11 +53,14 @@ int _version SEC("version") = 1;
 SEC("sockops")
 int bpf_testcb(struct bpf_sock_ops *skops)
 {
-	int rv = -1;
-	int bad_call_rv = 0;
+	char header[sizeof(struct ipv6hdr) + sizeof(struct tcphdr)];
+	struct tcphdr *thdr;
 	int good_call_rv = 0;
-	int op;
+	int bad_call_rv = 0;
+	int save_syn = 1;
+	int rv = -1;
 	int v = 0;
+	int op;
 
 	op = (int) skops->op;
 
@@ -82,6 +93,21 @@ int bpf_testcb(struct bpf_sock_ops *skops)
 		v = 0xff;
 		rv = bpf_setsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v,
 				    sizeof(v));
+		if (skops->family == AF_INET6) {
+			v = bpf_getsockopt(skops, IPPROTO_TCP, TCP_SAVED_SYN,
+					   header, (sizeof(struct ipv6hdr) +
+						    sizeof(struct tcphdr)));
+			if (!v) {
+				int offset = sizeof(struct ipv6hdr);
+
+				thdr = (struct tcphdr *)(header + offset);
+				v = thdr->syn;
+				__u32 key = 1;
+
+				bpf_map_update_elem(&sockopt_results, &key, &v,
+						    BPF_ANY);
+			}
+		}
 		break;
 	case BPF_SOCK_OPS_RTO_CB:
 		break;
@@ -111,6 +137,12 @@ int bpf_testcb(struct bpf_sock_ops *skops)
 		break;
 	case BPF_SOCK_OPS_TCP_LISTEN_CB:
 		bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG);
+		v = bpf_setsockopt(skops, IPPROTO_TCP, TCP_SAVE_SYN,
+				   &save_syn, sizeof(save_syn));
+		/* Update global map w/ result of setsock opt */
+		__u32 key = 0;
+
+		bpf_map_update_elem(&sockopt_results, &key, &v, BPF_ANY);
 		break;
 	default:
 		rv = -1;
diff --git a/tools/testing/selftests/bpf/test_tcpbpf_user.c b/tools/testing/selftests/bpf/test_tcpbpf_user.c
index a275c2971376..e6eebda7d112 100644
--- a/tools/testing/selftests/bpf/test_tcpbpf_user.c
+++ b/tools/testing/selftests/bpf/test_tcpbpf_user.c
@@ -54,6 +54,26 @@ int verify_result(const struct tcpbpf_globals *result)
 	return -1;
 }
 
+int verify_sockopt_result(int sock_map_fd)
+{
+	__u32 key = 0;
+	int res;
+	int rv;
+
+	/* check setsockopt for SAVE_SYN */
+	rv = bpf_map_lookup_elem(sock_map_fd, &key, &res);
+	EXPECT_EQ(0, rv, "d");
+	EXPECT_EQ(0, res, "d");
+	key = 1;
+	/* check getsockopt for SAVED_SYN */
+	rv = bpf_map_lookup_elem(sock_map_fd, &key, &res);
+	EXPECT_EQ(0, rv, "d");
+	EXPECT_EQ(1, res, "d");
+	return 0;
+err:
+	return -1;
+}
+
 static int bpf_find_map(const char *test, struct bpf_object *obj,
 			const char *name)
 {
@@ -70,11 +90,11 @@ static int bpf_find_map(const char *test, struct bpf_object *obj,
 int main(int argc, char **argv)
 {
 	const char *file = "test_tcpbpf_kern.o";
+	int prog_fd, map_fd, sock_map_fd;
 	struct tcpbpf_globals g = {0};
 	const char *cg_path = "/foo";
 	int error = EXIT_FAILURE;
 	struct bpf_object *obj;
-	int prog_fd, map_fd;
 	int cg_fd = -1;
 	__u32 key = 0;
 	int rv;
@@ -110,6 +130,10 @@ int main(int argc, char **argv)
 	if (map_fd < 0)
 		goto err;
 
+	sock_map_fd = bpf_find_map(__func__, obj, "sockopt_results");
+	if (sock_map_fd < 0)
+		goto err;
+
 	rv = bpf_map_lookup_elem(map_fd, &key, &g);
 	if (rv != 0) {
 		printf("FAILED: bpf_map_lookup_elem returns %d\n", rv);
@@ -121,6 +145,11 @@ int main(int argc, char **argv)
 		goto err;
 	}
 
+	if (verify_sockopt_result(sock_map_fd)) {
+		printf("FAILED: Wrong sockopt stats\n");
+		goto err;
+	}
+
 	printf("PASSED!\n");
 	error = 0;
 err:
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH net-next] failover: remove set but not used variable 'primary_dev'
From: Samudrala, Sridhar @ 2018-08-31 16:39 UTC (permalink / raw)
  To: YueHaibing, David S. Miller, Stephen Hemminger, Dan Carpenter,
	Alexander Duyck, Jeff Kirsher, Liran Alon, Joao Martins
  Cc: netdev, kernel-janitors
In-Reply-To: <1535687218-23169-1-git-send-email-yuehaibing@huawei.com>

On 8/30/2018 8:46 PM, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
>
> drivers/net/net_failover.c: In function 'net_failover_slave_unregister':
> drivers/net/net_failover.c:598:35: warning:
>   variable 'primary_dev' set but not used [-Wunused-but-set-variable]

Actually this gcc option found a bug.
We need to add this check after accessing primary_dev and standby_dev.

         if (slave_dev != primary_dev && slave_dev != standby_dev)
                 return -ENODEV;

Can you resubmit with the right fix?


>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>   drivers/net/net_failover.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
> index 7ae1856..e103c94e 100644
> --- a/drivers/net/net_failover.c
> +++ b/drivers/net/net_failover.c
> @@ -595,12 +595,11 @@ static int net_failover_slave_pre_unregister(struct net_device *slave_dev,
>   static int net_failover_slave_unregister(struct net_device *slave_dev,
>   					 struct net_device *failover_dev)
>   {
> -	struct net_device *standby_dev, *primary_dev;
> +	struct net_device *standby_dev;
>   	struct net_failover_info *nfo_info;
>   	bool slave_is_standby;
>   
>   	nfo_info = netdev_priv(failover_dev);
> -	primary_dev = rtnl_dereference(nfo_info->primary_dev);
>   	standby_dev = rtnl_dereference(nfo_info->standby_dev);
>   
>   	vlan_vids_del_by_dev(slave_dev, failover_dev);
>

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] netlink: ipv4 igmp join notifications
From: Roopa Prabhu @ 2018-08-31 16:29 UTC (permalink / raw)
  To: Patrick Ruddy; +Cc: netdev, Jiří Pírko, Stephen Hemminger
In-Reply-To: <20180831112024.30477-1-pruddy@vyatta.att-mail.com>

On Fri, Aug 31, 2018 at 4:20 AM, Patrick Ruddy
<pruddy@vyatta.att-mail.com> wrote:
> Some userspace applications need to know about IGMP joins from the kernel
> for 2 reasons
> 1. To allow the programming of multicast MAC filters in hardware
> 2. To form a multicast FORUS list for non link-local multicast
>    groups to be sent to the kernel and from there to the interested
>    party.
> (1) can be fulfilled but simply sending the hardware multicast MAC
> address to be programmed but (2) requires the L3 address to be sent
> since this cannot be constructed from the MAC address whereas the
> reverse translation is a standard library function.
>
> This commit provides addition and deletion of multicast addresses
> using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
> the RTM_GETADDR extension to allow multicast join state to be read
> from the kernel.
>
> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
> ---
> v2: fix kbuild warnings.

I am still going through the series, but AFAICT, user-space caches listening to
RTNLGRP_IPV4_IFADDR will now also get multicast addresses by default ?


>
>  include/linux/igmp.h |  4 ++
>  net/ipv4/devinet.c   | 39 +++++++++++++------
>  net/ipv4/igmp.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 122 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index 119f53941c12..644a548024ed 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -19,6 +19,8 @@
>  #include <linux/timer.h>
>  #include <linux/in.h>
>  #include <linux/refcount.h>
> +#include <linux/netlink.h>
> +#include <linux/netdevice.h>
>  #include <uapi/linux/igmp.h>
>
>  static inline struct igmphdr *igmp_hdr(const struct sk_buff *skb)
> @@ -130,6 +132,8 @@ extern void ip_mc_unmap(struct in_device *);
>  extern void ip_mc_remap(struct in_device *);
>  extern void ip_mc_dec_group(struct in_device *in_dev, __be32 addr);
>  extern void ip_mc_inc_group(struct in_device *in_dev, __be32 addr);
> +extern int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb,
> +                            struct net_device *dev);
>  int ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed);
>
>  #endif
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index ea4bd8a52422..42f7dcc4fb5e 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -57,6 +57,7 @@
>  #endif
>  #include <linux/kmod.h>
>  #include <linux/netconf.h>
> +#include <linux/igmp.h>
>
>  #include <net/arp.h>
>  #include <net/ip.h>
> @@ -1651,6 +1652,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>         int h, s_h;
>         int idx, s_idx;
>         int ip_idx, s_ip_idx;
> +       int multicast, mcast_idx;
>         struct net_device *dev;
>         struct in_device *in_dev;
>         struct in_ifaddr *ifa;
> @@ -1659,6 +1661,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>         s_h = cb->args[0];
>         s_idx = idx = cb->args[1];
>         s_ip_idx = ip_idx = cb->args[2];
> +       multicast = cb->args[3];
> +       mcast_idx = cb->args[4];
>
>         for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
>                 idx = 0;
> @@ -1675,18 +1679,29 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>                         if (!in_dev)
>                                 goto cont;
>
> -                       for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
> -                            ifa = ifa->ifa_next, ip_idx++) {
> -                               if (ip_idx < s_ip_idx)
> -                                       continue;
> -                               if (inet_fill_ifaddr(skb, ifa,
> -                                            NETLINK_CB(cb->skb).portid,
> -                                            cb->nlh->nlmsg_seq,
> -                                            RTM_NEWADDR, NLM_F_MULTI) < 0) {
> -                                       rcu_read_unlock();
> -                                       goto done;
> +                       if (!multicast) {
> +                               for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
> +                                    ifa = ifa->ifa_next, ip_idx++) {
> +                                       if (ip_idx < s_ip_idx)
> +                                               continue;
> +                                       if (inet_fill_ifaddr(skb, ifa,
> +                                                            NETLINK_CB(cb->skb).portid,
> +                                                            cb->nlh->nlmsg_seq,
> +                                                            RTM_NEWADDR,
> +                                                            NLM_F_MULTI) < 0) {
> +                                               rcu_read_unlock();
> +                                               goto done;
> +                                       }
> +                                       nl_dump_check_consistent(cb,
> +                                                                nlmsg_hdr(skb));
>                                 }
> -                               nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> +                               /* set for multicast loop */
> +                               multicast++;
> +                       }
> +                       /* loop over multicast addresses */
> +                       if (ip_mc_dump_ifaddr(skb, cb, dev) < 0) {
> +                               rcu_read_unlock();
> +                               goto done;
>                         }
>  cont:
>                         idx++;
> @@ -1698,6 +1713,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>         cb->args[0] = h;
>         cb->args[1] = idx;
>         cb->args[2] = ip_idx;
> +       cb->args[3] = multicast;
> +       cb->args[4] = mcast_idx;
>
>         return skb->len;
>  }
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index cf75f8944b05..c9bbd1d27124 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -86,6 +86,7 @@
>  #include <linux/inetdevice.h>
>  #include <linux/igmp.h>
>  #include <linux/if_arp.h>
> +#include <net/netlink.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/times.h>
>  #include <linux/pkt_sched.h>
> @@ -1384,6 +1385,91 @@ static void ip_mc_hash_remove(struct in_device *in_dev,
>  }
>
>
> +static int fill_addr(struct sk_buff *skb, struct net_device *dev, __be32 addr,
> +                    int type, unsigned int flags)
> +{
> +       struct nlmsghdr *nlh;
> +       struct ifaddrmsg *ifm;
> +
> +       nlh = nlmsg_put(skb, 0, 0, type, sizeof(*ifm), flags);
> +       if (!nlh)
> +               return -EMSGSIZE;
> +
> +       ifm = nlmsg_data(nlh);
> +       ifm->ifa_family = AF_INET;
> +       ifm->ifa_prefixlen = 32;
> +       ifm->ifa_flags = IFA_F_PERMANENT;
> +       ifm->ifa_scope = RT_SCOPE_LINK;
> +       ifm->ifa_index = dev->ifindex;
> +
> +       if (nla_put_in_addr(skb, IFA_ADDRESS, addr))
> +               goto nla_put_failure;
> +       nlmsg_end(skb, nlh);
> +       return 0;
> +
> +nla_put_failure:
> +       nlmsg_cancel(skb, nlh);
> +       return -EMSGSIZE;
> +}
> +
> +static inline size_t addr_nlmsg_size(void)
> +{
> +       return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
> +               + nla_total_size(sizeof(__be32));
> +}
> +
> +static void ip_mc_addr_notify(struct net_device *dev, __be32 addr, int type)
> +{
> +       struct net *net = dev_net(dev);
> +       struct sk_buff *skb;
> +       int err = -ENOBUFS;
> +
> +       skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
> +       if (!skb)
> +               goto errout;
> +
> +       err = fill_addr(skb, dev, addr, type, 0);
> +       if (err < 0) {
> +               WARN_ON(err == -EMSGSIZE);
> +               kfree_skb(skb);
> +               goto errout;
> +       }
> +       rtnl_notify(skb, net, 0, RTNLGRP_IPV4_IFADDR, NULL, GFP_ATOMIC);
> +       return;
> +errout:
> +       if (err < 0)
> +               rtnl_set_sk_err(net, RTNLGRP_LINK, err);


s/RTNLGRP_LINK/RTNLGRP_IPV4_IFADDR/




> +}
> +
> +int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb,
> +                     struct net_device *dev)
> +{
> +       int s_idx;
> +       int idx = 0;
> +       struct ip_mc_list *im;
> +       struct in_device *in_dev;
> +
> +       ASSERT_RTNL();
> +
> +       s_idx = cb->args[4];
> +       in_dev = __in_dev_get_rtnl(dev);
> +
> +       for_each_pmc_rtnl(in_dev, im) {
> +               if (idx < s_idx)
> +                       continue;
> +               if (fill_addr(skb, dev, im->multiaddr, RTM_NEWADDR,
> +                             NLM_F_MULTI) < 0)
> +                       goto done;
> +               nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> +               idx++;
> +       }
> +
> + done:
> +       cb->args[4] = idx;
> +
> +       return skb->len;
> +}
> +
>  /*
>   *     A socket has joined a multicast group on device dev.
>   */
> @@ -1433,6 +1519,8 @@ static void __ip_mc_inc_group(struct in_device *in_dev, __be32 addr,
>         igmpv3_del_delrec(in_dev, im);
>  #endif
>         igmp_group_added(im);
> +
> +       ip_mc_addr_notify(in_dev->dev, addr, RTM_NEWADDR);
>         if (!in_dev->dead)
>                 ip_rt_multicast_event(in_dev);
>  out:
> @@ -1664,6 +1752,8 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
>                                 in_dev->mc_count--;
>                                 igmp_group_dropped(i);
>                                 ip_mc_clear_src(i);
> +                               ip_mc_addr_notify(in_dev->dev, addr,
> +                                                 RTM_DELADDR);
>
>                                 if (!in_dev->dead)
>                                         ip_rt_multicast_event(in_dev);
> --
> 2.17.1
>

^ permalink raw reply

* [PATCH net-next] net: nixge: Fix Kconfig warning with OF_MDIO
From: Moritz Fischer @ 2018-08-31 20:30 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, andrew, moritz.fischer, linux-next,
	Moritz Fischer

Fix Kconfig warning with OF_MDIO where OF_MDIO was
selected unconditionally instead of only when
OF is actually enabled.

Fixes 7e8d5755be0e ("net: nixge: Add support for 64-bit platforms")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---
 drivers/net/ethernet/ni/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ni/Kconfig b/drivers/net/ethernet/ni/Kconfig
index 04e315704f71..c73978474c4b 100644
--- a/drivers/net/ethernet/ni/Kconfig
+++ b/drivers/net/ethernet/ni/Kconfig
@@ -20,7 +20,7 @@ config NI_XGE_MANAGEMENT_ENET
 	tristate "National Instruments XGE management enet support"
 	depends on HAS_IOMEM && HAS_DMA
 	select PHYLIB
-	select OF_MDIO
+	select OF_MDIO if OF
 	help
 	  Simple LAN device for debug or management purposes. Can
 	  support either 10G or 1G PHYs via SFP+ ports.
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH v2 02/29] Documentation: nvmem: document lookup entries
From: Brian Norris @ 2018-08-31 20:30 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Jonathan Corbet, Sekhar Nori, Kevin Hilman, Russell King,
	Arnd Bergmann, Greg Kroah-Hartman, David Woodhouse,
	Boris Brezillon, Marek Vasut, Richard Weinberger,
	Grygorii Strashko, David S . Miller, Srinivas Kandagatla, Naren,
	Mauro Carvalho Chehab, Andrew Morton, Lukas Wunner, Dan Carpenter,
	Florian Fainelli
In-Reply-To: <20180810080526.27207-3-brgl@bgdev.pl>

On Fri, Aug 10, 2018 at 10:04:59AM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> Describe the usage of nvmem cell lookup tables.
> 
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> ---
>  Documentation/nvmem/nvmem.txt | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/Documentation/nvmem/nvmem.txt b/Documentation/nvmem/nvmem.txt
> index 8d8d8f58f96f..9d5e3ca2b4f3 100644
> --- a/Documentation/nvmem/nvmem.txt
> +++ b/Documentation/nvmem/nvmem.txt
> @@ -58,6 +58,34 @@ static int qfprom_probe(struct platform_device *pdev)
>  It is mandatory that the NVMEM provider has a regmap associated with its
>  struct device. Failure to do would return error code from nvmem_register().
>  
> +Additionally it is possible to create nvmem cell lookup entries and register
> +them with the nvmem framework from machine code as shown in the example below:

Maybe it's partially a lacking in the existing documentation, but what
does the "name" and the "nvmem_name" mean here? AFAICT, "nvmem_name" is
akin to a provider identifier; and "name" is a key to match with the
consumer. It feels like this should be in either the header / kerneldoc
or this file. Or maybe both.

Does this mean there can only be a single "mac-address" cell in the
system? I have systems where there are multiple MACs provided in flash
storage, and we need to map them to ethernet0 and ethernet1. Is that
supported here?

Brian

> +static struct nvmem_cell_lookup foobar_lookup = {
> +	.info = {
> +		.name = "mac-address",
> +		.offset = 0xd000,
> +		.bytes = ERH_ALEN,
> +	},
> +	.nvmem_name = "foobar",
> +};
> +
> +static void foobar_register(void)
> +{
> +	...
> +	nvmem_add_lookup_table(&foobar_lookup, 1);
> +	...
> +}
> +
> +A lookup entry table can be later removed if needed:
> +
> +static void foobar_fini(void)
> +{
> +	...
> +	nvmem_del_lookup_table(&foobar_lookup, 1);
> +	...
> +}
> +
>  NVMEM Consumers
>  +++++++++++++++
>  
> -- 
> 2.18.0
> 

^ permalink raw reply

* Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace
From: David Ahern @ 2018-08-31 16:21 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: davem, netdev
In-Reply-To: <20180831161902.GD6236@localhost.localdomain>

On 8/31/18 10:19 AM, Lorenzo Bianconi wrote:
>> On 8/31/18 5:43 AM, Lorenzo Bianconi wrote:
>>> When moving a veth device to another namespace, userspace receives a
>>> RTM_DELLINK message indicating the device has been removed from current
>>> netns. However, the other peer does not receive a netlink event
>>> containing new values for IFLA_LINK_NETNSID and IFLA_LINK veth
>>> attributes.
>>> Fix that behaviour sending to userspace a RTM_NEWLINK message in the peer
>>> namespace to report new IFLA_LINK_NETNSID/IFLA_LINK values
>>>
>>
>> A newlink message is generated in the new namespace. What information is
>> missing from that message?
>>
> 
> Hi David,
> 
> let's assume we have two veth paired devices (veth0 and veth1) on inet
> namespace. When moving a veth1 to another namespace, userspace is notified
> with RTM_DELLINK event on inet namespace to indicate that veth1 has been
> moved to another namespace. However some userspace applications
> (e.g. NetworkManager), listening for events on inet namespace, are interested
> in veth1 ifindex in the new namespace. This patch sends a new RTM_NEWLINK event
> in inet namespace to provide new values for IFLA_LINK_NETNSID/IFLA_LINK 

This is in init namespace
$ ip li set veth2 netns foo

$ ip monitor
Deleted 20: veth2@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc
noop state DOWN group default
    link/ether c6:d0:d6:c5:23:7d brd ff:ff:ff:ff:ff:ff new-netns foo
new-ifindex 20

It shows the new namespace in the delete message.

^ permalink raw reply

* Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace
From: Lorenzo Bianconi @ 2018-08-31 16:19 UTC (permalink / raw)
  To: David Ahern; +Cc: davem, netdev
In-Reply-To: <1321a4ad-75c5-358f-3a5d-1ec1549a9474@gmail.com>

> On 8/31/18 5:43 AM, Lorenzo Bianconi wrote:
> > When moving a veth device to another namespace, userspace receives a
> > RTM_DELLINK message indicating the device has been removed from current
> > netns. However, the other peer does not receive a netlink event
> > containing new values for IFLA_LINK_NETNSID and IFLA_LINK veth
> > attributes.
> > Fix that behaviour sending to userspace a RTM_NEWLINK message in the peer
> > namespace to report new IFLA_LINK_NETNSID/IFLA_LINK values
> > 
> 
> A newlink message is generated in the new namespace. What information is
> missing from that message?
> 

Hi David,

let's assume we have two veth paired devices (veth0 and veth1) on inet
namespace. When moving a veth1 to another namespace, userspace is notified
with RTM_DELLINK event on inet namespace to indicate that veth1 has been
moved to another namespace. However some userspace applications
(e.g. NetworkManager), listening for events on inet namespace, are interested
in veth1 ifindex in the new namespace. This patch sends a new RTM_NEWLINK event
in inet namespace to provide new values for IFLA_LINK_NETNSID/IFLA_LINK 

Regards,
Lorenzo

^ permalink raw reply

* Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification
From: Edward Cree @ 2018-08-31 15:50 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev
In-Reply-To: <20180830021837.smtakvn62ieastst@ast-mbp>

On 30/08/18 03:18, Alexei Starovoitov wrote:
> I think it's a better base to continue debugging.
> In particular:
> 1. we have instability issue in the verifier.
>  from time to time the verifier goes to process extra 7 instructions on one
>  of the cilium tests. This was happening before and after this set.
I can't reproduce this; I always get 36926.  Can you try recording the
 verifier log and diff the output between the two cases?
> 2. there is a nice improvement in number of processed insns with this set,
>  but the difference I cannot explain, hence it has to debugged.
>  In theory the liveness rewrite shouldn't cause the difference in processed insns.
I shall attack this with the same methods I used for the other delta.
 Since that one turned out to be a real bug in the patch, I'm not so
 sanguine as to dismiss this one as probably connected to issue 1.

-Ed

^ permalink raw reply

* [bpf-next PATCH 3/3] xdp: split code for map vs non-map redirect
From: Jesper Dangaard Brouer @ 2018-08-31 15:26 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Borkmann, Alexei Starovoitov, Jesper Dangaard Brouer
In-Reply-To: <153572913891.27338.14887146010547808356.stgit@firesoul>

The compiler does an efficient job of inlining static C functions.
Perf top clearly shows that almost everything gets inlined into the
function call xdp_do_redirect.

The function xdp_do_redirect end-up containing and interleaving the
map and non-map redirect code.  This is sub-optimal, as it would be
strange for an XDP program to use both types of redirect in the same
program. The two use-cases are separate, and interleaving the code
just cause more instruction-cache pressure.

I would like to stress (again) that the non-map variant bpf_redirect
is very slow compared to the bpf_redirect_map variant, approx half the
speed.  Measured with driver i40e the difference is:

- map     redirect: 13,250,350 pps
- non-map redirect:  7,491,425 pps

For this reason, the function name of the non-map variant of redirect
have been called xdp_do_redirect_slow.  This hopefully gives a hint
when using perf, that this is not the optimal XDP redirect operating mode.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |   52 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index ec1b4eb0d3d4..c4ad1b93167f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3170,6 +3170,32 @@ static int __bpf_tx_xdp(struct net_device *dev,
 	return 0;
 }
 
+/* non-static to avoid inline by compiler */
+int xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
+{
+	struct net_device *fwd;
+	u32 index = ri->ifindex;
+	int err;
+
+	fwd = dev_get_by_index_rcu(dev_net(dev), index);
+	ri->ifindex = 0;
+	if (unlikely(!fwd)) {
+		err = -EINVAL;
+		goto err;
+	}
+
+	err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
+	if (unlikely(err))
+		goto err;
+
+	_trace_xdp_redirect(dev, xdp_prog, index);
+	return 0;
+err:
+	_trace_xdp_redirect_err(dev, xdp_prog, index, err);
+	return err;
+}
+
 static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 			    struct bpf_map *map,
 			    struct xdp_buff *xdp,
@@ -3264,9 +3290,9 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 }
 
 static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
-			       struct bpf_prog *xdp_prog, struct bpf_map *map)
+			       struct bpf_prog *xdp_prog, struct bpf_map *map,
+			       struct bpf_redirect_info *ri)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 	u32 index = ri->ifindex;
 	void *fwd = NULL;
 	int err;
@@ -3299,29 +3325,11 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 	struct bpf_map *map = READ_ONCE(ri->map);
-	struct net_device *fwd;
-	u32 index = ri->ifindex;
-	int err;
 
 	if (likely(map))
-		return xdp_do_redirect_map(dev, xdp, xdp_prog, map);
+		return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri);
 
-	fwd = dev_get_by_index_rcu(dev_net(dev), index);
-	ri->ifindex = 0;
-	if (unlikely(!fwd)) {
-		err = -EINVAL;
-		goto err;
-	}
-
-	err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
-	if (unlikely(err))
-		goto err;
-
-	_trace_xdp_redirect(dev, xdp_prog, index);
-	return 0;
-err:
-	_trace_xdp_redirect_err(dev, xdp_prog, index, err);
-	return err;
+	return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri);
 }
 EXPORT_SYMBOL_GPL(xdp_do_redirect);
 

^ permalink raw reply related

* [bpf-next PATCH 2/3] xdp: explicit inline __xdp_map_lookup_elem
From: Jesper Dangaard Brouer @ 2018-08-31 15:26 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Borkmann, Alexei Starovoitov, Jesper Dangaard Brouer
In-Reply-To: <153572913891.27338.14887146010547808356.stgit@firesoul>

The compiler chooses to not-inline the function __xdp_map_lookup_elem,
because it can see that it is used by both Generic-XDP and native-XDP
do redirect calls (xdp_do_generic_redirect_map and xdp_do_redirect_map).

The compiler cannot know that this is a bad choice, as it cannot know
that a net device cannot run both XDP modes (Generic or Native) at the
same time.  Thus, mark this function inline, even-though we normally
leave this up-to the compiler.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 520f5e9e0b73..ec1b4eb0d3d4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3232,7 +3232,7 @@ void xdp_do_flush_map(void)
 }
 EXPORT_SYMBOL_GPL(xdp_do_flush_map);
 
-static void *__xdp_map_lookup_elem(struct bpf_map *map, u32 index)
+static inline void *__xdp_map_lookup_elem(struct bpf_map *map, u32 index)
 {
 	switch (map->map_type) {
 	case BPF_MAP_TYPE_DEVMAP:
@@ -3275,7 +3275,7 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 	WRITE_ONCE(ri->map, NULL);
 
 	fwd = __xdp_map_lookup_elem(map, index);
-	if (!fwd) {
+	if (unlikely(!fwd)) {
 		err = -EINVAL;
 		goto err;
 	}
@@ -3303,7 +3303,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 	u32 index = ri->ifindex;
 	int err;
 
-	if (map)
+	if (likely(map))
 		return xdp_do_redirect_map(dev, xdp, xdp_prog, map);
 
 	fwd = dev_get_by_index_rcu(dev_net(dev), index);

^ permalink raw reply related

* [bpf-next PATCH 1/3] xdp: unlikely instrumentation for xdp map redirect
From: Jesper Dangaard Brouer @ 2018-08-31 15:26 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Borkmann, Alexei Starovoitov, Jesper Dangaard Brouer
In-Reply-To: <153572913891.27338.14887146010547808356.stgit@firesoul>

Notice the compiler generated ASM code layout was suboptimal.  It
assumed map enqueue errors as the likely case, which is shouldn't.
It assumed that xdp_do_flush_map() was a likely case, due to maps
changing between packets, which should be very unlikely.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index c25eb36f1320..520f5e9e0b73 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3182,7 +3182,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 		struct bpf_dtab_netdev *dst = fwd;
 
 		err = dev_map_enqueue(dst, xdp, dev_rx);
-		if (err)
+		if (unlikely(err))
 			return err;
 		__dev_map_insert_ctx(map, index);
 		break;
@@ -3191,7 +3191,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 		struct bpf_cpu_map_entry *rcpu = fwd;
 
 		err = cpu_map_enqueue(rcpu, xdp, dev_rx);
-		if (err)
+		if (unlikely(err))
 			return err;
 		__cpu_map_insert_ctx(map, index);
 		break;
@@ -3279,7 +3279,7 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 		err = -EINVAL;
 		goto err;
 	}
-	if (ri->map_to_flush && ri->map_to_flush != map)
+	if (ri->map_to_flush && unlikely(ri->map_to_flush != map))
 		xdp_do_flush_map();
 
 	err = __bpf_tx_xdp_map(dev, fwd, map, xdp, index);

^ permalink raw reply related

* [bpf-next PATCH 0/3] XDP micro optimizations for redirect
From: Jesper Dangaard Brouer @ 2018-08-31 15:26 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Borkmann, Alexei Starovoitov, Jesper Dangaard Brouer

This patchset contains XDP micro optimizations for the redirect core.
These are not functional changes.  The optimizations revolve around
getting the compiler to layout the code in a way that reflect how XDP
redirect is used.

Today the compiler chooses to inline and uninline (static C functions)
in a suboptimal way, compared to how XDP redirect can be used. Perf
top clearly shows that almost everything gets inlined into the
function call xdp_do_redirect.

The way the compiler chooses to inlines, does not reflect how XDP
redirect is used, as the compile cannot know this.

---

Jesper Dangaard Brouer (3):
      xdp: unlikely instrumentation for xdp map redirect
      xdp: explicit inline __xdp_map_lookup_elem
      xdp: split code for map vs non-map redirect


 net/core/filter.c |   64 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace
From: David Ahern @ 2018-08-31 15:24 UTC (permalink / raw)
  To: Lorenzo Bianconi, davem; +Cc: netdev
In-Reply-To: <51722660f2ef860779e227541dab77046496f135.1535712096.git.lorenzo.bianconi@redhat.com>

On 8/31/18 5:43 AM, Lorenzo Bianconi wrote:
> When moving a veth device to another namespace, userspace receives a
> RTM_DELLINK message indicating the device has been removed from current
> netns. However, the other peer does not receive a netlink event
> containing new values for IFLA_LINK_NETNSID and IFLA_LINK veth
> attributes.
> Fix that behaviour sending to userspace a RTM_NEWLINK message in the peer
> namespace to report new IFLA_LINK_NETNSID/IFLA_LINK values
> 

A newlink message is generated in the new namespace. What information is
missing from that message?

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Willem de Bruijn @ 2018-08-31 15:11 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Sowmini Varadhan, Network Development, Willem de Bruijn
In-Reply-To: <6fad62328ef919ada8036735cd0b835900d2f747.camel@redhat.com>

On Fri, Aug 31, 2018 at 9:44 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Fri, 2018-08-31 at 09:08 -0400, Willem de Bruijn wrote:
> > On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote:
> > > > That said, for negotiated flows an inverse GRO feature could
> > > > conceivably be implemented to reduce rx stack traversal, too.
> > > > Though due to interleaving of packets on the wire, it aggregation
> > > > would be best effort, similar to TCP TSO and GRO using the
> > > > PSH bit as packetization signal.
> > >
> > > Reviving this old thread, before I forgot again. I have some local
> > > patches implementing UDP GRO in a dual way to current GSO_UDP_L4
> > > implementation: several datagram with the same length are aggregated
> > > into a single one, and the user space receive a single larger packet
> > > instead of multiple ones. I hope quic can leverage such scenario, but I
> > > really know nothing about the protocol.
> > >
> > > I measure roughly a 50% performance improvement with udpgso_bench in
> > > respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU
> > > usage on the receiver[1]. Some additional hacking to the general GRO
> > > bits is required to avoid useless socket lookups for ingress UDP
> > > packets when UDP_GSO is not enabled.
> > >
> > > If there is interest on this topic, I can share some RFC patches
> > > (hopefully somewhat next week).
> >
> > As Eric pointed out, QUIC reception on mobile clients over the WAN
> > may not see much gain. But apparently there is a non-trivial amount
> > of traffic the other way, to servers. Again, WAN might limit whatever
> > gain we get, but I do want to look into that. And there are other UDP high
> > throughput workloads (with or without QUIC) between servers.
> >
> > If you have patches, please do share them.
>
> I'll try to clean them up and send them next week (as RFC).
>
> > I actually also have a rough
> > patch that I did not consider ready to share yet. Based on Tom's existing
> > socket lookup in udp_gro_receive to detect whether a local destination
> > exists and whether it has set an option to support receiving coalesced
> > payloads (along with a cmsg to share the segment size).
>
> That is more or less what I'm doing here.
> Side note: I had test it in baremetal, as veth/lo do not trigger the
> GRO path: selftest of this feature is not so straightforward.
>
> > Converting udp_recvmsg to split apart gso packets to make this
> > transparent seems to me to be too complex and not worth the effort.
>
> Agreed. Moreover doing many, small, recvmsg() instead of a single,
> large, one will hit the performances very badly due to PTI and
> HARDENED_USERCOPY.
>
> > If a local socket is not found in udp_gro_receive, this could also be
> > tentative interpreted as a non-local path (with false positives), enabling
> > transparent use of GRO + GSO batching on the forwarding path.
>
> That sounds interesting, even if false positive looks dangerous to me.
> Just to be on the same page, which false positive examples are you
> thinking at? UDP sockets bound to local address behind NAT?

Any packets that would otherwise get dropped, such as packets with
local destination, but no local bound socket. This may increase the
amount of cycles spent on such packets, potentially increasing a DoS
vector.

^ permalink raw reply

* linux-next: Signed-off-by missing for commit in the bpf-next tree
From: Stephen Rothwell @ 2018-08-31 19:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 332 bytes --]

Hi all,

Commits

  0f93c3995c93 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample program")
  9452048c7940 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt")
  175eba5962b4 ("xdp: remove redundant variable 'headroom'")

are missing a Signed-off-by from their committers.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* IPv6 neighbor discovery issues on 4.18
From: Brian Rak @ 2018-08-31 14:49 UTC (permalink / raw)
  To: netdev

We've upgraded a few machines to a 4.18.3 kernel and we're running into 
weird IPv6 neighbor discovery issues.  Basically, the machines stop 
responding to inbound IPv6 neighbor solicitation requests, which very 
quickly breaks all IPv6 connectivity.

It seems like the routing table gets confused:

# ip -6 route get fe80::4e16:fc00:c7a0:7800 dev br0
RTNETLINK answers: Network is unreachable
# ping6 fe80::4e16:fc00:c7a0:7800 -I br0
connect: Network is unreachable
yet

# ip -6 route | grep fe80 | grep br0
fe80::/64 dev br0 proto kernel metric 256 pref medium

fe80::4e16:fc00:c7a0:7800 is the link-local IP of the server's default 
gateway.

In this case, br0 has a single adapter attached to it.

I haven't been able to come up with any sort of reproduction steps here, 
this seems to happen after a few days of uptime in our environment.  The 
last known good release we have here is 4.17.13.

Any suggestions for troubleshooting this?  Sometimes we see machines fix 
themselves, but we haven't been able to figure out what's happening that 
helps.

^ permalink raw reply

* Re: linux-next: Tree for Aug 31 (nixge.c and phy)
From: Andrew Lunn @ 2018-08-31 18:19 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: Randy Dunlap, sfr, linux-next, Linux Kernel Mailing List, netdev
In-Reply-To: <CAJYdmeOELPeG=xwkW5ODgQ+hVnHXFn6YOhtCaCdZS2z-uY2Uvw@mail.gmail.com>

> Sounds good. What's the process here? Follow-up patch to net-next?

Hi Moritz

Yes. Please include a Fixes: tag, so we have some traceability.

     Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox