Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [patch v1 1/2] dt-bindings: net: add binding documentation for mlxsw thermal control
From: Vadim Pasternak @ 2017-08-29 17:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org,
	ivecera-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20170829172254.GG8235-g2DYL2Zd6BY@public.gmane.org>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew-g2DYL2Zd6BY@public.gmane.org]
> Sent: Tuesday, August 29, 2017 8:23 PM
> To: Vadim Pasternak <vadimp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org; davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org; jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org;
> ivecera-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [patch v1 1/2] dt-bindings: net: add binding documentation for
> mlxsw thermal control
> 
> > +- compatible		: "mellanox,mlxsw_minimal"
> 
> Interesting product name. Is there a mlxsw_maximal planned?
> 

Hi Andrew,

Thank you very much for review.

No plans for such product. We just have fully functional drivers for different
kind of Mellanox switch devices like spectrum, switchib, switchx2. All of them
work over PCI bus. The minimal is supposed to be used for the chassis
management and we uses it at BMC side. It works over I2C bus and doesn't
depend on switch type. So it has a minaml functionality, so name "minimal".

> > +- reg			: The I2C address of the device.
> > +
> > +Optional properties:
> > +- cooling-phandle	: phandle of the cooling device, which is to be used
> > +			  for the zone thermal control.
> > +			  If absent, cooling device controlled internally by
> > +			  the ASIC may be used.
> > +
> > +- trips			: the nodes to describe a point in the
> temperature
> > +			  domain with key temperatures at which cooling is
> > +			  recommended. Each node must contain the next
> values:
> > +			  - type: the trip type. Expected values are:
> > +			    0 - a trip point to enable active cooling;
> > +			    1 - a trip point to enable passive cooling;
> > +			    2 - a trip point to notify emergency;
> > +			  - temperature: unsigned integer indicating the trip
> > +			    temperature level in millicelsius;
> > +			  - minimum cooling state allowed within the trip
> node;
> > +			  - maximum cooling state allowed within the trip
> node;
> > +
> > +Example:
> > +	asic_thermal: mlxsw_minimal@48 {
> > +		compatible = "mlxsw_minimal";
> 
> You missed the vendor part.

Acked.

> 
> > +		reg = <0x48>;
> > +		status = "disabled";
> 
> An example with it disabled?

We just use it in such way at BMC side. It's disabled by default and upon
event indicating the good health for the device the device driver is
connected. I can remove it from the example. But for BMC it's actually
the default state.

> 
> > +		cooling-phandle = <&cooling>;
> > +
> > +		trips {
> > +			trip@0 {
> > +				trip = <0 75000 0 0>;
> > +			};
> 
> I don't know much about the thermal subsystem. But looking at other
> example binding documents, you seem to do something different here to
> other drivers. Why do you not use what seems to be the common format:

In mlxsw_thermal driver we have definition for the thermal trips, which contains
the type, like  "active" of "passive", temperature  in millicelsius and min/max states
for cooling device. These vector defines thermal trip points.
The hysteresis parameter is not relevant.

For example, ASIC thermal sensor is associated with the cooling device like:
&pwm_tacho {
...
	cooling: fan@0 {
		reg = <0x00>;
		cooling-levels = /bits/ 8 <125 151 177 203 229 255>;
		aspeed,fan-tach-ch = /bits/ 8 <0x00>;
	};

And the below sub-nodes
			trip@0 {
				trip = <0 75000 0 0>;
			};
			trip@1 {
				trip = <2 85000 1 5>;
			};
			trip@3 {
				trip = <2 105000 5 5>;
			};

defines that PWM should be at default speed (125), while temperature is
below 75000, should be at max speed (255), while temperature is above 
10500, and should step according the temperate trend between.

Thanks,
Vadim.

> 
>                trips {
>                         cpu_alert0: cpu-alert0 {
>                                 temperature = <90000>; /* millicelsius */
>                                 hysteresis = <2000>; /* millicelsius */
>                                 type = "active";
>                         };
>                         cpu_alert1: cpu-alert1 {
>                                 temperature = <100000>; /* millicelsius */
>                                 hysteresis = <2000>; /* millicelsius */
>                                 type = "passive";
>                         };
>                         cpu_crit: cpu-crit {
>                                 temperature = <125000>; /* millicelsius */
>                                 hysteresis = <2000>; /* millicelsius */
>                                 type = "critical";
>                         };
>                 };
> 
> 	Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: stmmac: constify clk_div_table
From: David Miller @ 2017-08-29 17:56 UTC (permalink / raw)
  To: arvind.yadav.cs
  Cc: khilman, carlo, alexandre.torgue, peppe.cavallaro, linux-kernel,
	linux-amlogic, linux-arm-kernel, netdev
In-Reply-To: <a54fe9764d287df370c16b0f0814d09d8fa24591.1503899422.git.arvind.yadav.cs@gmail.com>

From: Arvind Yadav <arvind.yadav.cs@gmail.com>
Date: Mon, 28 Aug 2017 11:22:20 +0530

> clk_div_table are not supposed to change at runtime.
> meson8b_dwmac structure is working with const clk_div_table.
> So mark the non-const structs as const.
> 
> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt
From: David Miller @ 2017-08-29 17:56 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, pabeni, chunwang, syzkaller
In-Reply-To: <4cfe77b4a829c0d6134b842fe2ea7c41b6b210ff.1503888301.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 28 Aug 2017 10:45:01 +0800

> ChunYu found a kernel warn_on during syzkaller fuzzing:
> 
> [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0
> [40226.144849] Call Trace:
> [40226.147590]  <IRQ>
> [40226.149859]  dump_stack+0xe2/0x186
> [40226.176546]  __warn+0x1a4/0x1e0
> [40226.180066]  warn_slowpath_null+0x31/0x40
> [40226.184555]  inet_sock_destruct+0x78d/0x9a0
> [40226.246355]  __sk_destruct+0xfa/0x8c0
> [40226.290612]  rcu_process_callbacks+0xaa0/0x18a0
> [40226.336816]  __do_softirq+0x241/0x75e
> [40226.367758]  irq_exit+0x1f6/0x220
> [40226.371458]  smp_apic_timer_interrupt+0x7b/0xa0
> [40226.376507]  apic_timer_interrupt+0x93/0xa0
> 
> The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
> As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"),
> udp has changed to use udp_destruct_sock as sk_destruct where it would
> udp_rmem_release all rmem.
> 
> But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
> changing family to PF_INET. If rmem is not 0 at that time, and there is
> no place to release rmem before calling inet_sock_destruct, the warn_on
> will be triggered.
> 
> This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
> any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
> already set it's sk_destruct with inet_sock_destruct and UDP has set with
> udp_destruct_sock since they're created.
> 
> Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers")
> Reported-by: ChunYu Wang <chunwang@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: UDP sockets oddities
From: Florian Fainelli @ 2017-08-29 17:53 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, pabeni, willemb
In-Reply-To: <deb227c9-728a-bd98-f282-1478d71353a8@gmail.com>

On 08/26/2017 11:56 AM, Florian Fainelli wrote:
> 
> 
> On 08/26/2017 05:47 AM, Eric Dumazet wrote:
>> On Fri, 2017-08-25 at 21:19 -0700, David Miller wrote:
>>
>>> Agreed, but the ARP resolution queue really needs to scale it's backlog
>>> to the physical technology it is attached to.
>> Yes, last time (in 2011) we increased the old limit of 3 packets :/
>>
>> We probably should match sysctl_wmem_max so that a single socket
>> provider would hit its sk_sndbuf limit

Eric, do you want to post this as a formal patch? I don't think I
understand these tunables enough to provide a good commit message
anyways. Thanks!

> 
> Before:
> /proc/sys/net/ipv4/neigh/eth0/unres_qlen:34
> /proc/sys/net/ipv4/neigh/eth0/unres_qlen_bytes:65536
> /proc/sys/net/ipv4/neigh/gphy/unres_qlen:34
> /proc/sys/net/ipv4/neigh/gphy/unres_qlen_bytes:65536
> 
> After:
> /proc/sys/net/ipv4/neigh/eth0/unres_qlen:106
> /proc/sys/net/ipv4/neigh/eth0/unres_qlen_bytes:229376
> /proc/sys/net/ipv4/neigh/gphy/unres_qlen:106
> /proc/sys/net/ipv4/neigh/gphy/unres_qlen_bytes:229376
> 
> and this does help a lot with the test case reported over an hour, only
> 2 packets lost:
> 
> # perf record -a -g -e skb:kfree_skb iperf -c 192.168.1.23 -b 900M -t
> 3600 -u
> ------------------------------------------------------------
> Client connecting to 192.168.1.23, UDP port 5001
> Sending 1470 byte datagrams, IPG target: 13.07 us (kalman adjust)
> UDP buffer size:  224 KByte (default)
> ------------------------------------------------------------
> [  4] local 192.168.1.66 port 48209 connected with 192.168.1.23 port 5001
> write failed: Invalid argument
> [ ID] Interval       Transfer     Bandwidth
> [  4]  0.0-404.9 sec  4.51 GBytes  95.7 Mbits/sec
> [  4] Sent 3294727 datagrams
> [  4] Server Report:
> [  4]  0.0-405.1 sec  4.51 GBytes  95.6 Mbits/sec  14.979 ms
> 2/3294728 (6.1e-05%)
> 
> Thanks Eric!
> 
>>
>> Something like :
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> index 6b0bc0f715346a097a6df46e2ba2771359abcd23..7777dceb78107c0019fb39d5b69be1959005b78e 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -109,7 +109,8 @@ neigh/default/unres_qlen_bytes - INTEGER
>>  	queued for each	unresolved address by other network layers.
>>  	(added in linux 3.3)
>>  	Setting negative value is meaningless and will return error.
>> -	Default: 65536 Bytes(64KB)
>> +	Default: SK_WMEM_MAX, enough to store 256 packets of medium size
>> +		 (less than 256 bytes per packet)
>>  
>>  neigh/default/unres_qlen - INTEGER
>>  	The maximum number of packets which may be queued for each
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 1c2912d433e81b10f3fdc87bcfcbb091570edc03..03a362568357acc7278a318423dd3873103f90ca 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -2368,6 +2368,16 @@ bool sk_net_capable(const struct sock *sk, int cap);
>>  
>>  void sk_get_meminfo(const struct sock *sk, u32 *meminfo);
>>  
>> +/* Take into consideration the size of the struct sk_buff overhead in the
>> + * determination of these values, since that is non-constant across
>> + * platforms.  This makes socket queueing behavior and performance
>> + * not depend upon such differences.
>> + */
>> +#define _SK_MEM_PACKETS		256
>> +#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
>> +#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
>> +#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
>> +
>>  extern __u32 sysctl_wmem_max;
>>  extern __u32 sysctl_rmem_max;
>>  
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index dfdd14cac775e9bfcee0085ee32ffcd0ab28b67b..9b7b6bbb2a23e7652a1f34a305f29d49de00bc8c 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -307,16 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX];
>>  static struct lock_class_key af_elock_keys[AF_MAX];
>>  static struct lock_class_key af_kern_callback_keys[AF_MAX];
>>  
>> -/* Take into consideration the size of the struct sk_buff overhead in the
>> - * determination of these values, since that is non-constant across
>> - * platforms.  This makes socket queueing behavior and performance
>> - * not depend upon such differences.
>> - */
>> -#define _SK_MEM_PACKETS		256
>> -#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
>> -#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
>> -#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
>> -
>>  /* Run time adjustable parameters. */
>>  __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
>>  EXPORT_SYMBOL(sysctl_wmem_max);
>> diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
>> index 21dedf6fd0f76dec22b2b3685beb89cfefea7ded..22bf0b95d6edc3c27ef3a99d27cb70a1551e3e0e 100644
>> --- a/net/decnet/dn_neigh.c
>> +++ b/net/decnet/dn_neigh.c
>> @@ -94,7 +94,7 @@ struct neigh_table dn_neigh_table = {
>>  			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
>>  			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
>>  			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
>> -			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64*1024,
>> +			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
>>  			[NEIGH_VAR_PROXY_QLEN] = 0,
>>  			[NEIGH_VAR_ANYCAST_DELAY] = 0,
>>  			[NEIGH_VAR_PROXY_DELAY] = 0,
>> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
>> index 8b52179ddc6e54eabf6d3c2ed0132083228680bb..7c45b8896709815c5dde5972fd57cb5c3bcb2648 100644
>> --- a/net/ipv4/arp.c
>> +++ b/net/ipv4/arp.c
>> @@ -171,7 +171,7 @@ struct neigh_table arp_tbl = {
>>  			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
>>  			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
>>  			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
>> -			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
>> +			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
>>  			[NEIGH_VAR_PROXY_QLEN] = 64,
>>  			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
>>  			[NEIGH_VAR_PROXY_DELAY]	= (8 * HZ) / 10,
>> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>> index 5e338eb89509b1df6ebd060f8bd19fcb4b86fe05..266a530414d7be4f1e7be922e465bbab46f7cbac 100644
>> --- a/net/ipv6/ndisc.c
>> +++ b/net/ipv6/ndisc.c
>> @@ -127,7 +127,7 @@ struct neigh_table nd_tbl = {
>>  			[NEIGH_VAR_BASE_REACHABLE_TIME] = ND_REACHABLE_TIME,
>>  			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
>>  			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
>> -			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
>> +			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
>>  			[NEIGH_VAR_PROXY_QLEN] = 64,
>>  			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
>>  			[NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10,
>>
>>
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 0/7] XDP redirect tracepoints
From: David Miller @ 2017-08-29 17:51 UTC (permalink / raw)
  To: brouer; +Cc: netdev, john.fastabend
In-Reply-To: <150401743083.16384.15778781741742858567.stgit@firesoul>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Tue, 29 Aug 2017 16:37:35 +0200

> I feel this is as far as I can take the tracepoint infrastructure to
> assist XDP monitoring.
> 
> Tracepoints comes with a base overhead of 25 nanosec for an attached
> bpf_prog, and 48 nanosec for using a full perf record. This is
> problematic for the XDP use-case, but it is very convenient to use the
> existing perf infrastructure.
> 
>>From a performance perspective, the real solution would be to attach
> another bpf_prog (that understand xdp_buff), but I'm not sure we want
> to introduce yet another bpf attach API for this.
> 
> One thing left is to standardize the possible err return codes, to a
> limited set, to allow easier (and faster) mapping into a bpf map.

Series applied, thanks Jesper.

^ permalink raw reply

* Re: [PATCH v2 net-next] irda: fix link order if IRDA is built into the kernel
From: David Miller @ 2017-08-29 17:49 UTC (permalink / raw)
  To: gregkh; +Cc: devel, netdev, samuel, linux-kernel, fengguang.wu, geert
In-Reply-To: <20170829174622.GA25926@kroah.com>

From: Greg KH <gregkh@linuxfoundation.org>
Date: Tue, 29 Aug 2017 19:46:22 +0200

> When moving the IRDA code out of net/ into drivers/staging/irda/net, the
> link order changes when IRDA is built into the kernel.  That causes a
> kernel crash at boot time as netfilter isn't initialized yet.
> 
> To fix this, build and link the irda networking code in the same exact
> order that it was previously before the move.
> 
> Reported-by: kernel test robot <fengguang.wu@intel.com>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org

Greg, just change the initializer in IRDA so that it will run
after subsys_init() when built statically.

IRDA is definitely not the first pontentially statically built
thing that needs netlink up and available.

^ permalink raw reply

* [PATCH v2 net-next] irda: fix link order if IRDA is built into the kernel
From: Greg KH @ 2017-08-29 17:46 UTC (permalink / raw)
  To: David Miller
  Cc: devel, samuel, netdev, linux-kernel, Geert Uytterhoeven,
	kernel test robot
In-Reply-To: <20170829173129.GA11029@kroah.com>

When moving the IRDA code out of net/ into drivers/staging/irda/net, the
link order changes when IRDA is built into the kernel.  That causes a
kernel crash at boot time as netfilter isn't initialized yet.

To fix this, build and link the irda networking code in the same exact
order that it was previously before the move.

Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
---
v2 - don't force irda to be a module, make the Makefiles put irda back
     where it was before in the link order.

 drivers/staging/Makefile | 1 -
 net/Makefile             | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index fced929a0e67..1192caa94435 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -2,7 +2,6 @@
 
 obj-y				+= media/
 obj-y				+= typec/
-obj-$(CONFIG_IRDA)		+= irda/net/
 obj-$(CONFIG_IRDA)		+= irda/drivers/
 obj-$(CONFIG_PRISM2_USB)	+= wlan-ng/
 obj-$(CONFIG_COMEDI)		+= comedi/
diff --git a/net/Makefile b/net/Makefile
index 3d3feff3643b..ddd059c3dfa4 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -31,6 +31,7 @@ obj-$(CONFIG_NETROM)		+= netrom/
 obj-$(CONFIG_ROSE)		+= rose/
 obj-$(CONFIG_AX25)		+= ax25/
 obj-$(CONFIG_CAN)		+= can/
+obj-$(CONFIG_IRDA)		+= ../drivers/staging/irda/net/
 obj-$(CONFIG_BT)		+= bluetooth/
 obj-$(CONFIG_SUNRPC)		+= sunrpc/
 obj-$(CONFIG_AF_RXRPC)		+= rxrpc/
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next] staging: irda: force to be a kernel module
From: David Miller @ 2017-08-29 17:40 UTC (permalink / raw)
  To: gregkh; +Cc: samuel, netdev, linux-kernel, devel
In-Reply-To: <20170829172608.GA4700@kroah.com>

From: Greg KH <gregkh@linuxfoundation.org>
Date: Tue, 29 Aug 2017 19:26:08 +0200

> On Tue, Aug 29, 2017 at 09:35:07AM -0700, David Miller wrote:
>> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Date: Tue, 29 Aug 2017 11:14:17 +0200
>> 
>> > Now that the IRDA networking code has moved into drivers/staging/, the
>> > link order is changed for when it is initialized if built into the
>> > system.  This can cause a crash when initializing as the netfilter core
>> > hasn't been initialized yet.
>> > 
>> > So force the IRDA code to be built as a module, preventing the crash.
>> > 
>> > Reported-by: kernel test robot <fengguang.wu@intel.com>
>> > Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
>> 
>> I don't think this is reasonable.
>> 
>> IRDA being built in was broken by moving it to staging, so it's a
>> regression and we should find a way to fix it.
> 
> Hm, this is due to netlink coming before irda in the link order before
> this patch series.  I can't change the link order to put all of net/
> before drivers/, which would solve this, and I don't think I can put:
> 	obj-$(CONFIG_IRDA) += ../../drivers/staging/irda/net/
> in a networking Makefile, can I?  Does "../" even work in a Makefile
> like that?
> 
> Any other thoughts?

Change the initialization type in IRDA from subsys_init() to ...
something else?

Amazing!

^ permalink raw reply

* Re: [PATCH net-next v2] bridge: fdb add and delete tracepoints
From: Roopa Prabhu @ 2017-08-29 17:36 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Nikolay Aleksandrov,
	bridge
In-Reply-To: <8737eb73-3437-5529-6f4d-7aa52c770357@gmail.com>

On Tue, Aug 29, 2017 at 9:46 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 08/28/2017 09:22 PM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> A few useful tracepoints to trace bridge forwarding
>> database updates.
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>
> Small nit below, but probably not a candidate for a v3
>
>> ---
>
>> +
>> +     TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags %04x ndm_flags = %02x",
>
> Small nit, any particular reason why ndm_flags got a special treatment
> with an equal character and not the other?
>


good eyes, thats a typo. I did scan them once and removed the '=' and
missed a spot.
i will send v3.

^ permalink raw reply

* Re: [PATCH net-next] staging: irda: force to be a kernel module
From: Greg KH @ 2017-08-29 17:31 UTC (permalink / raw)
  To: David Miller; +Cc: devel, netdev, samuel, linux-kernel
In-Reply-To: <20170829172608.GA4700@kroah.com>

On Tue, Aug 29, 2017 at 07:26:08PM +0200, Greg KH wrote:
> On Tue, Aug 29, 2017 at 09:35:07AM -0700, David Miller wrote:
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Date: Tue, 29 Aug 2017 11:14:17 +0200
> > 
> > > Now that the IRDA networking code has moved into drivers/staging/, the
> > > link order is changed for when it is initialized if built into the
> > > system.  This can cause a crash when initializing as the netfilter core
> > > hasn't been initialized yet.
> > > 
> > > So force the IRDA code to be built as a module, preventing the crash.
> > > 
> > > Reported-by: kernel test robot <fengguang.wu@intel.com>
> > > Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
> > 
> > I don't think this is reasonable.
> > 
> > IRDA being built in was broken by moving it to staging, so it's a
> > regression and we should find a way to fix it.
> 
> Hm, this is due to netlink coming before irda in the link order before
> this patch series.  I can't change the link order to put all of net/
> before drivers/, which would solve this, and I don't think I can put:
> 	obj-$(CONFIG_IRDA) += ../../drivers/staging/irda/net/
> in a networking Makefile, can I?  Does "../" even work in a Makefile
> like that?

Wait, I think that does work, let me go test this some more...

thanks,

greg k-h-

^ permalink raw reply

* Re: [PATCH net-next] staging: irda: force to be a kernel module
From: Greg KH @ 2017-08-29 17:26 UTC (permalink / raw)
  To: David Miller; +Cc: samuel, netdev, linux-kernel, devel
In-Reply-To: <20170829.093507.2166038228205751885.davem@davemloft.net>

On Tue, Aug 29, 2017 at 09:35:07AM -0700, David Miller wrote:
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Date: Tue, 29 Aug 2017 11:14:17 +0200
> 
> > Now that the IRDA networking code has moved into drivers/staging/, the
> > link order is changed for when it is initialized if built into the
> > system.  This can cause a crash when initializing as the netfilter core
> > hasn't been initialized yet.
> > 
> > So force the IRDA code to be built as a module, preventing the crash.
> > 
> > Reported-by: kernel test robot <fengguang.wu@intel.com>
> > Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
> 
> I don't think this is reasonable.
> 
> IRDA being built in was broken by moving it to staging, so it's a
> regression and we should find a way to fix it.

Hm, this is due to netlink coming before irda in the link order before
this patch series.  I can't change the link order to put all of net/
before drivers/, which would solve this, and I don't think I can put:
	obj-$(CONFIG_IRDA) += ../../drivers/staging/irda/net/
in a networking Makefile, can I?  Does "../" even work in a Makefile
like that?

Any other thoughts?

> It's one thing if IRDA on it's own has deteriorated and broken in some
> ways over time due to lack of maintainence, it's another to knowingly
> do something to it that causes a regression which is what happened
> here.

It has deteriorated and is broken and does not work at all from the
reports I have gotten, Linus pointing this out to me directly due to his
involvement in irda-related dive computers.  So I don't think anyone is
using this at all right now, it seems to crash when used anyway.  So no
one is running this "build in" code at the moment :)

ideas?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next 7/7] samples/bpf: xdp_monitor tool based on tracepoints
From: Daniel Borkmann @ 2017-08-29 17:24 UTC (permalink / raw)
  To: Alexei Starovoitov, Jesper Dangaard Brouer; +Cc: netdev, John Fastabend
In-Reply-To: <20170829170551.5uws25py4fcpem73@ast-mbp>

On 08/29/2017 07:05 PM, Alexei Starovoitov wrote:
> On Tue, Aug 29, 2017 at 04:38:11PM +0200, Jesper Dangaard Brouer wrote:
>> This tool xdp_monitor demonstrate how to use the different xdp_redirect
>> tracepoints xdp_redirect{,_map}{,_err} from a BPF program.
>>
>> The default mode is to only monitor the error counters, to avoid
>> affecting the per packet performance. Tracepoints comes with a base
>> overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
>> using a full perf record (with non-matching filter).  Thus, default
>> loading the --stats mode could affect the maximum performance.
>>
>> This version of the tool is very simple and count all types of errors
>> as one.  It will be natural to extend this later with the different
>> types of errors that can occur, which should help users quickly
>> identify common mistakes.
>>
>> Because the TP_STRUCT was kept in sync all the tracepoints loads the
>> same BPF code.  It would also be natural to extend the map version to
>> demonstrate how the map information could be used.
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> Nice. Did you consider using libbbpf (instead of old bpf_load.c hack)
> and make full standalone tool out of it? Looks very useful.

+1 also my suggestion. ;)

> Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [patch v1 1/2] dt-bindings: net: add binding documentation for mlxsw thermal control
From: Andrew Lunn @ 2017-08-29 17:22 UTC (permalink / raw)
  To: Vadim Pasternak; +Cc: robh+dt, davem, jiri, ivecera, devicetree, netdev
In-Reply-To: <1504032311-195988-2-git-send-email-vadimp@mellanox.com>

> +- compatible		: "mellanox,mlxsw_minimal"

Interesting product name. Is there a mlxsw_maximal planned?

> +- reg			: The I2C address of the device.
> +
> +Optional properties:
> +- cooling-phandle	: phandle of the cooling device, which is to be used
> +			  for the zone thermal control.
> +			  If absent, cooling device controlled internally by
> +			  the ASIC may be used.
> +
> +- trips			: the nodes to describe a point in the temperature
> +			  domain with key temperatures at which cooling is
> +			  recommended. Each node must contain the next values:
> +			  - type: the trip type. Expected values are:
> +			    0 - a trip point to enable active cooling;
> +			    1 - a trip point to enable passive cooling;
> +			    2 - a trip point to notify emergency;
> +			  - temperature: unsigned integer indicating the trip
> +			    temperature level in millicelsius;
> +			  - minimum cooling state allowed within the trip node;
> +			  - maximum cooling state allowed within the trip node;
> +
> +Example:
> +	asic_thermal: mlxsw_minimal@48 {
> +		compatible = "mlxsw_minimal";

You missed the vendor part.

> +		reg = <0x48>;
> +		status = "disabled";

An example with it disabled?

> +		cooling-phandle = <&cooling>;
> +
> +		trips {
> +			trip@0 {
> +				trip = <0 75000 0 0>;
> +			};

I don't know much about the thermal subsystem. But looking at other
example binding documents, you seem to do something different here to
other drivers. Why do you not use what seems to be the common format:

               trips {
                        cpu_alert0: cpu-alert0 {
                                temperature = <90000>; /* millicelsius */
                                hysteresis = <2000>; /* millicelsius */
                                type = "active";
                        };
                        cpu_alert1: cpu-alert1 {
                                temperature = <100000>; /* millicelsius */
                                hysteresis = <2000>; /* millicelsius */
                                type = "passive";
                        };
                        cpu_crit: cpu-crit {
                                temperature = <125000>; /* millicelsius */
                                hysteresis = <2000>; /* millicelsius */
                                type = "critical";
                        };
                };

	Andrew

^ permalink raw reply

* [PATCH net-next 6/6] vxlan: support flow dissect
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>

Populate offload flow_dissect callback appropriately for VXLAN and
VXLAN-GPE.
---
 drivers/net/vxlan.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ae3a1da703c2..41e50de40af4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1336,6 +1336,55 @@ static bool vxlan_ecn_decapsulate(struct vxlan_sock *vs, void *oiph,
 	return err <= 1;
 }
 
+static enum flow_dissect_ret vxlan_flow_dissect(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	__be16 protocol = htons(ETH_P_TEB);
+	struct vxlanhdr *vhdr, _vhdr;
+	struct vxlan_sock *vs;
+
+	vhdr = __skb_header_pointer(skb, *p_nhoff + sizeof(struct udphdr),
+				    sizeof(_vhdr), data, *p_hlen, &_vhdr);
+	if (!vhdr)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	vs = rcu_dereference_sk_user_data(sk);
+	if (!vs)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	if (vs->flags & VXLAN_F_GPE) {
+		struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vhdr;
+
+		/* Need to have Next Protocol set for interfaces in GPE mode. */
+		if (gpe->version != 0 || !gpe->np_applied || gpe->oam_flag)
+			return FLOW_DISSECT_RET_CONTINUE;
+
+		switch (gpe->next_protocol) {
+		case VXLAN_GPE_NP_IPV4:
+			protocol = htons(ETH_P_IP);
+			break;
+		case VXLAN_GPE_NP_IPV6:
+			protocol = htons(ETH_P_IPV6);
+			break;
+		case VXLAN_GPE_NP_ETHERNET:
+			protocol = htons(ETH_P_TEB);
+			break;
+		default:
+			return FLOW_DISSECT_RET_CONTINUE;
+		}
+	}
+
+	*p_nhoff += sizeof(struct udphdr) + sizeof(_vhdr);
+	*p_proto = protocol;
+
+	return FLOW_DISSECT_RET_PROTO_AGAIN;
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 {
@@ -2864,6 +2913,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
 	tunnel_cfg.encap_destroy = NULL;
 	tunnel_cfg.gro_receive = vxlan_gro_receive;
 	tunnel_cfg.gro_complete = vxlan_gro_complete;
+	tunnel_cfg.flow_dissect = vxlan_flow_dissect;
 
 	setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 5/6] fou: Support flow dissection
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>

Populate offload flow_dissect callabck appropriately for fou and gue.
---
 net/ipv4/fou.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 1540db65241a..a831dd49fb28 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -282,6 +282,20 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 	return err;
 }
 
+static enum flow_dissect_ret fou_flow_dissect(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	*p_ip_proto = fou_from_sock(sk)->protocol;
+	*p_nhoff += sizeof(struct udphdr);
+
+	return FLOW_DISSECT_RET_IPPROTO_AGAIN;
+}
+
 static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 				      struct guehdr *guehdr, void *data,
 				      size_t hdrlen, struct gro_remcsum *grc,
@@ -500,6 +514,53 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 	return err;
 }
 
+static enum flow_dissect_ret gue_flow_dissect(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	struct guehdr *guehdr, _guehdr;
+
+	guehdr = __skb_header_pointer(skb, *p_nhoff + sizeof(struct udphdr),
+				      sizeof(_guehdr), data, *p_hlen, &_guehdr);
+	if (!guehdr)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	switch (guehdr->version) {
+	case 0:
+		if (unlikely(guehdr->control))
+			return FLOW_DISSECT_RET_CONTINUE;
+
+		*p_ip_proto = guehdr->proto_ctype;
+		*p_nhoff += sizeof(struct udphdr) +
+		    sizeof(*guehdr) + (guehdr->hlen << 2);
+
+		break;
+	case 1:
+		switch (((struct iphdr *)guehdr)->version) {
+		case 4:
+			*p_ip_proto = IPPROTO_IPIP;
+			break;
+		case 6:
+			*p_ip_proto = IPPROTO_IPV6;
+			break;
+		default:
+			return FLOW_DISSECT_RET_CONTINUE;
+		}
+
+		*p_nhoff += sizeof(struct udphdr);
+
+		break;
+	default:
+		return FLOW_DISSECT_RET_CONTINUE;
+	}
+
+	return FLOW_DISSECT_RET_IPPROTO_AGAIN;
+}
+
 static int fou_add_to_port_list(struct net *net, struct fou *fou)
 {
 	struct fou_net *fn = net_generic(net, fou_net_id);
@@ -570,12 +631,14 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 		tunnel_cfg.encap_rcv = fou_udp_recv;
 		tunnel_cfg.gro_receive = fou_gro_receive;
 		tunnel_cfg.gro_complete = fou_gro_complete;
+		tunnel_cfg.flow_dissect = fou_flow_dissect;
 		fou->protocol = cfg->protocol;
 		break;
 	case FOU_ENCAP_GUE:
 		tunnel_cfg.encap_rcv = gue_udp_recv;
 		tunnel_cfg.gro_receive = gue_gro_receive;
 		tunnel_cfg.gro_complete = gue_gro_complete;
+		tunnel_cfg.flow_dissect = gue_flow_dissect;
 		break;
 	default:
 		err = -EINVAL;
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 4/6] udp: flow dissector offload
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>

Add support to perform UDP specific flow dissection. This is
primarily intended for dissecting encapsulated packets in UDP
encapsulation.

This patch adds a flow_dissect offload for UDP4 and UDP6. The backend
function performs a socket lookup and calls the flow_dissect function
if a socket is found.
---
 include/linux/udp.h      |  8 ++++++++
 include/net/udp.h        |  8 ++++++++
 include/net/udp_tunnel.h |  8 ++++++++
 net/ipv4/udp_offload.c   | 45 +++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/udp_tunnel.c    |  1 +
 net/ipv6/udp_offload.c   | 13 +++++++++++++
 6 files changed, 83 insertions(+)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index eaea63bc79bb..2e90b189ef6a 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -79,6 +79,14 @@ struct udp_sock {
 	int			(*gro_complete)(struct sock *sk,
 						struct sk_buff *skb,
 						int nhoff);
+	/* Flow dissector function for a UDP socket */
+	enum flow_dissect_ret (*flow_dissect)(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
 
 	/* udp_recvmsg try to use this before splicing sk_receive_queue */
 	struct sk_buff_head	reader_queue ____cacheline_aligned_in_smp;
diff --git a/include/net/udp.h b/include/net/udp.h
index f3d1de6f0983..499e4faf8b14 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -174,6 +174,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 				 struct udphdr *uh, udp_lookup_t lookup);
 int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
 
+enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
+			udp_lookup_t lookup,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
+
 static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
 {
 	struct udphdr *uh;
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 10cce0dd4450..b7102e0f41a9 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -69,6 +69,13 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
 						     struct sk_buff *skb);
 typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
 					 int nhoff);
+typedef enum flow_dissect_ret (*udp_tunnel_flow_dissect_t)(struct sock *sk,
+			const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
 
 struct udp_tunnel_sock_cfg {
 	void *sk_user_data;     /* user data used by encap_rcv call back */
@@ -78,6 +85,7 @@ struct udp_tunnel_sock_cfg {
 	udp_tunnel_encap_destroy_t encap_destroy;
 	udp_tunnel_gro_receive_t gro_receive;
 	udp_tunnel_gro_complete_t gro_complete;
+	udp_tunnel_flow_dissect_t flow_dissect;
 };
 
 /* Setup the given (UDP) sock to receive UDP encapsulated packets */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 97658bfc1b58..7f0a7ed4a6f7 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -328,11 +328,56 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
+enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb,
+			udp_lookup_t lookup,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	enum flow_dissect_ret ret = FLOW_DISSECT_RET_CONTINUE;
+	struct udphdr *uh, _uh;
+	struct sock *sk;
+
+	uh = __skb_header_pointer(skb, *p_nhoff, sizeof(_uh), data,
+				  *p_hlen, &_uh);
+	if (!uh)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	rcu_read_lock();
+
+	sk = (*lookup)(skb, uh->source, uh->dest);
+
+	if (sk && udp_sk(sk)->flow_dissect)
+		ret = udp_sk(sk)->flow_dissect(sk, skb, key_control,
+					       flow_dissector, target_container,
+					       data, p_proto, p_ip_proto,
+					       p_nhoff, p_hlen, flags);
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL(udp_flow_dissect);
+
+static enum flow_dissect_ret udp4_flow_dissect(const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	return udp_flow_dissect(skb, udp4_lib_lookup_skb, key_control,
+				flow_dissector, target_container, data,
+				p_proto, p_ip_proto, p_nhoff, p_hlen, flags);
+}
+
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_segment = udp4_tunnel_segment,
 		.gro_receive  =	udp4_gro_receive,
 		.gro_complete =	udp4_gro_complete,
+		.flow_dissect = udp4_flow_dissect,
 	},
 };
 
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 6539ff15e9a3..a4eec2a044d2 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -71,6 +71,7 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
 	udp_sk(sk)->encap_destroy = cfg->encap_destroy;
 	udp_sk(sk)->gro_receive = cfg->gro_receive;
 	udp_sk(sk)->gro_complete = cfg->gro_complete;
+	udp_sk(sk)->flow_dissect = cfg->flow_dissect;
 
 	udp_tunnel_encap_enable(sock);
 }
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 455fd4e39333..99ade504eaf7 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -73,11 +73,24 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
 
+static enum flow_dissect_ret udp6_flow_dissect(const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags)
+{
+	return udp_flow_dissect(skb, udp6_lib_lookup_skb, key_control,
+				flow_dissector, target_container, data,
+				p_proto, p_ip_proto, p_nhoff, p_hlen, flags);
+}
+
 static const struct net_offload udpv6_offload = {
 	.callbacks = {
 		.gso_segment	=	udp6_tunnel_segment,
 		.gro_receive	=	udp6_gro_receive,
 		.gro_complete	=	udp6_gro_complete,
+		.flow_dissect	=	udp6_flow_dissect,
 	},
 };
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 3/6] flow_dissector: Add protocol specific flow dissection offload
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>

Add offload capability for performing protocol specific flow dissection
(either by EtherType or IP protocol).

Specifically:

- Add flow_dissect to offload callbacks
- Move flow_dissect_ret enum to flow_dissector.h, cleanup names and add a
  couple of values
- Create GOTO_BY_RESULT macro to use in the main flow dissector switch to
  simplify handling of functions that return flow_dissect_ret enum
- In __skb_flow_dissect, add default case for switch(proto) as well as
  switch(ip_proto) that looks up and calls protocol specific flow
  dissection
---
 include/linux/netdevice.h    |   7 +++
 include/net/flow_dissector.h |   9 +++
 net/core/dev.c               |  14 +++++
 net/core/flow_dissector.c    | 132 +++++++++++++++++++++++++++++++------------
 net/ipv4/route.c             |   4 +-
 5 files changed, 128 insertions(+), 38 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b37a631..90ccb434e127 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2208,6 +2208,12 @@ struct offload_callbacks {
 	struct sk_buff		**(*gro_receive)(struct sk_buff **head,
 						 struct sk_buff *skb);
 	int			(*gro_complete)(struct sk_buff *skb, int nhoff);
+	enum flow_dissect_ret (*flow_dissect)(const struct sk_buff *skb,
+			struct flow_dissector_key_control *key_control,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data,
+			__be16 *p_proto, u8 *p_ip_proto, int *p_nhoff,
+			int *p_hlen, unsigned int flags);
 };
 
 struct packet_offload {
@@ -3253,6 +3259,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
+struct packet_offload *flow_dissect_find_by_type(__be16 type);
 
 static inline void napi_free_frags(struct napi_struct *napi)
 {
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index e2663e900b0a..ad75bbfd1c9c 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -19,6 +19,14 @@ struct flow_dissector_key_control {
 #define FLOW_DIS_FIRST_FRAG	BIT(1)
 #define FLOW_DIS_ENCAPSULATION	BIT(2)
 
+enum flow_dissect_ret {
+	FLOW_DISSECT_RET_OUT_GOOD,
+	FLOW_DISSECT_RET_OUT_BAD,
+	FLOW_DISSECT_RET_PROTO_AGAIN,
+	FLOW_DISSECT_RET_IPPROTO_AGAIN,
+	FLOW_DISSECT_RET_CONTINUE,
+};
+
 /**
  * struct flow_dissector_key_basic:
  * @thoff: Transport header offset
@@ -205,6 +213,7 @@ enum flow_dissector_key_id {
 #define FLOW_DISSECTOR_F_STOP_AT_L3		BIT(1)
 #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL	BIT(2)
 #define FLOW_DISSECTOR_F_STOP_AT_ENCAP		BIT(3)
+#define FLOW_DISSECTOR_F_STOP_AT_L4		BIT(4)
 
 struct flow_dissector_key {
 	enum flow_dissector_key_id key_id;
diff --git a/net/core/dev.c b/net/core/dev.c
index 270b54754821..22ea8daa930c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4860,6 +4860,20 @@ struct packet_offload *gro_find_receive_by_type(__be16 type)
 }
 EXPORT_SYMBOL(gro_find_receive_by_type);
 
+struct packet_offload *flow_dissect_find_by_type(__be16 type)
+{
+	struct list_head *offload_head = &offload_base;
+	struct packet_offload *ptype;
+
+	list_for_each_entry_rcu(ptype, offload_head, list) {
+		if (ptype->type != type || !ptype->callbacks.flow_dissect)
+			continue;
+		return ptype;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(flow_dissect_find_by_type);
+
 struct packet_offload *gro_find_complete_by_type(__be16 type)
 {
 	struct list_head *offload_head = &offload_base;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 12302acdb073..6a2cf240069a 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -9,6 +9,7 @@
 #include <net/ipv6.h>
 #include <net/gre.h>
 #include <net/pptp.h>
+#include <net/protocol.h>
 #include <linux/igmp.h>
 #include <linux/icmp.h>
 #include <linux/sctp.h>
@@ -115,12 +116,6 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
-enum flow_dissect_ret {
-	FLOW_DISSECT_RET_OUT_GOOD,
-	FLOW_DISSECT_RET_OUT_BAD,
-	FLOW_DISSECT_RET_OUT_PROTO_AGAIN,
-};
-
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
@@ -322,7 +317,7 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
 	if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
 		return FLOW_DISSECT_RET_OUT_GOOD;
 
-	return FLOW_DISSECT_RET_OUT_PROTO_AGAIN;
+	return FLOW_DISSECT_RET_PROTO_AGAIN;
 }
 
 static void
@@ -383,6 +378,27 @@ __skb_flow_dissect_ipv6(const struct sk_buff *skb,
 	key_ip->ttl = iph->hop_limit;
 }
 
+#define GOTO_BY_RESULT(ret) do {				\
+	switch (ret) {						\
+	case FLOW_DISSECT_RET_OUT_GOOD:				\
+		goto out_good;					\
+	case FLOW_DISSECT_RET_PROTO_AGAIN:			\
+		goto proto_again;				\
+	case FLOW_DISSECT_RET_IPPROTO_AGAIN:			\
+		goto ip_proto_again;				\
+	case FLOW_DISSECT_RET_OUT_BAD:				\
+	default:						\
+		goto out_bad;					\
+	}							\
+} while (0)
+
+#define GOTO_OR_CONT_BY_RESULT(ret) do {			\
+	enum flow_dissect_ret __ret = (ret);			\
+								\
+	if (__ret != FLOW_DISSECT_RET_CONTINUE)			\
+		GOTO_BY_RESULT(__ret);				\
+} while (0)
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
@@ -659,15 +675,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	case htons(ETH_P_MPLS_UC):
 	case htons(ETH_P_MPLS_MC):
 mpls:
-		switch (__skb_flow_dissect_mpls(skb, flow_dissector,
-						target_container, data,
-						nhoff, hlen)) {
-		case FLOW_DISSECT_RET_OUT_GOOD:
-			goto out_good;
-		case FLOW_DISSECT_RET_OUT_BAD:
-		default:
-			goto out_bad;
-		}
+		GOTO_BY_RESULT(__skb_flow_dissect_mpls(skb, flow_dissector,
+						       target_container, data,
+						       nhoff, hlen));
+
 	case htons(ETH_P_FCOE):
 		if ((hlen - nhoff) < FCOE_HEADER_LEN)
 			goto out_bad;
@@ -677,32 +688,44 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
 	case htons(ETH_P_ARP):
 	case htons(ETH_P_RARP):
-		switch (__skb_flow_dissect_arp(skb, flow_dissector,
-					       target_container, data,
-					       nhoff, hlen)) {
-		case FLOW_DISSECT_RET_OUT_GOOD:
-			goto out_good;
-		case FLOW_DISSECT_RET_OUT_BAD:
-		default:
-			goto out_bad;
+		GOTO_BY_RESULT(__skb_flow_dissect_arp(skb, flow_dissector,
+						      target_container, data,
+						      nhoff, hlen));
+
+	default: {
+		struct packet_offload *ptype;
+		enum flow_dissect_ret ret;
+
+		rcu_read_lock();
+
+		ptype = flow_dissect_find_by_type(proto);
+
+		if (ptype) {
+			ret = ptype->callbacks.flow_dissect(skb, key_control,
+						flow_dissector,
+						target_container,
+						data, &proto, &ip_proto, &nhoff,
+						&hlen, flags);
+			rcu_read_unlock();
+
+			GOTO_BY_RESULT(ret);
+		} else {
+			rcu_read_unlock();
 		}
-	default:
+
 		goto out_bad;
 	}
+	}
 
 ip_proto_again:
 	switch (ip_proto) {
 	case IPPROTO_GRE:
-		switch (__skb_flow_dissect_gre(skb, key_control, flow_dissector,
-					       target_container, data,
-					       &proto, &nhoff, &hlen, flags)) {
-		case FLOW_DISSECT_RET_OUT_GOOD:
-			goto out_good;
-		case FLOW_DISSECT_RET_OUT_BAD:
-			goto out_bad;
-		case FLOW_DISSECT_RET_OUT_PROTO_AGAIN:
-			goto proto_again;
-		}
+		GOTO_BY_RESULT(__skb_flow_dissect_gre(skb, key_control,
+						      flow_dissector,
+						      target_container, data,
+						      &proto, &nhoff, &hlen,
+						      flags));
+
 	case NEXTHDR_HOP:
 	case NEXTHDR_ROUTING:
 	case NEXTHDR_DEST: {
@@ -768,9 +791,43 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		__skb_flow_dissect_tcp(skb, flow_dissector, target_container,
 				       data, nhoff, hlen);
 		break;
-	default:
+	default: {
+		const struct net_offload *ops = NULL;
+
+		if (flags & FLOW_DISSECTOR_F_STOP_AT_L4)
+			break;
+
+		rcu_read_lock();
+
+		switch (proto) {
+		case htons(ETH_P_IP):
+			ops = rcu_dereference(inet_offloads[ip_proto]);
+			break;
+		case htons(ETH_P_IPV6):
+			ops = rcu_dereference(inet6_offloads[ip_proto]);
+			break;
+		default:
+			break;
+		}
+
+		if (ops && ops->callbacks.flow_dissect) {
+			enum flow_dissect_ret ret;
+
+			ret = ops->callbacks.flow_dissect(skb, key_control,
+						flow_dissector,
+						target_container,
+						data, &proto, &ip_proto, &nhoff,
+						&hlen, flags);
+			rcu_read_unlock();
+
+			GOTO_OR_CONT_BY_RESULT(ret);
+		} else {
+			rcu_read_unlock();
+		}
+
 		break;
 	}
+	}
 
 	if (dissector_uses_key(flow_dissector,
 			       FLOW_DISSECTOR_KEY_PORTS)) {
@@ -935,7 +992,8 @@ static inline u32 ___skb_get_hash(const struct sk_buff *skb,
 				  struct flow_keys *keys, u32 keyval)
 {
 	skb_flow_dissect_flow_keys(skb, keys,
-				   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
+				   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL |
+				   FLOW_DISSECTOR_F_STOP_AT_L4);
 
 	return __flow_hash_from_keys(keys, keyval);
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2d5ea4..85f12b8e0b7f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1811,7 +1811,9 @@ int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
 	case 1:
 		/* skb is currently provided only when forwarding */
 		if (skb) {
-			unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP;
+			unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP |
+					    FLOW_DISSECTOR_F_STOP_AT_L4;
+;
 			struct flow_keys keys;
 
 			/* short-circuit if we already have L4 hash present */
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 2/6] udp: Constify skb argument in lookup functions
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>

For UDP socket lookup functions, and associateed functions that take an
skbuf as argument, declare the skb argument as constant.

One caveat is that reuseport_select_sock can be called from the UDP
lookup functions with an skb argument. This function temporarily
modifies the skbuff data pointer (in bpf_run via a pull/push sequence).
To resolve compiler warning I added a local skbuf declaration that is
not const and assigned to the skb argument with an explicit cast.
---
 include/net/ip.h             |  2 +-
 include/net/sock_reuseport.h |  2 +-
 include/net/udp.h            | 11 ++++++-----
 net/core/sock_reuseport.c    |  5 +++--
 net/ipv4/udp.c               | 11 ++++++-----
 net/ipv6/udp.c               | 10 +++++-----
 6 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 9896f46cbbf1..8c0d84ffc659 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -79,7 +79,7 @@ struct ipcm_cookie {
 #define PKTINFO_SKB_CB(skb) ((struct in_pktinfo *)((skb)->cb))
 
 /* return enslaved device index if relevant */
-static inline int inet_sdif(struct sk_buff *skb)
+static inline int inet_sdif(const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
 	if (skb && ipv4_l3mdev_skb(IPCB(skb)->flags))
diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index aecd30308d50..d25352a848d9 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -20,7 +20,7 @@ extern int reuseport_add_sock(struct sock *sk, struct sock *sk2);
 extern void reuseport_detach_sock(struct sock *sk);
 extern struct sock *reuseport_select_sock(struct sock *sk,
 					  u32 hash,
-					  struct sk_buff *skb,
+					  const struct sk_buff *skb,
 					  int hdr_len);
 extern struct bpf_prog *reuseport_attach_prog(struct sock *sk,
 					      struct bpf_prog *prog);
diff --git a/include/net/udp.h b/include/net/udp.h
index 4e5f23fec35e..f3d1de6f0983 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -167,7 +167,7 @@ static inline void udp_csum_pull_header(struct sk_buff *skb)
 	UDP_SKB_CB(skb)->cscov -= sizeof(struct udphdr);
 }
 
-typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
+typedef struct sock *(*udp_lookup_t)(const struct sk_buff *skb, __be16 sport,
 				     __be16 dport);
 
 struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
@@ -288,8 +288,9 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 			     __be32 daddr, __be16 dport, int dif);
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 			       __be32 daddr, __be16 dport, int dif, int sdif,
-			       struct udp_table *tbl, struct sk_buff *skb);
-struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
+			       struct udp_table *tbl,
+			       const struct sk_buff *skb);
+struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
 struct sock *udp6_lib_lookup(struct net *net,
 			     const struct in6_addr *saddr, __be16 sport,
@@ -299,8 +300,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
 			       const struct in6_addr *daddr, __be16 dport,
 			       int dif, int sdif, struct udp_table *tbl,
-			       struct sk_buff *skb);
-struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
+			       const struct sk_buff *skb);
+struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
 
 /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index eed1ebf7f29d..a17f13b33189 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -164,9 +164,10 @@ void reuseport_detach_sock(struct sock *sk)
 EXPORT_SYMBOL(reuseport_detach_sock);
 
 static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
-			    struct bpf_prog *prog, struct sk_buff *skb,
+			    struct bpf_prog *prog, const struct sk_buff *_skb,
 			    int hdr_len)
 {
+	struct sk_buff *skb = (struct sk_buff *)_skb; /* Override const */
 	struct sk_buff *nskb = NULL;
 	u32 index;
 
@@ -205,7 +206,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
  */
 struct sock *reuseport_select_sock(struct sock *sk,
 				   u32 hash,
-				   struct sk_buff *skb,
+				   const struct sk_buff *skb,
 				   int hdr_len)
 {
 	struct sock_reuseport *reuse;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index bf6c406bf5e7..a851026ef28b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -135,7 +135,8 @@ EXPORT_SYMBOL(udp_memory_allocated);
 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
 /* IPCB reference means this can not be used from early demux */
-static bool udp_lib_exact_dif_match(struct net *net, struct sk_buff *skb)
+static bool udp_lib_exact_dif_match(struct net *net,
+				    const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
 	if (!net->ipv4.sysctl_udp_l3mdev_accept &&
@@ -445,7 +446,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 daddr, unsigned int hnum,
 				     int dif, int sdif, bool exact_dif,
 				     struct udp_hslot *hslot2,
-				     struct sk_buff *skb)
+				     const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	int score, badness, matches = 0, reuseport = 0;
@@ -484,7 +485,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
  */
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport, int dif,
-		int sdif, struct udp_table *udptable, struct sk_buff *skb)
+		int sdif, struct udp_table *udptable, const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	unsigned short hnum = ntohs(dport);
@@ -552,7 +553,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 }
 EXPORT_SYMBOL_GPL(__udp4_lib_lookup);
 
-static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
+static inline struct sock *__udp4_lib_lookup_skb(const struct sk_buff *skb,
 						 __be16 sport, __be16 dport,
 						 struct udp_table *udptable)
 {
@@ -563,7 +564,7 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
 				 inet_sdif(skb), udptable, skb);
 }
 
-struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport)
 {
 	return __udp4_lib_lookup_skb(skb, sport, dport, &udp_table);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 976f30391356..e9aa4db3ba53 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -56,7 +56,7 @@
 #include <trace/events/skb.h>
 #include "udp_impl.h"
 
-static bool udp6_lib_exact_dif_match(struct net *net, struct sk_buff *skb)
+static bool udp6_lib_exact_dif_match(struct net *net, const struct sk_buff *skb)
 {
 #if defined(CONFIG_NET_L3_MASTER_DEV)
 	if (!net->ipv4.sysctl_udp_l3mdev_accept &&
@@ -181,7 +181,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
 		const struct in6_addr *daddr, unsigned int hnum,
 		int dif, int sdif, bool exact_dif,
-		struct udp_hslot *hslot2, struct sk_buff *skb)
+		struct udp_hslot *hslot2, const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	int score, badness, matches = 0, reuseport = 0;
@@ -221,7 +221,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
 			       const struct in6_addr *daddr, __be16 dport,
 			       int dif, int sdif, struct udp_table *udptable,
-			       struct sk_buff *skb)
+			       const struct sk_buff *skb)
 {
 	struct sock *sk, *result;
 	unsigned short hnum = ntohs(dport);
@@ -290,7 +290,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
 }
 EXPORT_SYMBOL_GPL(__udp6_lib_lookup);
 
-static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
+static struct sock *__udp6_lib_lookup_skb(const struct sk_buff *skb,
 					  __be16 sport, __be16 dport,
 					  struct udp_table *udptable)
 {
@@ -301,7 +301,7 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
 				 inet6_sdif(skb), udptable, skb);
 }
 
-struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>

Support for processing TEB is currently in GRE flow dissection as a
special case. This can be moved to be a case the main proto switch in
__skb_flow_dissect.
---
 net/core/flow_dissector.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e2eaa1ff948d..12302acdb073 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -288,27 +288,8 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
 	if (hdr->flags & GRE_SEQ)
 		offset += sizeof(((struct pptp_gre_header *) 0)->seq);
 
-	if (gre_ver == 0) {
-		if (*p_proto == htons(ETH_P_TEB)) {
-			const struct ethhdr *eth;
-			struct ethhdr _eth;
-
-			eth = __skb_header_pointer(skb, *p_nhoff + offset,
-						   sizeof(_eth),
-						   data, *p_hlen, &_eth);
-			if (!eth)
-				return FLOW_DISSECT_RET_OUT_BAD;
-			*p_proto = eth->h_proto;
-			offset += sizeof(*eth);
-
-			/* Cap headers that we access via pointers at the
-			 * end of the Ethernet header as our maximum alignment
-			 * at that point is only 2 bytes.
-			 */
-			if (NET_IP_ALIGN)
-				*p_hlen = *p_nhoff + offset;
-		}
-	} else { /* version 1, must be PPTP */
+	/* version 1, must be PPTP */
+	if (gre_ver == 1) {
 		u8 _ppp_hdr[PPP_HDRLEN];
 		u8 *ppp_hdr;
 
@@ -573,6 +554,27 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
 		break;
 	}
+	case htons(ETH_P_TEB): {
+		const struct ethhdr *eth;
+		struct ethhdr _eth;
+
+		eth = __skb_header_pointer(skb, nhoff, sizeof(_eth),
+					   data, hlen, &_eth);
+		if (!eth)
+			goto out_bad;
+
+		proto = eth->h_proto;
+		nhoff += sizeof(*eth);
+
+		/* Cap headers that we access via pointers at the
+		 * end of the Ethernet header as our maximum alignment
+		 * at that point is only 2 bytes.
+		 */
+		if (NET_IP_ALIGN)
+			hlen = nhoff;
+
+		goto proto_again;
+	}
 	case htons(ETH_P_8021AD):
 	case htons(ETH_P_8021Q): {
 		const struct vlan_hdr *vlan;
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 0/6] flow_dissector: Protocol specific flow dissector offload
From: Tom Herbert @ 2017-08-29 17:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tom Herbert

This patch set adds a new offload type to perform flow dissection for
specific protocols (either by EtherType or by IP protocol). This is
primary useful to crack open UDP encapsulations (like VXLAN, GUE) for
the purposes of parsing the encapsulated packet.

Items in this patch set:
- Constify skb argument to UDP lookup functions
- Create new protocol case in __skb_dissect for ETH_P_TEB. This is based
  on the code in the GRE dissect function and the special handling in
  GRE can now be removed (it sets protocol to ETH_P_TEB and returns so
  goto proto_again is done)
- Add infrastructure for protocol specific flow dissection offload
- Add infrastructure to perform UDP flow dissection. Uses same model of
  GRO where a flow_dissect callback can be associated with a UDP
  socket
- Use the infrastructure to support flow dissection of VXLAN and GUE

Tested:

Forced RPS to call flow dissection for VXLAN, FOU, and GUE. Observed
that inner packet was being properly dissected.

Tom Herbert (6):
  flow_dissector: Move ETH_P_TEB processing to main switch
  udp: Constify skb argument in lookup functions
  flow_dissector: Add protocol specific flow dissection offload
  udp: flow dissector offload
  fou: Support flow dissection
  vxlan: support flow dissect

 drivers/net/vxlan.c          |  50 ++++++++++++
 include/linux/netdevice.h    |   7 ++
 include/linux/udp.h          |   8 ++
 include/net/flow_dissector.h |   9 +++
 include/net/ip.h             |   2 +-
 include/net/sock_reuseport.h |   2 +-
 include/net/udp.h            |  19 +++--
 include/net/udp_tunnel.h     |   8 ++
 net/core/dev.c               |  14 ++++
 net/core/flow_dissector.c    | 176 +++++++++++++++++++++++++++++--------------
 net/core/sock_reuseport.c    |   5 +-
 net/ipv4/fou.c               |  63 ++++++++++++++++
 net/ipv4/route.c             |   4 +-
 net/ipv4/udp.c               |  11 +--
 net/ipv4/udp_offload.c       |  45 +++++++++++
 net/ipv4/udp_tunnel.c        |   1 +
 net/ipv6/udp.c               |  10 +--
 net/ipv6/udp_offload.c       |  13 ++++
 18 files changed, 369 insertions(+), 78 deletions(-)

-- 
2.11.0

^ permalink raw reply

* Re: [PATCH] dt-binding: net/phy: fix interrupts description
From: Rob Herring @ 2017-08-29 17:15 UTC (permalink / raw)
  To: Baruch Siach
  Cc: David S. Miller, devicetree-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Mark Rutland
In-Reply-To: <20170828102039.wdylojqcrn2avv2f@tarshish>

On Mon, Aug 28, 2017 at 01:20:39PM +0300, Baruch Siach wrote:
> Hi Dave,
> 
> On Wed, Aug 23, 2017 at 09:11:00AM +0300, Baruch Siach wrote:
> > Commit b053dc5a722ea (powerpc: Refactor device tree binding) split the
> > Ethernet PHY binding documentation out of the big booting-without-of.txt
> > file, leaving a dangling reference to "section 2" in the 'interrupts'
> > property description. Drop that reference, and make the description look
> > more like the rest.
> > 
> > While at it, make the example interrupt-parent phandle look more like a
> > real world phandle, and use an IRQ_TYPE_ macro for the 'interrupts'
> > type.
> 
> This patch is now marked 'Not Applicable' in the netdev patchwork. Why is 
> that? Should it go through some other tree?

If it is only a binding change, I can apply. Though David does often 
pick them up anyway.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] dt-binding: net/phy: fix interrupts description
From: Rob Herring @ 2017-08-29 17:14 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Mark Rutland, David S . Miller, devicetree, netdev
In-Reply-To: <b3756346473feadfeba70ecb71960cad48e66621.1503468660.git.baruch@tkos.co.il>

On Wed, Aug 23, 2017 at 09:11:00AM +0300, Baruch Siach wrote:
> Commit b053dc5a722ea (powerpc: Refactor device tree binding) split the
> Ethernet PHY binding documentation out of the big booting-without-of.txt
> file, leaving a dangling reference to "section 2" in the 'interrupts'
> property description. Drop that reference, and make the description look
> more like the rest.
> 
> While at it, make the example interrupt-parent phandle look more like a
> real world phandle, and use an IRQ_TYPE_ macro for the 'interrupts'
> type.
> 
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>  Documentation/devicetree/bindings/net/phy.txt | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)

Applied.

Rob

^ permalink raw reply

* Re: [PATCH net-next 3/4] net: add NSH header structures and helpers
From: Jiri Benc @ 2017-08-29 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Yi Yang, Eric Garver, Jan Scheurich, Ben Pfaff
In-Reply-To: <4abd24a9ec958622186efacb4e46d709831fbaae.1503948295.git.jbenc@redhat.com>

On Mon, 28 Aug 2017 21:43:23 +0200, Jiri Benc wrote:
> This patch adds NSH header structures and helpers for NSH GSO
> support and Open vSwitch NSH support.
> 
> [1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

One thing to know before applying this to the kernel: NSH is still a
draft. It's not standardized yet. And the draft evolves, compare the
version from February to the latest one:

https://tools.ietf.org/html/draft-ietf-sfc-nsh-12#section-3.2
https://tools.ietf.org/html/draft-ietf-sfc-nsh-19#section-2.2

There's really no guarantee there won't be further changes.

This patchset by itself is harmless as there's no user and everything
can be changed. We should think through any uAPIs we're going to merge,
though. We don't want to repeat the problems with VXLAN and changed UDP
port.

Again, this patchset is harmless and can be applied even if the draft
changes. Further patches may not be and need to be designed very
carefully.

 Jiri

^ permalink raw reply

* Re: [PATCH net-next 7/7] samples/bpf: xdp_monitor tool based on tracepoints
From: Alexei Starovoitov @ 2017-08-29 17:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, John Fastabend
In-Reply-To: <150401749138.16384.17129327124102881342.stgit@firesoul>

On Tue, Aug 29, 2017 at 04:38:11PM +0200, Jesper Dangaard Brouer wrote:
> This tool xdp_monitor demonstrate how to use the different xdp_redirect
> tracepoints xdp_redirect{,_map}{,_err} from a BPF program.
> 
> The default mode is to only monitor the error counters, to avoid
> affecting the per packet performance. Tracepoints comes with a base
> overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
> using a full perf record (with non-matching filter).  Thus, default
> loading the --stats mode could affect the maximum performance.
> 
> This version of the tool is very simple and count all types of errors
> as one.  It will be natural to extend this later with the different
> types of errors that can occur, which should help users quickly
> identify common mistakes.
> 
> Because the TP_STRUCT was kept in sync all the tracepoints loads the
> same BPF code.  It would also be natural to extend the map version to
> demonstrate how the map information could be used.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Nice. Did you consider using libbbpf (instead of old bpf_load.c hack)
and make full standalone tool out of it? Looks very useful.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next 4/7] xdp: separate xdp_redirect tracepoint in error case
From: Alexei Starovoitov @ 2017-08-29 17:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, John Fastabend
In-Reply-To: <150401747611.16384.8021135230122395742.stgit@firesoul>

On Tue, Aug 29, 2017 at 04:37:56PM +0200, Jesper Dangaard Brouer wrote:
> There is a need to separate the xdp_redirect tracepoint into two
> tracepoints, for separating the error case from the normal forward
> case.
> 
> Due to the extreme speeds XDP is operating at, loading a tracepoint
> have a measurable impact.  Single core XDP REDIRECT (ethtool tuned
> rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
> bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
> (CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)
> 
> The overhead of loading a bpf-based tracepoint can be calculated to
> cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns).
> 
> Using perf record on the tracepoint event, with a non-matching --filter
> expression, the overhead is much larger. Performance drops to 8.3 Mpps,
> cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74))
> 
> Having a separate tracepoint for err cases, which should be less
> frequent, allow running a continuous monitor for errors while not
> affecting the redirect forward performance (this have also been
> verified by measurements).
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

thanks for detailed analysis of performance impact of the changes.
looks great to me.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox