Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net] nfp: cast sizeof() to int when comparing with error code
From: Joe Perches @ 2018-06-26  2:10 UTC (permalink / raw)
  To: Chengguang Xu, jakub.kicinski, davem, LKML, Julia Lawall, cocci
  Cc: oss-drivers, netdev, Dmitry Torokhov, linux-input, linux-s390
In-Reply-To: <20180626011631.22717-1-cgxu519@gmx.com>

On Tue, 2018-06-26 at 09:16 +0800, Chengguang Xu wrote:
> sizeof() will return unsigned value so in the error check
> negative error code will be always larger than sizeof().

This looks like a general class of error in the kernel
where a signed result that could be returning a -errno
is tested against < or <= sizeof()

A couple examples:

drivers/input/mouse/elan_i2c_smbus.c:

		len = i2c_smbus_read_block_data(client,
						ETP_SMBUS_IAP_PASSWORD_READ,
						val);
		if (len < sizeof(u16)) {

i2c_smbus_read_block_data can return a negative errno


net/smc/smc_clc.c:

	len = kernel_sendmsg(smc->clcsock, &msg, &vec, 1,
			     sizeof(struct smc_clc_msg_decline));
	if (len < sizeof(struct smc_clc_msg_decline))

where kernel_sendmsg can return a negative errno

There are probably others, I didn't look hard.

Perhaps a cocci script to find these could be generated?

^ permalink raw reply

* [PATCH net-next] liquidio: fix kernel panic when NIC firmware is older than 1.7.2
From: Felix Manlunas @ 2018-06-26 11:58 UTC (permalink / raw)
  To: davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	ricardo.farrington, felix.manlunas

From: Rick Farrington <ricardo.farrington@cavium.com>

Pre-1.7.2 NIC firmware does not support (and does not respond to) the "get
speed" command which is sent by the 1.7.2 driver during modprobe.  Due to a
bug in older firmware (with respect to unknown commands), this unsupported
command causes a cascade of errors that ends in a kernel panic.

Fix it by making the sending of the "get speed" command conditional on the
firmware version.

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
Note: To avoid checkpatch.pl "WARNING: line over 80 characters", the comma
      that separates the arguments in the call to strcmp() was placed one
      line below the usual spot.

 drivers/net/ethernet/cavium/liquidio/lio_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 7cb4e75..f83f884 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -3671,7 +3671,16 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 			OCTEON_CN2350_25GB_SUBSYS_ID ||
 		    octeon_dev->subsystem_id ==
 			OCTEON_CN2360_25GB_SUBSYS_ID) {
-			liquidio_get_speed(lio);
+			/* speed control unsupported in f/w older than 1.7.2 */
+			if (strcmp(octeon_dev->fw_info.liquidio_firmware_version
+			   , "1.7.2") < 0) {
+				dev_info(&octeon_dev->pci_dev->dev,
+					 "speed setting not supported by f/w.");
+				octeon_dev->speed_setting = 25;
+				octeon_dev->no_speed_setting = 1;
+			} else {
+				liquidio_get_speed(lio);
+			}
 
 			if (octeon_dev->speed_setting == 0) {
 				octeon_dev->speed_setting = 25;

^ permalink raw reply related

* Re: [PATCH v2 net] nfp: cast sizeof() to int when comparing with error code
From: Joe Perches @ 2018-06-26  2:29 UTC (permalink / raw)
  To: Chengguang Xu, jakub.kicinski, davem; +Cc: oss-drivers, netdev
In-Reply-To: <20180626011631.22717-1-cgxu519@gmx.com>

On Tue, 2018-06-26 at 09:16 +0800, Chengguang Xu wrote:
> sizeof() will return unsigned value so in the error check
> negative error code will be always larger than sizeof().
[]
> diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c
[]
> @@ -232,7 +232,7 @@ struct nfp_nffw_info *nfp_nffw_info_open(struct nfp_cpp *cpp)
>  	err = nfp_cpp_read(cpp, nfp_resource_cpp_id(state->res),
>  			   nfp_resource_address(state->res),
>  			   fwinf, sizeof(*fwinf));
> -	if (err < sizeof(*fwinf))
> +	if (err < (int)sizeof(*fwinf))
>  		goto err_release;
>  
>  	if (!nffw_res_flg_init_get(fwinf))

The way this is done in several places in the kernel is
to test first for < 0 and then test for < sizeof

	if (err < 0 || err < sizeof(etc...)

see net/ceph/ceph_common.c etc...

^ permalink raw reply

* Re: [PATCH v2 net] nfp: cast sizeof() to int when comparing with error code
From: cgxu519 @ 2018-06-26  2:48 UTC (permalink / raw)
  To: Joe Perches, jakub.kicinski, davem; +Cc: oss-drivers, netdev
In-Reply-To: <308957d91294ec1883ec492eecb8ffe1d51a0689.camel@perches.com>



On 06/26/2018 10:29 AM, Joe Perches wrote:
> On Tue, 2018-06-26 at 09:16 +0800, Chengguang Xu wrote:
>> sizeof() will return unsigned value so in the error check
>> negative error code will be always larger than sizeof().
> []
>> diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.c
> []
>> @@ -232,7 +232,7 @@ struct nfp_nffw_info *nfp_nffw_info_open(struct nfp_cpp *cpp)
>>   	err = nfp_cpp_read(cpp, nfp_resource_cpp_id(state->res),
>>   			   nfp_resource_address(state->res),
>>   			   fwinf, sizeof(*fwinf));
>> -	if (err < sizeof(*fwinf))
>> +	if (err < (int)sizeof(*fwinf))
>>   		goto err_release;
>>   
>>   	if (!nffw_res_flg_init_get(fwinf))
> The way this is done in several places in the kernel is
> to test first for < 0 and then test for < sizeof
>
> 	if (err < 0 || err < sizeof(etc...)
>
> see net/ceph/ceph_common.c etc...
If we need to distinguish the cases <0 and >0 && <sizeof() then that 
approach is better.
If not I think cast to int will be enough.

Thanks,
Chengguang.

^ permalink raw reply

* Re: [PATCH net-next] tcp: add SNMP counter for zero-window drops
From: David Miller @ 2018-06-26  2:50 UTC (permalink / raw)
  To: laoar.shao; +Cc: netdev
In-Reply-To: <1529848974-6384-1-git-send-email-laoar.shao@gmail.com>

From: Yafang Shao <laoar.shao@gmail.com>
Date: Sun, 24 Jun 2018 10:02:54 -0400

> It will be helpful if we could display the drops due to zero window or no
> enough window space.
> So a new SNMP MIB entry is added to track this behavior.
> This entry is named LINUX_MIB_TCPZEROWINDOWDROP and published in
> /proc/net/netstat in TcpExt line as TCPZeroWindowDrop.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

Applied.

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Steven Rostedt @ 2018-06-26  2:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: syzbot, linux-kernel, pmladek, sergey.senozhatsky, syzkaller-bugs,
	Samuel Ortiz, David S. Miller, linux-wireless, netdev
In-Reply-To: <20180626014924.GB11229@jagdpanzerIV>

On Tue, 26 Jun 2018 10:49:24 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> So we can try switching to ratelimited error reporting
> [that would be option A]:
> 
> ---
> 
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..2f3becb709b8 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -755,7 +755,7 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>  		pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>  					 frag_len + LLCP_HEADER_SIZE, &err);
>  		if (pdu == NULL) {
> -			pr_err("Could not allocate PDU\n");
> +			pr_err_ratelimited("Could not allocate PDU\n");
>  			continue;
>  		}
>  
> ---
> 
> 
> Or ratelimited error reporting and cond_resched()
> [that would be option B]:

I don't think this is a printk() issue per se, so I think Option B is
the only option. You should not get stuck in an infinite loop if we run
short on memory. Perhaps we could have an Option C which would exit
this loop gracefully with some kind of error. But I haven't looked at
the surrounding code to be sure if that is possible.

-- Steve

> 
> ---
> 
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..61741db4c4e6 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -755,7 +755,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>  		pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>  					 frag_len + LLCP_HEADER_SIZE, &err);
>  		if (pdu == NULL) {
> -			pr_err("Could not allocate PDU\n");
> +			pr_err_ratelimited("Could not allocate PDU\n");
> +			cond_resched();
>  			continue;
>  		}
>  
> ---

^ permalink raw reply

* Re: [net-next PATCH v4 3/7] net: sock: Change tx_queue_mapping in sock_common to unsigned short
From: Alexander Duyck @ 2018-06-26  3:25 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Amritha Nambiar, Linux Kernel Network Developers, David S. Miller,
	Alexander Duyck, Willem de Bruijn, Sridhar Samudrala,
	Eric Dumazet, Hannes Frederic Sowa
In-Reply-To: <CALx6S37uFs1shuPmno+L=p_Hyy1Q2qNaK+AqYvrk4HXTApL_Vg@mail.gmail.com>

On Mon, Jun 25, 2018 at 6:34 PM, Tom Herbert <tom@herbertland.com> wrote:
>
>
> On Mon, Jun 25, 2018 at 11:04 AM, Amritha Nambiar
> <amritha.nambiar@intel.com> wrote:
>>
>> Change 'skc_tx_queue_mapping' field in sock_common structure from
>> 'int' to 'unsigned short' type with 0 indicating unset and
>> a positive queue value being set. This way it is consistent with
>> the queue_mapping field in the sk_buff. This will also accommodate
>> adding a new 'unsigned short' field in sock_common in the next
>> patch for rx_queue_mapping.
>>
>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>> ---
>>  include/net/sock.h |   10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index b3b7541..009fd30 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -214,7 +214,7 @@ struct sock_common {
>>                 struct hlist_node       skc_node;
>>                 struct hlist_nulls_node skc_nulls_node;
>>         };
>> -       int                     skc_tx_queue_mapping;
>> +       unsigned short          skc_tx_queue_mapping;
>>         union {
>>                 int             skc_incoming_cpu;
>>                 u32             skc_rcv_wnd;
>> @@ -1681,17 +1681,19 @@ static inline int sk_receive_skb(struct sock *sk,
>> struct sk_buff *skb,
>>
>>  static inline void sk_tx_queue_set(struct sock *sk, int tx_queue)
>>  {
>> -       sk->sk_tx_queue_mapping = tx_queue;
>> +       /* sk_tx_queue_mapping accept only upto a 16-bit value */
>> +       WARN_ON((unsigned short)tx_queue > USHRT_MAX);
>
>
> Shouldn't this be USHRT_MAX - 1 ?

Actually just a ">=" would probably do as well.

>
>> +       sk->sk_tx_queue_mapping = tx_queue + 1;
>>  }
>>
>>  static inline void sk_tx_queue_clear(struct sock *sk)
>>  {
>> -       sk->sk_tx_queue_mapping = -1;
>>
>> +       sk->sk_tx_queue_mapping = 0;
>
>
> I think it's slightly better to define a new constant like NO_QUEUE_MAPPING
> to be USHRT_MAX. That avoids needing to do the arithmetic every time the
> value is accessed.
>>
>>  }
>>
>>  static inline int sk_tx_queue_get(const struct sock *sk)
>>  {
>> -       return sk ? sk->sk_tx_queue_mapping : -1;
>> +       return sk ? sk->sk_tx_queue_mapping - 1 : -1;
>
>
> Doesn't the comparison in __netdev_pick_tx need to be simultaneously changed
> for this?

This doesn't change the result. It was still -1 if the queue mapping
is not set. It was just initialized to 0 instead of to -1 so we have
to perform the operation to get there.

Also in regards to the comment above about needing an extra operation
I am not sure it makes much difference.

In the case of us starting with 0 as a reserved value I think the
instruction count should be about the same. We move the unsigned short
into an unsigned in, then decrement, and if the value is non-negative
we can assume it is valid. Although maybe I should double check the
code to make certain it is doing what I thought it was supposed to be
doing.

>
>>
>>
>>
>>  }
>>
>>  static inline void sk_set_socket(struct sock *sk, struct socket *sock)
>>
>

^ permalink raw reply

* [PATCH net-next] neighbour: force neigh_invalidate when NUD_FAILED update is from admin
From: Roopa Prabhu @ 2018-06-26  3:32 UTC (permalink / raw)
  To: davem; +Cc: netdev

From: Roopa Prabhu <roopa@cumulusnetworks.com>

In systems where neigh gc thresh holds are set to high values,
admin deleted neigh entries (eg ip neigh flush or ip neigh del) can
linger around in NUD_FAILED state for a long time until periodic gc kicks
in. This patch forces neigh_invalidate when NUD_FAILED neigh_update is
from an admin.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
My testing has not shown any problems with this patch. But i
am not sure why historically neigh admin was not considered here:
I am assuming that it is because the problem is not very obvious in
default low gc threshold deployments.
 net/core/neighbour.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8e3fda9..cbe85d8 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1148,7 +1148,8 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		neigh->nud_state = new;
 		err = 0;
 		notify = old & NUD_VALID;
-		if ((old & (NUD_INCOMPLETE | NUD_PROBE)) &&
+		if (((old & (NUD_INCOMPLETE | NUD_PROBE)) ||
+		     (flags & NEIGH_UPDATE_F_ADMIN)) &&
 		    (new & NUD_FAILED)) {
 			neigh_invalidate(neigh);
 			notify = 1;
-- 
2.1.4

^ permalink raw reply related

* [PATCH net v2 0/2] nfp: MPLS and shared blocks TC offload fixes
From: Jakub Kicinski @ 2018-06-26  3:36 UTC (permalink / raw)
  To: davem; +Cc: jiri, oss-drivers, netdev, Jakub Kicinski

Hi!

This series brings two fixes to TC filter/action offload code.
Pieter fixes matching MPLS packets when the match is purely on
the MPLS ethertype and none of the MPLS fields are used.
John provides a fix for offload of shared blocks.  Unfortunately,
with shared blocks there is currently no guarantee that filters
which were added by the core will be removed before block unbind.
Our simple fix is to not support offload of rules on shared blocks
at all, a revert of this fix will be send for -next once the
reoffload infrastructure lands.  The shared blocks became important
as we are trying to use them for bonding offload (managed from user
space) and lack of remove calls leads to resource leaks.

v2:
 - fix build error reported by kbuild bot due to missing
   tcf_block_shared() helper.

John Hurley (1):
  nfp: reject binding to shared blocks

Pieter Jansen van Vuuren (1):
  nfp: flower: fix mpls ether type detection

 drivers/net/ethernet/netronome/nfp/bpf/main.c      |  3 +++
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 14 ++++++++++++++
 .../net/ethernet/netronome/nfp/flower/offload.c    | 11 +++++++++++
 include/net/pkt_cls.h                              |  5 +++++
 4 files changed, 33 insertions(+)

-- 
2.17.1

^ permalink raw reply

* [PATCH net v2 1/2] nfp: flower: fix mpls ether type detection
From: Jakub Kicinski @ 2018-06-26  3:36 UTC (permalink / raw)
  To: davem; +Cc: jiri, oss-drivers, netdev, Pieter Jansen van Vuuren
In-Reply-To: <20180626033628.17660-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously it was not possible to distinguish between mpls ether types and
other ether types. This leads to incorrect classification of offloaded
filters that match on mpls ether type. For example the following two
filters overlap:

 # tc filter add dev eth0 parent ffff: \
    protocol 0x8847 flower \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 parent ffff: \
    protocol 0x0800 flower \
    action mirred egress redirect dev eth2

The driver now correctly includes the mac_mpls layer where HW stores mpls
fields, when it detects an mpls ether type. It also sets the MPLS_Q bit to
indicate that the filter should match mpls packets.

Fixes: bb055c198d9b ("nfp: add mpls match offloading support")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 14 ++++++++++++++
 .../net/ethernet/netronome/nfp/flower/offload.c    |  8 ++++++++
 2 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 91935405f586..84f7a5dbea9d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -123,6 +123,20 @@ nfp_flower_compile_mac(struct nfp_flower_mac_mpls *frame,
 			 NFP_FLOWER_MASK_MPLS_Q;
 
 		frame->mpls_lse = cpu_to_be32(t_mpls);
+	} else if (dissector_uses_key(flow->dissector,
+				      FLOW_DISSECTOR_KEY_BASIC)) {
+		/* Check for mpls ether type and set NFP_FLOWER_MASK_MPLS_Q
+		 * bit, which indicates an mpls ether type but without any
+		 * mpls fields.
+		 */
+		struct flow_dissector_key_basic *key_basic;
+
+		key_basic = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_BASIC,
+						      flow->key);
+		if (key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_UC) ||
+		    key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_MC))
+			frame->mpls_lse = cpu_to_be32(NFP_FLOWER_MASK_MPLS_Q);
 	}
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index c42e64f32333..477f584f6d28 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -264,6 +264,14 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 		case cpu_to_be16(ETH_P_ARP):
 			return -EOPNOTSUPP;
 
+		case cpu_to_be16(ETH_P_MPLS_UC):
+		case cpu_to_be16(ETH_P_MPLS_MC):
+			if (!(key_layer & NFP_FLOWER_LAYER_MAC)) {
+				key_layer |= NFP_FLOWER_LAYER_MAC;
+				key_size += sizeof(struct nfp_flower_mac_mpls);
+			}
+			break;
+
 		/* Will be included in layer 2. */
 		case cpu_to_be16(ETH_P_8021Q):
 			break;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net v2 2/2] nfp: reject binding to shared blocks
From: Jakub Kicinski @ 2018-06-26  3:36 UTC (permalink / raw)
  To: davem; +Cc: jiri, oss-drivers, netdev, John Hurley, Jakub Kicinski
In-Reply-To: <20180626033628.17660-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

TC shared blocks allow multiple qdiscs to be grouped together and filters
shared between them. Currently the chains of filters attached to a block
are only flushed when the block is removed. If a qdisc is removed from a
block but the block still exists, flow del messages are not passed to the
callback registered for that qdisc. For the NFP, this presents the
possibility of rules still existing in hw when they should be removed.

Prevent binding to shared blocks until the kernel can send per qdisc del
messages when block unbinds occur.

tcf_block_shared() was not used outside of the core until now, so also
add an empty implementation for builds with CONFIG_NET_CLS=n.

Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
v2:
 - add a tcf_block_shared() for !CONFIG_NET_CLS
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c       | 3 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 +++
 include/net/pkt_cls.h                               | 5 +++++
 3 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index fcdfb8e7fdea..6b15e3b11956 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -202,6 +202,9 @@ static int nfp_bpf_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	if (tcf_block_shared(f->block))
+		return -EOPNOTSUPP;
+
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 477f584f6d28..525057bee0ed 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -631,6 +631,9 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	if (tcf_block_shared(f->block))
+		return -EOPNOTSUPP;
+
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a3c1a2c47cd4..20b059574e60 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -111,6 +111,11 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 {
 }
 
+static inline bool tcf_block_shared(struct tcf_block *block)
+{
+	return false;
+}
+
 static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 {
 	return NULL;
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v2 2/3] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Masahiro Yamada @ 2018-06-26  3:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, Alexei Starovoitov, David S . Miller, Arnd Bergmann,
	Geert Uytterhoeven, Linux Kernel Mailing List, YueHaibing,
	Daniel Borkmann
In-Reply-To: <20180615004704.u5gofft7k6ehmhwi@ast-mbp.dhcp.thefacebook.com>

Hi Alexei,


2018-06-15 9:47 GMT+09:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
> On Thu, Jun 14, 2018 at 11:39:31PM +0900, Masahiro Yamada wrote:
>> What we want here is to embed a user-space program into the kernel.
>> Instead of the complex ELF magic, let's simply wrap it in the assembly
>> with the '.incbin' directive.
>>
>> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
>> ---
>>
>> Changes in v2:
>>   - Rebase
>>
>>  net/bpfilter/Makefile            | 15 ++-------------
>>  net/bpfilter/bpfilter_kern.c     | 11 +++++------
>>  net/bpfilter/bpfilter_umh_blob.S |  7 +++++++
>>  3 files changed, 14 insertions(+), 19 deletions(-)
>>  create mode 100644 net/bpfilter/bpfilter_umh_blob.S
>>
>> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
>> index e0bbe75..39c6980 100644
>> --- a/net/bpfilter/Makefile
>> +++ b/net/bpfilter/Makefile
>> @@ -15,18 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
>>  HOSTLDFLAGS += -static
>>  endif
>>
>> -# a bit of elf magic to convert bpfilter_umh binary into a binary blob
>> -# inside bpfilter_umh.o elf file referenced by
>> -# _binary_net_bpfilter_bpfilter_umh_start symbol
>> -# which bpfilter_kern.c passes further into umh blob loader at run-time
>> -quiet_cmd_copy_umh = GEN $@
>> -      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
>> -      $(OBJCOPY) -I binary -O `$(OBJDUMP) -f $<|grep format|cut -d' ' -f8` \
>> -      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
>> -      --rename-section .data=.init.rodata $< $@
>> -
>> -$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
>> -     $(call cmd,copy_umh)
>> +$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
>>
>>  obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
>> -bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
>> +bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
>> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
>> index 0952257..6de3ae5 100644
>> --- a/net/bpfilter/bpfilter_kern.c
>> +++ b/net/bpfilter/bpfilter_kern.c
>> @@ -10,11 +10,8 @@
>>  #include <linux/file.h>
>>  #include "msgfmt.h"
>>
>> -#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
>> -#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
>> -
>> -extern char UMH_start;
>> -extern char UMH_end;
>> +extern char bpfilter_umh_start;
>> +extern char bpfilter_umh_end;
>>
>>  static struct umh_info info;
>>  /* since ip_getsockopt() can run in parallel, serialize access to umh */
>> @@ -93,7 +90,9 @@ static int __init load_umh(void)
>>       int err;
>>
>>       /* fork usermode process */
>> -     err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
>> +     err = fork_usermode_blob(&bpfilter_umh_end,
>> +                              &bpfilter_umh_end - &bpfilter_umh_start,
>> +                              &info);
>>       if (err)
>>               return err;
>>       pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
>> diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
>> new file mode 100644
>> index 0000000..40311d1
>> --- /dev/null
>> +++ b/net/bpfilter/bpfilter_umh_blob.S
>> @@ -0,0 +1,7 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +     .section .init.rodata, "a"
>> +     .global bpfilter_umh_start
>> +bpfilter_umh_start:
>> +     .incbin "net/bpfilter/bpfilter_umh"
>> +     .global bpfilter_umh_end
>> +bpfilter_umh_end:
>
> for some reason it doesn't work.
> fork_usermode_blob() returns ENOEXEC
> You should be able to test it simply running 'iptables -L'.
> Without this patch you should see:
> [   12.696937] bpfilter: Loaded bpfilter_umh pid 225
> Started bpfilter
>
> where first line comes from kernel module and second from umh.


Sorry for the late reply.

Unfortunately, I will be busy for a while.

I will come back eventually
to check it out, but I cannot tell when.


Somebody else sent a patch equivalent to 1/3, so it is fine.

3/3 can go independently, so it will send it as a separate patch for now.





-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH nf-next v2] openvswitch: use nf_ct_get_tuplepr, invert_tuplepr
From: Pravin Shelar @ 2018-06-26  3:46 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, Linux Kernel Network Developers, ovs dev
In-Reply-To: <20180625155532.20577-1-fw@strlen.de>

On Mon, Jun 25, 2018 at 8:55 AM, Florian Westphal <fw@strlen.de> wrote:
> These versions deal with the l3proto/l4proto details internally.
> It removes only caller of nf_ct_get_tuple, so make it static.
>
> After this, l3proto->get_l4proto() can be removed in a followup patch.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

^ permalink raw reply

* [RESEND PATCH] bpfilter: check compiler capability in Kconfig
From: Masahiro Yamada @ 2018-06-26  3:55 UTC (permalink / raw)
  To: David S . Miller, netdev, Alexei Starovoitov
  Cc: Matteo Croce, Arnd Bergmann, Masahiro Yamada, linux-kbuild,
	Alexei Starovoitov, linux-kernel, Michal Marek, Daniel Borkmann

With the brand-new syntax extension of Kconfig, we can directly
check the compiler capability in the configuration phase.

If the cc-can-link.sh fails, the BPFILTER_UMH is automatically
hidden by the dependency.

I also deleted 'default n', which is no-op.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 Makefile               | 5 -----
 net/Makefile           | 4 ----
 net/bpfilter/Kconfig   | 2 +-
 scripts/cc-can-link.sh | 2 +-
 4 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/Makefile b/Makefile
index a6d4872..f71ea52 100644
--- a/Makefile
+++ b/Makefile
@@ -520,11 +520,6 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) $(KBUILD_CFLA
   KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
 endif
 
-ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/cc-can-link.sh $(CC)), y)
-  CC_CAN_LINK := y
-  export CC_CAN_LINK
-endif
-
 # The expansion should be delayed until arch/$(SRCARCH)/Makefile is included.
 # Some architectures define CROSS_COMPILE in arch/$(SRCARCH)/Makefile.
 # CC_VERSION_TEXT is referenced from Kconfig (so it needs export),
diff --git a/net/Makefile b/net/Makefile
index 13ec0d5..bdaf539 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -20,11 +20,7 @@ obj-$(CONFIG_TLS)		+= tls/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX)		+= unix/
 obj-$(CONFIG_NET)		+= ipv6/
-ifneq ($(CC_CAN_LINK),y)
-$(warning CC cannot link executables. Skipping bpfilter.)
-else
 obj-$(CONFIG_BPFILTER)		+= bpfilter/
-endif
 obj-$(CONFIG_PACKET)		+= packet/
 obj-$(CONFIG_NET_KEY)		+= key/
 obj-$(CONFIG_BRIDGE)		+= bridge/
diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
index a948b07..76deb66 100644
--- a/net/bpfilter/Kconfig
+++ b/net/bpfilter/Kconfig
@@ -1,6 +1,5 @@
 menuconfig BPFILTER
 	bool "BPF based packet filtering framework (BPFILTER)"
-	default n
 	depends on NET && BPF && INET
 	help
 	  This builds experimental bpfilter framework that is aiming to
@@ -9,6 +8,7 @@ menuconfig BPFILTER
 if BPFILTER
 config BPFILTER_UMH
 	tristate "bpfilter kernel module with user mode helper"
+	depends on $(success,$(srctree)/scripts/cc-can-link.sh $(CC))
 	default m
 	help
 	  This builds bpfilter kernel module with embedded user mode helper
diff --git a/scripts/cc-can-link.sh b/scripts/cc-can-link.sh
index 208eb28..6efcead 100755
--- a/scripts/cc-can-link.sh
+++ b/scripts/cc-can-link.sh
@@ -1,7 +1,7 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
-cat << "END" | $@ -x c - -o /dev/null >/dev/null 2>&1 && echo "y"
+cat << "END" | $@ -x c - -o /dev/null >/dev/null 2>&1
 #include <stdio.h>
 int main(void)
 {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] net: preserve sock reference when scrubbing the skb.
From: Cong Wang @ 2018-06-26  4:15 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: Linux Kernel Network Developers, Eric Dumazet, Paolo Abeni,
	David Miller, Florian Westphal, NetFilter
In-Reply-To: <20180625155610.30802-1-fbl@redhat.com>

On Mon, Jun 25, 2018 at 8:59 AM Flavio Leitner <fbl@redhat.com> wrote:
>
> The sock reference is lost when scrubbing the packet and that breaks
> TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
> performance impacts of about 50% in a single TCP stream when crossing
> network namespaces.
>
> XPS breaks because the queue mapping stored in the socket is not
> available, so another random queue might be selected when the stack
> needs to transmit something like a TCP ACK, or TCP Retransmissions.
> That causes packet re-ordering and/or performance issues.
>
> TSQ breaks because it orphans the packet while it is still in the
> host, so packets are queued contributing to the buffer bloat problem.

Why should TSQ in one stack care about buffer bloat in another stack?

Actually, I think the current behavior is correct, once the packet leaves
its current stack (or netns), it should relief the backpressure on TCP
socket in this stack, whether it will be queued in another stack is beyond
its concern. This breaks the isolation between networking stacks.

^ permalink raw reply

* Re: [PATCH rdma-next 08/12] overflow.h: Add arithmetic shift helper
From: Leon Romanovsky @ 2018-06-26  4:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Rasmus Villemoes, Doug Ledford, Kees Cook, RDMA mailing list,
	Hadar Hen Zion, Matan Barak, Michael J Ruhl, Noa Osherovich,
	Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev,
	linux-kernel
In-Reply-To: <20180625171157.GE5356@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1413 bytes --]

On Mon, Jun 25, 2018 at 11:11:57AM -0600, Jason Gunthorpe wrote:
> On Mon, Jun 25, 2018 at 11:26:05AM +0200, Rasmus Villemoes wrote:
>
> >    check_shift_overflow(a, s, d) {
> >        unsigned _nbits = 8*sizeof(a);
> >        typeof(a) _a = (a);
> >        typeof(s) _s = (s);
> >        typeof(d) _d = (d);
> >
> >        *_d = ((u64)(_a) << (_s & (_nbits-1)));
> >        _s >= _nbits || (_s > 0 && (_a >> (_nbits - _s -
> >    is_signed_type(a))) != 0);
> >    }
>
> Those types are not quite right.. What about this?
>
>     check_shift_overflow(a, s, d) ({
>         unsigned int _nbits = 8*sizeof(d) - is_signed_type(d);
>         typeof(d) _a = a;  // Shift is always performed on type 'd'
>         typeof(s) _s = s;
>         typeof(d) _d = d;
>
>         *_d = (_a << (_s & (_nbits-1)));
>
> 	(((*_d) >> (_s & (_nbits-1)) != _a);
>     })
>
> And can we use mathamatcial invertability to prove no overlow and
> bound _a ? As above.

Rasmus and Jason,

Thanks for the feedback.
The reason why I introduced function, because wanted to reuse
check_mul_overflow macro, but for any reasons which I don't remember
now, I had hard time to fix compilation errors.

Anyway, I'll resubmit.

Thanks


>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-next 00/12] RDMA fixes 2018-06-24
From: Leon Romanovsky @ 2018-06-26  4:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Matan Barak, Michael J Ruhl,
	Noa Osherovich, Raed Salem, Yishai Hadas, Saeed Mahameed,
	linux-netdev
In-Reply-To: <20180625213438.GA19857@ziepe.ca>

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

On Mon, Jun 25, 2018 at 03:34:38PM -0600, Jason Gunthorpe wrote:
> On Sun, Jun 24, 2018 at 11:23:41AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Hi,
> >
> > This is bunch of patches trigged by running syzkaller internally.
> >
> > I'm sending them based on rdma-next mainly for two reasons:
> > 1, Most of the patches fix the old issues and it doesn't matter when
> > they will hit the Linus's tree: now or later in a couple of weeks
> > during merge window.
> > 2. They interleave with code cleanup, mlx5-next patches and Michael's
> > feedback on flow counters series.
> >
> > Thanks
> >
> > Leon Romanovsky (12):
> >   RDMA/uverbs: Protect from attempts to create flows on unsupported QP
> >   RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
>
> I applied these two to for-rc
>
> >   RDMA/uverbs: Check existence of create_flow callback
> >   RDMA/verbs: Drop kernel variant of create_flow
> >   RDMA/verbs: Drop kernel variant of destroy_flow
> >   net/mlx5: Rate limit errors in command interface
> >   RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
> >   RDMA/umem: Don't check for negative return value of dma_map_sg_attrs()
> >   RDMA/uverbs: Remove redundant check
>
> These to for-next

Jason,

We would like to see patch "[PATCH mlx5-next 05/12] net/mlx5:
Rate limit errors in command interface" in out mlx5-next. Is it possible
at this point to drop it from for-next, so I'll be able to take it into
mlx5-next?

Thanks

>
> >   overflow.h: Add arithmetic shift helper
> >   RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq
> >   RDMA/mlx5: Reuse existed shift_overlow helper
>
> And these will have to be respun.
>
> Thanks,
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Sergey Senozhatsky @ 2018-06-26  4:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, syzbot, linux-kernel, pmladek,
	sergey.senozhatsky, syzkaller-bugs, Samuel Ortiz, David S. Miller,
	linux-wireless, netdev
In-Reply-To: <20180625225937.43aee76c@vmware.local.home>

On (06/25/18 22:59), Steven Rostedt wrote:
> > Or ratelimited error reporting and cond_resched()
> > [that would be option B]:
> 
> I don't think this is a printk() issue per se, so I think Option B is
> the only option. You should not get stuck in an infinite loop if we run
> short on memory. Perhaps we could have an Option C which would exit
> this loop gracefully with some kind of error. But I haven't looked at
> the surrounding code to be sure if that is possible.

Agree. I like Option B - an endless loop is the root cause, at the same
time filling up logbuf with useless data is useless. Can't tell if Option C
is feasible, up to networking people.

	-ss

^ permalink raw reply

* Re: [PATCH rdma-next 08/12] overflow.h: Add arithmetic shift helper
From: Leon Romanovsky @ 2018-06-26  4:24 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, RDMA mailing list,
	Hadar Hen Zion, Matan Barak, Michael J Ruhl, Noa Osherovich,
	Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev,
	linux-kernel
In-Reply-To: <CAKwiHFhgsyWYD+q+JFb2HJEphnjiiOp=o4Airv3MW031q2jx8w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

On Mon, Jun 25, 2018 at 11:26:05AM +0200, Rasmus Villemoes wrote:
> On 24 June 2018 at 10:23, Leon Romanovsky <leon@kernel.org> wrote:
>
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Add shift_overflow() helper to help driver authors to ensure that
> > shift operand doesn't cause to overflow, which is very common pattern
> > for RDMA drivers.
> >
>
> Not a huge fan. The other _overflow functions have a different behaviour
> (in how they return the result and the overflow status) and are
> type-generic, and I think someone at some point will use such a
> generically-named helper for stuff other than size_t. At least the
> array_size and struct_size helpers have size in their name and are
> specifically about computing the size of something, and are designed to be
> used directly as arguments to allocators, where SIZE_MAX is a suitable
> sentinel. I can't see the other patches in this series, so I don't know how
> you plan on using it, but it should also be usable outside rdma.
>
> Aside: why does b have type size_t?
>
> Does __must_check really make sense for a function without side effects? It
> doesn't tell gcc to warn if the result is not used in a conditional, it
> just warns if the result is not used at all, which wouldn't realistically
> happen for a pure function.
>
> I'd much rather see a type-generic check_shift_overflow (we can agree to
> leave "left" out of the name) with semantics similar to the other
> check_*_overflow functions. Then, if a size_t-eating, SIZE_MAX-returning
> helper is more convenient for rdma, that should be easy to implement on top
> of that. It shouldn't really be that hard to do. Something like
>
> check_shift_overflow(a, s, d) {
>     unsigned _nbits = 8*sizeof(a);
>     typeof(a) _a = (a);
>     typeof(s) _s = (s);
>     typeof(d) _d = (d);
>
>     *_d = ((u64)(_a) << (_s & (_nbits-1)));
>
>     _s >= _nbits || (_s > 0 && (_a >> (_nbits - _s - is_signed_type(a))) !=
> 0);
> }
>
> which should also handle shifts of signed types (though it allows << 0 for
> negative values; that's easy to also disallow). But the exact semantics
> should be documented via a bunch of tests (hint hint) exercising corner
> cases.

I'll respin.

Thanks for the feedback.

>
> Rasmus

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/5] sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams
From: Xin Long @ 2018-06-26  4:33 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: 吉藤英明, Neil Horman, David Miller,
	network dev, linux-sctp, yoshfuji
In-Reply-To: <20180625163157.GA542@localhost.localdomain>

On Tue, Jun 26, 2018 at 12:31 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> Hi,
>
> On Tue, Jun 26, 2018 at 01:12:00AM +0900, 吉藤英明 wrote:
>> Hi,
>>
>> 2018-06-25 22:03 GMT+09:00 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>:
>> > On Mon, Jun 25, 2018 at 07:28:47AM -0400, Neil Horman wrote:
>> >> On Mon, Jun 25, 2018 at 04:31:26PM +0900, David Miller wrote:
>> >> > From: Xin Long <lucien.xin@gmail.com>
>> >> > Date: Mon, 25 Jun 2018 10:14:35 +0800
>> >> >
>> >> > >  struct sctp_paddrparams {
>> >> > > @@ -773,6 +775,8 @@ struct sctp_paddrparams {
>> >> > >   __u32                   spp_pathmtu;
>> >> > >   __u32                   spp_sackdelay;
>> >> > >   __u32                   spp_flags;
>> >> > > + __u32                   spp_ipv6_flowlabel;
>> >> > > + __u8                    spp_dscp;
>> >> > >  } __attribute__((packed, aligned(4)));
>> >> >
>> >> > I don't think you can change the size of this structure like this.
>> >> >
>> >> > This check in sctp_setsockopt_peer_addr_params():
>> >> >
>> >> >     if (optlen != sizeof(struct sctp_paddrparams))
>> >> >             return -EINVAL;
>> >> >
>> >> > is going to trigger in old kernels when executing programs
>> >> > built against the new struct definition.
>> >
>> > That will happen, yes, but do we really care about being future-proof
>> > here? I mean: if we also update such check(s) to support dealing with
>> > smaller-than-supported structs, newer kernels will be able to run
>> > programs built against the old struct, and the new one; while building
>> > using newer headers and running on older kernel may fool the
>> > application in other ways too (like enabling support for something
>> > that is available on newer kernel and that is not present in the older
>> > one).
>>
>> We should not break existing apps.
>> We still accept apps of pre-2.4 era without sin6_scope_id
>> (e.g., net/ipv6/af_inet6.c:inet6_bind()).
>
> Yes. That's what I tried to say. That is supporting an old app built
> with old kernel headers and running on a newer kernel, and not the
> other way around (an app built with fresh headers and running on an
> old kernel).
To make it, I will update the check like:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1df5d07..c949d8c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2715,13 +2715,18 @@ static int
sctp_setsockopt_peer_addr_params(struct sock *sk,
        struct sctp_sock        *sp = sctp_sk(sk);
        int error;
        int hb_change, pmtud_change, sackdelay_change;
+       int plen = sizeof(params);
+       int old_plen = plen - sizeof(u32) * 2;

-       if (optlen != sizeof(struct sctp_paddrparams))
+       if (optlen != plen && optlen != old_plen)
                return -EINVAL;

        if (copy_from_user(&params, optval, optlen))
                return -EFAULT;

+       if (optlen == old_plen)
+               params.spp_flags &= ~(SPP_DSCP | SPP_IPV6_FLOWLABEL);
+
        /* Validate flags and value parameters. */
        hb_change        = params.spp_flags & SPP_HB;
        pmtud_change     = params.spp_flags & SPP_PMTUD;
@@ -5591,10 +5596,13 @@ static int
sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
        struct sctp_transport   *trans = NULL;
        struct sctp_association *asoc = NULL;
        struct sctp_sock        *sp = sctp_sk(sk);
+       int plen = sizeof(params);
+       int old_plen = plen - sizeof(u32) * 2;

-       if (len < sizeof(struct sctp_paddrparams))
+       if (len < old_plen)
                return -EINVAL;
-       len = sizeof(struct sctp_paddrparams);
+
+       len = len >= plen ? plen : old_plen;
        if (copy_from_user(&params, optval, len))
                return -EFAULT;

does it look ok to you?

^ permalink raw reply related

* Re: [PATCH net-next 1/5] ipv4: add __ip_queue_xmit() that supports tos param
From: Xin Long @ 2018-06-26  4:38 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, network dev, linux-sctp, Marcelo Ricardo Leitner
In-Reply-To: <20180625111312.GA16772@hmswarspite.think-freely.org>

On Mon, Jun 25, 2018 at 7:13 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Mon, Jun 25, 2018 at 04:26:54PM +0900, David Miller wrote:
>> From: Xin Long <lucien.xin@gmail.com>
>> Date: Mon, 25 Jun 2018 10:14:33 +0800
>>
>> > +EXPORT_SYMBOL(__ip_queue_xmit);
>> > +
>> > +int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
>> > +{
>> > +   return __ip_queue_xmit(sk, skb, fl, inet_sk(sk)->tos);
>> > +}
>> >  EXPORT_SYMBOL(ip_queue_xmit);
>>
>> Maybe better to only export __ip_queue_xmit() and make ip_queue_xmit() an
>> inline function in net/ip.h?
>>
> I concur.  No need to export both here
>
will do that.

^ permalink raw reply

* [PATCH] NFC: llcp: fix nfc_llcp_send_ui_frame() lockup
From: Sergey Senozhatsky @ 2018-06-26  4:41 UTC (permalink / raw)
  To: Samuel Ortiz, David S. Miller
  Cc: Steven Rostedt, Petr Mladek, syzkaller-bugs, linux-wireless,
	netdev, linux-kernel, syzbot, sergey.senozhatsky

syzbot reported the following nfc_llcp_send_ui_frame() lockup:

The kernel is CONFIG_PREEMPT_VOLUNTARY=y, llcp_sock_sendmsg() stuck
in an infinite error reporting loop, because the system is low memory
and MSG_DONTWAIT nfc_alloc_send_skb() allocations fail:

        do {
        ...
                pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
                                         frag_len + LLCP_HEADER_SIZE, &err);
                if (pdu == NULL) {
                        pr_err("Could not allocate PDU\n");
                        continue;
                }
        ...
        } while (remaining_len > 0);

nfc_llcp_send_ui_frame() spent enough time (94+ sec) trying to
allocate PDU, which resulted in RCU stall due to PREEMPT_VOLUNTARY:

 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
...
 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
 INFO: rcu_sched self-detected stall on CPU
         1-....: (20918 ticks this GP) idle=55a/1/4611686018427387906
 softirq=11347/11347 fqs=20240
          (t=125005 jiffies g=5572 c=5571 q=149)
 NMI backtrace for cpu 1
 CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #115
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
 01/01/2011
 Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1336
  print_cpu_stall kernel/rcu/tree.c:1485 [inline]
  check_cpu_stall.isra.60.cold.78+0x36c/0x5a6 kernel/rcu/tree.c:1553
  __rcu_pending kernel/rcu/tree.c:3244 [inline]
  rcu_pending kernel/rcu/tree.c:3291 [inline]
  rcu_check_callbacks+0x23f/0xcd0 kernel/rcu/tree.c:2646
  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
  __hrtimer_run_queues+0x3eb/0x10c0 kernel/time/hrtimer.c:1460
  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
  smp_apic_timer_interrupt+0x165/0x730 arch/x86/kernel/apic/apic.c:1050
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
  </IRQ>
 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
 RIP: 0010:console_unlock+0xc84/0x10b0 kernel/printk/printk.c:2397
 Code: c1 e8 03 42 80 3c 38 00 0f 85 bd 03 00 00 48 83 3d 38 f7 8e 07 00 0f 84
 69 02 00 00 e8 45 56 19 00 48 8b bd b0 fe ff ff 57 9d <0f> 1f 44 00 00 e9 96
 f5 ff ff e8 2d 56 19 00 48 8b 7d 08 e8 94 cf
 RSP: 0018:ffff8801aab0f358 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
 RAX: ffff8801aa2802c0 RBX: 0000000000000200 RCX: 1ffff10035450163
 RDX: 0000000000000000 RSI: ffffffff8162b8fb RDI: 0000000000000293
 RBP: ffff8801aab0f4c0 R08: ffff8801aa280af8 R09: 0000000000000006
 R10: ffff8801aa2802c0 R11: 0000000000000000 R12: 0000000000000000
 R13: ffffffff84ea9880 R14: 0000000000000001 R15: dffffc0000000000
  vprintk_emit+0x6c6/0xdf0 kernel/printk/printk.c:1907
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
  printk+0xa7/0xcf kernel/printk/printk.c:1981
  nfc_llcp_send_ui_frame.cold.9+0x18/0x1f net/nfc/llcp_commands.c:758
  llcp_sock_sendmsg+0x278/0x350 net/nfc/llcp_sock.c:786
  sock_sendmsg_nosec net/socket.c:645 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:655
  ___sys_sendmsg+0x51d/0x930 net/socket.c:2161
  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2256
  __do_sys_sendmmsg net/socket.c:2285 [inline]
  __se_sys_sendmmsg net/socket.c:2282 [inline]
  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2282
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Address the issues by rate limiting nfc_alloc_send_skb() allocation
error, to avoid logbuf pollution, and do cond_resched() before llcp
attempts to allocate PDU again.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com
---
 net/nfc/llcp_commands.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
index 2ceefa183cee..e19fadaa9022 100644
--- a/net/nfc/llcp_commands.c
+++ b/net/nfc/llcp_commands.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/sched.h>
 #include <linux/nfc.h>
 
 #include <net/nfc/nfc.h>
@@ -755,7 +756,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
 		pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
 					 frag_len + LLCP_HEADER_SIZE, &err);
 		if (pdu == NULL) {
-			pr_err("Could not allocate PDU\n");
+			pr_err_ratelimited("Could not allocate PDU\n");
+			cond_resched();
 			continue;
 		}
 
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCHv2 net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Xin Long @ 2018-06-26  4:41 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: network dev, linux-sctp, Neil Horman, Michael Tuexen, davem
In-Reply-To: <20180625133024.GB820@localhost.localdomain>

On Mon, Jun 25, 2018 at 9:30 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Mon, Jun 25, 2018 at 10:06:46AM +0800, Xin Long wrote:
>> This feature is actually already supported by sk->sk_reuse which can be
>> set by socket level opt SO_REUSEADDR. But it's not working exactly as
>> RFC6458 demands in section 8.1.27, like:
>>
>>   - This option only supports one-to-one style SCTP sockets
>>   - This socket option must not be used after calling bind()
>>     or sctp_bindx().
>>
>> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
>> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
>> work in linux.
>>
>> To separate it from the socket level version, this patch adds 'reuse' in
>> sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
>> setup limitations that are needed when it is being enabled.
>>
>> "It should be noted that the behavior of the socket-level socket option
>> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
>> leaves SO_REUSEADDR as is for the compatibility.
>>
>> Note that the name SCTP_REUSE_PORT is kind of confusing, it is identical
>> to SO_REUSEADDR with some extra restriction, so here it uses 'reuse' in
>> sctp_sock instead of 'reuseport'. As for sk->sk_reuseport support for
>> SCTP, it will be added in another patch.
>
> To help changelog readers later, please update to something like:
>
> """\
> Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
> functionality is nearly identical to SO_REUSEADDR, but with some
> extra restrictions. Here it uses 'reuse' in sctp_sock instead of
> 'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
> added in another patch.
> """
>
> Makes sense, can you note the difference?
Sure, will post v3. thanks.

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Dmitry Vyukov @ 2018-06-26  4:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: syzbot, LKML, Petr Mladek, Steven Rostedt, Sergey Senozhatsky,
	syzkaller-bugs, Samuel Ortiz, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <20180626014924.GB11229@jagdpanzerIV>

On Tue, Jun 26, 2018 at 3:49 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (06/25/18 16:19), syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    77072ca59fdd Merge tag 'for-linus-20180623' of git://git.k..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=169c7c04400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=befbcd7305e41bb0
>> dashboard link: https://syzkaller.appspot.com/bug?extid=d29d18215e477cfbfbdd
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1585147f800000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com
>>
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>
> Hi,
>
> Thanks for the report.
>
> I'll Cc networking people on this.
>
> I've a strong feeling that we saw it before.

A very similar bug was reported before:

https://groups.google.com/d/msg/syzkaller-bugs/Axw2t6DvU60/TfLUoXsjBAAJ



> The kernel is
> CONFIG_PREEMPT_VOLUNTARY=y, llcp_sock_sendmsg() stuck in a error
> reporting loop:
>
>         do {
>         ...
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
>                         pr_err("Could not allocate PDU\n");
>                         continue;
>                 }
>         ...
>         } while (remaining_len > 0);
>
> [ 1004.674843] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> [ 1004.681035] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> ...
> [ 1098.508526] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> [ 1098.514698] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> [ 1098.520844] INFO: rcu_sched self-detected stall on CPU
>
> 94 seconds worth of heavy printing, no preemption:
>
>> INFO: rcu_sched self-detected stall on CPU
>>       1-....: (20918 ticks this GP) idle=55a/1/4611686018427387906
>> softirq=11347/11347 fqs=20240
>>        (t=125005 jiffies g=5572 c=5571 q=149)
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #115
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  <IRQ>
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1336
>>  print_cpu_stall kernel/rcu/tree.c:1485 [inline]
>>  check_cpu_stall.isra.60.cold.78+0x36c/0x5a6 kernel/rcu/tree.c:1553
>>  __rcu_pending kernel/rcu/tree.c:3244 [inline]
>>  rcu_pending kernel/rcu/tree.c:3291 [inline]
>>  rcu_check_callbacks+0x23f/0xcd0 kernel/rcu/tree.c:2646
>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>  __hrtimer_run_queues+0x3eb/0x10c0 kernel/time/hrtimer.c:1460
>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>  smp_apic_timer_interrupt+0x165/0x730 arch/x86/kernel/apic/apic.c:1050
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>  </IRQ>
>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>> [inline]
>> RIP: 0010:console_unlock+0xc84/0x10b0 kernel/printk/printk.c:2397
>> Code: c1 e8 03 42 80 3c 38 00 0f 85 bd 03 00 00 48 83 3d 38 f7 8e 07 00 0f
>> 84 69 02 00 00 e8 45 56 19 00 48 8b bd b0 fe ff ff 57 9d <0f> 1f 44 00 00 e9
>> 96 f5 ff ff e8 2d 56 19 00 48 8b 7d 08 e8 94 cf
>> RSP: 0018:ffff8801aab0f358 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
>> RAX: ffff8801aa2802c0 RBX: 0000000000000200 RCX: 1ffff10035450163
>> RDX: 0000000000000000 RSI: ffffffff8162b8fb RDI: 0000000000000293
>> RBP: ffff8801aab0f4c0 R08: ffff8801aa280af8 R09: 0000000000000006
>> R10: ffff8801aa2802c0 R11: 0000000000000000 R12: 0000000000000000
>> R13: ffffffff84ea9880 R14: 0000000000000001 R15: dffffc0000000000
>>  vprintk_emit+0x6c6/0xdf0 kernel/printk/printk.c:1907
>>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
>>  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
>>  printk+0xa7/0xcf kernel/printk/printk.c:1981
>>  nfc_llcp_send_ui_frame.cold.9+0x18/0x1f net/nfc/llcp_commands.c:758
>>  llcp_sock_sendmsg+0x278/0x350 net/nfc/llcp_sock.c:786
>>  sock_sendmsg_nosec net/socket.c:645 [inline]
>>  sock_sendmsg+0xd5/0x120 net/socket.c:655
>>  ___sys_sendmsg+0x51d/0x930 net/socket.c:2161
>>  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2256
>>  __do_sys_sendmmsg net/socket.c:2285 [inline]
>>  __se_sys_sendmmsg net/socket.c:2282 [inline]
>>  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2282
>>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
>
> So we can try switching to ratelimited error reporting
> [that would be option A]:
>
> ---
>
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..2f3becb709b8 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -755,7 +755,7 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
> -                       pr_err("Could not allocate PDU\n");
> +                       pr_err_ratelimited("Could not allocate PDU\n");
>                         continue;
>                 }
>
> ---
>
>
> Or ratelimited error reporting and cond_resched()
> [that would be option B]:
>
> ---
>
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..61741db4c4e6 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -755,7 +755,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
> -                       pr_err("Could not allocate PDU\n");
> +                       pr_err_ratelimited("Could not allocate PDU\n");
> +                       cond_resched();
>                         continue;
>                 }
>
> ---
>
>         -ss
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180626014924.GB11229%40jagdpanzerIV.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Jakub Kicinski @ 2018-06-26  4:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, simon.horman, john.hurley,
	dsahern, mlxsw
In-Reply-To: <20180625210148.9386-1-jiri@resnulli.us>

On Mon, 25 Jun 2018 23:01:39 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> For the TC clsact offload these days, some of HW drivers need
> to hold a magic ball. The reason is, with the first inserted rule inside
> HW they need to guess what fields will be used for the matching. If
> later on this guess proves to be wrong and user adds a filter with a
> different field to match, there's a problem. Mlxsw resolves it now with
> couple of patterns. Those try to cover as many match fields as possible.
> This aproach is far from optimal, both performance-wise and scale-wise.
> Also, there is a combination of filters that in certain order won't
> succeed.
> 
> Most of the time, when user inserts filters in chain, he knows right away
> how the filters are going to look like - what type and option will they
> have. For example, he knows that he will only insert filters of type
> flower matching destination IP address. He can specify a template that
> would cover all the filters in the chain.

Perhaps it's lack of sleep, but this paragraph threw me a little off
the track.  IIUC the goal of this set is to provide a way to inform the
HW about expected matches before any rule is programmed into the HW.
Not before any rule is added to a particular chain.  One can just use
the first rule in the chain to make a guess about the chain, but thanks
to this set user can configure *all* chains before any rules are added.

And that's needed because once any rule is added the tcam config can no
longer be easily modified?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox