netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net/mlx5e: Fix zero table prio set by user.
@ 2019-07-25 11:24 wenxu
  2019-07-25 21:22 ` Saeed Mahameed
  0 siblings, 1 reply; 6+ messages in thread
From: wenxu @ 2019-07-25 11:24 UTC (permalink / raw)
  To: saeedm; +Cc: netdev

From: wenxu <wenxu@ucloud.cn>

The flow_cls_common_offload prio is zero

It leads the invalid table prio in hw.

Error: Could not process rule: Invalid argument

kernel log:
mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22 (table prio: 65535, level: 0, size: 4194304)

table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
should check (chain * FDB_MAX_PRIO) + prio is not 0

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 089ae4d..64ca90f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 		flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
 			  MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
 
-	table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
+	table_prio = (chain * FDB_MAX_PRIO) + prio;
+	if (table_prio)
+		table_prio = table_prio - 1;
 
 	/* create earlier levels for correct fs_core lookup when
 	 * connecting tables
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] net/mlx5e: Fix zero table prio set by user.
  2019-07-25 11:24 [PATCH] net/mlx5e: Fix zero table prio set by user wenxu
@ 2019-07-25 21:22 ` Saeed Mahameed
  2019-07-26 12:19   ` Or Gerlitz
  0 siblings, 1 reply; 6+ messages in thread
From: Saeed Mahameed @ 2019-07-25 21:22 UTC (permalink / raw)
  To: wenxu@ucloud.cn, Roi Dayan, Or Gerlitz, Mark Bloch, Paul Blakey
  Cc: netdev@vger.kernel.org

On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> The flow_cls_common_offload prio is zero
> 
> It leads the invalid table prio in hw.
> 
> Error: Could not process rule: Invalid argument
> 
> kernel log:
> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
> (table prio: 65535, level: 0, size: 4194304)
> 
> table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
> should check (chain * FDB_MAX_PRIO) + prio is not 0
> 
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git
> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 089ae4d..64ca90f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct 

this piece of code isn't in this function, weird how it got to the
diff, patch applies correctly though !

> mlx5_eswitch *esw)
>  		flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
>  			  MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>  
> -	table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
> +	table_prio = (chain * FDB_MAX_PRIO) + prio;
> +	if (table_prio)
> +		table_prio = table_prio - 1;
>  

This is black magic, even before this fix.
this -1 seems to be needed in order to call
create_next_size_table(table_prio) with the previous "table prio" ?
(table_prio - 1)  ?

The whole thing looks wrong to me since when prio is 0 and chain is 0,
there is not such thing table_prio - 1.

mlnx eswitch guys in the cc, please advise.

Thanks,
Saeed.

>  	/* create earlier levels for correct fs_core lookup when
>  	 * connecting tables

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net/mlx5e: Fix zero table prio set by user.
  2019-07-25 21:22 ` Saeed Mahameed
@ 2019-07-26 12:19   ` Or Gerlitz
  2019-07-26 12:39     ` wenxu
  0 siblings, 1 reply; 6+ messages in thread
From: Or Gerlitz @ 2019-07-26 12:19 UTC (permalink / raw)
  To: Saeed Mahameed, wenxu@ucloud.cn
  Cc: Roi Dayan, Mark Bloch, Paul Blakey, netdev@vger.kernel.org

On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
> > From: wenxu <wenxu@ucloud.cn>
> >
> > The flow_cls_common_offload prio is zero
> >
> > It leads the invalid table prio in hw.
> >
> > Error: Could not process rule: Invalid argument
> >
> > kernel log:
> > mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
> > (table prio: 65535, level: 0, size: 4194304)
> >
> > table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
> > should check (chain * FDB_MAX_PRIO) + prio is not 0
> >
> > Signed-off-by: wenxu <wenxu@ucloud.cn>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git
> > a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > index 089ae4d..64ca90f 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct
>
> this piece of code isn't in this function, weird how it got to the
> diff, patch applies correctly though !
>
> > mlx5_eswitch *esw)
> >               flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
> >                         MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
> >
> > -     table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
> > +     table_prio = (chain * FDB_MAX_PRIO) + prio;
> > +     if (table_prio)
> > +             table_prio = table_prio - 1;
> >
>
> This is black magic, even before this fix.
> this -1 seems to be needed in order to call
> create_next_size_table(table_prio) with the previous "table prio" ?
> (table_prio - 1)  ?
>
> The whole thing looks wrong to me since when prio is 0 and chain is 0,
> there is not such thing table_prio - 1.
>
> mlnx eswitch guys in the cc, please advise.

basically, prio 0 is not something we ever get in the driver, since if
user space
specifies 0, the kernel generates some random non-zero prio, and we support
only prios 1-16 -- Wenxu -- what do you run to get this error?



>
> Thanks,
> Saeed.
>
> >       /* create earlier levels for correct fs_core lookup when
> >        * connecting tables

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net/mlx5e: Fix zero table prio set by user.
  2019-07-26 12:19   ` Or Gerlitz
@ 2019-07-26 12:39     ` wenxu
  2019-07-26 14:01       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 6+ messages in thread
From: wenxu @ 2019-07-26 12:39 UTC (permalink / raw)
  To: Or Gerlitz, Saeed Mahameed
  Cc: Roi Dayan, Mark Bloch, Paul Blakey, netdev@vger.kernel.org


在 2019/7/26 20:19, Or Gerlitz 写道:
> On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>> On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
>>> From: wenxu <wenxu@ucloud.cn>
>>>
>>> The flow_cls_common_offload prio is zero
>>>
>>> It leads the invalid table prio in hw.
>>>
>>> Error: Could not process rule: Invalid argument
>>>
>>> kernel log:
>>> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
>>> (table prio: 65535, level: 0, size: 4194304)
>>>
>>> table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>> should check (chain * FDB_MAX_PRIO) + prio is not 0
>>>
>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git
>>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>> index 089ae4d..64ca90f 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct
>> this piece of code isn't in this function, weird how it got to the
>> diff, patch applies correctly though !
>>
>>> mlx5_eswitch *esw)
>>>               flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
>>>                         MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>>>
>>> -     table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>> +     table_prio = (chain * FDB_MAX_PRIO) + prio;
>>> +     if (table_prio)
>>> +             table_prio = table_prio - 1;
>>>
>> This is black magic, even before this fix.
>> this -1 seems to be needed in order to call
>> create_next_size_table(table_prio) with the previous "table prio" ?
>> (table_prio - 1)  ?
>>
>> The whole thing looks wrong to me since when prio is 0 and chain is 0,
>> there is not such thing table_prio - 1.
>>
>> mlnx eswitch guys in the cc, please advise.
> basically, prio 0 is not something we ever get in the driver, since if
> user space
> specifies 0, the kernel generates some random non-zero prio, and we support
> only prios 1-16 -- Wenxu -- what do you run to get this error?
>
>
I run offload with nfatbles(but not tc), there is no prio for each rule.

prio of flow_cls_common_offload init as 0.

static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,

                     __be16 proto,
                    struct netlink_ext_ack *extack)
{
    common->protocol = proto;
    common->extack = extack;
}


flow_cls_common_offload


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net/mlx5e: Fix zero table prio set by user.
  2019-07-26 12:39     ` wenxu
@ 2019-07-26 14:01       ` Marcelo Ricardo Leitner
  2019-07-28 10:04         ` Paul Blakey
  0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-07-26 14:01 UTC (permalink / raw)
  To: wenxu
  Cc: Or Gerlitz, Saeed Mahameed, Roi Dayan, Mark Bloch, Paul Blakey,
	pablo, netdev@vger.kernel.org

On Fri, Jul 26, 2019 at 08:39:43PM +0800, wenxu wrote:
> 
> 在 2019/7/26 20:19, Or Gerlitz 写道:
> > On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> >> On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
> >>> From: wenxu <wenxu@ucloud.cn>
> >>>
> >>> The flow_cls_common_offload prio is zero
> >>>
> >>> It leads the invalid table prio in hw.
> >>>
> >>> Error: Could not process rule: Invalid argument
> >>>
> >>> kernel log:
> >>> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
> >>> (table prio: 65535, level: 0, size: 4194304)
> >>>
> >>> table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
> >>> should check (chain * FDB_MAX_PRIO) + prio is not 0
> >>>
> >>> Signed-off-by: wenxu <wenxu@ucloud.cn>
> >>> ---
> >>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git
> >>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >>> index 089ae4d..64ca90f 100644
> >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >>> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct
> >> this piece of code isn't in this function, weird how it got to the
> >> diff, patch applies correctly though !
> >>
> >>> mlx5_eswitch *esw)
> >>>               flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
> >>>                         MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
> >>>
> >>> -     table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
> >>> +     table_prio = (chain * FDB_MAX_PRIO) + prio;
> >>> +     if (table_prio)
> >>> +             table_prio = table_prio - 1;
> >>>
> >> This is black magic, even before this fix.
> >> this -1 seems to be needed in order to call
> >> create_next_size_table(table_prio) with the previous "table prio" ?
> >> (table_prio - 1)  ?
> >>
> >> The whole thing looks wrong to me since when prio is 0 and chain is 0,
> >> there is not such thing table_prio - 1.
> >>
> >> mlnx eswitch guys in the cc, please advise.
> > basically, prio 0 is not something we ever get in the driver, since if
> > user space
> > specifies 0, the kernel generates some random non-zero prio, and we support
> > only prios 1-16 -- Wenxu -- what do you run to get this error?
> >
> >
> I run offload with nfatbles(but not tc), there is no prio for each rule.
> 
> prio of flow_cls_common_offload init as 0.
> 
> static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
> 
>                      __be16 proto,
>                     struct netlink_ext_ack *extack)
> {
>     common->protocol = proto;
>     common->extack = extack;
> }
> 
> 
> flow_cls_common_offload

Note that on
[PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
I asked Pablo on how nftables should behave on this situation.

It's the same issue as in the patch above but being fixed at a
different level.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] net/mlx5e: Fix zero table prio set by user.
  2019-07-26 14:01       ` Marcelo Ricardo Leitner
@ 2019-07-28 10:04         ` Paul Blakey
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Blakey @ 2019-07-28 10:04 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, wenxu
  Cc: Or Gerlitz, Saeed Mahameed, Roi Dayan, Mark Bloch,
	pablo@netfilter.org, netdev@vger.kernel.org


On 7/26/2019 5:01 PM, Marcelo Ricardo Leitner wrote:
> On Fri, Jul 26, 2019 at 08:39:43PM +0800, wenxu wrote:
>>
>> 在 2019/7/26 20:19, Or Gerlitz 写道:
>>> On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>>>> On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>
>>>>> The flow_cls_common_offload prio is zero
>>>>>
>>>>> It leads the invalid table prio in hw.
>>>>>
>>>>> Error: Could not process rule: Invalid argument
>>>>>
>>>>> kernel log:
>>>>> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
>>>>> (table prio: 65535, level: 0, size: 4194304)
>>>>>
>>>>> table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>>>> should check (chain * FDB_MAX_PRIO) + prio is not 0
>>>>>
>>>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>>>> ---
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git
>>>>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> index 089ae4d..64ca90f 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct
>>>> this piece of code isn't in this function, weird how it got to the
>>>> diff, patch applies correctly though !
>>>>
>>>>> mlx5_eswitch *esw)
>>>>>               flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
>>>>>                         MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>>>>>
>>>>> -     table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>>>> +     table_prio = (chain * FDB_MAX_PRIO) + prio;
>>>>> +     if (table_prio)
>>>>> +             table_prio = table_prio - 1;
>>>>>
>>>> This is black magic, even before this fix.
>>>> this -1 seems to be needed in order to call
>>>> create_next_size_table(table_prio) with the previous "table prio" ?
>>>> (table_prio - 1)  ?
>>>>
>>>> The whole thing looks wrong to me since when prio is 0 and chain is 0,
>>>> there is not such thing table_prio - 1.
>>>>
>>>> mlnx eswitch guys in the cc, please advise.
>>> basically, prio 0 is not something we ever get in the driver, since if
>>> user space
>>> specifies 0, the kernel generates some random non-zero prio, and we support
>>> only prios 1-16 -- Wenxu -- what do you run to get this error?
>>>
>>>
>> I run offload with nfatbles(but not tc), there is no prio for each rule.
>>
>> prio of flow_cls_common_offload init as 0.
>>
>> static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
>>
>>                      __be16 proto,
>>                     struct netlink_ext_ack *extack)
>> {
>>     common->protocol = proto;
>>     common->extack = extack;
>> }
>>
>>
>> flow_cls_common_offload
>
> Note that on
> [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
> I asked Pablo on how nftables should behave on this situation.
>
> It's the same issue as in the patch above but being fixed at a
> different level.

That's better, since the original code relied on not having prio 0 as valid, the suggested fix (net/mlx5e: Fix zero table prio set by user) maps NFT offload prio 0 and tc prio 1 to the same

hardware table. This is wrong and can cause issues.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-28 10:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-25 11:24 [PATCH] net/mlx5e: Fix zero table prio set by user wenxu
2019-07-25 21:22 ` Saeed Mahameed
2019-07-26 12:19   ` Or Gerlitz
2019-07-26 12:39     ` wenxu
2019-07-26 14:01       ` Marcelo Ricardo Leitner
2019-07-28 10:04         ` Paul Blakey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).