Netdev List
 help / color / mirror / Atom feed
* CONGRATULATIONS !!!
From: Cheveron Texaco @ 2016-09-22 18:59 UTC (permalink / raw)




Attention Winner,

Congratulations !!! Your email Address has won you a substantial amount of 1 Million United States dollars in the just concluded Chevron Texaco Promo-Lottery.

You are to contact your claim agent for claims process at Email:   chevron.claimsagent1401@gmail.com


Best Regards
Ms Stella Newton

^ permalink raw reply

* Re: [PATCH net] net: rtnl_register in net_ns_init need rtnl_lock
From: Hannes Frederic Sowa @ 2016-09-22 19:02 UTC (permalink / raw)
  To: Cong Wang; +Cc: Eric Dumazet, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUsYLWxWvcuVzHt=Q9hRjZ1vKJHWjS6ZKWc0rjdtdOo8w@mail.gmail.com>

On Thu, Sep 22, 2016, at 19:20, Cong Wang wrote:
> > I don't think it is a big issue but wanted the writes to the
> > rtnl_msg_handlers array to be strictly serialized. I was working on
> > adding this to other places, too. Maybe better for net-next even?
> 
> But they are called during boot, why is it possible to have a parallel
> reader/writer at that time?

It also happens during module load time, which isn't synchronized with
rtnl_lock.

Bye,
Hannes

^ permalink raw reply

* [PATCH v3] tcp: fix wrong checksum calculation on MTU probing
From: Douglas Caetano dos Santos @ 2016-09-22 18:52 UTC (permalink / raw)
  To: Sergei Shtylyov, David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <5ffc1020-a36f-75b4-ff25-4f58c243f803@cogentembedded.com>

With TCP MTU probing enabled and offload TX checksumming disabled,
tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
into the probe's SKB had an odd length. This was caused by the direct use
of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
fragment being copied, if needed. When this fragment was not the last, a
subsequent call used the previous checksum without considering this
padding.

The effect was a stale connection in one way, as even retransmissions
wouldn't solve the problem, because the checksum was never recalculated for
the full SKB length.

Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
---
 net/ipv4/tcp_output.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f53d0cc..2d32952 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1966,12 +1966,14 @@ static int tcp_mtu_probe(struct sock *sk)
 	len = 0;
 	tcp_for_write_queue_from_safe(skb, next, sk) {
 		copy = min_t(int, skb->len, probe_size - len);
-		if (nskb->ip_summed)
+		if (nskb->ip_summed) {
 			skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
-		else
-			nskb->csum = skb_copy_and_csum_bits(skb, 0,
-							    skb_put(nskb, copy),
-							    copy, nskb->csum);
+		} else {
+			__wsum csum = skb_copy_and_csum_bits(skb, 0,
+							     skb_put(nskb, copy),
+							     copy, 0);
+			nskb->csum = csum_block_add(nskb->csum, csum, len);
+		}

 		if (skb->len <= copy) {
 			/* We've eaten all the data from this skb.
-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Eric Dumazet @ 2016-09-22 18:42 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, Jamal Hadi Salim, WANG Cong, Eric Dumazet,
	netdev
In-Reply-To: <20160922212745.789734c5@halley>

On Thu, 2016-09-22 at 21:27 +0300, Shmulik Ladkani wrote:
> On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Hmm... we probably need to apply the full rcu protection before this
> > patch.
> > 
> > https://patchwork.ozlabs.org/patch/667680/
> 
> Are you referring to order of application into net-next?
> 
> This patch seems to present no new tcf_mirred_params members nor
> need-to-be-protected code regions (please, correct me if wrong).
> So it does not _depend_ on the 'full rcu fixes', does it?

No, simply a reminder that we run lockless there, so you might need to
read some control variables once, and in a consistent way.

(Or a concurrent writer could change params in the middle of the
function)

^ permalink raw reply

* Re: [PATCH v3 2/2] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates
From: Vishwanath Pai @ 2016-09-22 18:39 UTC (permalink / raw)
  To: Jan Engelhardt, pablo
  Cc: kaber, kadlec, johunt, netfilter-devel, coreteam, netdev,
	pai.vishwain
In-Reply-To: <alpine.LSU.2.20.1609221850530.27197@nerf40.vanv.qr>

Thanks for pointing this out, I will reorder the fields to:

struct hashlimit_cfg2 {
	__u64 avg;    /* Average secs between packets * scale */
	__u64 burst;
	__u32 mode; /* bitmask of XT_HASHLIMIT_HASH_* */

This should fix the hole and avoid padding.

-Vishwanath

On 09/22/2016 12:53 PM, Jan Engelhardt wrote:
> On Thursday 2016-09-22 18:43, Vishwanath Pai wrote:
>> >+struct hashlimit_cfg2 {
>> >+	__u32 mode;	  /* bitmask of XT_HASHLIMIT_HASH_* */
>> >+	__u64 avg;    /* Average secs between packets * scale */
>> >+	__u64 burst;  /* Period multiplier for upper limit. */
> This would have different sizes between i386 and x86_64,
> necessiting additional compat functions. It should be padded
> or reordered instead.
> 


^ permalink raw reply

* [PATCH v1] bpf: Set register type according to is_valid_access()
From: Mickaël Salaün @ 2016-09-22 18:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Alexei Starovoitov, Andy Lutomirski,
	Daniel Borkmann, Kees Cook, Sargun Dhillon, Tejun Heo, netdev

This fix a pointer leak when an unprivileged eBPF program read a pointer
value from the context. Even if is_valid_access() returns a pointer
type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
value containing an address is then allowed to leak. Moreover, this
prevented unprivileged eBPF programs to use functions with (legitimate)
pointer arguments.

This bug is not an issue for now because the only unprivileged eBPF
program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
from its context are UNKNOWN_VALUE. However, this fix is important for
future unprivileged eBPF program types which could use pointers in their
context.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Sargun Dhillon <sargun@sargun.me>
---
 kernel/bpf/verifier.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index daea765d72e6..0698ccd67715 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off,
 		}
 		err = check_ctx_access(env, off, size, t, &reg_type);
 		if (!err && t == BPF_READ && value_regno >= 0) {
-			mark_reg_unknown_value(state->regs, value_regno);
-			if (env->allow_ptr_leaks)
-				/* note that reg.[id|off|range] == 0 */
-				state->regs[value_regno].type = reg_type;
+			/* note that reg.[id|off|range] == 0 */
+			state->regs[value_regno].type = reg_type;
 		}
 
 	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Shmulik Ladkani @ 2016-09-22 18:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Jamal Hadi Salim, WANG Cong, Eric Dumazet,
	netdev, Shmulik Ladkani
In-Reply-To: <1474556053.23058.111.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Hmm... we probably need to apply the full rcu protection before this
> patch.
> 
> https://patchwork.ozlabs.org/patch/667680/

Are you referring to order of application into net-next?

This patch seems to present no new tcf_mirred_params members nor
need-to-be-protected code regions (please, correct me if wrong).
So it does not _depend_ on the 'full rcu fixes', does it?

Thanks,
Shmulik

^ permalink raw reply

* Re: XDP (eXpress Data Path) documentation
From: Jesper Dangaard Brouer via iovisor-dev @ 2016-09-22 18:16 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org
  Cc: Nathan Willis, Alexei Starovoitov, Tom Herbert, Jonathan Corbet,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Saeed Mahameed, Edward Cree
In-Reply-To: <20160920110844.661965be-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


On Tue, 20 Sep 2016 11:08:44 +0200 Jesper Dangaard Brouer <brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> As promised, I've started documenting the XDP eXpress Data Path):
> 
>  [1] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/index.html
> 
> IMHO the documentation have reached a stage where it is useful for the
> XDP project, BUT I request collaboration on improving the documentation
> from all. (Native English speakers are encouraged to send grammar fixes ;-))

I want to publicly thanks Edward Cree for being the first contributor
to the XDP documentation with formulation and grammar fixes.

Pulled and pushed:
 https://github.com/netoptimizer/prototype-kernel/commit/fb6a3de95

Thanks!
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH v2 iproute2 net-next] tc: m_vlan: Add vlan modify action
From: Shmulik Ladkani @ 2016-09-22 18:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko, Shmulik Ladkani

The 'vlan modify' action allows to replace an existing 802.1q tag
according to user provided settings.
It accepts same arguments as the 'vlan push' action.

For example, this replaces vid 6 with vid 5:

 # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q \
      basic match 'meta(vlan mask 0xfff eq 6)' \
      action vlan modify id 5 continue

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 v2: Coding. No need to encapsule action_names[] access into a function

 include/linux/tc_act/tc_vlan.h |  1 +
 man/man8/tc-vlan.8             | 25 +++++++++++++++++++------
 tc/m_vlan.c                    | 40 +++++++++++++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index be72b6e384..bddb272b84 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -16,6 +16,7 @@
 
 #define TCA_VLAN_ACT_POP	1
 #define TCA_VLAN_ACT_PUSH	2
+#define TCA_VLAN_ACT_MODIFY	3
 
 struct tc_vlan {
 	tc_gen;
diff --git a/man/man8/tc-vlan.8 b/man/man8/tc-vlan.8
index 4d0c5c8a15..af3de1c54e 100644
--- a/man/man8/tc-vlan.8
+++ b/man/man8/tc-vlan.8
@@ -6,7 +6,7 @@ vlan - vlan manipulation module
 .in +8
 .ti -8
 .BR tc " ... " "action vlan" " { " pop " |"
-.IR PUSH " } [ " CONTROL " ]"
+.IR PUSH " | " MODIFY " } [ " CONTROL " ]"
 
 .ti -8
 .IR PUSH " := "
@@ -17,22 +17,30 @@ vlan - vlan manipulation module
 .BI id " VLANID"
 
 .ti -8
+.IR MODIFY " := "
+.BR modify " [ " protocol
+.IR VLANPROTO " ]"
+.BR " [ " priority
+.IR VLANPRIO " ] "
+.BI id " VLANID"
+
+.ti -8
 .IR CONTROL " := { "
 .BR reclassify " | " pipe " | " drop " | " continue " | " pass " }"
 .SH DESCRIPTION
 The
 .B vlan
 action allows to perform 802.1Q en- or decapsulation on a packet, reflected by
-the two operation modes
-.IR POP " and " PUSH .
+the operation modes
+.IR POP ", " PUSH " and " MODIFY .
 The
 .I POP
 mode is simple, as no further information is required to just drop the
 outer-most VLAN encapsulation. The
-.I PUSH
-mode on the other hand requires at least a
+.IR PUSH " and " MODIFY
+modes require at least a
 .I VLANID
-and allows to optionally choose the
+and allow to optionally choose the
 .I VLANPROTO
 to use.
 .SH OPTIONS
@@ -45,6 +53,11 @@ Encapsulation mode. Requires at least
 .B id
 option.
 .TP
+.B modify
+Replace mode. Existing 802.1Q tag is replaced. Requires at least
+.B id
+option.
+.TP
 .BI id " VLANID"
 Specify the VLAN ID to encapsulate into.
 .I VLANID
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index 05a63b48f1..b32f746015 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -19,10 +19,17 @@
 #include "tc_util.h"
 #include <linux/tc_act/tc_vlan.h>
 
+static const char * const action_names[] = {
+	[TCA_VLAN_ACT_POP] = "pop",
+	[TCA_VLAN_ACT_PUSH] = "push",
+	[TCA_VLAN_ACT_MODIFY] = "modify",
+};
+
 static void explain(void)
 {
 	fprintf(stderr, "Usage: vlan pop\n");
 	fprintf(stderr, "       vlan push [ protocol VLANPROTO ] id VLANID [ priority VLANPRIO ] [CONTROL]\n");
+	fprintf(stderr, "       vlan modify [ protocol VLANPROTO ] id VLANID [ priority VLANPRIO ] [CONTROL]\n");
 	fprintf(stderr, "       VLANPROTO is one of 802.1Q or 802.1AD\n");
 	fprintf(stderr, "            with default: 802.1Q\n");
 	fprintf(stderr, "       CONTROL := reclassify | pipe | drop | continue | pass\n");
@@ -34,6 +41,11 @@ static void usage(void)
 	exit(-1);
 }
 
+static bool has_push_attribs(int action)
+{
+	return action == TCA_VLAN_ACT_PUSH || action == TCA_VLAN_ACT_MODIFY;
+}
+
 static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 		      int tca_id, struct nlmsghdr *n)
 {
@@ -71,9 +83,17 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 				return -1;
 			}
 			action = TCA_VLAN_ACT_PUSH;
+		} else if (matches(*argv, "modify") == 0) {
+			if (action) {
+				fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+					*argv);
+				explain();
+				return -1;
+			}
+			action = TCA_VLAN_ACT_MODIFY;
 		} else if (matches(*argv, "id") == 0) {
-			if (action != TCA_VLAN_ACT_PUSH) {
-				fprintf(stderr, "\"%s\" is only valid for push\n",
+			if (!has_push_attribs(action)) {
+				fprintf(stderr, "\"%s\" is only valid for push/modify\n",
 					*argv);
 				explain();
 				return -1;
@@ -83,8 +103,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 				invarg("id is invalid", *argv);
 			id_set = 1;
 		} else if (matches(*argv, "protocol") == 0) {
-			if (action != TCA_VLAN_ACT_PUSH) {
-				fprintf(stderr, "\"%s\" is only valid for push\n",
+			if (!has_push_attribs(action)) {
+				fprintf(stderr, "\"%s\" is only valid for push/modify\n",
 					*argv);
 				explain();
 				return -1;
@@ -94,8 +114,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 				invarg("protocol is invalid", *argv);
 			proto_set = 1;
 		} else if (matches(*argv, "priority") == 0) {
-			if (action != TCA_VLAN_ACT_PUSH) {
-				fprintf(stderr, "\"%s\" is only valid for push\n",
+			if (!has_push_attribs(action)) {
+				fprintf(stderr, "\"%s\" is only valid for push/modify\n",
 					*argv);
 				explain();
 				return -1;
@@ -129,8 +149,9 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 		}
 	}
 
-	if (action == TCA_VLAN_ACT_PUSH && !id_set) {
-		fprintf(stderr, "id needs to be set for push\n");
+	if (has_push_attribs(action) && !id_set) {
+		fprintf(stderr, "id needs to be set for %s\n",
+			action_names[action]);
 		explain();
 		return -1;
 	}
@@ -186,7 +207,8 @@ static int print_vlan(struct action_util *au, FILE *f, struct rtattr *arg)
 		fprintf(f, " pop");
 		break;
 	case TCA_VLAN_ACT_PUSH:
-		fprintf(f, " push");
+	case TCA_VLAN_ACT_MODIFY:
+		fprintf(f, " %s", action_names[parm->v_action]);
 		if (tb[TCA_VLAN_PUSH_VLAN_ID]) {
 			val = rta_getattr_u16(tb[TCA_VLAN_PUSH_VLAN_ID]);
 			fprintf(f, " id %u", val);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2] fs/select: add vmalloc fallback for select(2)
From: Vlastimil Babka @ 2016-09-22 17:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Viro, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Michal Hocko, netdev, Linux API, linux-man
In-Reply-To: <1474564068.23058.144.camel@edumazet-glaptop3.roam.corp.google.com>

On 09/22/2016 07:07 PM, Eric Dumazet wrote:
> On Thu, 2016-09-22 at 18:56 +0200, Vlastimil Babka wrote:
>> On 09/22/2016 06:49 PM, Eric Dumazet wrote:
>> > On Thu, 2016-09-22 at 18:43 +0200, Vlastimil Babka wrote:
>> >> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
>> >> with the number of fds passed. We had a customer report page allocation
>> >> failures of order-4 for this allocation. This is a costly order, so it might
>> >> easily fail, as the VM expects such allocation to have a lower-order fallback.
>> >>
>> >> Such trivial fallback is vmalloc(), as the memory doesn't have to be
>> >> physically contiguous. Also the allocation is temporary for the duration of the
>> >> syscall, so it's unlikely to stress vmalloc too much.
>> >
>> > vmalloc() uses a vmap_area_lock spinlock, and TLB flushes.
>> >
>> > So I guess allowing vmalloc() being called from an innocent application
>> > doing a select() might be dangerous, especially if this select() happens
>> > thousands of time per second.
>>
>> Isn't seq_buf_alloc() similarly exposed? And ipc_alloc()?
>
> Possibly.
>
> We don't have a library function (attempting kmalloc(), fallback to
> vmalloc() presumably to avoid abuses, but I guess some patches were
> accepted without thinking about this.

So in the case of select() it seems like the memory we need 6 bits per file 
descriptor, multiplied by the highest possible file descriptor (nfds) as passed 
to the syscall. According to the man page of select:

        EINVAL nfds is negative or exceeds the RLIMIT_NOFILE resource limit (see 
getrlimit(2)).

The code actually seems to silently cap the value instead of returning EINVAL 
though? (IIUC):

        /* max_fds can increase, so grab it once to avoid race */
         rcu_read_lock();
         fdt = files_fdtable(current->files);
         max_fds = fdt->max_fds;
         rcu_read_unlock();
         if (n > max_fds)
                 n = max_fds;

The default for this cap seems to be 1024 where I checked (again, IIUC, it's 
what ulimit -n returns?). I wasn't able to change it to more than 2048, which 
makes the bitmaps still below PAGE_SIZE.

So if I get that right, the system admin would have to allow really large 
RLIMIT_NOFILE to even make vmalloc() possible here. So I don't see it as a large 
concern?

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH][V2] cxgb4: fix signed wrap around when decrementing index idx
From: Colin King @ 2016-09-22 17:48 UTC (permalink / raw)
  To: Hariprasad S, netdev; +Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Change predecrement compare to post decrement compare to avoid an
unsigned integer wrap-around comparison when decrementing idx in
the while loop.

For example, when idx is zero, the current situation will
predecrement idx in the while loop, wrapping idx to the maximum
signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index d12a73e..42a9c8d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -367,7 +367,7 @@ int request_msix_queue_irqs_uld(struct adapter *adap, unsigned int uld_type)
 	}
 	return 0;
 unwind:
-	while (--idx >= 0) {
+	while (idx-- > 0) {
 		bmap_idx = rxq_info->msix_tbl[idx];
 		free_msix_idx_in_bmap(adap, bmap_idx);
 		free_irq(adap->msix_info_ulds[bmap_idx].vec,
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH iproute2 net-next] tc: m_vlan: Add vlan modify action
From: Shmulik Ladkani @ 2016-09-22 17:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko, Shmulik Ladkani
In-Reply-To: <20160922090504.3eaa5266@xeon-e3>

On Thu, 22 Sep 2016 09:05:04 -0700 Stephen Hemminger <stephen@networkplumber.org> wrote:
> On Thu, 22 Sep 2016 12:31:10 +0300
> Shmulik Ladkani <shmulik.ladkani@ravellosystems.com> wrote:
> 
> > +
> > +static const char *action_name(int action)
> > +{
> > +	static const char * const names[] = {
> > +		[TCA_VLAN_ACT_POP] = "pop",
> > +		[TCA_VLAN_ACT_PUSH] = "push",
> > +		[TCA_VLAN_ACT_MODIFY] = "modify",
> > +	};
> > +	return names[action];
> > +}
> > +  
> 
> Why are you wrapping a simple array lookup in a function?

No reason in particular, was probably code evolution, will amend, thanks.

^ permalink raw reply

* Re: [PATCH 2/2] net: thunderx: Support for byte queue limits
From: Florian Fainelli @ 2016-09-22 17:41 UTC (permalink / raw)
  To: sunil.kovvuri, netdev; +Cc: Sunil Goutham, linux-kernel, linux-arm-kernel
In-Reply-To: <1474535121-13958-3-git-send-email-sunil.kovvuri@gmail.com>

On 09/22/2016 02:05 AM, sunil.kovvuri@gmail.com wrote:
> From: Sunil Goutham <sgoutham@cavium.com>
> 
> This patch adds support for byte queue limits
> 
> Signed-off-by: Sunil Goutham <sgoutham@cavium.com>

Where is the code that calls netdev_tx_reset_queue()? This is needed in
the function that brings down the interface, did your test survive a
up/down/up sequence?

> ---
>  drivers/net/ethernet/cavium/thunder/nicvf_main.c   | 11 ++++++--
>  drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 30 ++++++++++++++--------
>  2 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> index 7d00162..453e3a0 100644
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> @@ -516,7 +516,8 @@ static int nicvf_init_resources(struct nicvf *nic)
>  static void nicvf_snd_pkt_handler(struct net_device *netdev,
>  				  struct cmp_queue *cq,
>  				  struct cqe_send_t *cqe_tx,
> -				  int cqe_type, int budget)
> +				  int cqe_type, int budget,
> +				  unsigned int *tx_pkts, unsigned int *tx_bytes)
>  {
>  	struct sk_buff *skb = NULL;
>  	struct nicvf *nic = netdev_priv(netdev);
> @@ -547,6 +548,8 @@ static void nicvf_snd_pkt_handler(struct net_device *netdev,
>  		}
>  		nicvf_put_sq_desc(sq, hdr->subdesc_cnt + 1);
>  		prefetch(skb);
> +		(*tx_pkts)++;
> +		*tx_bytes += skb->len;
>  		napi_consume_skb(skb, budget);
>  		sq->skbuff[cqe_tx->sqe_ptr] = (u64)NULL;
>  	} else {
> @@ -662,6 +665,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,
>  	struct cmp_queue *cq = &qs->cq[cq_idx];
>  	struct cqe_rx_t *cq_desc;
>  	struct netdev_queue *txq;
> +	unsigned int tx_pkts = 0, tx_bytes = 0;
>  
>  	spin_lock_bh(&cq->lock);
>  loop:
> @@ -701,7 +705,7 @@ loop:
>  		case CQE_TYPE_SEND:
>  			nicvf_snd_pkt_handler(netdev, cq,
>  					      (void *)cq_desc, CQE_TYPE_SEND,
> -					      budget);
> +					      budget, &tx_pkts, &tx_bytes);
>  			tx_done++;
>  		break;
>  		case CQE_TYPE_INVALID:
> @@ -730,6 +734,9 @@ done:
>  		netdev = nic->pnicvf->netdev;
>  		txq = netdev_get_tx_queue(netdev,
>  					  nicvf_netdev_qidx(nic, cq_idx));
> +		if (tx_pkts)
> +			netdev_tx_completed_queue(txq, tx_pkts, tx_bytes);
> +
>  		nic = nic->pnicvf;
>  		if (netif_tx_queue_stopped(txq) && netif_carrier_ok(netdev)) {
>  			netif_tx_start_queue(txq);
> diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> index 178c5c7..a4fc501 100644
> --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
> @@ -1082,6 +1082,24 @@ static inline void nicvf_sq_add_cqe_subdesc(struct snd_queue *sq, int qentry,
>  	imm->len = 1;
>  }
>  
> +static inline void nicvf_sq_doorbell(struct nicvf *nic, struct sk_buff *skb,
> +				     int sq_num, int desc_cnt)
> +{
> +	struct netdev_queue *txq;
> +
> +	txq = netdev_get_tx_queue(nic->pnicvf->netdev,
> +				  skb_get_queue_mapping(skb));
> +
> +	netdev_tx_sent_queue(txq, skb->len);
> +
> +	/* make sure all memory stores are done before ringing doorbell */
> +	smp_wmb();
> +
> +	/* Inform HW to xmit all TSO segments */
> +	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_DOOR,
> +			      sq_num, desc_cnt);
> +}
> +
>  /* Segment a TSO packet into 'gso_size' segments and append
>   * them to SQ for transfer
>   */
> @@ -1141,12 +1159,8 @@ static int nicvf_sq_append_tso(struct nicvf *nic, struct snd_queue *sq,
>  	/* Save SKB in the last segment for freeing */
>  	sq->skbuff[hdr_qentry] = (u64)skb;
>  
> -	/* make sure all memory stores are done before ringing doorbell */
> -	smp_wmb();
> +	nicvf_sq_doorbell(nic, skb, sq_num, desc_cnt);
>  
> -	/* Inform HW to xmit all TSO segments */
> -	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_DOOR,
> -			      sq_num, desc_cnt);
>  	nic->drv_stats.tx_tso++;
>  	return 1;
>  }
> @@ -1219,12 +1233,8 @@ doorbell:
>  		nicvf_sq_add_cqe_subdesc(sq, qentry, tso_sqe, skb);
>  	}
>  
> -	/* make sure all memory stores are done before ringing doorbell */
> -	smp_wmb();
> +	nicvf_sq_doorbell(nic, skb, sq_num, subdesc_cnt);
>  
> -	/* Inform HW to xmit new packet */
> -	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_DOOR,
> -			      sq_num, subdesc_cnt);
>  	return 1;
>  
>  append_fail:
> 


-- 
Florian

^ permalink raw reply

* Fwd: Re: nfs broken on Fedora-24, 32-bit?
From: Ben Greear @ 2016-09-22 17:35 UTC (permalink / raw)
  To: netdev
In-Reply-To: <08ad07ee-72e2-b478-d70d-87b08c24452f@candelatech.com>


This is probably not an NFS specific issue, though I guess possibly it is.

Forwarding to netdev in case someone wants to take a look at it.

Thanks,
Ben


-------- Forwarded Message --------
Subject: Re: nfs broken on Fedora-24, 32-bit?
Date: Fri, 16 Sep 2016 16:31:51 -0700
From: Ben Greear <greearb@candelatech.com>
Organization: Candela Technologies
To: Trond Myklebust <trondmy@primarydata.com>
CC: List Linux NFS Mailing <linux-nfs@vger.kernel.org>

On 09/15/2016 04:06 PM, Ben Greear wrote:
> On 09/15/2016 04:00 PM, Trond Myklebust wrote:
>> Hi Ben,
>>
>>> On Sep 15, 2016, at 18:32, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> I have a Fedora-24 machine mounting an NFS server running Fedora-13 (kernel 2.6.34.9-69.fc13.x86_64).
>>>
>>> F24 machine has this in /etc/fstab:
>>>
>>> 192.168.100.3:/mnt/d2 /mnt/d2                   nfs     nfsvers=3       0 0
>>>
>>> When I copy a file from f24-32 to the F-13 machine, the file size is the same,
>>> but the file is corrupted on the file server.  I see a different md5sum each time.
>>>
>>> Various other systems (F21, F19, etc) can all copy to the F13 machine fine.
>>>
>>> And, F24-64 machine can copy to the F13 machine fine.
>>>
>>> Anyone seen something similar?
>>
>> Do you know if the corruption is happening on the read()s or on the write()s? Do you, for instance get the same corruption if you copy from a local file on
>> the F-24 client to the server? ..or if you copy from a file on the server to a local directory on the F-24 client?
>>
>> Cheers
>>   Trond
>>
>
> Seems to be a write issue:
>
> # This is the nfs server:
>
> [greearb@fs3 candela_cdrom.5.3.5]$ md5sum gua-f21-32
> ad4073fa8b806bb82b85a645e21f5e67  gua-f21-32
> [greearb@fs3 candela_cdrom.5.3.5]$ md5sum ../greearb/tmp/gua-f21-32
> 582bfea0cc8cc52aa38dc0f5048d0156  ../greearb/tmp/gua-f21-32
> [greearb@fs3 candela_cdrom.5.3.5]$
>
>
> # This is the v-f24-32 client:
>
> greearb@v-f24-32 ~]$ cp /mnt/d2/pub/candela_cdrom.5.3.5/gua-f21-32 ./
> [greearb@v-f24-32 ~]$ md5sum gua-f21-32
> ad4073fa8b806bb82b85a645e21f5e67  gua-f21-32
> [greearb@v-f24-32 ~]$ cp gua-f21-32 /mnt/d2/pub/greearb/tmp/
> [greearb@v-f24-32 ~]$ md5sum /mnt/d2/pub/greearb/tmp/gua-f21-32
> ad4073fa8b806bb82b85a645e21f5e67  /mnt/d2/pub/greearb/tmp/gua-f21-32
>
>
> Interesting that the client reads back the file it copied over as if it were correct, but
> it shows up wrong on the nfs server.  Maybe it is just reading a local cache?
>
> Thanks,
> Ben
>

Here is some more info on this:

We can only reproduce this on virtual machines using the KVM infrastructure, and only
when we use the rtl8139 virtual hardware (in bridge mode).  With the e1000 virtual hardware
we cannot reproduce the problem.

Also, multiple different nfs servers (including much newer kernels) all show the same
behaviour with this broken nfs client.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH net] net: rtnl_register in net_ns_init need rtnl_lock
From: Cong Wang @ 2016-09-22 17:20 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Eric Dumazet, Linux Kernel Network Developers
In-Reply-To: <4c1b9aba-e6e8-babc-1bbe-aef2bc0bca53@stressinduktion.org>

On Thu, Sep 22, 2016 at 6:41 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 22.09.2016 15:03, Eric Dumazet wrote:
>> On Thu, 2016-09-22 at 13:03 +0200, Hannes Frederic Sowa wrote:
>>> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>> ---
>>>  net/core/net_namespace.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>>> index 2c2eb1b629b11d..a2ace299f28355 100644
>>> --- a/net/core/net_namespace.c
>>> +++ b/net/core/net_namespace.c
>>> @@ -758,9 +758,11 @@ static int __init net_ns_init(void)
>>>
>>>      register_pernet_subsys(&net_ns_ops);
>>>
>>> +    rtnl_lock();
>>>      rtnl_register(PF_UNSPEC, RTM_NEWNSID, rtnl_net_newid, NULL, NULL);
>>>      rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, rtnl_net_dumpid,
>>>                    NULL);
>>> +    rtnl_unlock();
>>>
>>>      return 0;
>>>  }
>>
>> Hi Hannes
>>
>> Why is this needed here, and not in other places ?
>
> I found this during working on the file and actually saw no live issues
> (belonged to another series which I just split up).
>
> I don't think it is a big issue but wanted the writes to the
> rtnl_msg_handlers array to be strictly serialized. I was working on
> adding this to other places, too. Maybe better for net-next even?

But they are called during boot, why is it possible to have a parallel
reader/writer at that time?

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] add support for RGMII on GMAC0 through TRGMII hardware module
From: Sergei Shtylyov @ 2016-09-22 17:08 UTC (permalink / raw)
  To: David Miller, sean.wang
  Cc: john, nbd, netdev, linux-kernel, linux-mediatek, andrew,
	f.fainelli, keyhaede, objelf
In-Reply-To: <20160922.082247.198689057396545219.davem@davemloft.net>

Hello.

On 09/22/2016 03:22 PM, David Miller wrote:

>> By default, GMAC0 is connected to built-in switch called
>> MT7530 through the proprietary interface called Turbo RGMII
>> (TRGMII). TRGMII also supports well for RGMII as generic external
>> PHY uses but requires some slight changes to the setup of TRGMII
>> and doesn't have well support on current driver.
>>
>> So this patchset
>> 1) provides the slight changes of the setup for RGMII can work
>>    through TRGMII
>> 2) adds additional setting "trgmii" as PHY_INTERFACE_MODE_TRGMII
>>    about phy-mode on device tree to make GMAC0 distinguish which
>>    mode it runs
>> 3) changes dynamically source clock, TX/RX delay and interface
>>    mode on TRGMII for adapting various link
>>
>> Changes since v1:
>> - fixed the style of comment which doesn't have a space at
>>    the beginning and end of comment lines
>> - add support for phy-mode "trgmii" as PHY_INTERFACE_MODE_TRGMII
>>    into linux/phy.h
>> - enhance the Documentation about device tree binding for trgmii
>>   which is applicable only for GMAC0 which uses fixed-link
>
> Series applied.

    Despite my comments? Sigh...

MBR, Sergei

^ permalink raw reply

* Re: [PATCH v2] fs/select: add vmalloc fallback for select(2)
From: Eric Dumazet @ 2016-09-22 17:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alexander Viro, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Michal Hocko, netdev
In-Reply-To: <12efc491-a0e7-1012-5a8b-6d3533c720db@suse.cz>

On Thu, 2016-09-22 at 18:56 +0200, Vlastimil Babka wrote:
> On 09/22/2016 06:49 PM, Eric Dumazet wrote:
> > On Thu, 2016-09-22 at 18:43 +0200, Vlastimil Babka wrote:
> >> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> >> with the number of fds passed. We had a customer report page allocation
> >> failures of order-4 for this allocation. This is a costly order, so it might
> >> easily fail, as the VM expects such allocation to have a lower-order fallback.
> >>
> >> Such trivial fallback is vmalloc(), as the memory doesn't have to be
> >> physically contiguous. Also the allocation is temporary for the duration of the
> >> syscall, so it's unlikely to stress vmalloc too much.
> >
> > vmalloc() uses a vmap_area_lock spinlock, and TLB flushes.
> >
> > So I guess allowing vmalloc() being called from an innocent application
> > doing a select() might be dangerous, especially if this select() happens
> > thousands of time per second.
> 
> Isn't seq_buf_alloc() similarly exposed? And ipc_alloc()?

Possibly.

We don't have a library function (attempting kmalloc(), fallback to
vmalloc() presumably to avoid abuses, but I guess some patches were
accepted without thinking about this.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH net-next 7/9] net/mlx5: E-Switch, Support VLAN actions in the offloads mode
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Many virtualization systems use a policy under which a vlan tag is
pushed to packets sent by guests, and popped before the packet is
forwarded to the VM.

The current generation of the mlx5 HW doesn't fully support that on
a per flow level. As such, we are addressing the above common use
case with the SRIOV e-Switch abilities to push vlan into packets
sent by VFs and pop vlan from packets forwarded to VFs.

The HW can match on the correct vlan being present in packets
forwarded to VFs (eSwitch steering is done before stripping
the tag), so this part is offloaded as is.

A common practice for vlans is to avoid both push vlan and pop vlan
for inter-host VM/VM (east-west) communication because in this case,
push on egress cancels out with pop on ingress.

For supporting that, we use a global eswitch vlan pop policy, hence
allowing guest A to communicate with both remote VM B and local VM C.
This works since the HW pops the vlan only if it exists (e.g for
C --> A packets but not for B --> A packets).

On the slow path, when a VF vport has an offloaded flow which involves
pushing vlans, wheres another flow is not currently offloaded, the
packets from the 2nd flow seen by the VF representor on the host have
vlan. The VF rep driver removes such vlan before calling into the host
networking stack.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  21 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  33 ++++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  15 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 180 +++++++++++++++++++++
 5 files changed, 249 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3460154..460363b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -869,6 +869,7 @@ void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
 int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv);
 void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv);
 int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr);
+void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 
 int mlx5e_create_direct_rqts(struct mlx5e_priv *priv);
 void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b309e7c..c127923 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -446,6 +446,16 @@ static void mlx5e_rq_free_mpwqe_info(struct mlx5e_rq *rq)
 	kfree(rq->mpwqe.info);
 }
 
+static bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
+{
+	struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
+
+	if (rep && rep->vport != FDB_UPLINK_VPORT)
+		return true;
+
+	return false;
+}
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -487,6 +497,11 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 
 	switch (priv->params.rq_wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
+		if (mlx5e_is_vf_vport_rep(priv)) {
+			err = -EINVAL;
+			goto err_rq_wq_destroy;
+		}
+
 		rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
 		rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_mpwqe;
@@ -512,7 +527,11 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 			goto err_rq_wq_destroy;
 		}
 
-		rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
+		if (mlx5e_is_vf_vport_rep(priv))
+			rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
+		else
+			rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
+
 		rq->alloc_wqe = mlx5e_alloc_rx_wqe;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index e836e47..c6de6fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -36,6 +36,7 @@
 #include <net/busy_poll.h>
 #include "en.h"
 #include "en_tc.h"
+#include "eswitch.h"
 
 static inline bool mlx5e_rx_hw_stamp(struct mlx5e_tstamp *tstamp)
 {
@@ -803,6 +804,38 @@ wq_ll_pop:
 		       &wqe->next.next_wqe_index);
 }
 
+void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
+{
+	struct net_device *netdev = rq->netdev;
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5e_rx_wqe *wqe;
+	struct sk_buff *skb;
+	__be16 wqe_counter_be;
+	u16 wqe_counter;
+	u32 cqe_bcnt;
+
+	wqe_counter_be = cqe->wqe_counter;
+	wqe_counter    = be16_to_cpu(wqe_counter_be);
+	wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
+	cqe_bcnt       = be32_to_cpu(cqe->byte_cnt);
+
+	skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt);
+	if (!skb)
+		goto wq_ll_pop;
+
+	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+
+	if (rep->vlan && skb_vlan_tag_present(skb))
+		skb_vlan_pop(skb);
+
+	napi_gro_receive(rq->cq.napi, skb);
+
+wq_ll_pop:
+	mlx5_wq_ll_pop(&rq->wq, wqe_counter_be,
+		       &wqe->next.next_wqe_index);
+}
+
 static inline void mlx5e_mpwqe_fill_rx_skb(struct mlx5e_rq *rq,
 					   struct mlx5_cqe64 *cqe,
 					   struct mlx5e_mpw_info *wi,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index eeeeadc..2e2938e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -157,6 +157,7 @@ struct mlx5_eswitch_fdb {
 			struct mlx5_flow_group *send_to_vport_grp;
 			struct mlx5_flow_group *miss_grp;
 			struct mlx5_flow_rule  *miss_rule;
+			int vlan_push_pop_refcount;
 		} offloads;
 	};
 };
@@ -183,6 +184,8 @@ struct mlx5_eswitch_rep {
 
 	struct mlx5_flow_rule *vport_rx_rule;
 	struct list_head       vport_sqs_list;
+	u16		       vlan;
+	u32		       vlan_refcount;
 	bool		       valid;
 };
 
@@ -252,11 +255,16 @@ enum {
 	SET_VLAN_INSERT	= BIT(1)
 };
 
+#define MLX5_FLOW_CONTEXT_ACTION_VLAN_POP  0x40
+#define MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH 0x80
+
 struct mlx5_esw_flow_attr {
 	struct mlx5_eswitch_rep *in_rep;
 	struct mlx5_eswitch_rep *out_rep;
 
 	int	action;
+	u16	vlan;
+	bool	vlan_handled;
 };
 
 int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
@@ -273,6 +281,13 @@ void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
 void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
 				       int vport_index);
 
+int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
+				 struct mlx5_esw_flow_attr *attr);
+int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
+				 struct mlx5_esw_flow_attr *attr);
+int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
+				  int vport, u16 vlan, u8 qos, u8 set_flags);
+
 #define MLX5_DEBUG_ESWITCH_MASK BIT(3)
 
 #define esw_info(dev, format, ...)				\
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c0d9d1a..c910858 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -89,6 +89,186 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	return rule;
 }
 
+static int esw_set_global_vlan_pop(struct mlx5_eswitch *esw, u8 val)
+{
+	struct mlx5_eswitch_rep *rep;
+	int vf_vport, err = 0;
+
+	esw_debug(esw->dev, "%s applying global %s policy\n", __func__, val ? "pop" : "none");
+	for (vf_vport = 1; vf_vport < esw->enabled_vports; vf_vport++) {
+		rep = &esw->offloads.vport_reps[vf_vport];
+		if (!rep->valid)
+			continue;
+
+		err = __mlx5_eswitch_set_vport_vlan(esw, rep->vport, 0, 0, val);
+		if (err)
+			goto out;
+	}
+
+out:
+	return err;
+}
+
+static struct mlx5_eswitch_rep *
+esw_vlan_action_get_vport(struct mlx5_esw_flow_attr *attr, bool push, bool pop)
+{
+	struct mlx5_eswitch_rep *in_rep, *out_rep, *vport = NULL;
+
+	in_rep  = attr->in_rep;
+	out_rep = attr->out_rep;
+
+	if (push)
+		vport = in_rep;
+	else if (pop)
+		vport = out_rep;
+	else
+		vport = in_rep;
+
+	return vport;
+}
+
+static int esw_add_vlan_action_check(struct mlx5_esw_flow_attr *attr,
+				     bool push, bool pop, bool fwd)
+{
+	struct mlx5_eswitch_rep *in_rep, *out_rep;
+
+	if ((push || pop) && !fwd)
+		goto out_notsupp;
+
+	in_rep  = attr->in_rep;
+	out_rep = attr->out_rep;
+
+	if (push && in_rep->vport == FDB_UPLINK_VPORT)
+		goto out_notsupp;
+
+	if (pop && out_rep->vport == FDB_UPLINK_VPORT)
+		goto out_notsupp;
+
+	/* vport has vlan push configured, can't offload VF --> wire rules w.o it */
+	if (!push && !pop && fwd)
+		if (in_rep->vlan && out_rep->vport == FDB_UPLINK_VPORT)
+			goto out_notsupp;
+
+	/* protects against (1) setting rules with different vlans to push and
+	 * (2) setting rules w.o vlans (attr->vlan = 0) && w. vlans to push (!= 0)
+	 */
+	if (push && in_rep->vlan_refcount && (in_rep->vlan != attr->vlan))
+		goto out_notsupp;
+
+	return 0;
+
+out_notsupp:
+	return -ENOTSUPP;
+}
+
+int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
+				 struct mlx5_esw_flow_attr *attr)
+{
+	struct offloads_fdb *offloads = &esw->fdb_table.offloads;
+	struct mlx5_eswitch_rep *vport = NULL;
+	bool push, pop, fwd;
+	int err = 0;
+
+	push = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH);
+	pop  = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
+	fwd  = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
+
+	err = esw_add_vlan_action_check(attr, push, pop, fwd);
+	if (err)
+		return err;
+
+	attr->vlan_handled = false;
+
+	vport = esw_vlan_action_get_vport(attr, push, pop);
+
+	if (!push && !pop && fwd) {
+		/* tracks VF --> wire rules without vlan push action */
+		if (attr->out_rep->vport == FDB_UPLINK_VPORT) {
+			vport->vlan_refcount++;
+			attr->vlan_handled = true;
+		}
+
+		return 0;
+	}
+
+	if (!push && !pop)
+		return 0;
+
+	if (!(offloads->vlan_push_pop_refcount)) {
+		/* it's the 1st vlan rule, apply global vlan pop policy */
+		err = esw_set_global_vlan_pop(esw, SET_VLAN_STRIP);
+		if (err)
+			goto out;
+	}
+	offloads->vlan_push_pop_refcount++;
+
+	if (push) {
+		if (vport->vlan_refcount)
+			goto skip_set_push;
+
+		err = __mlx5_eswitch_set_vport_vlan(esw, vport->vport, attr->vlan, 0,
+						    SET_VLAN_INSERT | SET_VLAN_STRIP);
+		if (err)
+			goto out;
+		vport->vlan = attr->vlan;
+skip_set_push:
+		vport->vlan_refcount++;
+	}
+out:
+	if (!err)
+		attr->vlan_handled = true;
+	return err;
+}
+
+int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
+				 struct mlx5_esw_flow_attr *attr)
+{
+	struct offloads_fdb *offloads = &esw->fdb_table.offloads;
+	struct mlx5_eswitch_rep *vport = NULL;
+	bool push, pop, fwd;
+	int err = 0;
+
+	if (!attr->vlan_handled)
+		return 0;
+
+	push = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH);
+	pop  = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
+	fwd  = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
+
+	vport = esw_vlan_action_get_vport(attr, push, pop);
+
+	if (!push && !pop && fwd) {
+		/* tracks VF --> wire rules without vlan push action */
+		if (attr->out_rep->vport == FDB_UPLINK_VPORT)
+			vport->vlan_refcount--;
+
+		return 0;
+	}
+
+	if (push) {
+		vport->vlan_refcount--;
+		if (vport->vlan_refcount)
+			goto skip_unset_push;
+
+		vport->vlan = 0;
+		err = __mlx5_eswitch_set_vport_vlan(esw, vport->vport,
+						    0, 0, SET_VLAN_STRIP);
+		if (err)
+			goto out;
+	}
+
+skip_unset_push:
+	offloads->vlan_push_pop_refcount--;
+	if (offloads->vlan_push_pop_refcount)
+		return 0;
+
+	/* no more vlan rules, stop global vlan pop policy */
+	err = esw_set_global_vlan_pop(esw, 0);
+
+out:
+	return err;
+}
+
 static struct mlx5_flow_rule *
 mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport, u32 sqn)
 {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 5/9] net/mlx5: Put elements related to offloaded TC rule in one struct
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Put the representors related to the source and dest vports and the
action in struct mlx5_esw_flow_attr which is used while setting the FDB rule.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 51 ++++++++++++----------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  | 10 ++++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  9 ++--
 3 files changed, 44 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 783e122..3eb319b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -39,6 +39,7 @@
 #include <linux/rhashtable.h>
 #include <net/switchdev.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/tc_act/tc_vlan.h>
 #include "en.h"
 #include "en_tc.h"
 #include "eswitch.h"
@@ -47,6 +48,7 @@ struct mlx5e_tc_flow {
 	struct rhash_head	node;
 	u64			cookie;
 	struct mlx5_flow_rule	*rule;
+	struct mlx5_esw_flow_attr *attr;
 };
 
 #define MLX5E_TC_TABLE_NUM_ENTRIES 1024
@@ -114,15 +116,11 @@ err_create_ft:
 
 static struct mlx5_flow_rule *mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 						    struct mlx5_flow_spec *spec,
-						    u32 action, u32 dst_vport)
+						    struct mlx5_esw_flow_attr *attr)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_eswitch_rep *rep = priv->ppriv;
-	u32 src_vport;
 
-	src_vport = rep->vport;
-
-	return mlx5_eswitch_add_offloaded_rule(esw, spec, action, src_vport, dst_vport);
+	return mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
 }
 
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
@@ -358,7 +356,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 }
 
 static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
-				u32 *action, u32 *dest_vport)
+				struct mlx5_esw_flow_attr *attr)
 {
 	const struct tc_action *a;
 	LIST_HEAD(actions);
@@ -366,17 +364,18 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 	if (tc_no_actions(exts))
 		return -EINVAL;
 
-	*action = 0;
+	memset(attr, 0, sizeof(*attr));
+	attr->in_rep = priv->ppriv;
 
 	tcf_exts_to_list(exts, &actions);
 	list_for_each_entry(a, &actions, list) {
 		/* Only support a single action per rule */
-		if (*action)
+		if (attr->action)
 			return -EINVAL;
 
 		if (is_tcf_gact_shot(a)) {
-			*action = MLX5_FLOW_CONTEXT_ACTION_DROP |
-				  MLX5_FLOW_CONTEXT_ACTION_COUNT;
+			attr->action = MLX5_FLOW_CONTEXT_ACTION_DROP |
+				       MLX5_FLOW_CONTEXT_ACTION_COUNT;
 			continue;
 		}
 
@@ -384,7 +383,6 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 			int ifindex = tcf_mirred_ifindex(a);
 			struct net_device *out_dev;
 			struct mlx5e_priv *out_priv;
-			struct mlx5_eswitch_rep *out_rep;
 
 			out_dev = __dev_get_by_index(dev_net(priv->netdev), ifindex);
 
@@ -394,10 +392,9 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 				return -EINVAL;
 			}
 
+			attr->action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 			out_priv = netdev_priv(out_dev);
-			out_rep  = out_priv->ppriv;
-			*dest_vport = out_rep->vport;
-			*action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+			attr->out_rep = out_priv->ppriv;
 			continue;
 		}
 
@@ -411,18 +408,27 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 {
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	int err = 0;
-	u32 flow_tag, action, dest_vport = 0;
+	bool fdb_flow = false;
+	u32 flow_tag, action;
 	struct mlx5e_tc_flow *flow;
 	struct mlx5_flow_spec *spec;
 	struct mlx5_flow_rule *old = NULL;
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 
+	if (esw && esw->mode == SRIOV_OFFLOADS)
+		fdb_flow = true;
+
 	flow = rhashtable_lookup_fast(&tc->ht, &f->cookie,
 				      tc->ht_params);
-	if (flow)
+	if (flow) {
 		old = flow->rule;
-	else
-		flow = kzalloc(sizeof(*flow), GFP_KERNEL);
+	} else {
+		if (fdb_flow)
+			flow = kzalloc(sizeof(*flow) + sizeof(struct mlx5_esw_flow_attr),
+				       GFP_KERNEL);
+		else
+			flow = kzalloc(sizeof(*flow), GFP_KERNEL);
+	}
 
 	spec = mlx5_vzalloc(sizeof(*spec));
 	if (!spec || !flow) {
@@ -436,11 +442,12 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 	if (err < 0)
 		goto err_free;
 
-	if (esw && esw->mode == SRIOV_OFFLOADS) {
-		err = parse_tc_fdb_actions(priv, f->exts, &action, &dest_vport);
+	if (fdb_flow) {
+		flow->attr  = (struct mlx5_esw_flow_attr *)(flow + 1);
+		err = parse_tc_fdb_actions(priv, f->exts, flow->attr);
 		if (err < 0)
 			goto err_free;
-		flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, action, dest_vport);
+		flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->attr);
 	} else {
 		err = parse_tc_nic_actions(priv, f->exts, &action, &flow_tag);
 		if (err < 0)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 4f5391a..eeeeadc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -238,11 +238,12 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 				 struct ifla_vf_stats *vf_stats);
 
 struct mlx5_flow_spec;
+struct mlx5_esw_flow_attr;
 
 struct mlx5_flow_rule *
 mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_spec *spec,
-				u32 action, u32 src_vport, u32 dst_vport);
+				struct mlx5_esw_flow_attr *attr);
 struct mlx5_flow_rule *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 tirn);
 
@@ -251,6 +252,13 @@ enum {
 	SET_VLAN_INSERT	= BIT(1)
 };
 
+struct mlx5_esw_flow_attr {
+	struct mlx5_eswitch_rep *in_rep;
+	struct mlx5_eswitch_rep *out_rep;
+
+	int	action;
+};
+
 int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
 				 struct mlx5_eswitch_rep *rep,
 				 u16 *sqns_array, int sqns_num);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index b901cd4..c0d9d1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -46,19 +46,22 @@ enum {
 struct mlx5_flow_rule *
 mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 				struct mlx5_flow_spec *spec,
-				u32 action, u32 src_vport, u32 dst_vport)
+				struct mlx5_esw_flow_attr *attr)
 {
 	struct mlx5_flow_destination dest = { 0 };
 	struct mlx5_fc *counter = NULL;
 	struct mlx5_flow_rule *rule;
 	void *misc;
+	int action;
 
 	if (esw->mode != SRIOV_OFFLOADS)
 		return ERR_PTR(-EOPNOTSUPP);
 
+	action = attr->action;
+
 	if (action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
 		dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
-		dest.vport_num = dst_vport;
+		dest.vport_num = attr->out_rep->vport;
 		action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 	} else if (action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		counter = mlx5_fc_create(esw->dev, true);
@@ -69,7 +72,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	}
 
 	misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters);
-	MLX5_SET(fte_match_set_misc, misc, source_port, src_vport);
+	MLX5_SET(fte_match_set_misc, misc, source_port, attr->in_rep->vport);
 
 	misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters);
 	MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 4/9] net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

The HW can be programmed to push vlan, pop vlan or both.

A factorization step towards using the push/pop capabilties in the
eswitch offloads mode. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 33 +++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  5 ++++
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 4927494..6605453 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -127,7 +127,7 @@ static int modify_esw_vport_context_cmd(struct mlx5_core_dev *dev, u16 vport,
 }
 
 static int modify_esw_vport_cvlan(struct mlx5_core_dev *dev, u32 vport,
-				  u16 vlan, u8 qos, bool set)
+				  u16 vlan, u8 qos, u8 set_flags)
 {
 	u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)] = {0};
 
@@ -135,14 +135,18 @@ static int modify_esw_vport_cvlan(struct mlx5_core_dev *dev, u32 vport,
 	    !MLX5_CAP_ESW(dev, vport_cvlan_insert_if_not_exist))
 		return -ENOTSUPP;
 
-	esw_debug(dev, "Set Vport[%d] VLAN %d qos %d set=%d\n",
-		  vport, vlan, qos, set);
-	if (set) {
+	esw_debug(dev, "Set Vport[%d] VLAN %d qos %d set=%x\n",
+		  vport, vlan, qos, set_flags);
+
+	if (set_flags & SET_VLAN_STRIP)
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.vport_cvlan_strip, 1);
+
+	if (set_flags & SET_VLAN_INSERT) {
 		/* insert only if no vlan in packet */
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.vport_cvlan_insert, 1);
+
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.cvlan_pcp, qos);
 		MLX5_SET(modify_esw_vport_context_in, in,
@@ -1777,25 +1781,21 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
 	return 0;
 }
 
-int mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
-				int vport, u16 vlan, u8 qos)
+int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
+				  int vport, u16 vlan, u8 qos, u8 set_flags)
 {
 	struct mlx5_vport *evport;
 	int err = 0;
-	int set = 0;
 
 	if (!ESW_ALLOWED(esw))
 		return -EPERM;
 	if (!LEGAL_VPORT(esw, vport) || (vlan > 4095) || (qos > 7))
 		return -EINVAL;
 
-	if (vlan || qos)
-		set = 1;
-
 	mutex_lock(&esw->state_lock);
 	evport = &esw->vports[vport];
 
-	err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set);
+	err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags);
 	if (err)
 		goto unlock;
 
@@ -1813,6 +1813,17 @@ unlock:
 	return err;
 }
 
+int mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
+				int vport, u16 vlan, u8 qos)
+{
+	u8 set_flags = 0;
+
+	if (vlan || qos)
+		set_flags = SET_VLAN_STRIP | SET_VLAN_INSERT;
+
+	return __mlx5_eswitch_set_vport_vlan(esw, vport, vlan, qos, set_flags);
+}
+
 int mlx5_eswitch_set_vport_spoofchk(struct mlx5_eswitch *esw,
 				    int vport, bool spoofchk)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index ebfcde0..4f5391a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -246,6 +246,11 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 struct mlx5_flow_rule *
 mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 tirn);
 
+enum {
+	SET_VLAN_STRIP	= BIT(0),
+	SET_VLAN_INSERT	= BIT(1)
+};
+
 int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
 				 struct mlx5_eswitch_rep *rep,
 				 u16 *sqns_array, int sqns_num);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 8/9] net/mlx5e: Add TC vlan action for SRIOV offloads
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Parse TC vlan actions and set the required elements to allow offloading.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 43 ++++++++++++++++++-------
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3eb319b..e61bd52 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -119,17 +119,27 @@ static struct mlx5_flow_rule *mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 						    struct mlx5_esw_flow_attr *attr)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	int err;
+
+	err = mlx5_eswitch_add_vlan_action(esw, attr);
+	if (err)
+		return ERR_PTR(err);
 
 	return mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
 }
 
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
-			      struct mlx5_flow_rule *rule)
+			      struct mlx5_flow_rule *rule,
+			      struct mlx5_esw_flow_attr *attr)
 {
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_fc *counter = NULL;
 
 	counter = mlx5_flow_rule_counter(rule);
 
+	if (esw && esw->mode == SRIOV_OFFLOADS)
+		mlx5_eswitch_del_vlan_action(esw, attr);
+
 	mlx5_del_flow_rule(rule);
 
 	mlx5_fc_destroy(priv->mdev, counter);
@@ -369,13 +379,9 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 
 	tcf_exts_to_list(exts, &actions);
 	list_for_each_entry(a, &actions, list) {
-		/* Only support a single action per rule */
-		if (attr->action)
-			return -EINVAL;
-
 		if (is_tcf_gact_shot(a)) {
-			attr->action = MLX5_FLOW_CONTEXT_ACTION_DROP |
-				       MLX5_FLOW_CONTEXT_ACTION_COUNT;
+			attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP |
+					MLX5_FLOW_CONTEXT_ACTION_COUNT;
 			continue;
 		}
 
@@ -392,12 +398,25 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 				return -EINVAL;
 			}
 
-			attr->action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+			attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 			out_priv = netdev_priv(out_dev);
 			attr->out_rep = out_priv->ppriv;
 			continue;
 		}
 
+		if (is_tcf_vlan(a)) {
+			if (tcf_vlan_action(a) == VLAN_F_POP) {
+				attr->action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
+			} else if (tcf_vlan_action(a) == VLAN_F_PUSH) {
+				if (tcf_vlan_push_proto(a) != htons(ETH_P_8021Q))
+					return -EOPNOTSUPP;
+
+				attr->action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH;
+				attr->vlan = tcf_vlan_push_vid(a);
+			}
+			continue;
+		}
+
 		return -EINVAL;
 	}
 	return 0;
@@ -413,6 +432,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 	struct mlx5e_tc_flow *flow;
 	struct mlx5_flow_spec *spec;
 	struct mlx5_flow_rule *old = NULL;
+	struct mlx5_esw_flow_attr *old_attr;
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 
 	if (esw && esw->mode == SRIOV_OFFLOADS)
@@ -422,6 +442,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 				      tc->ht_params);
 	if (flow) {
 		old = flow->rule;
+		old_attr = flow->attr;
 	} else {
 		if (fdb_flow)
 			flow = kzalloc(sizeof(*flow) + sizeof(struct mlx5_esw_flow_attr),
@@ -466,7 +487,7 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 		goto err_del_rule;
 
 	if (old)
-		mlx5e_tc_del_flow(priv, old);
+		mlx5e_tc_del_flow(priv, old, old_attr);
 
 	goto out;
 
@@ -494,7 +515,7 @@ int mlx5e_delete_flower(struct mlx5e_priv *priv,
 
 	rhashtable_remove_fast(&tc->ht, &flow->node, tc->ht_params);
 
-	mlx5e_tc_del_flow(priv, flow->rule);
+	mlx5e_tc_del_flow(priv, flow->rule, flow->attr);
 
 	kfree(flow);
 
@@ -551,7 +572,7 @@ static void _mlx5e_tc_del_flow(void *ptr, void *arg)
 	struct mlx5e_tc_flow *flow = ptr;
 	struct mlx5e_priv *priv = arg;
 
-	mlx5e_tc_del_flow(priv, flow->rule);
+	mlx5e_tc_del_flow(priv, flow->rule, flow->attr);
 	kfree(flow);
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/9] net/mlx5: E-Switch, Set the vport when registering the uplink rep
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Set the vport value in the PF entry to be that of the uplink so
we can use it blindly over the tc / eswitch offload code without
translating it each time we deal with the uplink representor.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  6 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 10 ++------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  3 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 27 +++++++++++-----------
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a9fc9d4..b309e7c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3726,9 +3726,9 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 		mlx5_query_nic_vport_mac_address(mdev, 0, rep.hw_id);
 		rep.load = mlx5e_nic_rep_load;
 		rep.unload = mlx5e_nic_rep_unload;
-		rep.vport = 0;
+		rep.vport = FDB_UPLINK_VPORT;
 		rep.priv_data = priv;
-		mlx5_eswitch_register_vport_rep(esw, &rep);
+		mlx5_eswitch_register_vport_rep(esw, 0, &rep);
 	}
 }
 
@@ -3867,7 +3867,7 @@ static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
 		rep.unload = mlx5e_vport_rep_unload;
 		rep.vport = vport;
 		ether_addr_copy(rep.hw_id, mac);
-		mlx5_eswitch_register_vport_rep(esw, &rep);
+		mlx5_eswitch_register_vport_rep(esw, vport, &rep);
 	}
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 22cfc4a..783e122 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -120,10 +120,7 @@ static struct mlx5_flow_rule *mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 	struct mlx5_eswitch_rep *rep = priv->ppriv;
 	u32 src_vport;
 
-	if (rep->vport) /* set source vport for the flow */
-		src_vport = rep->vport;
-	else
-		src_vport = FDB_UPLINK_VPORT;
+	src_vport = rep->vport;
 
 	return mlx5_eswitch_add_offloaded_rule(esw, spec, action, src_vport, dst_vport);
 }
@@ -399,10 +396,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 
 			out_priv = netdev_priv(out_dev);
 			out_rep  = out_priv->ppriv;
-			if (out_rep->vport == 0)
-				*dest_vport = FDB_UPLINK_VPORT;
-			else
-				*dest_vport = out_rep->vport;
+			*dest_vport = out_rep->vport;
 			*action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 			continue;
 		}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index b96e8c9..6d8c5a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -254,9 +254,10 @@ void mlx5_eswitch_sqs2vport_stop(struct mlx5_eswitch *esw,
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode);
 int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode);
 void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
+				     int vport_index,
 				     struct mlx5_eswitch_rep *rep);
 void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
-				       int vport);
+				       int vport_index);
 
 #define MLX5_DEBUG_ESWITCH_MASK BIT(3)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 3dc83a9..a73721b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -144,16 +144,12 @@ int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
 {
 	struct mlx5_flow_rule *flow_rule;
 	struct mlx5_esw_sq *esw_sq;
-	int vport;
 	int err;
 	int i;
 
 	if (esw->mode != SRIOV_OFFLOADS)
 		return 0;
 
-	vport = rep->vport == 0 ?
-		FDB_UPLINK_VPORT : rep->vport;
-
 	for (i = 0; i < sqns_num; i++) {
 		esw_sq = kzalloc(sizeof(*esw_sq), GFP_KERNEL);
 		if (!esw_sq) {
@@ -163,7 +159,7 @@ int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
 
 		/* Add re-inject rule to the PF/representor sqs */
 		flow_rule = mlx5_eswitch_add_send_to_vport_rule(esw,
-								vport,
+								rep->vport,
 								sqns_array[i]);
 		if (IS_ERR(flow_rule)) {
 			err = PTR_ERR(flow_rule);
@@ -612,27 +608,30 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 }
 
 void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
-				     struct mlx5_eswitch_rep *rep)
+				     int vport_index,
+				     struct mlx5_eswitch_rep *__rep)
 {
 	struct mlx5_esw_offload *offloads = &esw->offloads;
+	struct mlx5_eswitch_rep *rep;
+
+	rep = &offloads->vport_reps[vport_index];
 
-	memcpy(&offloads->vport_reps[rep->vport], rep,
-	       sizeof(struct mlx5_eswitch_rep));
+	memcpy(rep, __rep, sizeof(struct mlx5_eswitch_rep));
 
-	INIT_LIST_HEAD(&offloads->vport_reps[rep->vport].vport_sqs_list);
-	offloads->vport_reps[rep->vport].valid = true;
+	INIT_LIST_HEAD(&rep->vport_sqs_list);
+	rep->valid = true;
 }
 
 void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
-				       int vport)
+				       int vport_index)
 {
 	struct mlx5_esw_offload *offloads = &esw->offloads;
 	struct mlx5_eswitch_rep *rep;
 
-	rep = &offloads->vport_reps[vport];
+	rep = &offloads->vport_reps[vport_index];
 
-	if (esw->mode == SRIOV_OFFLOADS && esw->vports[vport].enabled)
+	if (esw->mode == SRIOV_OFFLOADS && esw->vports[vport_index].enabled)
 		rep->unload(esw, rep);
 
-	offloads->vport_reps[vport].valid = false;
+	rep->valid = false;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/9] net/mlx5: E-Switch, Set vport representor fields explicitly on registration
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

The structure we use for the eswitch vport representor (mlx5_eswitch_rep)
has some fields which are set from upper layers in the driver when they
register the rep. Use explicit setting on registration time for them and
avoid global memcpy. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h          | 5 +++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 8 +++++++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 6d8c5a2..ebfcde0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -178,11 +178,12 @@ struct mlx5_eswitch_rep {
 	void		       (*unload)(struct mlx5_eswitch *esw,
 					 struct mlx5_eswitch_rep *rep);
 	u16		       vport;
-	struct mlx5_flow_rule *vport_rx_rule;
+	u8		       hw_id[ETH_ALEN];
 	void		      *priv_data;
+
+	struct mlx5_flow_rule *vport_rx_rule;
 	struct list_head       vport_sqs_list;
 	bool		       valid;
-	u8		       hw_id[ETH_ALEN];
 };
 
 struct mlx5_esw_offload {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index a73721b..b901cd4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -616,7 +616,13 @@ void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
 
 	rep = &offloads->vport_reps[vport_index];
 
-	memcpy(rep, __rep, sizeof(struct mlx5_eswitch_rep));
+	memset(rep, 0, sizeof(*rep));
+
+	rep->load   = __rep->load;
+	rep->unload = __rep->unload;
+	rep->vport  = __rep->vport;
+	rep->priv_data = __rep->priv_data;
+	ether_addr_copy(rep->hw_id, __rep->hw_id);
 
 	INIT_LIST_HEAD(&rep->vport_sqs_list);
 	rep->valid = true;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 9/9] net/mlx5e: Add TC vlan match parsing
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Enhance the parsing of offloaded TC rules matches to handle vlans.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index e61bd52..a350b71 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -164,6 +164,7 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 	    ~(BIT(FLOW_DISSECTOR_KEY_CONTROL) |
 	      BIT(FLOW_DISSECTOR_KEY_BASIC) |
 	      BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
+	      BIT(FLOW_DISSECTOR_KEY_VLAN) |
 	      BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) |
 	      BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) |
 	      BIT(FLOW_DISSECTOR_KEY_PORTS))) {
@@ -227,6 +228,24 @@ static int parse_cls_flower(struct mlx5e_priv *priv, struct mlx5_flow_spec *spec
 				key->src);
 	}
 
+	if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
+		struct flow_dissector_key_vlan *key =
+			skb_flow_dissector_target(f->dissector,
+						  FLOW_DISSECTOR_KEY_VLAN,
+						  f->key);
+		struct flow_dissector_key_vlan *mask =
+			skb_flow_dissector_target(f->dissector,
+						  FLOW_DISSECTOR_KEY_VLAN,
+						  f->mask);
+		if (mask->vlan_id) {
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c, vlan_tag, 1);
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v, vlan_tag, 1);
+
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c, first_vid, mask->vlan_id);
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v, first_vid, key->vlan_id);
+		}
+	}
+
 	if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
 		struct flow_dissector_key_ipv4_addrs *key =
 			skb_flow_dissector_target(f->dissector,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 6/9] net/mlx5e: Refactor retrival of skb from rx completion element (cqe)
From: Saeed Mahameed @ 2016-09-22 17:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Hadar Hen-Zion, Jiri Pirko, Andy Gospodarek,
	Jesse Brandeburg, John Fastabend, Amir Vadai, Saeed Mahameed
In-Reply-To: <1474563709-17943-1-git-send-email-saeedm@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>

Factor the relevant code into a static inline helper (skb_from_cqe)
doing that.

Move the call to napi_gro_receive to be carried out just
after mlx5e_complete_rx_cqe returns.

Both changes are to be used for the VF representor as well
in the next commit.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 41 +++++++++++++++++--------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 0a81bd3..e836e47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -629,7 +629,6 @@ static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 	rq->stats.packets++;
 	rq->stats.bytes += cqe_bcnt;
 	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
-	napi_gro_receive(rq->cq.napi, skb);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_sq *sq)
@@ -733,20 +732,15 @@ static inline bool mlx5e_xdp_handle(struct mlx5e_rq *rq,
 	}
 }
 
-void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
+static inline
+struct sk_buff *skb_from_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
+			     u16 wqe_counter, u32 cqe_bcnt)
 {
 	struct bpf_prog *xdp_prog = READ_ONCE(rq->xdp_prog);
 	struct mlx5e_dma_info *di;
-	struct mlx5e_rx_wqe *wqe;
-	__be16 wqe_counter_be;
 	struct sk_buff *skb;
-	u16 wqe_counter;
 	void *va, *data;
-	u32 cqe_bcnt;
 
-	wqe_counter_be = cqe->wqe_counter;
-	wqe_counter    = be16_to_cpu(wqe_counter_be);
-	wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 	di             = &rq->dma_info[wqe_counter];
 	va             = page_address(di->page);
 	data           = va + MLX5_RX_HEADROOM;
@@ -757,22 +751,21 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 				      rq->buff.wqe_sz,
 				      DMA_FROM_DEVICE);
 	prefetch(data);
-	cqe_bcnt = be32_to_cpu(cqe->byte_cnt);
 
 	if (unlikely((cqe->op_own >> 4) != MLX5_CQE_RESP_SEND)) {
 		rq->stats.wqe_err++;
 		mlx5e_page_release(rq, di, true);
-		goto wq_ll_pop;
+		return NULL;
 	}
 
 	if (mlx5e_xdp_handle(rq, xdp_prog, di, data, cqe_bcnt))
-		goto wq_ll_pop; /* page/packet was consumed by XDP */
+		return NULL; /* page/packet was consumed by XDP */
 
 	skb = build_skb(va, RQ_PAGE_SIZE(rq));
 	if (unlikely(!skb)) {
 		rq->stats.buff_alloc_err++;
 		mlx5e_page_release(rq, di, true);
-		goto wq_ll_pop;
+		return NULL;
 	}
 
 	/* queue up for recycling ..*/
@@ -782,7 +775,28 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	skb_reserve(skb, MLX5_RX_HEADROOM);
 	skb_put(skb, cqe_bcnt);
 
+	return skb;
+}
+
+void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
+{
+	struct mlx5e_rx_wqe *wqe;
+	__be16 wqe_counter_be;
+	struct sk_buff *skb;
+	u16 wqe_counter;
+	u32 cqe_bcnt;
+
+	wqe_counter_be = cqe->wqe_counter;
+	wqe_counter    = be16_to_cpu(wqe_counter_be);
+	wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
+	cqe_bcnt       = be32_to_cpu(cqe->byte_cnt);
+
+	skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt);
+	if (!skb)
+		goto wq_ll_pop;
+
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	napi_gro_receive(rq->cq.napi, skb);
 
 wq_ll_pop:
 	mlx5_wq_ll_pop(&rq->wq, wqe_counter_be,
@@ -861,6 +875,7 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 
 	mlx5e_mpwqe_fill_rx_skb(rq, cqe, wi, cqe_bcnt, skb);
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	napi_gro_receive(rq->cq.napi, skb);
 
 mpwrq_cqe_out:
 	if (likely(wi->consumed_strides < rq->mpwqe_num_strides))
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox